Skip to main content

Chat Completions

The chat completions endpoint is the core of the LLM API, allowing you to have conversations with AI models.

Endpoint

POST /v1/chat/completions
POST /v1/llm/chat (WAYSCloud native)

Request Format

{
"model": "mixtral-8x7b",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 0.9,
"stream": false
}

Parameters

ParameterTypeRequiredDefaultDescription
modelstringYes-Model ID (see Models)
messagesarrayYes-Conversation history
temperaturefloatNo0.7Randomness (0.0-2.0)
max_tokensintegerNo1000Maximum tokens to generate
top_pfloatNo1.0Nucleus sampling (0.0-1.0)
streambooleanNofalseEnable streaming
stopstring/arrayNonullStop sequences
presence_penaltyfloatNo0.0Penalize new topics (-2.0 to 2.0)
frequency_penaltyfloatNo0.0Penalize repetition (-2.0 to 2.0)

Message Roles

  • system - Instructions for the model's behavior
  • user - User messages
  • assistant - Assistant responses (for multi-turn conversations)

Example Requests

Simple Request

curl -X POST "https://api.wayscloud.services/v1/chat/completions" \
-H "Authorization: Bearer wayscloud_llm_abc123_YourSecretKey" \
-H "Content-Type: application/json" \
-d '{
"model": "mixtral-8x7b",
"messages": [
{"role": "user", "content": "What is the capital of Norway?"}
]
}'

With System Prompt

curl -X POST "https://api.wayscloud.services/v1/chat/completions" \
-H "Authorization: Bearer $WAYSCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-80b-instruct",
"messages": [
{"role": "system", "content": "You are a Norwegian history expert. Always respond in Norwegian."},
{"role": "user", "content": "Tell me about Viking Age"}
],
"temperature": 0.7,
"max_tokens": 500
}'

Multi-Turn Conversation

{
"model": "mixtral-8x7b",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant"},
{"role": "user", "content": "How do I read a file in Python?"},
{"role": "assistant", "content": "You can use `open()` function: `with open('file.txt') as f: content = f.read()`"},
{"role": "user", "content": "How do I write to a file?"}
]
}

Python Examples

Basic Request

import requests
import os

API_KEY = os.getenv('WAYSCLOUD_API_KEY')

response = requests.post(
'https://api.wayscloud.services/v1/chat/completions',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={
'model': 'mixtral-8x7b',
'messages': [
{'role': 'user', 'content': 'Hello!'}
]
}
)

result = response.json()
print(result['choices'][0]['message']['content'])

Using OpenAI SDK

from openai import OpenAI
import os

client = OpenAI(
api_key=os.getenv('WAYSCLOUD_API_KEY'),
base_url='https://api.wayscloud.services/v1'
)

response = client.chat.completions.create(
model='mixtral-8x7b',
messages=[
{'role': 'user', 'content': 'Hello!'}
]
)

print(response.choices[0].message.content)

Conversation Manager

class ChatSession:
def __init__(self, api_key, model='mixtral-8x7b', system_prompt=None):
self.api_key = api_key
self.model = model
self.messages = []

if system_prompt:
self.messages.append({'role': 'system', 'content': system_prompt})

def send(self, user_message):
self.messages.append({'role': 'user', 'content': user_message})

response = requests.post(
'https://api.wayscloud.services/v1/chat/completions',
headers={
'Authorization': f'Bearer {self.api_key}',
'Content-Type': 'application/json'
},
json={
'model': self.model,
'messages': self.messages
}
)

result = response.json()
assistant_message = result['choices'][0]['message']['content']

self.messages.append({'role': 'assistant', 'content': assistant_message})

return assistant_message

# Usage
chat = ChatSession(
api_key=os.getenv('WAYSCLOUD_API_KEY'),
system_prompt='You are a helpful Python programming assistant'
)

print(chat.send('How do I make HTTP requests?'))
print(chat.send('Show me an example with error handling'))

JavaScript Example

const axios = require('axios');

async function chat(message) {
const response = await axios.post(
'https://api.wayscloud.services/v1/chat/completions',
{
model: 'mixtral-8x7b',
messages: [{ role: 'user', content: message }]
},
{
headers: {
'Authorization': `Bearer ${process.env.WAYSCLOUD_API_KEY}`,
'Content-Type': 'application/json'
}
}
);

return response.data.choices[0].message.content;
}

// Usage
const answer = await chat('What is the capital of Norway?');
console.log(answer);

Response Format

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699012345,
"model": "mixtral-8x7b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 8,
"total_tokens": 23
}
}

Finish Reasons

  • stop - Model finished naturally
  • length - Hit max_tokens limit
  • content_filter - Content filtered

Error Responses

400 Bad Request

{
"error": {
"message": "Invalid model specified",
"type": "invalid_request_error",
"code": "invalid_model"
}
}

429 Rate Limit

{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error"
}
}

Best Practices

  1. Set max_tokens to prevent excessive costs
  2. Use system prompts for consistent behavior
  3. Implement retry logic for rate limits
  4. Cache responses when appropriate
  5. Monitor token usage for cost control

Next Steps