LLM API
AI language models (OpenAI-compatible). Reasoning-capable models may return extended metadata in model_output_metadata.reasoning containing inference traces, without changing the request format.
Endpoints
| Method | Path | Description |
|---|---|---|
GET | /v1/llm/models | List models |
GET | /v1/models | List models |
POST | /v1/llm/chat | Chat completion |
POST | /v1/llm/chat/completions | Chat completion |
POST | /v1/chat/completions | Chat completion |
POST | /v1/llm/embeddings | Create Embedding |
POST | /v1/embeddings | Create Embedding |
GET /v1/llm/models
List models
List available models (WAYSCloud endpoint)
Response:
| Field | Type | Description |
|---|---|---|
object | string | Values: list |
data | array |
Example:
curl https://api.wayscloud.services/v1/llm/models \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"Response:
{
"object": "list",
"data": [
{
"id": "mixtral-8x7b",
"object": "model",
"owned_by": "wayscloud",
"created": 1700000000
},
{
"id": "deepseek-v3",
"object": "model",
"owned_by": "wayscloud",
"created": 1700000000
}
]
}GET /v1/models
List models
List available models (OpenAI-compatible endpoint)
Returns a list of all available LLM models.
Response Example:
{
"object": "list",
"data": [
{"id": "deepseek-v3", "object": "model", "owned_by": "wayscloud"},
{"id": "qwen3-235b-instruct", "object": "model", "owned_by": "wayscloud"},
{"id": "qwen3-vl-8b", "object": "model", "owned_by": "wayscloud"}
]
}Response:
| Field | Type | Description |
|---|---|---|
object | string | Values: list |
data | array |
Example:
curl https://api.wayscloud.services/v1/models \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"Response:
{
"object": "list",
"data": [
{
"id": "mixtral-8x7b",
"object": "model",
"owned_by": "wayscloud",
"created": 1700000000
},
{
"id": "deepseek-v3",
"object": "model",
"owned_by": "wayscloud",
"created": 1700000000
}
]
}POST /v1/llm/chat
Chat completion
WAYSCloud LLM chat completion endpoint
Supports both streaming and non-streaming responses.
Request Body:
| Field | Type | Description |
|---|---|---|
model | string | Required. Model alias (e.g., 'mixtral-8x7b') |
messages | array | Required. |
stream | boolean | Enable SSE streaming |
temperature | object | |
max_tokens | object | |
top_p | object | |
tools | object | Tool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented. |
tool_choice | object | Controls tool selection. Accepted for OpenAI compatibility, not yet implemented. |
agent_id | object | Agent identifier for AI agents. Stored for logging only. |
region | object | Preferred datacenter region for inference. Currently only 'oslo' is available. |
Example:
{
"model": "qwen3-235b-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Norway?"
}
],
"temperature": 0.7,
"max_tokens": 256
}Response example:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen3-235b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
}Example:
curl -X POST https://api.wayscloud.services/v1/llm/chat \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-235b-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Norway?"
}
],
"temperature": 0.7,
"max_tokens": 256
}'Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen3-235b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
}POST /v1/llm/chat/completions
Chat completion
WAYSCloud LLM chat completion endpoint (with /completions suffix)
Alias for /v1/llm/chat for nginx compatibility. Supports both streaming and non-streaming responses.
Request Body:
| Field | Type | Description |
|---|---|---|
model | string | Required. Model alias (e.g., 'mixtral-8x7b') |
messages | array | Required. |
stream | boolean | Enable SSE streaming |
temperature | object | |
max_tokens | object | |
top_p | object | |
tools | object | Tool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented. |
tool_choice | object | Controls tool selection. Accepted for OpenAI compatibility, not yet implemented. |
agent_id | object | Agent identifier for AI agents. Stored for logging only. |
region | object | Preferred datacenter region for inference. Currently only 'oslo' is available. |
Example:
{
"model": "qwen3-235b-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Norway?"
}
],
"temperature": 0.7,
"max_tokens": 256
}Response example:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen3-235b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
}Example:
curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-235b-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Norway?"
}
],
"temperature": 0.7,
"max_tokens": 256
}'Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen3-235b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
}POST /v1/chat/completions
Chat completion
OpenAI-compatible chat completion endpoint
Drop-in replacement for OpenAI's /v1/chat/completions endpoint. Compatible with OpenAI SDK and all OpenAI-compatible clients.
Features:
- Non-streaming and streaming (SSE) responses
- Temperature and max_tokens control
- Agents framework support (tools, agent_id, tool_choice)
- Automatic token counting and billing
Request Example:
{
"model": "mixtral-8x7b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 100
}Streaming Example:
{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Write a story"}],
"stream": true,
"max_tokens": 500
}AI Agent Example:
{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Help me code"}],
"agent_id": "ephemeral",
"tools": [],
"tool_choice": "auto"
}Available Models:
Chat — general purpose:
deepseek-v3,deepseek-v3.1- Versatile, 131k contextqwen3-235b-instruct- Flagship MoE, 262k contextllama-3.3-70b- Enterprise, 131k contextmixtral-8x7b- Fast lightweight MoEgpt-oss-120b,gpt-oss-20b- Open-weightgemma-3n-e4b- Ultra-lightweight, cheapestkimi-k2.5- Long-context conversationalqwen3.5-9b- Lightweight, 262k context
Reasoning:
deepseek-r1- Advanced reasoningqwen3-235b-thinking- MoE reasoning
Code:
qwen3-coder-480b- Code generation
Vision (multimodal):
qwen3-vl-8b- Text + image input
Embeddings:
embedding-multilingual- 1024-dim, for RAG
Moderation:
llamaguard-4- Content safety
Request Body:
| Field | Type | Description |
|---|---|---|
model | string | Required. Model alias (e.g., 'mixtral-8x7b') |
messages | array | Required. |
stream | boolean | Enable SSE streaming |
temperature | object | |
max_tokens | object | |
top_p | object | |
tools | object | Tool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented. |
tool_choice | object | Controls tool selection. Accepted for OpenAI compatibility, not yet implemented. |
agent_id | object | Agent identifier for AI agents. Stored for logging only. |
region | object | Preferred datacenter region for inference. Currently only 'oslo' is available. |
Example:
{
"model": "qwen3-235b-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Norway?"
}
],
"temperature": 0.7,
"max_tokens": 256
}Response example:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen3-235b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
}Example:
curl -X POST https://api.wayscloud.services/v1/chat/completions \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-235b-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Norway?"
}
],
"temperature": 0.7,
"max_tokens": 256
}'Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qwen3-235b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Norway is Oslo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
}POST /v1/llm/embeddings
Create Embedding
OpenAI-compatible embeddings endpoint.
Request body:
Response: OpenAI-compatible embedding list.
Example:
curl -X POST https://api.wayscloud.services/v1/llm/embeddings \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"POST /v1/embeddings
Create Embedding
OpenAI-compatible embeddings endpoint.
Request body:
Response: OpenAI-compatible embedding list.
Example:
curl -X POST https://api.wayscloud.services/v1/embeddings \
-H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"