Skip to content

LLM API

AI language models (OpenAI-compatible). Reasoning-capable models may return extended metadata in model_output_metadata.reasoning containing inference traces, without changing the request format.

Endpoints

MethodPathDescription
GET/v1/llm/modelsList models
GET/v1/modelsList models
POST/v1/llm/chatChat completion
POST/v1/llm/chat/completionsChat completion
POST/v1/chat/completionsChat completion
POST/v1/llm/embeddingsCreate Embedding
POST/v1/embeddingsCreate Embedding

GET /v1/llm/models

List models

List available models (WAYSCloud endpoint)

Response:

FieldTypeDescription
objectstringValues: list
dataarray

Example:

bash
curl https://api.wayscloud.services/v1/llm/models \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

Response:

json
{
  "object": "list",
  "data": [
    {
      "id": "mixtral-8x7b",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    },
    {
      "id": "deepseek-v3",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    }
  ]
}

GET /v1/models

List models

List available models (OpenAI-compatible endpoint)

Returns a list of all available LLM models.

Response Example:

json
{
  "object": "list",
  "data": [
    {"id": "deepseek-v3", "object": "model", "owned_by": "wayscloud"},
    {"id": "qwen3-235b-instruct", "object": "model", "owned_by": "wayscloud"},
    {"id": "qwen3-vl-8b", "object": "model", "owned_by": "wayscloud"}
  ]
}

Response:

FieldTypeDescription
objectstringValues: list
dataarray

Example:

bash
curl https://api.wayscloud.services/v1/models \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

Response:

json
{
  "object": "list",
  "data": [
    {
      "id": "mixtral-8x7b",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    },
    {
      "id": "deepseek-v3",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    }
  ]
}

POST /v1/llm/chat

Chat completion

WAYSCloud LLM chat completion endpoint

Supports both streaming and non-streaming responses.

Request Body:

FieldTypeDescription
modelstringRequired. Model alias (e.g., 'mixtral-8x7b')
messagesarrayRequired.
streambooleanEnable SSE streaming
temperatureobject
max_tokensobject
top_pobject
toolsobjectTool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented.
tool_choiceobjectControls tool selection. Accepted for OpenAI compatibility, not yet implemented.
agent_idobjectAgent identifier for AI agents. Stored for logging only.
regionobjectPreferred datacenter region for inference. Currently only 'oslo' is available.

Example:

json
{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response example:

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

Example:

bash
curl -X POST https://api.wayscloud.services/v1/llm/chat \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}'

Response:

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

POST /v1/llm/chat/completions

Chat completion

WAYSCloud LLM chat completion endpoint (with /completions suffix)

Alias for /v1/llm/chat for nginx compatibility. Supports both streaming and non-streaming responses.

Request Body:

FieldTypeDescription
modelstringRequired. Model alias (e.g., 'mixtral-8x7b')
messagesarrayRequired.
streambooleanEnable SSE streaming
temperatureobject
max_tokensobject
top_pobject
toolsobjectTool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented.
tool_choiceobjectControls tool selection. Accepted for OpenAI compatibility, not yet implemented.
agent_idobjectAgent identifier for AI agents. Stored for logging only.
regionobjectPreferred datacenter region for inference. Currently only 'oslo' is available.

Example:

json
{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response example:

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

Example:

bash
curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}'

Response:

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

POST /v1/chat/completions

Chat completion

OpenAI-compatible chat completion endpoint

Drop-in replacement for OpenAI's /v1/chat/completions endpoint. Compatible with OpenAI SDK and all OpenAI-compatible clients.

Features:

  • Non-streaming and streaming (SSE) responses
  • Temperature and max_tokens control
  • Agents framework support (tools, agent_id, tool_choice)
  • Automatic token counting and billing

Request Example:

json
{
  "model": "mixtral-8x7b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 100
}

Streaming Example:

json
{
  "model": "deepseek-v3",
  "messages": [{"role": "user", "content": "Write a story"}],
  "stream": true,
  "max_tokens": 500
}

AI Agent Example:

json
{
  "model": "deepseek-v3",
  "messages": [{"role": "user", "content": "Help me code"}],
  "agent_id": "ephemeral",
  "tools": [],
  "tool_choice": "auto"
}

Available Models:

Chat — general purpose:

  • deepseek-v3, deepseek-v3.1 - Versatile, 131k context
  • qwen3-235b-instruct - Flagship MoE, 262k context
  • llama-3.3-70b - Enterprise, 131k context
  • mixtral-8x7b - Fast lightweight MoE
  • gpt-oss-120b, gpt-oss-20b - Open-weight
  • gemma-3n-e4b - Ultra-lightweight, cheapest
  • kimi-k2.5 - Long-context conversational
  • qwen3.5-9b - Lightweight, 262k context

Reasoning:

  • deepseek-r1 - Advanced reasoning
  • qwen3-235b-thinking - MoE reasoning

Code:

  • qwen3-coder-480b - Code generation

Vision (multimodal):

  • qwen3-vl-8b - Text + image input

Embeddings:

  • embedding-multilingual - 1024-dim, for RAG

Moderation:

  • llamaguard-4 - Content safety

Request Body:

FieldTypeDescription
modelstringRequired. Model alias (e.g., 'mixtral-8x7b')
messagesarrayRequired.
streambooleanEnable SSE streaming
temperatureobject
max_tokensobject
top_pobject
toolsobjectTool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented.
tool_choiceobjectControls tool selection. Accepted for OpenAI compatibility, not yet implemented.
agent_idobjectAgent identifier for AI agents. Stored for logging only.
regionobjectPreferred datacenter region for inference. Currently only 'oslo' is available.

Example:

json
{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response example:

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

Example:

bash
curl -X POST https://api.wayscloud.services/v1/chat/completions \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}'

Response:

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

POST /v1/llm/embeddings

Create Embedding

OpenAI-compatible embeddings endpoint.

Request body:

Response: OpenAI-compatible embedding list.

Example:

bash
curl -X POST https://api.wayscloud.services/v1/llm/embeddings \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

POST /v1/embeddings

Create Embedding

OpenAI-compatible embeddings endpoint.

Request body:

Response: OpenAI-compatible embedding list.

Example:

bash
curl -X POST https://api.wayscloud.services/v1/embeddings \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"