LLM API

AI language models (OpenAI-compatible). Reasoning-capable models may return extended metadata in model_output_metadata.reasoning containing inference traces, without changing the request format.

Endpoints

Method	Path	Description
`GET`	`/v1/llm/models`	List models
`GET`	`/v1/models`	List models
`POST`	`/v1/llm/chat`	Chat completion
`POST`	`/v1/llm/chat/completions`	Chat completion
`POST`	`/v1/chat/completions`	Chat completion
`POST`	`/v1/llm/embeddings`	Create Embedding
`POST`	`/v1/embeddings`	Create Embedding

GET /v1/llm/models

List models

List available models (WAYSCloud endpoint)

Response:

Field	Type	Description
`object`	`string`	Values: `list`
`data`	`array`

Example:

bash

curl https://api.wayscloud.services/v1/llm/models \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

Response:

json

{
  "object": "list",
  "data": [
    {
      "id": "mixtral-8x7b",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    },
    {
      "id": "deepseek-v3",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    }
  ]
}

GET /v1/models

List models

List available models (OpenAI-compatible endpoint)

Returns a list of all available LLM models.

Response Example:

json

{
  "object": "list",
  "data": [
    {"id": "deepseek-v3", "object": "model", "owned_by": "wayscloud"},
    {"id": "qwen3-235b-instruct", "object": "model", "owned_by": "wayscloud"},
    {"id": "qwen3-vl-8b", "object": "model", "owned_by": "wayscloud"}
  ]
}

Response:

Field	Type	Description
`object`	`string`	Values: `list`
`data`	`array`

Example:

bash

curl https://api.wayscloud.services/v1/models \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

Response:

json

{
  "object": "list",
  "data": [
    {
      "id": "mixtral-8x7b",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    },
    {
      "id": "deepseek-v3",
      "object": "model",
      "owned_by": "wayscloud",
      "created": 1700000000
    }
  ]
}

POST /v1/llm/chat

Chat completion

WAYSCloud LLM chat completion endpoint

Supports both streaming and non-streaming responses.

Request Body:

Field	Type	Description
`model`	`string`	Required. Model alias (e.g., 'mixtral-8x7b')
`messages`	`array`	Required.
`stream`	`boolean`	Enable SSE streaming
`temperature`	`object`
`max_tokens`	`object`
`top_p`	`object`
`tools`	`object`	Tool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented.
`tool_choice`	`object`	Controls tool selection. Accepted for OpenAI compatibility, not yet implemented.
`agent_id`	`object`	Agent identifier for AI agents. Stored for logging only.
`region`	`object`	Preferred datacenter region for inference. Currently only 'oslo' is available.

Example:

json

{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response example:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

Example:

bash

curl -X POST https://api.wayscloud.services/v1/llm/chat \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}'

Response:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

POST /v1/llm/chat/completions

Chat completion

WAYSCloud LLM chat completion endpoint (with /completions suffix)

Alias for /v1/llm/chat for nginx compatibility. Supports both streaming and non-streaming responses.

Request Body:

Field	Type	Description
`model`	`string`	Required. Model alias (e.g., 'mixtral-8x7b')
`messages`	`array`	Required.
`stream`	`boolean`	Enable SSE streaming
`temperature`	`object`
`max_tokens`	`object`
`top_p`	`object`
`tools`	`object`	Tool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented.
`tool_choice`	`object`	Controls tool selection. Accepted for OpenAI compatibility, not yet implemented.
`agent_id`	`object`	Agent identifier for AI agents. Stored for logging only.
`region`	`object`	Preferred datacenter region for inference. Currently only 'oslo' is available.

Example:

json

{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response example:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

Example:

bash

curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}'

Response:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

POST /v1/chat/completions

Chat completion

OpenAI-compatible chat completion endpoint

Drop-in replacement for OpenAI's /v1/chat/completions endpoint. Compatible with OpenAI SDK and all OpenAI-compatible clients.

Features:

Non-streaming and streaming (SSE) responses
Temperature and max_tokens control
Agents framework support (tools, agent_id, tool_choice)
Automatic token counting and billing

Request Example:

json

{
  "model": "mixtral-8x7b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 100
}

Streaming Example:

json

{
  "model": "deepseek-v3",
  "messages": [{"role": "user", "content": "Write a story"}],
  "stream": true,
  "max_tokens": 500
}

AI Agent Example:

json

{
  "model": "deepseek-v3",
  "messages": [{"role": "user", "content": "Help me code"}],
  "agent_id": "ephemeral",
  "tools": [],
  "tool_choice": "auto"
}

Available Models:

Chat — general purpose:

deepseek-v3, deepseek-v3.1 - Versatile, 131k context
qwen3-235b-instruct - Flagship MoE, 262k context
llama-3.3-70b - Enterprise, 131k context
mixtral-8x7b - Fast lightweight MoE
gpt-oss-120b, gpt-oss-20b - Open-weight
gemma-3n-e4b - Ultra-lightweight, cheapest
kimi-k2.5 - Long-context conversational
qwen3.5-9b - Lightweight, 262k context

Reasoning:

deepseek-r1 - Advanced reasoning
qwen3-235b-thinking - MoE reasoning

Code:

qwen3-coder-480b - Code generation

Vision (multimodal):

qwen3-vl-8b - Text + image input

Embeddings:

embedding-multilingual - 1024-dim, for RAG

Moderation:

llamaguard-4 - Content safety

Request Body:

Field	Type	Description
`model`	`string`	Required. Model alias (e.g., 'mixtral-8x7b')
`messages`	`array`	Required.
`stream`	`boolean`	Enable SSE streaming
`temperature`	`object`
`max_tokens`	`object`
`top_p`	`object`
`tools`	`object`	Tool definitions for function calling. Accepted for OpenAI compatibility, not yet implemented.
`tool_choice`	`object`	Controls tool selection. Accepted for OpenAI compatibility, not yet implemented.
`agent_id`	`object`	Agent identifier for AI agents. Stored for logging only.
`region`	`object`	Preferred datacenter region for inference. Currently only 'oslo' is available.

Example:

json

{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response example:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

Example:

bash

curl -X POST https://api.wayscloud.services/v1/chat/completions \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3-235b-thinking",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}'

Response:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qwen3-235b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Norway is Oslo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 8,
    "total_tokens": 32
  }
}

POST /v1/llm/embeddings

Create Embedding

OpenAI-compatible embeddings endpoint.

Request body:

Response: OpenAI-compatible embedding list.

Example:

bash

curl -X POST https://api.wayscloud.services/v1/llm/embeddings \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

POST /v1/embeddings

Create Embedding

OpenAI-compatible embeddings endpoint.

Request body:

Response: OpenAI-compatible embedding list.

Example:

bash

curl -X POST https://api.wayscloud.services/v1/embeddings \
  -H "X-API-Key: wayscloud_llm_abc12_YOUR_SECRET"

LLM API ​

Endpoints ​

GET /v1/llm/models ​

GET /v1/models ​

POST /v1/llm/chat ​

POST /v1/llm/chat/completions ​

POST /v1/chat/completions ​

POST /v1/llm/embeddings ​

POST /v1/embeddings ​

LLM API

Endpoints

GET /v1/llm/models

GET /v1/models

POST /v1/llm/chat

POST /v1/llm/chat/completions

POST /v1/chat/completions

POST /v1/llm/embeddings

POST /v1/embeddings