LLM API Overview

WAYSCloud LLM API provides access to 10 state-of-the-art language models via an OpenAI-compatible API. Perfect for chatbots, content generation, code assistance, and reasoning tasks.

Key Features

🤖 10 Powerful Models - Including reasoning models with chain-of-thought
🔌 OpenAI Compatible - Drop-in replacement for OpenAI API
⚡ Streaming Support - Real-time responses with Server-Sent Events (SSE)
🔐 Secure - API key authentication with rate limiting
📊 Usage Tracking - Automatic token counting and billing

Base URL

https://api.wayscloud.services/v1

Available Models

Standard Chat Models

Model	Description	Context	Best For
`qwen3-80b-instruct`	High-quality general purpose	32K tokens	General tasks, multilingual
`deepseek-v3`	Advanced reasoning and coding	64K tokens	Complex tasks, code generation
`kimi-k2`	Long context support	200K tokens	Long documents, analysis
`mixtral-8x7b`	Fast and efficient	32K tokens	Quick responses, cost-effective
`llama-3.1-405b`	Large-scale reasoning	128K tokens	Complex reasoning, analysis

Reasoning Models (Chain-of-Thought)

Model	Description	Context	Best For
`deepseek-r1` ⭐	Advanced reasoning with thinking process	64K tokens	Math, logic, problem-solving
`qwen3-80b-thinking`	Reasoning-optimized Qwen	32K tokens	Step-by-step reasoning
`qwen3-235b-thinking`	Large-scale reasoning model	32K tokens	Complex multi-step problems

Specialized Models

Model	Description	Context	Best For
`qwen3-coder-480b`	Code generation and debugging	32K tokens	Programming tasks
`llamaguard-4`	Content moderation	8K tokens	Safety, content filtering

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer wayscloud_llm_prod_YOUR_API_KEY

Get Your API Key

Log in to WAYSCloud Console
Navigate to API Keys
Click Create New Key
Select LLM service
Copy your API key (starts with wayscloud_llm_prod_)

Quick Start

Using cURL

curl -X POST https://api.wayscloud.services/v1/chat/completions \
  -H "Authorization: Bearer wayscloud_llm_prod_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-80b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is quantum computing?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Using Python (OpenAI SDK)

from openai import OpenAI

# Initialize client with WAYSCloud endpoint
client = OpenAI(
    api_key="wayscloud_llm_prod_YOUR_API_KEY",
    base_url="https://api.wayscloud.services/v1"
)

# Create chat completion
response = client.chat.completions.create(
    model="qwen3-80b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is quantum computing?"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Streaming Responses

# Stream responses in real-time
stream = client.chat.completions.create(
    model="qwen3-80b-instruct",
    messages=[{"role": "user", "content": "Write a story about a robot."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Using Reasoning Models

# Use deepseek-r1 for complex reasoning
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[
        {"role": "user", "content": "Solve: If x + 2 = 5, what is x^2 + 3x?"}
    ]
)

print(response.choices[0].message.content)
# Output includes thinking process and final answer

Using Node.js

const OpenAI = require('openai');

const client = new OpenAI({
  apiKey: 'wayscloud_llm_prod_YOUR_API_KEY',
  baseURL: 'https://api.wayscloud.services/v1'
});

async function chat() {
  const completion = await client.chat.completions.create({
    model: 'qwen3-80b-instruct',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Explain neural networks simply.' }
    ],
    temperature: 0.7,
    max_tokens: 500
  });

  console.log(completion.choices[0].message.content);
}

chat();

Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen3-80b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Rate Limits

1000 requests/minute per API key
Burst: Up to 100 additional requests
Token limits: Vary by subscription plan (see pricing)

Model Selection Guide

Need fast responses? → Use mixtral-8x7b (most cost-effective)

Need complex reasoning? → Use deepseek-r1 or qwen3-235b-thinking (shows thinking process)

Need code generation? → Use qwen3-coder-480b or deepseek-v3

Need multilingual support? → Use qwen3-80b-instruct (excellent for Chinese, English, others)

Need to process long documents? → Use kimi-k2 (200K context window)

Next Steps

Chat Completions (coming soon) - Detailed API reference
Streaming (coming soon) - Real-time responses with SSE
Reasoning Models (coming soon) - How to use chain-of-thought models
Error Handling (coming soon) - Error codes and troubleshooting
Code Examples (coming soon) - More examples in different languages

Support

Need help?

Key Features​

Base URL​

Available Models​

Standard Chat Models​

Reasoning Models (Chain-of-Thought)​

Specialized Models​

Authentication​

Get Your API Key​

Quick Start​

Using cURL​

Using Python (OpenAI SDK)​

Streaming Responses​

Using Reasoning Models​

Using Node.js​

Response Format​

Rate Limits​

Model Selection Guide​

Next Steps​

Support​