LLM API Overview
WAYSCloud LLM API provides access to 10 state-of-the-art language models via an OpenAI-compatible API. Perfect for chatbots, content generation, code assistance, and reasoning tasks.
Key Features
- 🤖 10 Powerful Models - Including reasoning models with chain-of-thought
- 🔌 OpenAI Compatible - Drop-in replacement for OpenAI API
- ⚡ Streaming Support - Real-time responses with Server-Sent Events (SSE)
- 🔐 Secure - API key authentication with rate limiting
- 📊 Usage Tracking - Automatic token counting and billing
Base URL
https://api.wayscloud.services/v1
Available Models
Standard Chat Models
| Model | Description | Context | Best For |
|---|---|---|---|
qwen3-80b-instruct | High-quality general purpose | 32K tokens | General tasks, multilingual |
deepseek-v3 | Advanced reasoning and coding | 64K tokens | Complex tasks, code generation |
kimi-k2 | Long context support | 200K tokens | Long documents, analysis |
mixtral-8x7b | Fast and efficient | 32K tokens | Quick responses, cost-effective |
llama-3.1-405b | Large-scale reasoning | 128K tokens | Complex reasoning, analysis |
Reasoning Models (Chain-of-Thought)
| Model | Description | Context | Best For |
|---|---|---|---|
deepseek-r1 ⭐ | Advanced reasoning with thinking process | 64K tokens | Math, logic, problem-solving |
qwen3-80b-thinking | Reasoning-optimized Qwen | 32K tokens | Step-by-step reasoning |
qwen3-235b-thinking | Large-scale reasoning model | 32K tokens | Complex multi-step problems |
Specialized Models
| Model | Description | Context | Best For |
|---|---|---|---|
qwen3-coder-480b | Code generation and debugging | 32K tokens | Programming tasks |
llamaguard-4 | Content moderation | 8K tokens | Safety, content filtering |
Authentication
All requests require an API key in the Authorization header:
Authorization: Bearer wayscloud_llm_prod_YOUR_API_KEY
Get Your API Key
- Log in to WAYSCloud Console
- Navigate to API Keys
- Click Create New Key
- Select LLM service
- Copy your API key (starts with
wayscloud_llm_prod_)
Quick Start
Using cURL
curl -X POST https://api.wayscloud.services/v1/chat/completions \
-H "Authorization: Bearer wayscloud_llm_prod_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-80b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
"temperature": 0.7,
"max_tokens": 500
}'
Using Python (OpenAI SDK)
from openai import OpenAI
# Initialize client with WAYSCloud endpoint
client = OpenAI(
api_key="wayscloud_llm_prod_YOUR_API_KEY",
base_url="https://api.wayscloud.services/v1"
)
# Create chat completion
response = client.chat.completions.create(
model="qwen3-80b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
Streaming Responses
# Stream responses in real-time
stream = client.chat.completions.create(
model="qwen3-80b-instruct",
messages=[{"role": "user", "content": "Write a story about a robot."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Using Reasoning Models
# Use deepseek-r1 for complex reasoning
response = client.chat.completions.create(
model="deepseek-r1",
messages=[
{"role": "user", "content": "Solve: If x + 2 = 5, what is x^2 + 3x?"}
]
)
print(response.choices[0].message.content)
# Output includes thinking process and final answer
Using Node.js
const OpenAI = require('openai');
const client = new OpenAI({
apiKey: 'wayscloud_llm_prod_YOUR_API_KEY',
baseURL: 'https://api.wayscloud.services/v1'
});
async function chat() {
const completion = await client.chat.completions.create({
model: 'qwen3-80b-instruct',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain neural networks simply.' }
],
temperature: 0.7,
max_tokens: 500
});
console.log(completion.choices[0].message.content);
}
chat();
Response Format
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "qwen3-80b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}
Rate Limits
- 1000 requests/minute per API key
- Burst: Up to 100 additional requests
- Token limits: Vary by subscription plan (see pricing)
Model Selection Guide
Need fast responses?
→ Use mixtral-8x7b (most cost-effective)
Need complex reasoning?
→ Use deepseek-r1 or qwen3-235b-thinking (shows thinking process)
Need code generation?
→ Use qwen3-coder-480b or deepseek-v3
Need multilingual support?
→ Use qwen3-80b-instruct (excellent for Chinese, English, others)
Need to process long documents?
→ Use kimi-k2 (200K context window)
Next Steps
- Chat Completions (coming soon) - Detailed API reference
- Streaming (coming soon) - Real-time responses with SSE
- Reasoning Models (coming soon) - How to use chain-of-thought models
- Error Handling (coming soon) - Error codes and troubleshooting
- Code Examples (coming soon) - More examples in different languages
Support
Need help?
- 📧 Email: support@wayscloud.services
- 💬 Mattermost Chat
- 📚 OpenAPI Spec