Skip to main content

LLM API Overview

WAYSCloud LLM API provides access to 10 state-of-the-art language models via an OpenAI-compatible API. Perfect for chatbots, content generation, code assistance, and reasoning tasks.

Key Features

  • 🤖 10 Powerful Models - Including reasoning models with chain-of-thought
  • 🔌 OpenAI Compatible - Drop-in replacement for OpenAI API
  • Streaming Support - Real-time responses with Server-Sent Events (SSE)
  • 🔐 Secure - API key authentication with rate limiting
  • 📊 Usage Tracking - Automatic token counting and billing

Base URL

https://api.wayscloud.services/v1

Available Models

Standard Chat Models

ModelDescriptionContextBest For
qwen3-80b-instructHigh-quality general purpose32K tokensGeneral tasks, multilingual
deepseek-v3Advanced reasoning and coding64K tokensComplex tasks, code generation
kimi-k2Long context support200K tokensLong documents, analysis
mixtral-8x7bFast and efficient32K tokensQuick responses, cost-effective
llama-3.1-405bLarge-scale reasoning128K tokensComplex reasoning, analysis

Reasoning Models (Chain-of-Thought)

ModelDescriptionContextBest For
deepseek-r1Advanced reasoning with thinking process64K tokensMath, logic, problem-solving
qwen3-80b-thinkingReasoning-optimized Qwen32K tokensStep-by-step reasoning
qwen3-235b-thinkingLarge-scale reasoning model32K tokensComplex multi-step problems

Specialized Models

ModelDescriptionContextBest For
qwen3-coder-480bCode generation and debugging32K tokensProgramming tasks
llamaguard-4Content moderation8K tokensSafety, content filtering

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer wayscloud_llm_prod_YOUR_API_KEY

Get Your API Key

  1. Log in to WAYSCloud Console
  2. Navigate to API Keys
  3. Click Create New Key
  4. Select LLM service
  5. Copy your API key (starts with wayscloud_llm_prod_)

Quick Start

Using cURL

curl -X POST https://api.wayscloud.services/v1/chat/completions \
-H "Authorization: Bearer wayscloud_llm_prod_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-80b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
"temperature": 0.7,
"max_tokens": 500
}'

Using Python (OpenAI SDK)

from openai import OpenAI

# Initialize client with WAYSCloud endpoint
client = OpenAI(
api_key="wayscloud_llm_prod_YOUR_API_KEY",
base_url="https://api.wayscloud.services/v1"
)

# Create chat completion
response = client.chat.completions.create(
model="qwen3-80b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
temperature=0.7,
max_tokens=500
)

print(response.choices[0].message.content)

Streaming Responses

# Stream responses in real-time
stream = client.chat.completions.create(
model="qwen3-80b-instruct",
messages=[{"role": "user", "content": "Write a story about a robot."}],
stream=True
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

Using Reasoning Models

# Use deepseek-r1 for complex reasoning
response = client.chat.completions.create(
model="deepseek-r1",
messages=[
{"role": "user", "content": "Solve: If x + 2 = 5, what is x^2 + 3x?"}
]
)

print(response.choices[0].message.content)
# Output includes thinking process and final answer

Using Node.js

const OpenAI = require('openai');

const client = new OpenAI({
apiKey: 'wayscloud_llm_prod_YOUR_API_KEY',
baseURL: 'https://api.wayscloud.services/v1'
});

async function chat() {
const completion = await client.chat.completions.create({
model: 'qwen3-80b-instruct',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain neural networks simply.' }
],
temperature: 0.7,
max_tokens: 500
});

console.log(completion.choices[0].message.content);
}

chat();

Response Format

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "qwen3-80b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}

Rate Limits

  • 1000 requests/minute per API key
  • Burst: Up to 100 additional requests
  • Token limits: Vary by subscription plan (see pricing)

Model Selection Guide

Need fast responses? → Use mixtral-8x7b (most cost-effective)

Need complex reasoning? → Use deepseek-r1 or qwen3-235b-thinking (shows thinking process)

Need code generation? → Use qwen3-coder-480b or deepseek-v3

Need multilingual support? → Use qwen3-80b-instruct (excellent for Chinese, English, others)

Need to process long documents? → Use kimi-k2 (200K context window)

Next Steps

  • Chat Completions (coming soon) - Detailed API reference
  • Streaming (coming soon) - Real-time responses with SSE
  • Reasoning Models (coming soon) - How to use chain-of-thought models
  • Error Handling (coming soon) - Error codes and troubleshooting
  • Code Examples (coming soon) - More examples in different languages

Support

Need help?