LLM API
OpenAI-compatible chat completions API hosted in EU datacenters. Change the base URL and API key in your existing OpenAI SDK code and it works immediately.
Best for: applications that need language model inference with EU data residency, per-token billing, and no minimum commitment.
Activate LLM API | LLM API reference
What this is
WAYSCloud LLM is an OpenAI-compatible inference API at https://api.wayscloud.services/v1/llm. It implements the /v1/chat/completions and /v1/models endpoints with the same request and response format as the OpenAI API. You authenticate with a Bearer token, pick a model, and send messages. Responses can be streamed or returned as a single block. All inference runs in EU datacenters. No data leaves Europe.
When to use it
Use this when:
- You need language model inference with EU data residency
- You want to use existing OpenAI SDK code with a different provider
- You need per-token billing without a fixed monthly commitment
- You want access to multiple models (general purpose, reasoning, code) through one API
Consider something else when:
- You need image or video generation — use GPU Studio
- You want a pre-built chatbot with knowledge base — use Chatbot (CaaS)
- You need audio transcription — use Speech Intelligence
What you get
- OpenAI-compatible endpoint at
https://api.wayscloud.services/v1/llm - 11 models: general purpose, reasoning, code generation, content moderation
- Streaming with server-sent events for real-time token delivery
- EU data residency: all inference runs in European datacenters
- Per-token billing with no minimum commitment
- 3 plan tiers with different rate limits: Starter (60 RPM), Pro (120 RPM), Enterprise (300 RPM)
Pricing
All prices exclude VAT. Per-token billing.
chat/light
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| gemma-3n-e4b | total tokens | 93000 | 1.10 | 1.04 | 690000 |
| gpt-oss-20b | total tokens | 232500 | 2.75 | 2.61 | 1.73 |
| mixtral-8x7b | total tokens | 3.40 | 40 | 38 | 25.09 |
chat/processing
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| deepseek-v3 | total tokens | 5.95 | 70 | 66.50 | 43.91 |
| deepseek-v3.1 | total tokens | 2.79 | 33 | 31.32 | 20.70 |
| gpt-oss-120b | total tokens | 700000 | 8.25 | 7.83 | 5.17 |
| kimi-k2.5 | total tokens | 2.33 | 27.50 | 26.10 | 17.25 |
| llama-3.3-70b | total tokens | 4.09 | 48.40 | 45.94 | 30.36 |
| qwen3-235b-instruct | total tokens | 930000 | 11 | 10.44 | 6.90 |
| qwen3.5-9b | total tokens | 470000 | 5.50 | 5.22 | 3.45 |
code
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| qwen3-coder-480b | total tokens | 10.20 | 120 | 114 | 75.27 |
embedding
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| embedding-multilingual | total tokens | 93000 | 1.10 | 1.04 | 690000 |
moderation
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| llamaguard-4 | total tokens | 2.13 | 25 | 23.75 | 15.68 |
reasoning
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| deepseek-r1 | input tokens | 34 | 400 | 380 | 250.91 |
| qwen3-235b-thinking | input tokens | 14.88 | 175 | 166.25 | 109.77 |
vision
| Model | Metric | EUR/M tokens | NOK/M tokens | SEK/M tokens | DKK/M tokens |
|---|---|---|---|---|---|
| qwen3-vl-8b | total tokens | 837000 | 9.90 | 9.40 | 6.21 |
How it works
- Activate the LLM service in the dashboard and choose a plan (Starter, Pro, or Enterprise).
- Copy your API key. It is shown only once.
- Set the base URL to
https://api.wayscloud.services/v1/llmin your OpenAI SDK or HTTP client. - Send a chat completion request with your chosen model and messages array.
- Receive the response as a single JSON object or as a stream of server-sent events.
- Monitor usage in the dashboard: tokens consumed, cost by model, and recent requests.
What you see in the dashboard
- Tokens this month: input + output token count with progress toward plan limit
- Cost this month: broken down by model
- Models used: list with per-model token consumption and average latency
- Recent requests: timestamp, model, token count, latency, status
- Plan details: current tier, rate limit, included tokens
Fastest way to get started
Dashboard
- Open my.wayscloud.services and go to AI & Machine Learning then LLM API
- Click Activate, choose Starter plan, and copy your API key
API
curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
-H "Authorization: Bearer wayscloud_llm_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-235b-thinking",
"messages": [{"role": "user", "content": "What is the capital of Norway?"}],
"max_tokens": 128
}'Example request and response
Request: Chat completion with system prompt
curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
-H "Authorization: Bearer wayscloud_llm_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "Explain the difference between TCP and UDP in two sentences."}
],
"temperature": 0.3,
"max_tokens": 256
}'Response:
{
"id": "chatcmpl-9f2a7b3c",
"object": "chat.completion",
"model": "deepseek-r1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "TCP is a connection-oriented protocol that guarantees ordered, reliable delivery of data through acknowledgments and retransmissions. UDP is connectionless and sends datagrams without delivery guarantees, which makes it faster but suitable only when occasional packet loss is acceptable."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 31,
"completion_tokens": 47,
"total_tokens": 78
}
}Python with OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://api.wayscloud.services/v1/llm",
api_key="wayscloud_llm_abc12_YOUR_SECRET"
)
response = client.chat.completions.create(
model="mixtral-8x7b",
messages=[{"role": "user", "content": "Summarize GDPR Article 17 in plain language."}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Available models:
| Model | Description | Context | EUR | NOK | SEK | DKK |
|---|---|---|---|---|---|---|
deepseek-v3 | General purpose, fast | 128K | 7 | 80 | 80 | 55 |
deepseek-r1 | Reasoning (chain-of-thought) | 128K | 13 / 36 | 150 / 420 | 150 / 420 | 103 / 290 |
qwen3-80b-instruct | General purpose, balanced | 128K | 1 / 8 | 15 / 90 | 15 / 90 | 10 / 62 |
qwen3-80b-thinking | Reasoning with thinking | 128K | 1 / 9 | 15 / 100 | 15 / 100 | 10 / 69 |
qwen3-235b-thinking | Large reasoning model | 128K | 3 / 16 | 35 / 180 | 35 / 180 | 24 / 124 |
qwen3-coder-480b | Code generation | 128K | 10 | 120 | 120 | 83 |
qwen3-vl-32b | Vision + language | 32K | — | — | — | — |
kimi-k2 | General purpose | 128K | 5 / 14 | 60 / 160 | 60 / 160 | 41 / 110 |
mixtral-8x7b | Low-cost general purpose | 32K | 4 | 45 | 45 | 31 |
llama-3.1-405b | Large general purpose | 128K | 22 | 250 | 250 | 172 |
llamaguard-4 | Content moderation | — | 2 | 25 | 25 | 17 |
Prices are per 1M tokens. Where two values are shown: input / output. All prices exclude VAT.
Common use cases
- Customer support — draft replies, summarize tickets, classify intent
- Content generation — blog posts, product descriptions, email templates
- Code assistance — code review, generation, documentation, bug explanation
- Data extraction — parse unstructured text into structured fields
- Translation — translate text between languages with context awareness
Related services
- GPU Studio — image and video generation
- Chatbot (CaaS) — pre-built chatbot with knowledge base and RAG
- Speech Intelligence — transcribe audio, then analyze with LLM
Related documentation
- Run an LLM Request — step-by-step guide
- LLM API reference — all endpoints
- API Keys — managing API credentials
- Getting Started — platform overview