LLM API

OpenAI-compatible chat completions API hosted in EU datacenters. Change the base URL and API key in your existing OpenAI SDK code and it works immediately.

Best for: applications that need language model inference with EU data residency, per-token billing, and no minimum commitment.

Activate LLM API | LLM API reference

What this is

WAYSCloud LLM is an OpenAI-compatible inference API at https://api.wayscloud.services/v1/llm. It implements the /v1/chat/completions and /v1/models endpoints with the same request and response format as the OpenAI API. You authenticate with a Bearer token, pick a model, and send messages. Responses can be streamed or returned as a single block. All inference runs in EU datacenters. No data leaves Europe.

When to use it

Use this when:

You need language model inference with EU data residency
You want to use existing OpenAI SDK code with a different provider
You need per-token billing without a fixed monthly commitment
You want access to multiple models (general purpose, reasoning, code) through one API

Consider something else when:

You need image or video generation — use GPU Studio
You want a pre-built chatbot with knowledge base — use Chatbot (CaaS)
You need audio transcription — use Speech Intelligence

What you get

OpenAI-compatible endpoint at https://api.wayscloud.services/v1/llm
11 models: general purpose, reasoning, code generation, content moderation
Streaming with server-sent events for real-time token delivery
EU data residency: all inference runs in European datacenters
Per-token billing with no minimum commitment
3 plan tiers with different rate limits: Starter (60 RPM), Pro (120 RPM), Enterprise (300 RPM)

Pricing

All prices exclude VAT. Per-token billing.

chat/light

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
gemma-3n-e4b	total tokens	93000	1.10	1.04	690000
gpt-oss-20b	total tokens	232500	2.75	2.61	1.73
mixtral-8x7b	total tokens	3.40	40	38	25.09

chat/processing

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
deepseek-v3	total tokens	5.95	70	66.50	43.91
deepseek-v3.1	total tokens	2.79	33	31.32	20.70
gpt-oss-120b	total tokens	700000	8.25	7.83	5.17
kimi-k2.5	total tokens	2.33	27.50	26.10	17.25
llama-3.3-70b	total tokens	4.09	48.40	45.94	30.36
qwen3-235b-instruct	total tokens	930000	11	10.44	6.90
qwen3.5-9b	total tokens	470000	5.50	5.22	3.45

code

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
qwen3-coder-480b	total tokens	10.20	120	114	75.27

embedding

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
embedding-multilingual	total tokens	93000	1.10	1.04	690000

moderation

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
llamaguard-4	total tokens	2.13	25	23.75	15.68

reasoning

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
deepseek-r1	input tokens	34	400	380	250.91
qwen3-235b-thinking	input tokens	14.88	175	166.25	109.77

vision

Model	Metric	EUR/M tokens	NOK/M tokens	SEK/M tokens	DKK/M tokens
qwen3-vl-8b	total tokens	837000	9.90	9.40	6.21

View all plans in dashboard

How it works

Activate the LLM service in the dashboard and choose a plan (Starter, Pro, or Enterprise).
Copy your API key. It is shown only once.
Set the base URL to https://api.wayscloud.services/v1/llm in your OpenAI SDK or HTTP client.
Send a chat completion request with your chosen model and messages array.
Receive the response as a single JSON object or as a stream of server-sent events.
Monitor usage in the dashboard: tokens consumed, cost by model, and recent requests.

What you see in the dashboard

Tokens this month: input + output token count with progress toward plan limit
Cost this month: broken down by model
Models used: list with per-model token consumption and average latency
Recent requests: timestamp, model, token count, latency, status
Plan details: current tier, rate limit, included tokens

Fastest way to get started

Dashboard

Open my.wayscloud.services and go to AI & Machine Learning then LLM API
Click Activate, choose Starter plan, and copy your API key

API

bash

curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
  -H "Authorization: Bearer wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-235b-thinking",
    "messages": [{"role": "user", "content": "What is the capital of Norway?"}],
    "max_tokens": 128
  }'

Example request and response

Request: Chat completion with system prompt

bash

curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
  -H "Authorization: Bearer wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [
      {"role": "system", "content": "You are a concise technical writer."},
      {"role": "user", "content": "Explain the difference between TCP and UDP in two sentences."}
    ],
    "temperature": 0.3,
    "max_tokens": 256
  }'

Response:

json

{
  "id": "chatcmpl-9f2a7b3c",
  "object": "chat.completion",
  "model": "deepseek-r1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "TCP is a connection-oriented protocol that guarantees ordered, reliable delivery of data through acknowledgments and retransmissions. UDP is connectionless and sends datagrams without delivery guarantees, which makes it faster but suitable only when occasional packet loss is acceptable."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 31,
    "completion_tokens": 47,
    "total_tokens": 78
  }
}

Python with OpenAI SDK:

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.wayscloud.services/v1/llm",
    api_key="wayscloud_llm_abc12_YOUR_SECRET"
)

response = client.chat.completions.create(
    model="mixtral-8x7b",
    messages=[{"role": "user", "content": "Summarize GDPR Article 17 in plain language."}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Available models:

Model	Description	Context	EUR	NOK	SEK	DKK
`deepseek-v3`	General purpose, fast	128K	7	80	80	55
`deepseek-r1`	Reasoning (chain-of-thought)	128K	13 / 36	150 / 420	150 / 420	103 / 290
`qwen3-80b-instruct`	General purpose, balanced	128K	1 / 8	15 / 90	15 / 90	10 / 62
`qwen3-80b-thinking`	Reasoning with thinking	128K	1 / 9	15 / 100	15 / 100	10 / 69
`qwen3-235b-thinking`	Large reasoning model	128K	3 / 16	35 / 180	35 / 180	24 / 124
`qwen3-coder-480b`	Code generation	128K	10	120	120	83
`qwen3-vl-32b`	Vision + language	32K	—	—	—	—
`kimi-k2`	General purpose	128K	5 / 14	60 / 160	60 / 160	41 / 110
`mixtral-8x7b`	Low-cost general purpose	32K	4	45	45	31
`llama-3.1-405b`	Large general purpose	128K	22	250	250	172
`llamaguard-4`	Content moderation	—	2	25	25	17

Prices are per 1M tokens. Where two values are shown: input / output. All prices exclude VAT.

Common use cases

Customer support — draft replies, summarize tickets, classify intent
Content generation — blog posts, product descriptions, email templates
Code assistance — code review, generation, documentation, bug explanation
Data extraction — parse unstructured text into structured fields
Translation — translate text between languages with context awareness

GPU Studio — image and video generation
Chatbot (CaaS) — pre-built chatbot with knowledge base and RAG
Speech Intelligence — transcribe audio, then analyze with LLM

Run an LLM Request — step-by-step guide
LLM API reference — all endpoints
API Keys — managing API credentials
Getting Started — platform overview

Open in dashboard

LLM API ​

What this is ​

When to use it ​

What you get ​

Pricing ​

chat/light ​

chat/processing ​

code ​

embedding ​

moderation ​

reasoning ​

vision ​

How it works ​

What you see in the dashboard ​

Fastest way to get started ​

Dashboard ​

API ​

Example request and response ​

Common use cases ​

Related services ​

Related documentation ​