Skip to content

LLM API

OpenAI-compatible chat completions API hosted in EU datacenters. Change the base URL and API key in your existing OpenAI SDK code and it works immediately.

Best for: applications that need language model inference with EU data residency, per-token billing, and no minimum commitment.

Activate LLM API | LLM API reference


What this is

WAYSCloud LLM is an OpenAI-compatible inference API at https://api.wayscloud.services/v1/llm. It implements the /v1/chat/completions and /v1/models endpoints with the same request and response format as the OpenAI API. You authenticate with a Bearer token, pick a model, and send messages. Responses can be streamed or returned as a single block. All inference runs in EU datacenters. No data leaves Europe.


When to use it

Use this when:

  • You need language model inference with EU data residency
  • You want to use existing OpenAI SDK code with a different provider
  • You need per-token billing without a fixed monthly commitment
  • You want access to multiple models (general purpose, reasoning, code) through one API

Consider something else when:


What you get

  • OpenAI-compatible endpoint at https://api.wayscloud.services/v1/llm
  • 11 models: general purpose, reasoning, code generation, content moderation
  • Streaming with server-sent events for real-time token delivery
  • EU data residency: all inference runs in European datacenters
  • Per-token billing with no minimum commitment
  • 3 plan tiers with different rate limits: Starter (60 RPM), Pro (120 RPM), Enterprise (300 RPM)

Pricing

All prices exclude VAT. Per-token billing.

chat/light

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
gemma-3n-e4btotal tokens930001.101.04690000
gpt-oss-20btotal tokens2325002.752.611.73
mixtral-8x7btotal tokens3.40403825.09

chat/processing

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
deepseek-v3total tokens5.957066.5043.91
deepseek-v3.1total tokens2.793331.3220.70
gpt-oss-120btotal tokens7000008.257.835.17
kimi-k2.5total tokens2.3327.5026.1017.25
llama-3.3-70btotal tokens4.0948.4045.9430.36
qwen3-235b-instructtotal tokens9300001110.446.90
qwen3.5-9btotal tokens4700005.505.223.45

code

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
qwen3-coder-480btotal tokens10.2012011475.27

embedding

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
embedding-multilingualtotal tokens930001.101.04690000

moderation

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
llamaguard-4total tokens2.132523.7515.68

reasoning

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
deepseek-r1input tokens34400380250.91
qwen3-235b-thinkinginput tokens14.88175166.25109.77

vision

ModelMetricEUR/M tokensNOK/M tokensSEK/M tokensDKK/M tokens
qwen3-vl-8btotal tokens8370009.909.406.21

View all plans in dashboard


How it works

  1. Activate the LLM service in the dashboard and choose a plan (Starter, Pro, or Enterprise).
  2. Copy your API key. It is shown only once.
  3. Set the base URL to https://api.wayscloud.services/v1/llm in your OpenAI SDK or HTTP client.
  4. Send a chat completion request with your chosen model and messages array.
  5. Receive the response as a single JSON object or as a stream of server-sent events.
  6. Monitor usage in the dashboard: tokens consumed, cost by model, and recent requests.

What you see in the dashboard

  • Tokens this month: input + output token count with progress toward plan limit
  • Cost this month: broken down by model
  • Models used: list with per-model token consumption and average latency
  • Recent requests: timestamp, model, token count, latency, status
  • Plan details: current tier, rate limit, included tokens

Fastest way to get started

Dashboard

  1. Open my.wayscloud.services and go to AI & Machine Learning then LLM API
  2. Click Activate, choose Starter plan, and copy your API key

API

bash
curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
  -H "Authorization: Bearer wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-235b-thinking",
    "messages": [{"role": "user", "content": "What is the capital of Norway?"}],
    "max_tokens": 128
  }'

Example request and response

Request: Chat completion with system prompt

bash
curl -X POST https://api.wayscloud.services/v1/llm/chat/completions \
  -H "Authorization: Bearer wayscloud_llm_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [
      {"role": "system", "content": "You are a concise technical writer."},
      {"role": "user", "content": "Explain the difference between TCP and UDP in two sentences."}
    ],
    "temperature": 0.3,
    "max_tokens": 256
  }'

Response:

json
{
  "id": "chatcmpl-9f2a7b3c",
  "object": "chat.completion",
  "model": "deepseek-r1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "TCP is a connection-oriented protocol that guarantees ordered, reliable delivery of data through acknowledgments and retransmissions. UDP is connectionless and sends datagrams without delivery guarantees, which makes it faster but suitable only when occasional packet loss is acceptable."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 31,
    "completion_tokens": 47,
    "total_tokens": 78
  }
}

Python with OpenAI SDK:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.wayscloud.services/v1/llm",
    api_key="wayscloud_llm_abc12_YOUR_SECRET"
)

response = client.chat.completions.create(
    model="mixtral-8x7b",
    messages=[{"role": "user", "content": "Summarize GDPR Article 17 in plain language."}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Available models:

ModelDescriptionContextEURNOKSEKDKK
deepseek-v3General purpose, fast128K7808055
deepseek-r1Reasoning (chain-of-thought)128K13 / 36150 / 420150 / 420103 / 290
qwen3-80b-instructGeneral purpose, balanced128K1 / 815 / 9015 / 9010 / 62
qwen3-80b-thinkingReasoning with thinking128K1 / 915 / 10015 / 10010 / 69
qwen3-235b-thinkingLarge reasoning model128K3 / 1635 / 18035 / 18024 / 124
qwen3-coder-480bCode generation128K1012012083
qwen3-vl-32bVision + language32K
kimi-k2General purpose128K5 / 1460 / 16060 / 16041 / 110
mixtral-8x7bLow-cost general purpose32K4454531
llama-3.1-405bLarge general purpose128K22250250172
llamaguard-4Content moderation2252517

Prices are per 1M tokens. Where two values are shown: input / output. All prices exclude VAT.


Common use cases

  • Customer support — draft replies, summarize tickets, classify intent
  • Content generation — blog posts, product descriptions, email templates
  • Code assistance — code review, generation, documentation, bug explanation
  • Data extraction — parse unstructured text into structured fields
  • Translation — translate text between languages with context awareness


Open in dashboard