Rate Limits

WAYSCloud implements rate limits to ensure fair usage and maintain service quality. This guide covers rate limits for each service, how to handle rate limiting, and strategies to optimize your usage.

Overview

Rate limits are applied per API key and are measured in requests per time window. When you exceed a rate limit, you'll receive a 429 Too Many Requests response.

Rate Limits by Service

Storage API

Resource	Limit	Window
API Requests	1000 requests	per minute
Burst Allowance	+100 additional requests	per minute
Max File Size	50GB	per upload
Concurrent Connections	100	per API key

Notes:

Burst allowance allows temporary spikes in traffic
Multipart uploads count as one request per part
List operations are paginated (max 1000 objects per page)

LLM API

Resource	Limit	Window
API Requests	1000 requests	per minute
Burst Allowance	+100 additional requests	per minute
Max Tokens (input + output)	Varies by model	per request
Concurrent Requests	50	per API key

Token Limits by Model:

Model	Max Context	Max Output Tokens
`mixtral-8x7b`	32K tokens	4K tokens
`qwen3-80b-instruct`	32K tokens	4K tokens
`qwen3-80b-thinking`	32K tokens	4K tokens
`qwen3-235b-thinking`	32K tokens	4K tokens
`deepseek-v3`	64K tokens	8K tokens
`deepseek-r1`	64K tokens	8K tokens
`kimi-k2`	200K tokens	8K tokens
`llama-3.1-405b`	128K tokens	8K tokens
`qwen3-coder-480b`	32K tokens	4K tokens
`llamaguard-4`	8K tokens	1K tokens

Notes:

Token limits include both input (prompt) and output (completion)
Streaming requests count as one request
Rate limits apply to all models combined

Database API

Resource	Limit	Window
API Requests	500 requests	per minute
Database Creation	10 databases	per hour
Snapshot Creation	20 snapshots	per hour
Max Databases	100	per account
Max Connections	100	per database
Firewall Rules	50	per database

Notes:

Database operations are heavier, hence lower limits
Snapshot operations don't count toward general API limit
Max database size: 1TB

DNS API

Resource	Limit	Window
Zone Creation	10 zones	per hour
Record Creation	100 records	per minute
Record Updates	100 records	per minute
Record Deletion	100 records	per minute
Zone Queries (GET)	1000 requests	per minute
Record Queries (GET)	1000 requests	per minute
Batch Operations	1000 records	per request

Notes:

Zone creation is rate-limited to prevent abuse
Batch operations count as one request regardless of record count
DNSSEC operations have same limits as regular operations

GPU API

Resource	Limit	Window
API Requests	100 requests	per minute
Concurrent Jobs	10 jobs	per API key
Job Status Checks	1000 requests	per minute

Job-Specific Limits:

Job Type	Max Duration	Max Output
Video Generation	10 minutes	30 seconds
Text-to-Speech	5 minutes	10MB
Audio Transcription	30 minutes	1 hour audio
Image Generation	5 minutes	4K resolution

Notes:

Jobs are asynchronous and don't count toward rate limit once submitted
Status checks have separate, higher limit
Webhook delivery doesn't count toward limits

Rate Limit Headers

All API responses include rate limit headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1699123456

Headers:

X-RateLimit-Limit - Maximum requests allowed in window
X-RateLimit-Remaining - Requests remaining in current window
X-RateLimit-Reset - Unix timestamp when limit resets

Handling Rate Limits

429 Too Many Requests Response

When you exceed a rate limit, you'll receive:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699123456

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded",
    "retry_after": 60,
    "details": {
      "limit": "1000 requests/minute",
      "window": "60 seconds"
    }
  }
}

Response Headers:

Retry-After - Seconds to wait before retrying

Exponential Backoff

Implement exponential backoff for 429 responses:

import time
import requests

def api_call_with_backoff(url, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            # Check Retry-After header
            retry_after = int(response.headers.get('Retry-After', 60))

            # Exponential backoff: 1s, 2s, 4s, 8s, 16s
            wait_time = min(retry_after, 2 ** attempt)

            print(f"Rate limited. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
            time.sleep(wait_time)
            continue

        # Handle other errors
        response.raise_for_status()

    raise Exception("Max retries exceeded")

JavaScript/Node.js Example

async function apiCallWithBackoff(url, headers, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, { headers });

    if (response.ok) {
      return await response.json();
    }

    if (response.status === 429) {
      // Check Retry-After header
      const retryAfter = parseInt(response.headers.get('Retry-After') || '60');

      // Exponential backoff
      const waitTime = Math.min(retryAfter, Math.pow(2, attempt));

      console.log(`Rate limited. Waiting ${waitTime}s (attempt ${attempt + 1}/${maxRetries})`);
      await new Promise(resolve => setTimeout(resolve, waitTime * 1000));
      continue;
    }

    throw new Error(`HTTP ${response.status}: ${await response.text()}`);
  }

  throw new Error('Max retries exceeded');
}

Optimization Strategies

1. Monitor Rate Limit Headers

Track remaining requests to avoid hitting limits:

def smart_api_call(url, headers):
    response = requests.get(url, headers=headers)

    # Check remaining requests
    remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
    reset_time = int(response.headers.get('X-RateLimit-Reset', 0))

    if remaining < 10:
        # Close to limit - slow down
        current_time = time.time()
        wait_until_reset = reset_time - current_time

        if wait_until_reset > 0:
            print(f"Approaching rate limit. {remaining} requests remaining.")
            print(f"Slowing down for {wait_until_reset}s")
            time.sleep(wait_until_reset / 10)  # Space out remaining requests

    return response.json()

2. Batch Operations

Use batch endpoints when available:

# Bad - Multiple individual requests (uses 100 API calls)
for record in records:
    api.create_dns_record(zone_id, record)

# Good - Single batch request (uses 1 API call)
api.create_dns_records_batch(zone_id, records)

Services with Batch Support:

DNS API: Create/update/delete multiple records
Storage API: Use multipart upload for large files
Database API: Manage multiple firewall rules

3. Cache Responses

Cache responses when data doesn't change frequently:

import time
from functools import lru_cache

@lru_cache(maxsize=128)
def get_dns_records(zone_id):
    """Cache DNS records for 60 seconds"""
    return api.list_dns_records(zone_id)

# Or use time-based caching
cache = {}
CACHE_TTL = 60  # seconds

def get_cached_data(key):
    if key in cache:
        data, timestamp = cache[key]
        if time.time() - timestamp < CACHE_TTL:
            return data

    # Fetch fresh data
    data = api.get_data(key)
    cache[key] = (data, time.time())
    return data

4. Use Webhooks (GPU API)

Instead of polling job status, use webhooks:

# Bad - Polling (uses many API calls)
while True:
    status = api.get_job_status(job_id)
    if status['status'] in ['completed', 'failed']:
        break
    time.sleep(5)  # Poll every 5 seconds

# Good - Webhooks (uses 1 API call)
job = api.create_job(
    job_type='video_generation',
    webhook_url='https://myapp.com/webhook'
)
# Job status delivered to your webhook when complete

5. Distribute Load

Spread requests evenly over time:

import time

def rate_limited_loop(items, requests_per_second=10):
    """Process items with rate limiting"""
    interval = 1.0 / requests_per_second

    for item in items:
        start_time = time.time()

        # Process item
        process_item(item)

        # Wait to maintain rate
        elapsed = time.time() - start_time
        if elapsed < interval:
            time.sleep(interval - elapsed)

6. Use Multiple API Keys

For high-volume applications, use multiple API keys:

from itertools import cycle

# Multiple API keys
api_keys = [
    'wayscloud_storage_key1_secret',
    'wayscloud_storage_key2_secret',
    'wayscloud_storage_key3_secret'
]

# Round-robin through keys
key_cycle = cycle(api_keys)

def make_request(url):
    api_key = next(key_cycle)
    headers = {'Authorization': f'Bearer {api_key}'}
    return requests.get(url, headers=headers)

warning

Ensure compliance with Terms of Service when using multiple keys. Contact support for high-volume use cases.

Quota Limits

In addition to rate limits, some resources have quota limits:

Storage Quotas

Plan	Storage Quota	Bandwidth
Free	10GB	50GB/month
Basic	100GB	500GB/month
Pro	1TB	5TB/month
Enterprise	Custom	Custom

Database Quotas

Plan	Max Databases	Max DB Size	Snapshots
Free	3	1GB	5 per DB
Basic	10	10GB	10 per DB
Pro	50	100GB	30 per DB
Enterprise	Custom	Custom	Custom

DNS Quotas

Plan	Max Zones	Records per Zone	Queries/Month
Free	3	100	1M
Basic	10	500	10M
Pro	50	2000	100M
Enterprise	Custom	Custom	Custom

Monitoring Usage

Via Dashboard

Monitor API usage at my.wayscloud.services/usage:

Real-time request counts
Rate limit violations
Quota usage
Historical data

Via API

Check usage programmatically:

curl -X GET "https://provision.wayscloud.net/api/v1/dashboard/usage" \
  -H "Authorization: Bearer {keycloak_token}"

Response:

{
  "period": "2025-11-04",
  "services": {
    "storage": {
      "requests": 45234,
      "rate_limit_hits": 3,
      "quota_used": "45.2GB",
      "quota_limit": "100GB"
    },
    "llm": {
      "requests": 12456,
      "tokens_used": 5234567,
      "rate_limit_hits": 0
    },
    "database": {
      "requests": 3421,
      "databases_active": 5,
      "rate_limit_hits": 0
    }
  }
}

Upgrading Limits

Need higher limits? Contact us:

Email: sales@wayscloud.no
Subject: Rate Limit Increase Request
Include:
- Current usage patterns
- Required limits
- Use case description
- Expected growth

Enterprise plans offer custom rate limits and dedicated capacity.

Best Practices Summary

✅ Monitor headers - Check X-RateLimit-Remaining
✅ Implement backoff - Use exponential backoff for 429 errors
✅ Use batch operations - Reduce API calls with batch endpoints
✅ Cache responses - Cache data that doesn't change frequently
✅ Use webhooks - Avoid polling with webhook notifications
✅ Distribute load - Spread requests evenly over time
✅ Log violations - Track and investigate rate limit hits
✅ Plan capacity - Monitor usage trends and upgrade proactively

Next Steps

Error Handling - Handle rate limit errors properly
Authentication - API key management
Monitoring - API endpoint reference

Support

Questions about rate limits?

Email: support@wayscloud.no
Documentation: https://docs.wayscloud.services
Dashboard: https://my.wayscloud.services

Overview​

Rate Limits by Service​

Storage API​

LLM API​

Database API​

DNS API​

GPU API​

Rate Limit Headers​

Handling Rate Limits​

429 Too Many Requests Response​

Exponential Backoff​

JavaScript/Node.js Example​

Optimization Strategies​

1. Monitor Rate Limit Headers​

2. Batch Operations​

3. Cache Responses​

4. Use Webhooks (GPU API)​

5. Distribute Load​

6. Use Multiple API Keys​

Quota Limits​

Storage Quotas​

Database Quotas​

DNS Quotas​

Monitoring Usage​

Via Dashboard​

Via API​

Upgrading Limits​

Best Practices Summary​

Next Steps​

Support​

Overview

Rate Limits by Service

Storage API

LLM API

Database API

DNS API

GPU API

Rate Limit Headers

Handling Rate Limits

429 Too Many Requests Response

Exponential Backoff

JavaScript/Node.js Example

Optimization Strategies

1. Monitor Rate Limit Headers

2. Batch Operations

3. Cache Responses

4. Use Webhooks (GPU API)

5. Distribute Load

6. Use Multiple API Keys

Quota Limits

Storage Quotas

Database Quotas

DNS Quotas

Monitoring Usage

Via Dashboard

Via API

Upgrading Limits

Best Practices Summary

Next Steps

Support