Rate Limits
WAYSCloud implements rate limits to ensure fair usage and maintain service quality. This guide covers rate limits for each service, how to handle rate limiting, and strategies to optimize your usage.
Overview
Rate limits are applied per API key and are measured in requests per time window. When you exceed a rate limit, you'll receive a 429 Too Many Requests response.
Rate Limits by Service
Storage API
| Resource | Limit | Window |
|---|---|---|
| API Requests | 1000 requests | per minute |
| Burst Allowance | +100 additional requests | per minute |
| Max File Size | 50GB | per upload |
| Concurrent Connections | 100 | per API key |
Notes:
- Burst allowance allows temporary spikes in traffic
- Multipart uploads count as one request per part
- List operations are paginated (max 1000 objects per page)
LLM API
| Resource | Limit | Window |
|---|---|---|
| API Requests | 1000 requests | per minute |
| Burst Allowance | +100 additional requests | per minute |
| Max Tokens (input + output) | Varies by model | per request |
| Concurrent Requests | 50 | per API key |
Token Limits by Model:
| Model | Max Context | Max Output Tokens |
|---|---|---|
mixtral-8x7b | 32K tokens | 4K tokens |
qwen3-80b-instruct | 32K tokens | 4K tokens |
qwen3-80b-thinking | 32K tokens | 4K tokens |
qwen3-235b-thinking | 32K tokens | 4K tokens |
deepseek-v3 | 64K tokens | 8K tokens |
deepseek-r1 | 64K tokens | 8K tokens |
kimi-k2 | 200K tokens | 8K tokens |
llama-3.1-405b | 128K tokens | 8K tokens |
qwen3-coder-480b | 32K tokens | 4K tokens |
llamaguard-4 | 8K tokens | 1K tokens |
Notes:
- Token limits include both input (prompt) and output (completion)
- Streaming requests count as one request
- Rate limits apply to all models combined
Database API
| Resource | Limit | Window |
|---|---|---|
| API Requests | 500 requests | per minute |
| Database Creation | 10 databases | per hour |
| Snapshot Creation | 20 snapshots | per hour |
| Max Databases | 100 | per account |
| Max Connections | 100 | per database |
| Firewall Rules | 50 | per database |
Notes:
- Database operations are heavier, hence lower limits
- Snapshot operations don't count toward general API limit
- Max database size: 1TB
DNS API
| Resource | Limit | Window |
|---|---|---|
| Zone Creation | 10 zones | per hour |
| Record Creation | 100 records | per minute |
| Record Updates | 100 records | per minute |
| Record Deletion | 100 records | per minute |
| Zone Queries (GET) | 1000 requests | per minute |
| Record Queries (GET) | 1000 requests | per minute |
| Batch Operations | 1000 records | per request |
Notes:
- Zone creation is rate-limited to prevent abuse
- Batch operations count as one request regardless of record count
- DNSSEC operations have same limits as regular operations
GPU API
| Resource | Limit | Window |
|---|---|---|
| API Requests | 100 requests | per minute |
| Concurrent Jobs | 10 jobs | per API key |
| Job Status Checks | 1000 requests | per minute |
Job-Specific Limits:
| Job Type | Max Duration | Max Output |
|---|---|---|
| Video Generation | 10 minutes | 30 seconds |
| Text-to-Speech | 5 minutes | 10MB |
| Audio Transcription | 30 minutes | 1 hour audio |
| Image Generation | 5 minutes | 4K resolution |
Notes:
- Jobs are asynchronous and don't count toward rate limit once submitted
- Status checks have separate, higher limit
- Webhook delivery doesn't count toward limits
Rate Limit Headers
All API responses include rate limit headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1699123456
Headers:
X-RateLimit-Limit- Maximum requests allowed in windowX-RateLimit-Remaining- Requests remaining in current windowX-RateLimit-Reset- Unix timestamp when limit resets
Handling Rate Limits
429 Too Many Requests Response
When you exceed a rate limit, you'll receive:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699123456
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded",
"retry_after": 60,
"details": {
"limit": "1000 requests/minute",
"window": "60 seconds"
}
}
}
Response Headers:
Retry-After- Seconds to wait before retrying
Exponential Backoff
Implement exponential backoff for 429 responses:
import time
import requests
def api_call_with_backoff(url, headers, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json()
if response.status_code == 429:
# Check Retry-After header
retry_after = int(response.headers.get('Retry-After', 60))
# Exponential backoff: 1s, 2s, 4s, 8s, 16s
wait_time = min(retry_after, 2 ** attempt)
print(f"Rate limited. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
time.sleep(wait_time)
continue
# Handle other errors
response.raise_for_status()
raise Exception("Max retries exceeded")
JavaScript/Node.js Example
async function apiCallWithBackoff(url, headers, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, { headers });
if (response.ok) {
return await response.json();
}
if (response.status === 429) {
// Check Retry-After header
const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
// Exponential backoff
const waitTime = Math.min(retryAfter, Math.pow(2, attempt));
console.log(`Rate limited. Waiting ${waitTime}s (attempt ${attempt + 1}/${maxRetries})`);
await new Promise(resolve => setTimeout(resolve, waitTime * 1000));
continue;
}
throw new Error(`HTTP ${response.status}: ${await response.text()}`);
}
throw new Error('Max retries exceeded');
}
Optimization Strategies
1. Monitor Rate Limit Headers
Track remaining requests to avoid hitting limits:
def smart_api_call(url, headers):
response = requests.get(url, headers=headers)
# Check remaining requests
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
reset_time = int(response.headers.get('X-RateLimit-Reset', 0))
if remaining < 10:
# Close to limit - slow down
current_time = time.time()
wait_until_reset = reset_time - current_time
if wait_until_reset > 0:
print(f"Approaching rate limit. {remaining} requests remaining.")
print(f"Slowing down for {wait_until_reset}s")
time.sleep(wait_until_reset / 10) # Space out remaining requests
return response.json()
2. Batch Operations
Use batch endpoints when available:
# Bad - Multiple individual requests (uses 100 API calls)
for record in records:
api.create_dns_record(zone_id, record)
# Good - Single batch request (uses 1 API call)
api.create_dns_records_batch(zone_id, records)
Services with Batch Support:
- DNS API: Create/update/delete multiple records
- Storage API: Use multipart upload for large files
- Database API: Manage multiple firewall rules
3. Cache Responses
Cache responses when data doesn't change frequently:
import time
from functools import lru_cache
@lru_cache(maxsize=128)
def get_dns_records(zone_id):
"""Cache DNS records for 60 seconds"""
return api.list_dns_records(zone_id)
# Or use time-based caching
cache = {}
CACHE_TTL = 60 # seconds
def get_cached_data(key):
if key in cache:
data, timestamp = cache[key]
if time.time() - timestamp < CACHE_TTL:
return data
# Fetch fresh data
data = api.get_data(key)
cache[key] = (data, time.time())
return data
4. Use Webhooks (GPU API)
Instead of polling job status, use webhooks:
# Bad - Polling (uses many API calls)
while True:
status = api.get_job_status(job_id)
if status['status'] in ['completed', 'failed']:
break
time.sleep(5) # Poll every 5 seconds
# Good - Webhooks (uses 1 API call)
job = api.create_job(
job_type='video_generation',
webhook_url='https://myapp.com/webhook'
)
# Job status delivered to your webhook when complete
5. Distribute Load
Spread requests evenly over time:
import time
def rate_limited_loop(items, requests_per_second=10):
"""Process items with rate limiting"""
interval = 1.0 / requests_per_second
for item in items:
start_time = time.time()
# Process item
process_item(item)
# Wait to maintain rate
elapsed = time.time() - start_time
if elapsed < interval:
time.sleep(interval - elapsed)
6. Use Multiple API Keys
For high-volume applications, use multiple API keys:
from itertools import cycle
# Multiple API keys
api_keys = [
'wayscloud_storage_key1_secret',
'wayscloud_storage_key2_secret',
'wayscloud_storage_key3_secret'
]
# Round-robin through keys
key_cycle = cycle(api_keys)
def make_request(url):
api_key = next(key_cycle)
headers = {'Authorization': f'Bearer {api_key}'}
return requests.get(url, headers=headers)
Ensure compliance with Terms of Service when using multiple keys. Contact support for high-volume use cases.
Quota Limits
In addition to rate limits, some resources have quota limits:
Storage Quotas
| Plan | Storage Quota | Bandwidth |
|---|---|---|
| Free | 10GB | 50GB/month |
| Basic | 100GB | 500GB/month |
| Pro | 1TB | 5TB/month |
| Enterprise | Custom | Custom |
Database Quotas
| Plan | Max Databases | Max DB Size | Snapshots |
|---|---|---|---|
| Free | 3 | 1GB | 5 per DB |
| Basic | 10 | 10GB | 10 per DB |
| Pro | 50 | 100GB | 30 per DB |
| Enterprise | Custom | Custom | Custom |
DNS Quotas
| Plan | Max Zones | Records per Zone | Queries/Month |
|---|---|---|---|
| Free | 3 | 100 | 1M |
| Basic | 10 | 500 | 10M |
| Pro | 50 | 2000 | 100M |
| Enterprise | Custom | Custom | Custom |
Monitoring Usage
Via Dashboard
Monitor API usage at my.wayscloud.services/usage:
- Real-time request counts
- Rate limit violations
- Quota usage
- Historical data
Via API
Check usage programmatically:
curl -X GET "https://provision.wayscloud.net/api/v1/dashboard/usage" \
-H "Authorization: Bearer {keycloak_token}"
Response:
{
"period": "2025-11-04",
"services": {
"storage": {
"requests": 45234,
"rate_limit_hits": 3,
"quota_used": "45.2GB",
"quota_limit": "100GB"
},
"llm": {
"requests": 12456,
"tokens_used": 5234567,
"rate_limit_hits": 0
},
"database": {
"requests": 3421,
"databases_active": 5,
"rate_limit_hits": 0
}
}
}
Upgrading Limits
Need higher limits? Contact us:
- Email: sales@wayscloud.no
- Subject: Rate Limit Increase Request
- Include:
- Current usage patterns
- Required limits
- Use case description
- Expected growth
Enterprise plans offer custom rate limits and dedicated capacity.
Best Practices Summary
- ✅ Monitor headers - Check
X-RateLimit-Remaining - ✅ Implement backoff - Use exponential backoff for 429 errors
- ✅ Use batch operations - Reduce API calls with batch endpoints
- ✅ Cache responses - Cache data that doesn't change frequently
- ✅ Use webhooks - Avoid polling with webhook notifications
- ✅ Distribute load - Spread requests evenly over time
- ✅ Log violations - Track and investigate rate limit hits
- ✅ Plan capacity - Monitor usage trends and upgrade proactively
Next Steps
- Error Handling - Handle rate limit errors properly
- Authentication - API key management
- Monitoring - API endpoint reference
Support
Questions about rate limits?
- Email: support@wayscloud.no
- Documentation: https://docs.wayscloud.services
- Dashboard: https://my.wayscloud.services