Rate Limits
Overview of Skytells API rate limits — how they work, headers, and how to handle them gracefully.
Rate Limits
The Skytells API enforces per-account rate limits to ensure fair usage and platform stability. When you exceed a limit, the API returns a 429 Too Many Requests response with an error_id of RATE_LIMIT_EXCEEDED.
Rate limits are applied at the account level across all API keys belonging to the same account, measured over a rolling time window.
The Retry-After Header
Every 429 response includes a Retry-After header containing the number of seconds to wait before retrying:
HTTP/1.1 429 Too Many Requests
Retry-After: 12Always read the Retry-After header instead of guessing wait times. Retrying immediately will keep returning 429 and waste your quota recovery window.
Handling Rate Limits
The recommended strategy is exponential backoff with jitter — wait progressively longer between retries, with a small random offset to spread out requests from multiple clients.
Retry with exponential backoff
async function fetchWithRetry(url: string, options: RequestInit, retries = 5) {
for (let attempt = 0; attempt < retries; attempt++) {
const res = await fetch(url, options);
if (res.status !== 429) return res;
const retryAfter = Number(res.headers.get('Retry-After') ?? 1);
const jitter = Math.random() * 500;
await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
}
throw new Error('Max retries exceeded');
}General Best Practices
| Practice | Why |
|---|---|
Read Retry-After header | Avoid guessing wait durations |
| Use exponential backoff | Spread retry load over time |
| Add random jitter | Prevent synchronized retries from multiple clients |
Cache GET /v1/models | Model listings change infrequently — avoid polling |
| Poll predictions at 2–5 s intervals | Fast polling is the most common cause of hitting limits |
Sustained high-frequency polling is the most common cause of hitting rate limits. Switch to webhooks to eliminate polling entirely.
Limits by Surface
Models Rate Limits
Per-model concurrency and request-per-minute limits for inference workloads.
Edge Rate Limits
Limits for the Edge API (edge.skytells.ai) and streaming endpoints.
How is this guide?