API Rate Limits
Overview of Skytells API rate limits — how they work, headers, and how to handle them gracefully.
Rate Limits
The Skytells API enforces per-account rate limits to ensure fair usage and platform stability. HTTP status codes, JSON error bodies, and stable error_id / error.code values when you are rate limited are documented in the error references — see 429 — Too Many Requests (prediction and general API responses) and Inference API errors — Rate limiting (OpenAI-compatible error objects, including gateway tier limits).
Limits depend on your account tier and configuration. A tier scopes separate ceilings — for example by model, by requests and tokens over a time window, and by API surface (Standard API vs Edge). This page focuses on headers and client behavior; tier spend bands and typical ceilings are summarized on Account tiers.
Rate limits are evaluated at the account level across all API keys for that account (within each scope described above).
Response headers (limits and usage)
On successful responses (HTTP 2xx), the gateway may attach Skytells rate-limit headers so you can throttle proactively. Retry-After is not sent on 2xx — it is only used on 429 (see below).
| Header | Description |
|---|---|
x-skytells-ratelimit-limit-requests | Maximum requests allowed in the current window. |
x-skytells-ratelimit-remaining-requests | Estimated requests remaining in the window. |
x-skytells-ratelimit-window | Window length in seconds (or - if not set). |
x-skytells-ratelimit-limit-tokens-in | Maximum input tokens allowed in the window. |
x-skytells-ratelimit-remaining-tokens-in | Estimated input tokens remaining. |
x-skytells-ratelimit-limit-tokens-out | Maximum output tokens allowed in the window. |
x-skytells-ratelimit-remaining-tokens-out | Estimated output tokens remaining. |
Conventions: A value of - means a dimension is not configured or not applicable. Values are informational; under load, remaining counts are best-effort. Not every account has every dimension — your effective limits are reflected in these headers when the gateway applies them.
Retry-After (HTTP 429 only)
On 429 rate-limit responses (see API errors and Inference — Rate limiting for payload fields), the gateway may set Retry-After to a string of whole seconds to wait. If the computed wait is ≤ 0, Retry-After may be omitted — use error.details.reset (Unix seconds) and the same header names as on success when present. Prefer Retry-After when it is set; otherwise derive wait from details.reset or use backoff.
HTTP/1.1 429 Too Many Requests
Retry-After: 12Read Retry-After when present instead of guessing. Retrying immediately will often keep returning 429. On 429, x-skytells-ratelimit-* headers may still be present and reflect gateway state merged into the error response.
Handling Rate Limits
The recommended strategy is exponential backoff with jitter — wait progressively longer between retries, with a small random offset to spread out requests from multiple clients.
Retry with exponential backoff
async function fetchWithRetry(url: string, options: RequestInit, retries = 5) {
for (let attempt = 0; attempt < retries; attempt++) {
const res = await fetch(url, options);
if (res.status !== 429) return res;
const retryAfter = Number(res.headers.get('Retry-After') ?? 1);
const jitter = Math.random() * 500;
await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
}
throw new Error('Max retries exceeded');
}General Best Practices
| Practice | Why |
|---|---|
Read Retry-After header | Avoid guessing wait durations |
| Use exponential backoff | Spread retry load over time |
| Add random jitter | Prevent synchronized retries from multiple clients |
Cache GET /v1/models | Model listings change infrequently — avoid polling |
| Poll predictions at 2–5 s intervals | Fast polling is the most common cause of hitting limits |
Sustained high-frequency polling is the most common cause of hitting rate limits. Switch to webhooks to eliminate polling entirely.
Limits by Surface
Models Rate Limits
Per-model concurrency and request-per-minute limits for inference workloads.
Edge Rate Limits
Edge gateway and streaming — throughput, concurrent streams, and operational guidance.
How is this guide?