API Rate Limits

Overview of Skytells API rate limits — how they work, headers, and how to handle them gracefully.

Rate Limits

The Skytells API enforces per-account rate limits to ensure fair usage and platform stability. HTTP status codes, JSON error bodies, and stable error_id / error.code values when you are rate limited are documented in the error references — see 429 — Too Many Requests (prediction and general API responses) and Inference API errors — Rate limiting (OpenAI-compatible error objects, including gateway tier limits).

Limits depend on your account tier and configuration. A tier scopes separate ceilings — for example by model, by requests and tokens over a time window, and by API surface (Standard API vs Edge). This page focuses on headers and client behavior; tier spend bands and typical ceilings are summarized on Account tiers.

Rate limits are evaluated at the account level across all API keys for that account (within each scope described above).

Response headers (limits and usage)

On successful responses (HTTP 2xx), the gateway may attach Skytells rate-limit headers so you can throttle proactively. Retry-After is not sent on 2xx — it is only used on 429 (see below).

Header	Description
`x-skytells-ratelimit-limit-requests`	Maximum requests allowed in the current window.
`x-skytells-ratelimit-remaining-requests`	Estimated requests remaining in the window.
`x-skytells-ratelimit-window`	Window length in seconds (or `-` if not set).
`x-skytells-ratelimit-limit-tokens-in`	Maximum input tokens allowed in the window.
`x-skytells-ratelimit-remaining-tokens-in`	Estimated input tokens remaining.
`x-skytells-ratelimit-limit-tokens-out`	Maximum output tokens allowed in the window.
`x-skytells-ratelimit-remaining-tokens-out`	Estimated output tokens remaining.

Conventions: A value of - means a dimension is not configured or not applicable. Values are informational; under load, remaining counts are best-effort. Not every account has every dimension — your effective limits are reflected in these headers when the gateway applies them.

Retry-After (HTTP 429 only)

On 429 rate-limit responses (see API errors and Inference — Rate limiting for payload fields), the gateway may set Retry-After to a string of whole seconds to wait. If the computed wait is ≤ 0, Retry-After may be omitted — use error.details.reset (Unix seconds) and the same header names as on success when present. Prefer Retry-After when it is set; otherwise derive wait from details.reset or use backoff.

HTTP/1.1 429 Too Many Requests
Retry-After: 12

Read Retry-After when present instead of guessing. Retrying immediately will often keep returning 429. On 429, x-skytells-ratelimit-* headers may still be present and reflect gateway state merged into the error response.

Handling Rate Limits

The recommended strategy is exponential backoff with jitter — wait progressively longer between retries, with a small random offset to spread out requests from multiple clients.

Retry with exponential backoff

TypeScript

async function fetchWithRetry(url: string, options: RequestInit, retries = 5) {
  for (let attempt = 0; attempt < retries; attempt++) {
    const res = await fetch(url, options);
    if (res.status !== 429) return res;

    const retryAfter = Number(res.headers.get('Retry-After') ?? 1);
    const jitter = Math.random() * 500;
    await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
  }
  throw new Error('Max retries exceeded');
}

General Best Practices

Practice	Why
Read `Retry-After` header	Avoid guessing wait durations
Use exponential backoff	Spread retry load over time
Add random jitter	Prevent synchronized retries from multiple clients
Cache `GET /v1/models`	Model listings change infrequently — avoid polling
Poll predictions at 2–5 s intervals	Fast polling is the most common cause of hitting limits

Sustained high-frequency polling is the most common cause of hitting rate limits. Switch to webhooks to eliminate polling entirely.

Rate Limits

Response headers (limits and usage)

Retry-After (HTTP 429 only)

Handling Rate Limits

Retry with exponential backoff

General Best Practices

Limits by Surface

Models Rate Limits

Edge Rate Limits

On this page