Rate Limits

API Rate Limits

Overview of Skytells API rate limits — how they work, headers, and how to handle them gracefully.

Rate Limits

The Skytells API enforces per-account rate limits to ensure fair usage and platform stability. HTTP status codes, JSON error bodies, and stable error_id / error.code values when you are rate limited are documented in the error references — see 429 — Too Many Requests (prediction and general API responses) and Inference API errors — Rate limiting (OpenAI-compatible error objects, including gateway tier limits).

Limits depend on your account tier and configuration. A tier scopes separate ceilings — for example by model, by requests and tokens over a time window, and by API surface (Standard API vs Edge). This page focuses on headers and client behavior; tier spend bands and typical ceilings are summarized on Account tiers.

Rate limits are evaluated at the account level across all API keys for that account (within each scope described above).

Yes No Request Under limit? Process 429 — see error docs Response

Response headers (limits and usage)

On successful responses (HTTP 2xx), the gateway may attach Skytells rate-limit headers so you can throttle proactively. Retry-After is not sent on 2xx — it is only used on 429 (see below).

HeaderDescription
x-skytells-ratelimit-limit-requestsMaximum requests allowed in the current window.
x-skytells-ratelimit-remaining-requestsEstimated requests remaining in the window.
x-skytells-ratelimit-windowWindow length in seconds (or - if not set).
x-skytells-ratelimit-limit-tokens-inMaximum input tokens allowed in the window.
x-skytells-ratelimit-remaining-tokens-inEstimated input tokens remaining.
x-skytells-ratelimit-limit-tokens-outMaximum output tokens allowed in the window.
x-skytells-ratelimit-remaining-tokens-outEstimated output tokens remaining.

Conventions: A value of - means a dimension is not configured or not applicable. Values are informational; under load, remaining counts are best-effort. Not every account has every dimension — your effective limits are reflected in these headers when the gateway applies them.


Retry-After (HTTP 429 only)

On 429 rate-limit responses (see API errors and Inference — Rate limiting for payload fields), the gateway may set Retry-After to a string of whole seconds to wait. If the computed wait is ≤ 0, Retry-After may be omitted — use error.details.reset (Unix seconds) and the same header names as on success when present. Prefer Retry-After when it is set; otherwise derive wait from details.reset or use backoff.

HTTP/1.1 429 Too Many Requests
Retry-After: 12

Handling Rate Limits

The recommended strategy is exponential backoff with jitter — wait progressively longer between retries, with a small random offset to spread out requests from multiple clients.

Retry with exponential backoff

TypeScript
async function fetchWithRetry(url: string, options: RequestInit, retries = 5) {
  for (let attempt = 0; attempt < retries; attempt++) {
    const res = await fetch(url, options);
    if (res.status !== 429) return res;

    const retryAfter = Number(res.headers.get('Retry-After') ?? 1);
    const jitter = Math.random() * 500;
    await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
  }
  throw new Error('Max retries exceeded');
}

General Best Practices

PracticeWhy
Read Retry-After headerAvoid guessing wait durations
Use exponential backoffSpread retry load over time
Add random jitterPrevent synchronized retries from multiple clients
Cache GET /v1/modelsModel listings change infrequently — avoid polling
Poll predictions at 2–5 s intervalsFast polling is the most common cause of hitting limits

Limits by Surface

How is this guide?

On this page