Advanced40 minModule 3 of 4

Rate Limits & Error Handling

Handle 429s gracefully, build a prediction queue, write user-friendly error messages, and set budget guardrails — for a resilient production app.

What you'll be able to do after this module

Build an error handling layer that never crashes under load — automatic retry with backoff, per-user queuing, budget guardrails, and user-friendly error messages that don't leak implementation details.


Rate limits by plan

PlanStandard req/minBurstEdge
Free60Limited
Pro600Allowed
BusinessCustomCustom
EnterpriseCustomCustom

When you exceed your limit, the API returns 429 Too Many Requests with a Retry-After header telling you exactly when to retry.


Error taxonomy — retryable vs. not

The most important distinction in error handling:

HTTP StatusCategoryAction
400Bad requestFix your code — do not retry
401Invalid API keyFix your key — do not retry
403Plan/permission issueUpgrade or fix permissions — do not retry
422Invalid inputFix the request body — do not retry
429Rate limitedRetry with backoff after Retry-After
500Server errorRetry with backoff
502, 503Service unavailableRetry with backoff
504Gateway timeoutRetry with backoff

Exponential backoff with jitter

Never retry immediately on 429. Add a delay that grows with each attempt, plus random jitter to prevent synchronized retry storms:

async function withRetry<T>(
  fn: () => Promise<T>,
  options = { maxAttempts: 5, baseDelayMs: 500 },
): Promise<T> {
  let lastError: Error;

  for (let attempt = 0; attempt < options.maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err: unknown) {
      lastError = err instanceof Error ? err : new Error(String(err));

      const isRetryable =
        lastError.message.includes('429') ||
        /5\d{2}/.test(lastError.message);

      if (!isRetryable) throw lastError; // non-retryable — give up immediately
      if (attempt === options.maxAttempts - 1) break;

      // Exponential backoff + ±20% jitter
      const base = options.baseDelayMs * Math.pow(2, attempt);
      const jitter = base * 0.2 * (Math.random() * 2 - 1);
      const delay = Math.round(base + jitter);
      console.warn(`Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`);
      await new Promise(r => setTimeout(r, delay));
    }
  }

  throw lastError!;
}

// Usage — completely transparent to the calling code
const prediction = await withRetry(() =>
  client.predictions.create({
    model: 'truefusion-pro',
    input: { prompt, width: 1024, height: 1024 },
  })
);

Backoff progression: 500ms → 1s → 2s → 4s → 8s (±jitter)


Prediction queue

For high-throughput apps (batch processing, multiple concurrent users), queue predictions to stay safely under your rate limit:

// lib/prediction-queue.ts
import PQueue from 'p-queue';

// Pro plan: 600 req/min = 10/sec. Stay at 8 to be safe.
const queue = new PQueue({
  concurrency: 10,
  interval: 1000,
  intervalCap: 8,
});

export function enqueuePrediction<T>(fn: () => Promise<T>): Promise<T> {
  return queue.add(fn) as Promise<T>;
}

export function getQueueStats() {
  return { pending: queue.size, running: queue.pending };
}

Budget guardrails

Prevent unexpected bills with a daily spend limit:

// lib/budget.ts
import { Redis } from '@upstash/redis';

const redis = Redis.fromEnv();
const DAILY_BUDGET_USD = 50;

const MODEL_COSTS: Record<string, number> = {
  'truefusion-edge': 0.01,
  'truefusion': 0.02,
  'truefusion-pro': 0.04,
  'truefusion-2.0': 0.06,
  'truefusion-ultra': 0.08,
  'beatfusion-2.0': 0.75,
};

function today() { return new Date().toISOString().split('T')[0]; }

export async function checkBudget(model: string): Promise<void> {
  const cost = MODEL_COSTS[model] ?? 0.04;
  const key = `budget:global:${today()}`;
  const current = parseFloat(await redis.get<string>(key) ?? '0');

  if (current + cost > DAILY_BUDGET_USD) {
    throw new BudgetExceededError(
      `Daily budget of $${DAILY_BUDGET_USD} would be exceeded. Current: $${current.toFixed(2)}`
    );
  }
}

export async function recordSpend(model: string): Promise<void> {
  const cost = MODEL_COSTS[model] ?? 0.04;
  const key = `budget:global:${today()}`;
  await redis.incrbyfloat(key, cost);
  await redis.expire(key, 86_400 * 2); // expire after 2 days
}

class BudgetExceededError extends Error {
  readonly name = 'BudgetExceededError';
}

User-facing error messages

Never show raw API error messages to users. Map them to clear, actionable copy:

interface UserFacingError {
  message: string;
  action?: string;
  retryable: boolean;
}

function toUserError(httpStatus: number, detail?: string): UserFacingError {
  switch (httpStatus) {
    case 401:
      return {
        message: 'Authentication failed.',
        action: 'Please refresh the page and try again.',
        retryable: false,
      };
    case 422:
      return {
        message: 'Your request contained an invalid prompt.',
        action: detail?.includes('nsfw') ? 'Try a different description.' : undefined,
        retryable: false,
      };
    case 429:
      return {
        message: 'Our AI service is busy right now.',
        action: 'Please try again in a few seconds.',
        retryable: true,
      };
    case 500:
    case 503:
      return {
        message: "We're experiencing a temporary issue.",
        action: "We're on it. Please try again shortly.",
        retryable: true,
      };
    default:
      return {
        message: 'Something went wrong.',
        action: 'Please try again.',
        retryable: true,
      };
  }
}

Monitoring and alerting

Set up alerts so you know about problems before your users do:

MetricAlert thresholdWhy
Error rate> 1% of requests over 5 minSomething is wrong
Rate limit rate> 5% hitting 429Approaching quota ceiling
P99 prediction latency> 30s for imagesModel backlog or degradation
Daily spend> 80% of budgetApproaching limit
Failed webhook deliveries> 0 per hourProcessing pipeline broken
// Log structured events — send to Datadog, Logtail, Sentry, etc.
function logPredictionEvent(event: {
  predictionId: string;
  model: string;
  status: 'created' | 'succeeded' | 'failed' | 'rate_limited';
  latencyMs?: number;
  httpStatus?: number;
}) {
  console.log(JSON.stringify({
    service: 'skytells',
    timestamp: new Date().toISOString(),
    ...event,
  }));
}

Summary

The four resilience patterns:

  1. Exponential backoff — only for 429 and 5xx. Never for 4xx.
  2. Prediction queue — throttle concurrent requests to stay under your rate limit
  3. Budget guardrails — daily spend cap with Redis, checked before each prediction
  4. User-friendly errors — map HTTP status codes to clear, non-technical messages

Next: the Edge API — ultra-low latency for Business and Enterprise accounts.

On this page