Rate Limits & Error Handling
Handle 429s gracefully, build a prediction queue, write user-friendly error messages, and set budget guardrails — for a resilient production app.
What you'll be able to do after this module
Build an error handling layer that never crashes under load — automatic retry with backoff, per-user queuing, budget guardrails, and user-friendly error messages that don't leak implementation details.
Rate limits by plan
| Plan | Standard req/min | Burst | Edge |
|---|---|---|---|
| Free | 60 | Limited | — |
| Pro | 600 | Allowed | — |
| Business | Custom | Custom | ✓ |
| Enterprise | Custom | Custom | ✓ |
When you exceed your limit, the API returns 429 Too Many Requests with a Retry-After header telling you exactly when to retry.
Error taxonomy — retryable vs. not
The most important distinction in error handling:
| HTTP Status | Category | Action |
|---|---|---|
400 | Bad request | Fix your code — do not retry |
401 | Invalid API key | Fix your key — do not retry |
403 | Plan/permission issue | Upgrade or fix permissions — do not retry |
422 | Invalid input | Fix the request body — do not retry |
429 | Rate limited | Retry with backoff after Retry-After |
500 | Server error | Retry with backoff |
502, 503 | Service unavailable | Retry with backoff |
504 | Gateway timeout | Retry with backoff |
Retrying 4xx errors (except 429) is almost always wrong. 422 Invalid Input will never succeed no matter how many times you retry — fix the request instead.
Exponential backoff with jitter
Never retry immediately on 429. Add a delay that grows with each attempt, plus random jitter to prevent synchronized retry storms:
async function withRetry<T>(
fn: () => Promise<T>,
options = { maxAttempts: 5, baseDelayMs: 500 },
): Promise<T> {
let lastError: Error;
for (let attempt = 0; attempt < options.maxAttempts; attempt++) {
try {
return await fn();
} catch (err: unknown) {
lastError = err instanceof Error ? err : new Error(String(err));
const isRetryable =
lastError.message.includes('429') ||
/5\d{2}/.test(lastError.message);
if (!isRetryable) throw lastError; // non-retryable — give up immediately
if (attempt === options.maxAttempts - 1) break;
// Exponential backoff + ±20% jitter
const base = options.baseDelayMs * Math.pow(2, attempt);
const jitter = base * 0.2 * (Math.random() * 2 - 1);
const delay = Math.round(base + jitter);
console.warn(`Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`);
await new Promise(r => setTimeout(r, delay));
}
}
throw lastError!;
}
// Usage — completely transparent to the calling code
const prediction = await withRetry(() =>
client.predictions.create({
model: 'truefusion-pro',
input: { prompt, width: 1024, height: 1024 },
})
);Backoff progression: 500ms → 1s → 2s → 4s → 8s (±jitter)
Prediction queue
For high-throughput apps (batch processing, multiple concurrent users), queue predictions to stay safely under your rate limit:
// lib/prediction-queue.ts
import PQueue from 'p-queue';
// Pro plan: 600 req/min = 10/sec. Stay at 8 to be safe.
const queue = new PQueue({
concurrency: 10,
interval: 1000,
intervalCap: 8,
});
export function enqueuePrediction<T>(fn: () => Promise<T>): Promise<T> {
return queue.add(fn) as Promise<T>;
}
export function getQueueStats() {
return { pending: queue.size, running: queue.pending };
}The queue handles throttling automatically. You call enqueuePrediction() instead of calling Skytells directly — the queue releases each call at the right rate.
Budget guardrails
Prevent unexpected bills with a daily spend limit:
// lib/budget.ts
import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();
const DAILY_BUDGET_USD = 50;
const MODEL_COSTS: Record<string, number> = {
'truefusion-edge': 0.01,
'truefusion': 0.02,
'truefusion-pro': 0.04,
'truefusion-2.0': 0.06,
'truefusion-ultra': 0.08,
'beatfusion-2.0': 0.75,
};
function today() { return new Date().toISOString().split('T')[0]; }
export async function checkBudget(model: string): Promise<void> {
const cost = MODEL_COSTS[model] ?? 0.04;
const key = `budget:global:${today()}`;
const current = parseFloat(await redis.get<string>(key) ?? '0');
if (current + cost > DAILY_BUDGET_USD) {
throw new BudgetExceededError(
`Daily budget of $${DAILY_BUDGET_USD} would be exceeded. Current: $${current.toFixed(2)}`
);
}
}
export async function recordSpend(model: string): Promise<void> {
const cost = MODEL_COSTS[model] ?? 0.04;
const key = `budget:global:${today()}`;
await redis.incrbyfloat(key, cost);
await redis.expire(key, 86_400 * 2); // expire after 2 days
}
class BudgetExceededError extends Error {
readonly name = 'BudgetExceededError';
}User-facing error messages
Never show raw API error messages to users. Map them to clear, actionable copy:
interface UserFacingError {
message: string;
action?: string;
retryable: boolean;
}
function toUserError(httpStatus: number, detail?: string): UserFacingError {
switch (httpStatus) {
case 401:
return {
message: 'Authentication failed.',
action: 'Please refresh the page and try again.',
retryable: false,
};
case 422:
return {
message: 'Your request contained an invalid prompt.',
action: detail?.includes('nsfw') ? 'Try a different description.' : undefined,
retryable: false,
};
case 429:
return {
message: 'Our AI service is busy right now.',
action: 'Please try again in a few seconds.',
retryable: true,
};
case 500:
case 503:
return {
message: "We're experiencing a temporary issue.",
action: "We're on it. Please try again shortly.",
retryable: true,
};
default:
return {
message: 'Something went wrong.',
action: 'Please try again.',
retryable: true,
};
}
}Never expose raw Skytells API error messages to your users. They often contain internal model names, stack traces, or validation details that leak implementation details and confuse non-technical users.
Monitoring and alerting
Set up alerts so you know about problems before your users do:
| Metric | Alert threshold | Why |
|---|---|---|
| Error rate | > 1% of requests over 5 min | Something is wrong |
| Rate limit rate | > 5% hitting 429 | Approaching quota ceiling |
| P99 prediction latency | > 30s for images | Model backlog or degradation |
| Daily spend | > 80% of budget | Approaching limit |
| Failed webhook deliveries | > 0 per hour | Processing pipeline broken |
// Log structured events — send to Datadog, Logtail, Sentry, etc.
function logPredictionEvent(event: {
predictionId: string;
model: string;
status: 'created' | 'succeeded' | 'failed' | 'rate_limited';
latencyMs?: number;
httpStatus?: number;
}) {
console.log(JSON.stringify({
service: 'skytells',
timestamp: new Date().toISOString(),
...event,
}));
}Summary
Your app will handle every failure mode gracefully. No crashes, no silent data loss, no surprise bills.
The four resilience patterns:
- Exponential backoff — only for
429and5xx. Never for4xx. - Prediction queue — throttle concurrent requests to stay under your rate limit
- Budget guardrails — daily spend cap with Redis, checked before each prediction
- User-friendly errors — map HTTP status codes to clear, non-technical messages
Next: the Edge API — ultra-low latency for Business and Enterprise accounts.
Webhooks
Implement reliable webhook handling — signature verification, idempotency, async processing, and a complete production webhook receiver.
Edge API
Route latency-sensitive calls through the Skytells Edge gateway for ultra-low latency AI responses. Available for Business and Enterprise accounts on supported models only.