Error Handling & Best Practices
Handle errors from both the Prediction and Inference APIs robustly — implement retry logic, manage rate limits, set up webhooks, secure API keys, and ship production-ready AI integrations.
What you'll be able to do after this module
Write a single error-handling layer that works for both APIs. Retry transient errors with exponential backoff. Set up webhooks instead of polling for long-running audio and video predictions. Safely manage API keys. Pass the production readiness checklist before you go live.
The two error formats — side by side
Module 3 showed the schemas. Here's what to do with them.
Prediction API error (top-level error_id):
{
"id": "pred_abc123",
"status": "failed",
"error_id": "INSUFFICIENT_CREDITS",
"error_message": "Your account has run out of credits.",
"error": null
}Inference API error (nested error object):
{
"error": {
"error_id": "context_length_exceeded",
"type": "InferenceError",
"code": "context_length_exceeded",
"message": "Your messages exceed the model context window.",
"param": "messages",
"status": 400,
"request_id": "req_xyz789"
}
}The key differences:
- Prediction errors live at the top level of the Prediction Object (
status: "failed"+error_id) - Inference errors are nested under an
"error"key, with atypeof"InferenceError"and an HTTPstatus
Detecting which API returned an error
def is_prediction_error(obj: dict) -> bool:
return "status" in obj and obj.get("status") == "failed" and "error_id" in obj
def is_inference_error(obj: dict) -> bool:
return "error" in obj and isinstance(obj["error"], dict)
def handle_response(obj: dict):
if is_inference_error(obj):
err = obj["error"]
raise InferenceAPIError(
message=err["message"],
code=err["code"],
status=err["status"],
request_id=err.get("request_id"),
)
if is_prediction_error(obj):
raise PredictionAPIError(
message=obj["error_message"],
error_id=obj["error_id"],
prediction_id=obj["id"],
)Prediction API error codes
error_id | HTTP status | Meaning | What to do |
|---|---|---|---|
UNAUTHORIZED | 401 | Invalid or missing API key | Verify x-api-key is set and not expired |
MODEL_NOT_FOUND | 404 | model_id doesn't exist | Check the model catalog — the ID may have changed |
INSUFFICIENT_CREDITS | 402 | Account ran out of credits | Top up in Dashboard or set auto-recharge |
INVALID_INPUT | 422 | Input doesn't match input_schema | Log the validation errors from error_message |
RATE_LIMIT_EXCEEDED | 429 | Too many requests | Respect Retry-After header, add jitter to retries |
PREDICTION_FAILED | 500 | Model failed at inference time | Retry once; if persists, contact support with prediction_id |
TIMEOUT | 504 | Prediction exceeded time limit | Use webhooks for long-running jobs (video, audio) |
Inference API error codes
code | HTTP status | Meaning |
|---|---|---|
invalid_api_key | 401 | Invalid, revoked, or missing API key |
insufficient_quota | 402 | Account ran out of token quota |
model_not_found | 404 | Requested model not available |
context_length_exceeded | 400 | Prompt + completion > context window |
invalid_request_error | 400 | Malformed request body |
rate_limit_exceeded | 429 | Too many requests per minute/day |
server_error | 500 | Infrastructure error — safe to retry |
service_unavailable | 503 | Model temporarily offline — retry with backoff |
The Inference API returns HTTP 4xx/5xx status codes alongside the error body. The Prediction API always returns HTTP 200 — polling returns the Prediction Object and you check status to detect failure.
Retry with exponential backoff
Retry on transient errors: rate limits (429), server errors (500/503), and prediction failures.
import time
import random
import httpx
def call_with_retry(fn, max_attempts: int = 4):
"""
Retry a callable with exponential backoff + jitter.
Retryable: 429, 500, 503
Non-retryable: 400, 401, 402, 404, 422
"""
NON_RETRYABLE = {400, 401, 402, 404, 422}
for attempt in range(max_attempts):
try:
return fn()
except httpx.HTTPStatusError as e:
status = e.response.status_code
if status in NON_RETRYABLE:
raise # Don't retry client errors
if attempt == max_attempts - 1:
raise # Final attempt — surface the error
# Respect Retry-After if present
retry_after = e.response.headers.get("Retry-After")
if retry_after:
time.sleep(float(retry_after))
else:
# Exponential backoff: 1s, 2s, 4s, 8s … with ±25% jitter
base_delay = 2 ** attempt
jitter = random.uniform(0.75, 1.25)
time.sleep(base_delay * jitter)
# Usage
result = call_with_retry(
lambda: client.chat.completions.create(
model="deepbrain-router",
messages=[{"role": "user", "content": "Hello"}],
)
)Stop polling — use webhooks
Polling GET /v1/predictions/:id is fine in development, but for audio and video jobs (which can take 30–120 seconds), webhooks are the production pattern.
import httpx, os
resp = httpx.post(
"https://api.skytells.ai/v1/predictions",
headers={"x-api-key": os.environ["SKYTELLS_API_KEY"]},
json={
"model_id": "stable-video-6s",
"inputs": {"prompt": "A drone shot over a mountain lake at dawn"},
"webhook_url": "https://yourapp.com/api/webhooks/skytells",
"webhook_events": ["prediction.completed", "prediction.failed"],
},
)
print(resp.json()["id"]) # prediction ID — store thisSkytells will POST the full Prediction Object to your webhook_url when the job completes:
from fastapi import FastAPI, Request, Header, HTTPException
import hmac, hashlib
app = FastAPI()
WEBHOOK_SECRET = os.environ["SKYTELLS_WEBHOOK_SECRET"] # set in Dashboard
@app.post("/api/webhooks/skytells")
async def handle_webhook(
request: Request,
x_skytells_signature: str = Header(None),
):
body = await request.body()
# Verify signature — never skip this in production
expected = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256).hexdigest()
if not hmac.compare_digest(f"sha256={expected}", x_skytells_signature or ""):
raise HTTPException(status_code=401, detail="Invalid signature")
event = await request.json()
prediction = event["data"]
if prediction["status"] == "succeeded":
output_urls = prediction["output"] # download within 24 hours
await save_to_cloud_storage(output_urls)
elif prediction["status"] == "failed":
await log_prediction_failure(prediction["id"], prediction["error_id"])Always verify the webhook signature. Without signature verification, an attacker can POST fake completion events to your endpoint. Use hmac.compare_digest (Python) or a constant-time comparison to prevent timing attacks.
Output URL expiry
Prediction output URLs (images, video, audio) are temporary CDN links that expire after 24 hours. Transfer them to your own storage immediately on completion.
import httpx, boto3, os
from urllib.parse import urlparse
from pathlib import Path
def save_outputs(prediction: dict, bucket: str) -> list[str]:
"""Download Prediction outputs to S3 and return permanent URLs."""
s3 = boto3.client("s3")
permanent_urls = []
for i, url in enumerate(prediction.get("output", [])):
# Determine file extension from URL
ext = Path(urlparse(url).path).suffix or ".bin"
key = f"predictions/{prediction['id']}/{i}{ext}"
# Stream-download and upload
with httpx.stream("GET", url) as r:
s3.upload_fileobj(r.raw, bucket, key)
permanent_urls.append(f"https://{bucket}.s3.amazonaws.com/{key}")
return permanent_urlsAPI key security
Never commit API keys to source control. A key leaked in a public repository must be treated as fully compromised — rotate it immediately from the Dashboard.
Good practices:
- Environment variables only —
SKYTELLS_API_KEYin.env, never hardcoded - Add
.envto.gitignoreimmediately when you create the file - Separate keys per environment — use a different key for development, staging, and production
- Rotate keys on a schedule — treat keys like passwords; rotate every 90 days or immediately on suspected exposure
- Scope keys if possible — create read-only keys for analytics, write keys for generation, per the Dashboard's key scoping options
- Use secrets managers — AWS Secrets Manager, GCP Secret Manager, or Doppler to inject keys at runtime, not at build time
# .gitignore — must contain
.env
.env.local
.env.*.local
*.envRate limit strategy
The platform enforces per-minute and per-day request limits. Your retry logic (above) handles the 429, but also:
- Batch appropriately — embeddings support arrays of inputs; use batches instead of loop-per-item
- Pre-cache common completions — if 80% of requests share the same prompt prefix, cache the response
- Queue in production — use a job queue (BullMQ, Celery, SQS) to smooth out traffic spikes rather than firing concurrent requests
Production readiness checklist
Before you go live:
- API key is in an environment variable, not hardcoded
-
.envis in.gitignore - Retry logic covers 429 and 5xx with exponential backoff
- Prediction outputs are downloaded to permanent storage within 24 hours
- Webhooks are set up for video/audio predictions
- Webhook signatures are verified server-side
- Error types from both APIs are caught and surfaced to users with helpful messages
- Test with
INSUFFICIENT_CREDITS(low-balance account) to confirm graceful degradation - Test with
RATE_LIMIT_EXCEEDEDto confirm retry behaviour - Separate API keys for dev, staging, and production environments
Congratulations — you've completed AI Foundations
You now understand the Skytells AI platform from the ground up: models and billing, both APIs and their schemas, hands-on predictions and inference calls, and production error handling.
What you can build now:
- A chat UI backed by
POST /v1/chat/completionswith streaming - An image generation pipeline using
POST /v1/predictionswith async polling or webhooks - A semantic search system using
POST /v1/embeddings - Stateful multi-turn conversations using
POST /v1/responses
Where to go next:
- Image Generation learning path — advanced TrueFusion prompting, ControlNet, upscaling
- SDK Mastery learning path — idiomatic Python and TypeScript SDK patterns
- Building Production Apps learning path — queuing, caching, cost optimization at scale