The Two APIs — Prediction & Inference Schemas
Understand the Prediction API and Inference API side by side — their request schemas, response schemas, lifecycles, error formats, and exactly when to use each one.
What you'll be able to do after this module
Read any Skytells request or response and immediately know which API it belongs to. Understand every field in the Prediction Object and the ChatCompletionResponse. Never confuse the type: "inference" field on Predictions again.
Two APIs, one base URL
The Skytells REST API exposes two distinct sub-APIs under https://api.skytells.ai/v1:
| Prediction API | Inference API | |
|---|---|---|
| What it generates | Media — images, video, audio | Text — LLM completions, embeddings |
| Endpoints | POST /v1/predictions · GET /v1/predictions/:id | POST /v1/chat/completions · POST /v1/responses · POST /v1/embeddings |
| Response style | Async — jobs are queued and polled, Sync — with wait parameter enabled. | Synchronous or streaming (SSE) |
| Auth header | x-api-key: sk-... | x-api-key: sk-... or Authorization: Bearer sk-... |
| OpenAI-compatible | No | Yes — swap base_url + key, nothing else changes |
| Schema family | Skytells Prediction schema | OpenAI-compatible schema |
Both use the same API key. Both follow REST conventions. Everything else is different.
The Prediction API
Request schema
POST /v1/predictions
{
"model": "truefusion-pro",
"input": {
"prompt": "A red fox in a snowy forest, golden hour cinematic",
"aspect_ratio": "16:9",
"num_outputs": 1,
"guidance": 3.0
},
"webhook": "https://yourapp.com/webhooks/skytells"
}| Field | Type | Required | Description |
|---|---|---|---|
model | string | ✓ | The model's namespace value from /v1/models (e.g. truefusion-pro) |
input | object | ✓ | Model-specific input fields. Structure defined by the model's input_schema |
webhook | string | — | URL to receive prediction lifecycle events (recommended for video/audio) |
The input object's shape varies per model. Every model exposes its input_schema as a JSON Schema object in the /v1/models/:namespace response. That schema defines exactly which fields are valid, required, and their types.
How to read a model's input_schema
# Fetch the input schema for truefusion-pro
curl https://api.skytells.ai/v1/models/truefusion-pro \
-H "x-api-key: $SKYTELLS_API_KEY" \
| python3 -m json.tool | grep -A 50 '"input_schema"'A typical image model input_schema looks like:
{
"type": "object",
"title": "Input",
"required": ["prompt"],
"properties": {
"prompt": {
"type": "string",
"description": "Text prompt for image generation",
"x-order": 0
},
"aspect_ratio": {
"type": "string",
"enum": ["1:1", "16:9", "9:16", "4:3", "3:4"],
"default": "1:1",
"description": "Aspect ratio for the generated image",
"x-order": 1
},
"num_outputs": {
"type": "integer",
"default": 1,
"minimum": 1,
"maximum": 4,
"description": "Number of images to generate",
"x-order": 4
},
"guidance": {
"type": "number",
"default": 3.0,
"minimum": 0,
"maximum": 10,
"description": "Guidance for generated image",
"x-order": 6
},
"seed": {
"type": "integer",
"description": "Random seed. Set for reproducible generation",
"x-order": 7
}
}
}| Schema field | What it means |
|---|---|
required | Fields the model will reject if missing |
properties[field].type | Data type: string, integer, number, boolean, array |
properties[field].enum | Allowed values only — passing anything else errors |
properties[field].default | Omit this field and the model uses this value |
properties[field].minimum / maximum | Numeric bounds |
x-order | Display ordering in the Console — no functional effect |
Prediction Object schema (response)
Both POST /v1/predictions and GET /v1/predictions/:id return a Prediction Object:
{
"id": "d05b96fc-7fdf-4528-8e61-aa1092f48040",
"status": "succeeded",
"type": "inference",
"stream": false,
"input": {
"prompt": "romantic, love, r&b",
"lyrics": "[verse]\nYour smile lights the sky..."
},
"output": [
"https://delivery.skytells.cloud/us/2026/03/06/7bfdb9a4.mp3"
],
"created_at": "2026-03-06T20:15:23.000000Z",
"started_at": "2026-03-06T20:14:35.160110297Z",
"completed_at": "2026-03-06T20:15:23.293497292Z",
"updated_at": "2026-03-06T20:15:24.000000Z",
"privacy": "private",
"source": "api",
"model": { "name": "beatfusion-2.0", "type": "audio" },
"webhook": { "url": null, "events": [] },
"metrics": { "predict_time": 48.13, "total_time": 48.16 },
"metadata": {
"billing": { "credits_used": 0.75 },
"storage": { "files": [] },
"data_available": true
},
"urls": {
"get": "https://api.skytells.ai/v1/predictions/d05b96fc-.../",
"cancel": "https://api.skytells.ai/v1/predictions/d05b96fc-.../cancel",
"stream": "https://api.skytells.ai/v1/predictions/d05b96fc-.../stream",
"delete": "https://api.skytells.ai/v1/predictions/d05b96fc-.../delete"
}
}| Field | Type | Description |
|---|---|---|
id | UUID string | Unique prediction ID — use to poll status |
status | string | queued · processing · succeeded · failed · canceled |
type | string | Always "inference" — see note below |
input | object | The exact input you sent |
output | array of strings | CDN URLs to the generated files. null until succeeded |
model.name | string | Model namespace that ran |
model.type | string | "image" · "video" · "audio" |
metrics.predict_time | number | Seconds the model spent generating |
metrics.total_time | number | Total wall-clock time including queuing |
metadata.billing.credits_used | number | Credit cost deducted for this prediction |
urls.get | string | Canonical URL to poll this prediction |
urls.cancel | string | POST this to cancel if still processing |
type: "inference" does NOT mean this object came from the Inference API. Every prediction — whether it generates an image, video, or audio — has type: "inference" because all AI model runs are inference computations internally. This field describes the computation type, not the API route. You'll always get this field on Prediction Objects regardless of which model you ran.
Prediction lifecycle
- Fast image models (e.g.
truefusion-edge): often returnsucceededin the initial POST — no polling needed - Standard image models (e.g.
truefusion-pro): 5–20 seconds — pollGET /v1/predictions/:id - Video and audio models: 30 seconds to several minutes — use webhooks
The Inference API
Request schema
POST /v1/chat/completions
{
"model": "deepbrain-router",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is machine learning?" }
],
"stream": false,
"temperature": 0.7,
"max_tokens": 512
}| Field | Type | Required | Description |
|---|---|---|---|
model | string | ✓ | LLM namespace: deepbrain-router · gpt-5 · gpt-5.4 |
messages | array | ✓ | Conversation history. Each item has role (system/user/assistant) and content (string) |
stream | boolean | — | true = stream tokens as server-sent events. Default false |
max_tokens | integer | — | Max tokens to generate. Default 8192 |
temperature | number | — | Randomness: 0 = deterministic, 2 = very creative. Default 0.7 |
top_p | number | — | Nucleus sampling. Default 0.95 |
frequency_penalty | number | — | Penalise repeated tokens (−2 to 2). Default 0.0 |
presence_penalty | number | — | Penalise already-used tokens (−2 to 2). Default 0.0 |
stop | string or array | — | Stop sequences — generation halts at any match |
ChatCompletionResponse schema
{
"id": "chatcmpl-DKQ7HZtNYLc7uK0Dpn0JggRUUhuBE",
"object": "chat.completion",
"created": 1773759323,
"model": "deepbrain-router",
"system_fingerprint": "fp_490a4ad033",
"choices": [
{
"index": 0,
"logprobs": null,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Machine learning is a branch of AI...",
"annotations": [],
"refusal": null
},
"content_filter_results": {
"hate": { "filtered": false, "severity": "safe" },
"self_harm": { "filtered": false, "severity": "safe" },
"sexual": { "filtered": false, "severity": "safe" },
"violence": { "filtered": false, "severity": "safe" },
"protected_material_code": { "filtered": false, "detected": false },
"protected_material_text": { "filtered": false, "detected": false }
}
}
],
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
"hate": { "filtered": false, "severity": "safe" },
"self_harm": { "filtered": false, "severity": "safe" },
"sexual": { "filtered": false, "severity": "safe" },
"violence": { "filtered": false, "severity": "safe" },
"jailbreak": { "filtered": false, "detected": false }
}
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 10,
"total_tokens": 19,
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
}
}
}Responsible AI & Safety: Skytells enforces content safety on every inference response. The content_filter_results field on each choice shows real-time analysis of the model's output. The prompt_filter_results array does the same for your input. A filtered: true value means Skytells blocked that content from being sent or received. Your application should check finish_reason: "content_filter" to detect when a response was halted by the safety layer.
| Field | Type | Description |
|---|---|---|
id | string | Completion ID, prefix chatcmpl- |
object | string | Always "chat.completion" |
system_fingerprint | string | Identifies the exact model version that served the request |
choices[0].message.content | string | The generated text — this is the main output |
choices[0].message.annotations | array | Structured annotations added by the model (citations, tool calls) |
choices[0].message.refusal | string | null | Non-null when the model explicitly refused to answer |
choices[0].finish_reason | string | stop · length · content_filter |
choices[0].content_filter_results | object | Per-category safety analysis of the response |
prompt_filter_results[0].content_filter_results | object | Per-category safety analysis of the input prompt |
usage.prompt_tokens | integer | Tokens consumed by your messages |
usage.completion_tokens | integer | Tokens generated |
usage.total_tokens | integer | Sum — billed in tokens for LLM models |
usage.completion_tokens_details | object | Breakdown: reasoning, audio, speculative prediction tokens |
usage.prompt_tokens_details | object | Breakdown: cached, audio tokens in the prompt |
Stateful conversation — /v1/responses
POST /v1/responses adds stateful multi-turn conversations via previous_response_id. Pass the ID from a previous response and the server replays the conversation history without you resending the full message array.
{
"model": "deepbrain-router",
"input": "How is it different from the Inference API?",
"previous_response_id": "resp_abc123",
"instructions": "You are a helpful developer assistant."
}Response (ResponseObject):
{
"id": "resp_xyz789",
"object": "response",
"created_at": 1748000000,
"model": "deepbrain-router",
"status": "completed",
"output_text": "The Prediction API generates media...",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{ "type": "output_text", "text": "The Prediction API generates media..." }]
}
],
"usage": { "input_tokens": 12, "output_tokens": 41, "total_tokens": 53 }
}Embeddings — /v1/embeddings
{
"model": "deepbrain-router",
"input": "A photorealistic mountain lake at sunrise",
"encoding_format": "float"
}Response:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, 0.015797101, "..."]
}
],
"model": "deepbrain-router",
"usage": { "prompt_tokens": 9, "total_tokens": 9 }
}Error schemas — two different formats
The two APIs return errors in fundamentally different shapes. Knowing which format to parse is essential.
Prediction API error
{
"error_id": "INSUFFICIENT_CREDITS",
"message": "You do not have enough credits to run this prediction"
}Error fields are top-level — error_id and message are at the root of the object.
Inference API error (OpenAI-compatible)
{
"error": {
"message": "You have run out of credits.",
"type": "authorization_error",
"code": "insufficient_credits",
"error_id": "INSUFFICIENT_CREDITS",
"status": 402,
"param": null,
"request_id": "req_abc123xyz",
"details": null
}
}Error fields are nested under "error". The Inference API error adds type, code, status, param, and request_id.
Always branch on error_id — it's the stable, machine-readable identifier for programmatic handling. Never parse message strings.
def parse_error(body: dict) -> dict:
# Prediction API: error_id is top-level
if "error_id" in body:
return {"source": "prediction", "id": body["error_id"], "msg": body["message"]}
# Inference API: nested under 'error'
if "error" in body:
err = body["error"]
return {
"source": "inference",
"id": err["error_id"],
"msg": err["message"],
"request_id": err.get("request_id"),
}
return {}How to tell which API a response came from
| You see this in the response body... | It came from... |
|---|---|
"id" is a UUID (xxxxxxxx-xxxx-...) | Prediction API |
"id" starts with chatcmpl- | Inference API — /v1/chat/completions |
"id" starts with resp_ | Inference API — /v1/responses |
Has a choices[] array | Inference API |
Has an output[] array of CDN URLs | Prediction API |
Has status field (queued/processing/succeeded) | Prediction API |
Has object: "chat.completion" or object: "response" | Inference API |
Error has top-level error_id | Prediction API error |
Error has error.error_id nested | Inference API error |
Decision guide — which API to use?
| Task | Use | Endpoint |
|---|---|---|
| Generate image, video, or audio | Prediction API | POST /v1/predictions |
| LLM chat — answer a question | Inference API | POST /v1/chat/completions |
| LLM chat — multi-turn conversation | Inference API | POST /v1/responses |
| Generate vector embeddings | Inference API | POST /v1/embeddings |
| Streaming text to a UI | Inference API | POST /v1/chat/completions with stream: true |
| OpenAI SDK drop-in replacement | Inference API | Change base_url and key only |
Summary — the key schemas
PREDICTION API
POST /v1/predictions
Body: { model: string, input: { ...model-specific }, webhook?: string }
Returns: Prediction Object
{ id: UUID, status: "queued|processing|succeeded|failed",
type: "inference" ← ALWAYS - not the API name,
output: ["https://cdn.url/..."] ← available when succeeded,
metadata.billing.credits_used: number }
INFERENCE API
POST /v1/chat/completions
Body: { model: string, messages: [{role, content}], stream?: bool, ... }
Returns: { id: "chatcmpl-...", object: "chat.completion",
choices: [{ message: { role, content } }], usage: {...} }
POST /v1/responses
Body: { model: string, input: string|messages, previous_response_id?: string }
Returns: { id: "resp_...", object: "response", output_text: string }
POST /v1/embeddings
Body: { model: string, input: string|string[] }
Returns: { data: [{ embedding: number[] }] }Up next: hands-on time — you'll make your first real Prediction API call and generate an image.
Models & Billing
Navigate Skytells' full model catalog — image, video, audio, and text/LLM models — and accurately estimate costs before you build.
Your First Prediction
Make your first real API call — generate an AI image with the Prediction API, understand the full async lifecycle, and read outputs correctly.