The Two APIs — Prediction & Inference Schemas

Understand the Prediction API and Inference API side by side — their request schemas, response schemas, lifecycles, error formats, and exactly when to use each one.

What you'll be able to do after this module

Read any Skytells request or response and immediately know which API it belongs to. Understand every field in the Prediction Object and the ChatCompletionResponse. Never confuse the type: "inference" field on Predictions again.


Two APIs, one base URL

The Skytells REST API exposes two distinct sub-APIs under https://api.skytells.ai/v1:

Prediction APIInference API
What it generatesMedia — images, video, audioText — LLM completions, embeddings
EndpointsPOST /v1/predictions · GET /v1/predictions/:idPOST /v1/chat/completions · POST /v1/responses · POST /v1/embeddings
Response styleAsync — jobs are queued and polled, Sync — with wait parameter enabled.Synchronous or streaming (SSE)
Auth headerx-api-key: sk-...x-api-key: sk-... or Authorization: Bearer sk-...
OpenAI-compatibleNoYes — swap base_url + key, nothing else changes
Schema familySkytells Prediction schemaOpenAI-compatible schema

Both use the same API key. Both follow REST conventions. Everything else is different.


The Prediction API

Request schema

POST /v1/predictions

{
  "model": "truefusion-pro",
  "input": {
    "prompt": "A red fox in a snowy forest, golden hour cinematic",
    "aspect_ratio": "16:9",
    "num_outputs": 1,
    "guidance": 3.0
  },
  "webhook": "https://yourapp.com/webhooks/skytells"
}
FieldTypeRequiredDescription
modelstringThe model's namespace value from /v1/models (e.g. truefusion-pro)
inputobjectModel-specific input fields. Structure defined by the model's input_schema
webhookstringURL to receive prediction lifecycle events (recommended for video/audio)

The input object's shape varies per model. Every model exposes its input_schema as a JSON Schema object in the /v1/models/:namespace response. That schema defines exactly which fields are valid, required, and their types.

How to read a model's input_schema

# Fetch the input schema for truefusion-pro
curl https://api.skytells.ai/v1/models/truefusion-pro \
  -H "x-api-key: $SKYTELLS_API_KEY" \
  | python3 -m json.tool | grep -A 50 '"input_schema"'

A typical image model input_schema looks like:

{
  "type": "object",
  "title": "Input",
  "required": ["prompt"],
  "properties": {
    "prompt": {
      "type": "string",
      "description": "Text prompt for image generation",
      "x-order": 0
    },
    "aspect_ratio": {
      "type": "string",
      "enum": ["1:1", "16:9", "9:16", "4:3", "3:4"],
      "default": "1:1",
      "description": "Aspect ratio for the generated image",
      "x-order": 1
    },
    "num_outputs": {
      "type": "integer",
      "default": 1,
      "minimum": 1,
      "maximum": 4,
      "description": "Number of images to generate",
      "x-order": 4
    },
    "guidance": {
      "type": "number",
      "default": 3.0,
      "minimum": 0,
      "maximum": 10,
      "description": "Guidance for generated image",
      "x-order": 6
    },
    "seed": {
      "type": "integer",
      "description": "Random seed. Set for reproducible generation",
      "x-order": 7
    }
  }
}
Schema fieldWhat it means
requiredFields the model will reject if missing
properties[field].typeData type: string, integer, number, boolean, array
properties[field].enumAllowed values only — passing anything else errors
properties[field].defaultOmit this field and the model uses this value
properties[field].minimum / maximumNumeric bounds
x-orderDisplay ordering in the Console — no functional effect

Prediction Object schema (response)

Both POST /v1/predictions and GET /v1/predictions/:id return a Prediction Object:

{
  "id": "d05b96fc-7fdf-4528-8e61-aa1092f48040",
  "status": "succeeded",
  "type": "inference",
  "stream": false,
  "input": {
    "prompt": "romantic, love, r&b",
    "lyrics": "[verse]\nYour smile lights the sky..."
  },
  "output": [
    "https://delivery.skytells.cloud/us/2026/03/06/7bfdb9a4.mp3"
  ],
  "created_at": "2026-03-06T20:15:23.000000Z",
  "started_at": "2026-03-06T20:14:35.160110297Z",
  "completed_at": "2026-03-06T20:15:23.293497292Z",
  "updated_at": "2026-03-06T20:15:24.000000Z",
  "privacy": "private",
  "source": "api",
  "model": { "name": "beatfusion-2.0", "type": "audio" },
  "webhook": { "url": null, "events": [] },
  "metrics": { "predict_time": 48.13, "total_time": 48.16 },
  "metadata": {
    "billing": { "credits_used": 0.75 },
    "storage": { "files": [] },
    "data_available": true
  },
  "urls": {
    "get": "https://api.skytells.ai/v1/predictions/d05b96fc-.../",
    "cancel": "https://api.skytells.ai/v1/predictions/d05b96fc-.../cancel",
    "stream": "https://api.skytells.ai/v1/predictions/d05b96fc-.../stream",
    "delete": "https://api.skytells.ai/v1/predictions/d05b96fc-.../delete"
  }
}
FieldTypeDescription
idUUID stringUnique prediction ID — use to poll status
statusstringqueued · processing · succeeded · failed · canceled
typestringAlways "inference"see note below
inputobjectThe exact input you sent
outputarray of stringsCDN URLs to the generated files. null until succeeded
model.namestringModel namespace that ran
model.typestring"image" · "video" · "audio"
metrics.predict_timenumberSeconds the model spent generating
metrics.total_timenumberTotal wall-clock time including queuing
metadata.billing.credits_usednumberCredit cost deducted for this prediction
urls.getstringCanonical URL to poll this prediction
urls.cancelstringPOST this to cancel if still processing

Prediction lifecycle

POST /v1/predictions queued processing succeeded ✓output: CDN URLs failed ✗error message
  • Fast image models (e.g. truefusion-edge): often return succeeded in the initial POST — no polling needed
  • Standard image models (e.g. truefusion-pro): 5–20 seconds — poll GET /v1/predictions/:id
  • Video and audio models: 30 seconds to several minutes — use webhooks

The Inference API

Request schema

POST /v1/chat/completions

{
  "model": "deepbrain-router",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "What is machine learning?" }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 512
}
FieldTypeRequiredDescription
modelstringLLM namespace: deepbrain-router · gpt-5 · gpt-5.4
messagesarrayConversation history. Each item has role (system/user/assistant) and content (string)
streambooleantrue = stream tokens as server-sent events. Default false
max_tokensintegerMax tokens to generate. Default 8192
temperaturenumberRandomness: 0 = deterministic, 2 = very creative. Default 0.7
top_pnumberNucleus sampling. Default 0.95
frequency_penaltynumberPenalise repeated tokens (−2 to 2). Default 0.0
presence_penaltynumberPenalise already-used tokens (−2 to 2). Default 0.0
stopstring or arrayStop sequences — generation halts at any match

ChatCompletionResponse schema

{
  "id": "chatcmpl-DKQ7HZtNYLc7uK0Dpn0JggRUUhuBE",
  "object": "chat.completion",
  "created": 1773759323,
  "model": "deepbrain-router",
  "system_fingerprint": "fp_490a4ad033",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Machine learning is a branch of AI...",
        "annotations": [],
        "refusal": null
      },
      "content_filter_results": {
        "hate":                    { "filtered": false, "severity": "safe" },
        "self_harm":               { "filtered": false, "severity": "safe" },
        "sexual":                  { "filtered": false, "severity": "safe" },
        "violence":                { "filtered": false, "severity": "safe" },
        "protected_material_code": { "filtered": false, "detected": false },
        "protected_material_text": { "filtered": false, "detected": false }
      }
    }
  ],
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate":      { "filtered": false, "severity": "safe" },
        "self_harm": { "filtered": false, "severity": "safe" },
        "sexual":    { "filtered": false, "severity": "safe" },
        "violence":  { "filtered": false, "severity": "safe" },
        "jailbreak": { "filtered": false, "detected": false }
      }
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 10,
    "total_tokens": 19,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    }
  }
}
FieldTypeDescription
idstringCompletion ID, prefix chatcmpl-
objectstringAlways "chat.completion"
system_fingerprintstringIdentifies the exact model version that served the request
choices[0].message.contentstringThe generated text — this is the main output
choices[0].message.annotationsarrayStructured annotations added by the model (citations, tool calls)
choices[0].message.refusalstring | nullNon-null when the model explicitly refused to answer
choices[0].finish_reasonstringstop · length · content_filter
choices[0].content_filter_resultsobjectPer-category safety analysis of the response
prompt_filter_results[0].content_filter_resultsobjectPer-category safety analysis of the input prompt
usage.prompt_tokensintegerTokens consumed by your messages
usage.completion_tokensintegerTokens generated
usage.total_tokensintegerSum — billed in tokens for LLM models
usage.completion_tokens_detailsobjectBreakdown: reasoning, audio, speculative prediction tokens
usage.prompt_tokens_detailsobjectBreakdown: cached, audio tokens in the prompt

Stateful conversation — /v1/responses

POST /v1/responses adds stateful multi-turn conversations via previous_response_id. Pass the ID from a previous response and the server replays the conversation history without you resending the full message array.

{
  "model": "deepbrain-router",
  "input": "How is it different from the Inference API?",
  "previous_response_id": "resp_abc123",
  "instructions": "You are a helpful developer assistant."
}

Response (ResponseObject):

{
  "id": "resp_xyz789",
  "object": "response",
  "created_at": 1748000000,
  "model": "deepbrain-router",
  "status": "completed",
  "output_text": "The Prediction API generates media...",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "The Prediction API generates media..." }]
    }
  ],
  "usage": { "input_tokens": 12, "output_tokens": 41, "total_tokens": 53 }
}

Embeddings — /v1/embeddings

{
  "model": "deepbrain-router",
  "input": "A photorealistic mountain lake at sunrise",
  "encoding_format": "float"
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797101, "..."]
    }
  ],
  "model": "deepbrain-router",
  "usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

Error schemas — two different formats

The two APIs return errors in fundamentally different shapes. Knowing which format to parse is essential.

Prediction API error

{
  "error_id": "INSUFFICIENT_CREDITS",
  "message": "You do not have enough credits to run this prediction"
}

Error fields are top-levelerror_id and message are at the root of the object.

Inference API error (OpenAI-compatible)

{
  "error": {
    "message": "You have run out of credits.",
    "type": "authorization_error",
    "code": "insufficient_credits",
    "error_id": "INSUFFICIENT_CREDITS",
    "status": 402,
    "param": null,
    "request_id": "req_abc123xyz",
    "details": null
  }
}

Error fields are nested under "error". The Inference API error adds type, code, status, param, and request_id.

Always branch on error_id — it's the stable, machine-readable identifier for programmatic handling. Never parse message strings.

def parse_error(body: dict) -> dict:
    # Prediction API: error_id is top-level
    if "error_id" in body:
        return {"source": "prediction", "id": body["error_id"], "msg": body["message"]}

    # Inference API: nested under 'error'
    if "error" in body:
        err = body["error"]
        return {
            "source": "inference",
            "id": err["error_id"],
            "msg": err["message"],
            "request_id": err.get("request_id"),
        }

    return {}

How to tell which API a response came from

You see this in the response body...It came from...
"id" is a UUID (xxxxxxxx-xxxx-...)Prediction API
"id" starts with chatcmpl-Inference API — /v1/chat/completions
"id" starts with resp_Inference API — /v1/responses
Has a choices[] arrayInference API
Has an output[] array of CDN URLsPrediction API
Has status field (queued/processing/succeeded)Prediction API
Has object: "chat.completion" or object: "response"Inference API
Error has top-level error_idPrediction API error
Error has error.error_id nestedInference API error

Decision guide — which API to use?

TaskUseEndpoint
Generate image, video, or audioPrediction APIPOST /v1/predictions
LLM chat — answer a questionInference APIPOST /v1/chat/completions
LLM chat — multi-turn conversationInference APIPOST /v1/responses
Generate vector embeddingsInference APIPOST /v1/embeddings
Streaming text to a UIInference APIPOST /v1/chat/completions with stream: true
OpenAI SDK drop-in replacementInference APIChange base_url and key only

Summary — the key schemas

PREDICTION API
  POST /v1/predictions
  Body:    { model: string, input: { ...model-specific }, webhook?: string }
  Returns: Prediction Object
           { id: UUID, status: "queued|processing|succeeded|failed",
             type: "inference" ← ALWAYS - not the API name,
             output: ["https://cdn.url/..."] ← available when succeeded,
             metadata.billing.credits_used: number }

INFERENCE API
  POST /v1/chat/completions
  Body:    { model: string, messages: [{role, content}], stream?: bool, ... }
  Returns: { id: "chatcmpl-...", object: "chat.completion",
             choices: [{ message: { role, content } }], usage: {...} }

  POST /v1/responses
  Body:    { model: string, input: string|messages, previous_response_id?: string }
  Returns: { id: "resp_...", object: "response", output_text: string }

  POST /v1/embeddings
  Body:    { model: string, input: string|string[] }
  Returns: { data: [{ embedding: number[] }] }

Up next: hands-on time — you'll make your first real Prediction API call and generate an image.

On this page