The Two APIs — Prediction & Inference Schemas

Understand the Prediction API and Inference API side by side — their request schemas, response schemas, lifecycles, error formats, and exactly when to use each one.

What you'll be able to do after this module

Read any Skytells request or response and immediately know which API it belongs to. Understand every field in the Prediction Object and the ChatCompletionResponse. Never confuse the type: "inference" field on Predictions again.

Two APIs, one base URL

The Skytells REST API exposes two distinct sub-APIs under https://api.skytells.ai/v1:

	Prediction API	Inference API
What it generates	Media — images, video, audio	Text — LLM completions, embeddings
Endpoints	`POST /v1/predictions` · `GET /v1/predictions/:id`	`POST /v1/chat/completions` · `POST /v1/responses` · `POST /v1/embeddings`
Response style	Async — jobs are queued and polled, Sync — with `wait` parameter enabled.	Synchronous or streaming (SSE)
Auth header	`x-api-key: sk-...`	`x-api-key: sk-...` or `Authorization: Bearer sk-...`
OpenAI-compatible	No	Yes — swap `base_url` + key, nothing else changes
Schema family	Skytells Prediction schema	OpenAI-compatible schema

Both use the same API key. Both follow REST conventions. Everything else is different.

The Prediction API

Request schema

POST /v1/predictions

{
  "model": "truefusion-pro",
  "input": {
    "prompt": "A red fox in a snowy forest, golden hour cinematic",
    "aspect_ratio": "16:9",
    "num_outputs": 1,
    "guidance": 3.0
  },
  "webhook": "https://yourapp.com/webhooks/skytells"
}

Field	Type	Required	Description
`model`	string	✓	The model's `namespace` value from `/v1/models` (e.g. `truefusion-pro`)
`input`	object	✓	Model-specific input fields. Structure defined by the model's `input_schema`
`webhook`	string	—	URL to receive prediction lifecycle events (recommended for video/audio)

The input object's shape varies per model. Every model exposes its input_schema as a JSON Schema object in the /v1/models/:namespace response. That schema defines exactly which fields are valid, required, and their types.

How to read a model's input_schema

# Fetch the input schema for truefusion-pro
curl https://api.skytells.ai/v1/models/truefusion-pro \
  -H "x-api-key: $SKYTELLS_API_KEY" \
  | python3 -m json.tool | grep -A 50 '"input_schema"'

A typical image model input_schema looks like:

{
  "type": "object",
  "title": "Input",
  "required": ["prompt"],
  "properties": {
    "prompt": {
      "type": "string",
      "description": "Text prompt for image generation",
      "x-order": 0
    },
    "aspect_ratio": {
      "type": "string",
      "enum": ["1:1", "16:9", "9:16", "4:3", "3:4"],
      "default": "1:1",
      "description": "Aspect ratio for the generated image",
      "x-order": 1
    },
    "num_outputs": {
      "type": "integer",
      "default": 1,
      "minimum": 1,
      "maximum": 4,
      "description": "Number of images to generate",
      "x-order": 4
    },
    "guidance": {
      "type": "number",
      "default": 3.0,
      "minimum": 0,
      "maximum": 10,
      "description": "Guidance for generated image",
      "x-order": 6
    },
    "seed": {
      "type": "integer",
      "description": "Random seed. Set for reproducible generation",
      "x-order": 7
    }
  }
}

Schema field	What it means
`required`	Fields the model will reject if missing
`properties[field].type`	Data type: `string`, `integer`, `number`, `boolean`, `array`
`properties[field].enum`	Allowed values only — passing anything else errors
`properties[field].default`	Omit this field and the model uses this value
`properties[field].minimum` / `maximum`	Numeric bounds
`x-order`	Display ordering in the Console — no functional effect

Prediction Object schema (response)

Both POST /v1/predictions and GET /v1/predictions/:id return a Prediction Object:

{
  "id": "d05b96fc-7fdf-4528-8e61-aa1092f48040",
  "status": "succeeded",
  "type": "inference",
  "stream": false,
  "input": {
    "prompt": "romantic, love, r&b",
    "lyrics": "[verse]\nYour smile lights the sky..."
  },
  "output": [
    "https://delivery.skytells.cloud/us/2026/03/06/7bfdb9a4.mp3"
  ],
  "created_at": "2026-03-06T20:15:23.000000Z",
  "started_at": "2026-03-06T20:14:35.160110297Z",
  "completed_at": "2026-03-06T20:15:23.293497292Z",
  "updated_at": "2026-03-06T20:15:24.000000Z",
  "privacy": "private",
  "source": "api",
  "model": { "name": "beatfusion-2.0", "type": "audio" },
  "webhook": { "url": null, "events": [] },
  "metrics": { "predict_time": 48.13, "total_time": 48.16 },
  "metadata": {
    "billing": { "credits_used": 0.75 },
    "storage": { "files": [] },
    "data_available": true
  },
  "urls": {
    "get": "https://api.skytells.ai/v1/predictions/d05b96fc-.../",
    "cancel": "https://api.skytells.ai/v1/predictions/d05b96fc-.../cancel",
    "stream": "https://api.skytells.ai/v1/predictions/d05b96fc-.../stream",
    "delete": "https://api.skytells.ai/v1/predictions/d05b96fc-.../delete"
  }
}

Field	Type	Description
`id`	UUID string	Unique prediction ID — use to poll status
`status`	string	`queued` · `processing` · `succeeded` · `failed` · `canceled`
`type`	string	Always `"inference"` — see note below
`input`	object	The exact input you sent
`output`	array of strings	CDN URLs to the generated files. `null` until `succeeded`
`model.name`	string	Model namespace that ran
`model.type`	string	`"image"` · `"video"` · `"audio"`
`metrics.predict_time`	number	Seconds the model spent generating
`metrics.total_time`	number	Total wall-clock time including queuing
`metadata.billing.credits_used`	number	Credit cost deducted for this prediction
`urls.get`	string	Canonical URL to poll this prediction
`urls.cancel`	string	`POST` this to cancel if still processing

type: "inference" does NOT mean this object came from the Inference API. Every prediction — whether it generates an image, video, or audio — has type: "inference" because all AI model runs are inference computations internally. This field describes the computation type, not the API route. You'll always get this field on Prediction Objects regardless of which model you ran.

Prediction lifecycle

Fast image models (e.g. truefusion-edge): often return succeeded in the initial POST — no polling needed
Standard image models (e.g. truefusion-pro): 5–20 seconds — poll GET /v1/predictions/:id
Video and audio models: 30 seconds to several minutes — use webhooks

The Inference API

Request schema

POST /v1/chat/completions

{
  "model": "deepbrain-router",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "What is machine learning?" }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 512
}

Field	Type	Required	Description
`model`	string	✓	LLM namespace: `deepbrain-router` · `gpt-5` · `gpt-5.4`
`messages`	array	✓	Conversation history. Each item has `role` (`system`/`user`/`assistant`) and `content` (string)
`stream`	boolean	—	`true` = stream tokens as server-sent events. Default `false`
`max_tokens`	integer	—	Max tokens to generate. Default `8192`
`temperature`	number	—	Randomness: 0 = deterministic, 2 = very creative. Default `0.7`
`top_p`	number	—	Nucleus sampling. Default `0.95`
`frequency_penalty`	number	—	Penalise repeated tokens (−2 to 2). Default `0.0`
`presence_penalty`	number	—	Penalise already-used tokens (−2 to 2). Default `0.0`
`stop`	string or array	—	Stop sequences — generation halts at any match

ChatCompletionResponse schema

{
  "id": "chatcmpl-DKQ7HZtNYLc7uK0Dpn0JggRUUhuBE",
  "object": "chat.completion",
  "created": 1773759323,
  "model": "deepbrain-router",
  "system_fingerprint": "fp_490a4ad033",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Machine learning is a branch of AI...",
        "annotations": [],
        "refusal": null
      },
      "content_filter_results": {
        "hate":                    { "filtered": false, "severity": "safe" },
        "self_harm":               { "filtered": false, "severity": "safe" },
        "sexual":                  { "filtered": false, "severity": "safe" },
        "violence":                { "filtered": false, "severity": "safe" },
        "protected_material_code": { "filtered": false, "detected": false },
        "protected_material_text": { "filtered": false, "detected": false }
      }
    }
  ],
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate":      { "filtered": false, "severity": "safe" },
        "self_harm": { "filtered": false, "severity": "safe" },
        "sexual":    { "filtered": false, "severity": "safe" },
        "violence":  { "filtered": false, "severity": "safe" },
        "jailbreak": { "filtered": false, "detected": false }
      }
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 10,
    "total_tokens": 19,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    }
  }
}

Responsible AI & Safety: Skytells enforces content safety on every inference response. The content_filter_results field on each choice shows real-time analysis of the model's output. The prompt_filter_results array does the same for your input. A filtered: true value means Skytells blocked that content from being sent or received. Your application should check finish_reason: "content_filter" to detect when a response was halted by the safety layer.

Field	Type	Description
`id`	string	Completion ID, prefix `chatcmpl-`
`object`	string	Always `"chat.completion"`
`system_fingerprint`	string	Identifies the exact model version that served the request
`choices[0].message.content`	string	The generated text — this is the main output
`choices[0].message.annotations`	array	Structured annotations added by the model (citations, tool calls)
`choices[0].message.refusal`	string \| null	Non-null when the model explicitly refused to answer
`choices[0].finish_reason`	string	`stop` · `length` · `content_filter`
`choices[0].content_filter_results`	object	Per-category safety analysis of the response
`prompt_filter_results[0].content_filter_results`	object	Per-category safety analysis of the input prompt
`usage.prompt_tokens`	integer	Tokens consumed by your messages
`usage.completion_tokens`	integer	Tokens generated
`usage.total_tokens`	integer	Sum — billed in tokens for LLM models
`usage.completion_tokens_details`	object	Breakdown: reasoning, audio, speculative prediction tokens
`usage.prompt_tokens_details`	object	Breakdown: cached, audio tokens in the prompt

Stateful conversation — /v1/responses

POST /v1/responses adds stateful multi-turn conversations via previous_response_id. Pass the ID from a previous response and the server replays the conversation history without you resending the full message array.

{
  "model": "deepbrain-router",
  "input": "How is it different from the Inference API?",
  "previous_response_id": "resp_abc123",
  "instructions": "You are a helpful developer assistant."
}

Response (ResponseObject):

{
  "id": "resp_xyz789",
  "object": "response",
  "created_at": 1748000000,
  "model": "deepbrain-router",
  "status": "completed",
  "output_text": "The Prediction API generates media...",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "The Prediction API generates media..." }]
    }
  ],
  "usage": { "input_tokens": 12, "output_tokens": 41, "total_tokens": 53 }
}

Embeddings — /v1/embeddings

{
  "model": "deepbrain-router",
  "input": "A photorealistic mountain lake at sunrise",
  "encoding_format": "float"
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797101, "..."]
    }
  ],
  "model": "deepbrain-router",
  "usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

Error schemas — two different formats

The two APIs return errors in fundamentally different shapes. Knowing which format to parse is essential.

Prediction API error

{
  "error_id": "INSUFFICIENT_CREDITS",
  "message": "You do not have enough credits to run this prediction"
}

Error fields are top-level — error_id and message are at the root of the object.

Inference API error (OpenAI-compatible)

{
  "error": {
    "message": "You have run out of credits.",
    "type": "authorization_error",
    "code": "insufficient_credits",
    "error_id": "INSUFFICIENT_CREDITS",
    "status": 402,
    "param": null,
    "request_id": "req_abc123xyz",
    "details": null
  }
}

Error fields are nested under "error". The Inference API error adds type, code, status, param, and request_id.

Always branch on error_id — it's the stable, machine-readable identifier for programmatic handling. Never parse message strings.

def parse_error(body: dict) -> dict:
    # Prediction API: error_id is top-level
    if "error_id" in body:
        return {"source": "prediction", "id": body["error_id"], "msg": body["message"]}

    # Inference API: nested under 'error'
    if "error" in body:
        err = body["error"]
        return {
            "source": "inference",
            "id": err["error_id"],
            "msg": err["message"],
            "request_id": err.get("request_id"),
        }

    return {}

How to tell which API a response came from

You see this in the response body...	It came from...
`"id"` is a UUID (`xxxxxxxx-xxxx-...`)	Prediction API
`"id"` starts with `chatcmpl-`	Inference API — `/v1/chat/completions`
`"id"` starts with `resp_`	Inference API — `/v1/responses`
Has a `choices[]` array	Inference API
Has an `output[]` array of CDN URLs	Prediction API
Has `status` field (`queued`/`processing`/`succeeded`)	Prediction API
Has `object: "chat.completion"` or `object: "response"`	Inference API
Error has top-level `error_id`	Prediction API error
Error has `error.error_id` nested	Inference API error

Decision guide — which API to use?

Task	Use	Endpoint
Generate image, video, or audio	Prediction API	`POST /v1/predictions`
LLM chat — answer a question	Inference API	`POST /v1/chat/completions`
LLM chat — multi-turn conversation	Inference API	`POST /v1/responses`
Generate vector embeddings	Inference API	`POST /v1/embeddings`
Streaming text to a UI	Inference API	`POST /v1/chat/completions` with `stream: true`
OpenAI SDK drop-in replacement	Inference API	Change `base_url` and key only

Summary — the key schemas

PREDICTION API
  POST /v1/predictions
  Body:    { model: string, input: { ...model-specific }, webhook?: string }
  Returns: Prediction Object
           { id: UUID, status: "queued|processing|succeeded|failed",
             type: "inference" ← ALWAYS - not the API name,
             output: ["https://cdn.url/..."] ← available when succeeded,
             metadata.billing.credits_used: number }

INFERENCE API
  POST /v1/chat/completions
  Body:    { model: string, messages: [{role, content}], stream?: bool, ... }
  Returns: { id: "chatcmpl-...", object: "chat.completion",
             choices: [{ message: { role, content } }], usage: {...} }

  POST /v1/responses
  Body:    { model: string, input: string|messages, previous_response_id?: string }
  Returns: { id: "resp_...", object: "response", output_text: string }

  POST /v1/embeddings
  Body:    { model: string, input: string|string[] }
  Returns: { data: [{ embedding: number[] }] }

Up next: hands-on time — you'll make your first real Prediction API call and generate an image.

PreviousModels & Billing NextYour First Prediction

The Two APIs — Prediction & Inference Schemas

On this page