API v1 Overview

Skytells API v1 — a RESTful, resource-based API for AI predictions, models, and more.

API v1

The Skytells API v1 is a RESTful, resource-based API. Every endpoint maps to a resource — models, predictions, webhooks — following standard HTTP conventions.

All endpoints are served under:

https://api.skytells.ai/v1

How It Works

A single request travels through four stages before you receive a response:

API Schemas

The API is organized around two interaction models.

Predictions for media generation and Inference for other tasks like text, code, reasoning, compute etc.

Difference between Predictions and Inference

Task	Predictions	Inference
Media generation	Yes	Depending on the model
Text, code, reasoning, compute	Depends on the model	Yes
Compute	Skytells-tasks	Yes
Synchronous	YES	Yes
Webhooks	Yes	Task Specific
Polling	Yes	Yes
OpenAI-compatible	No	Yes — at the sub-API level: Chat Completions, Responses, and Embeddings each follow OpenAI-compatible schemas
Streaming	Task Specific	Yes when `stream` is enabled
Schema	Prediction Object Schema	Sub-API specific: Chat Completions, Responses, Embeddings — all follow OpenAI-compatible schemas augmented with Skytells safety fields

Each model is mapped to the appropriate API based on its capabilities and requirements. Every model has a unique API type, and each API type has a unique schema, These schemas are defined in the Model Catalog including the inference API for the model.

PLAYGROUND

If you're inferenceing a model or creating a prediction, you can use the Playground to test the model and see the response schema, Schema and inference endpoint are clearly shown in the UI under the "API" tab.

Good to know that:

Predictions API always uses the Prediction Object Schema.
Inference API operates through its sub-APIs — Chat Completions, Responses, and Embeddings — each with their own OpenAI-compatible schema augmented by Skytells safety fields.

API Flow

Here's how the API endpoints orchestrate the two interaction models:

When API v1 receives a request, Skytells's Orchestrator Layer determines whether it should be handled by the Predictions API or the Inference API. As task state changes occur within either API, webhook events are emitted through the Webhook Dispatcher to notify subscribed systems.

For Inference API flow, jump to the Inference API Lifecycle section.

Prediction API

Asynchronous/Synchronous inference for image, video, and audio models. Responses are queued, tracked by ID, and delivered via webhook or polling.

Inference API

Synchronous/Streaming inference for LLMs, text, code, reasoning, and embeddings. Sub-APIs (Chat Completions, Responses, Embeddings) follow OpenAI-compatible schemas — swap base_url and keep your existing code.

Prediction Lifecycle

A prediction moves through a series of states from the moment you create it until it completes. At each transition, Skytells can fire a webhook to your server so you never need to poll.

Status Meanings

Status	Description
`pending`	Request accepted, waiting for a worker to pick it up
`starting`	Worker assigned, model is loading into memory
`processing`	Model is actively running inference on your input
`succeeded`	Inference completed — outputs are available in `output[]`
`failed`	An error occurred — check `error` field for details
`cancelled`	Cancelled by the user before completion

Webhook Triggers

Every status transition can fire a webhook to your server. Subscribe to any combination of events on each prediction request:

Event	Fired when status reaches
`prediction.started`	`processing`
`prediction.succeeded`	`succeeded`
`prediction.failed`	`failed`
`prediction.cancelled`	`cancelled`

Skip polling — use webhooks. Instead of repeatedly calling GET /v1/predictions/:id to check if a prediction is done, register a webhook URL in your prediction request. Skytells will POST the full prediction object to your server the moment anything changes. See the Webhooks reference for setup, event payloads, and signature verification.

Prediction Object Schema

For detailed information on the Prediction Object JSON Schema, see the Prediction API reference.

Please note that the Prediction Object Schema is model-dependent. Text models (deepbrain-router, gpt-5, gpt-5.4) expose the full OpenAI-compatible schema. Image, video, and audio models use the Predictions API which has a different request format. Always check the Model Catalog for a model's API type before integrating.

Inference Lifecycle

Inference requests are synchronous — you send a request and receive a response in a single connection. There is no polling, no prediction ID, and no webhook needed.

Inference Infrastructure

Where a request is executed depends on the model:

Model Source	Inference Location	Notes
Skytells native models	Skytells global infrastructure	Low-latency, edge-compatible nodes. No data leaves Skytells.
Partner-provided models	Partner infrastructure	Execution is delegated to the partner under a formal infrastructure agreement. Skytells remains the billing and API layer.

Regardless of where inference is executed, all requests are authenticated, billed, and governed by Skytells. Partner infrastructure arrangements are contractual and transparent — they do not change the API surface, pricing model, or your data handling obligations with Skytells.

Inference API Schema

The Inference API is not a single schema — it is an umbrella for three sub-APIs, each with its own schema:

Chat Completions — OpenAI Chat Completions schema, plus Skytells content_filter_results and prompt_filter_results safety fields.
Responses — OpenAI Responses schema, plus Skytells content_filters[] array for prompt and completion safety.
Embeddings — OpenAI Embeddings schema.

Because the sub-APIs follow OpenAI-compatible schemas as their base, existing OpenAI SDK code requires only a base_url change. The additional Skytells safety fields are returned alongside the standard fields and are safe to ignore.

OpenAI compatibility applies at the sub-API level — not at the Inference API level as a whole. Image, video, and audio models use the Predictions API which has a different request format. Always check the Model Catalog for a model's API type before integrating.

Authentication

All requests require your API key in the x-api-key header. For setup, examples, and security best practices, see Authentication.

HTTP Methods

Method	Usage	Example
`GET`	Retrieve a resource or list	`GET /v1/models`
`POST`	Create a new resource	`POST /v1/predictions`
`DELETE`	Remove a resource	`DELETE /v1/predictions/{id}`

Pagination

List endpoints support pagination via query parameters:

Parameter	Type	Default	Description
`per_page`	`integer`	`15`	Results per page (max `50`)
`page`	`integer`	`1`	Page number

Error Codes

Most REST responses use one Standard Unified Error Shape

Standard Unified Error Shape

{
  "status": false,
  "response": "short summary",
  "error": {
    "http_status": http_status_code,
    "message": "human-readable reason",
    "details": "extra context",
    "error_id": "error_code_value"
  }
}

Please note, That endpoints like Predictions and Models return the Standard Unified Error Shape and you should branch on error.error_id. Inference (Chat, Responses, Embeddings) uses a different OpenAI-style error document, top-level error only (no status / response wrapper). See Error objects and API errors.

HTTP	Common `error_id` values (not exhaustive)
`400`	`INVALID_PARAMETER` · `INVALID_PARAMETER_COMBINATION` · `INVALID_DATE_FORMAT` · `INVALID_DATE_RANGE` · `UNSUPPORTED_MEDIA_TYPE` · …
`401`	`UNAUTHORIZED` · `ACCOUNT_SUSPENDED`

For the full list of error codes, see the API errors page.

API v1

How It Works

API Schemas

Difference between Predictions and Inference

How models are mapped to APIs

API Flow

Prediction API

Inference API

Prediction Lifecycle

Status Meanings

Webhook Triggers

Prediction Object Schema

Inference Lifecycle

Inference Infrastructure

Inference API Schema

Authentication

HTTP Methods

Error Codes

Resources

Models

Prediction API

Inference API

Webhooks

Status

On this page