Inference API

Skytells Inference API — the gateway for running LLMs, text, code, reasoning, and embedding workloads. Sub-APIs (Chat, Responses, Embeddings) operate under this umbrella.

About Inference API

The Skytells Inference API is the gateway for running heavy AI workloads — large language models (LLMs), text generation, reasoning, code execution, and embeddings. Think of it the same way you think of the Predictions API: it is the umbrella that governs a family of specialized sub-APIs, each with its own endpoint, schema, and streaming behavior, This API is the backbone of the heavy compute and AI workloads that Skytells offers.

The sub-APIs that operate under Inference are:

Chat Completions

POST /v1/chat/completions — Turn-based conversation with LLMs. Streaming supported. Follows the OpenAI Chat Completions schema with additional Skytells safety fields.

Responses

POST /v1/responses — Stateful multi-turn responses with server-side conversation management. Streaming supported. Follows the OpenAI Responses schema with additional Skytells safety fields.

Embeddings

POST /v1/embeddings — Generate vector embeddings for semantic search and similarity. Follows the OpenAI Embeddings schema.

Enterprise Deployments

Run your models on dedicated Skytells infrastructure with private networking and custom hardware tiers.

The Inference API itself is not a single endpoint — it is the namespace under which Chat, Responses, and Embeddings operate. Each sub-API has its own request/response schema. OpenAI compatibility applies at the sub-API level, not at the Inference API level as a whole.

How Inference Differs from Predictions

Both Inference and Predictions run AI models, but they serve different workloads:

	Predictions API	Inference API
Primary use	Media generation (images, video, audio)	LLMs, text, code, reasoning, embeddings
Execution model	Asynchronous (queue → poll or webhook)	Synchronous or streaming (single connection)
Sub-APIs	Single prediction schema	Chat Completions, Responses, Embeddings
OpenAI-compatible schemas	No	Yes — at the sub-API level (Chat, Responses, Embeddings)
Skytells safety fields	Model-dependent	Yes — `content_filter_results` on every response
Streaming	Task-specific	Yes — via `"stream": true` or `?stream=true`

Sub-API Schemas

Each sub-API under Inference has its own schema. All three currently follow OpenAI-compatible schemas, augmented with Skytells-specific safety fields:

Sub-API	Endpoint	Schema Base	Skytells Additions
Chat Completions	`POST /v1/chat/completions`	OpenAI Chat Completions	`content_filter_results` per choice, `prompt_filter_results` at root
Responses	`POST /v1/responses`	OpenAI Responses	`content_filters[]` array at root
Embeddings	`POST /v1/embeddings`	OpenAI Embeddings	—

Because Chat Completions and Responses follow the OpenAI schema as a base, you can use the official OpenAI SDK by pointing base_url to https://api.skytells.ai/v1. The additional Skytells safety fields are returned alongside the standard fields and are fully optional to consume.

Supported Models

These are some of the models that are routed through the Inference API. See the Model Catalog for full schema and pricing details.

Model	Namespace	Vendor	Input	Output
GPT-5	`gpt-5`	OpenAI	$0.50 / 1M tokens	$1.25 / 1M tokens
GPT-5.4	`gpt-5.4`	OpenAI	$0.50 / 1M tokens	$1.25 / 1M tokens
GPT-5.4 Mini	`gpt-5.4-mini`	OpenAI	$0.75 / 1M tokens	$4.50 / 1M tokens
GPT-5.3 Codex	`gpt-5.3-codex`	OpenAI	$0.50 / 1M tokens	$1.25 / 1M tokens
Llama 3.1 8B	`llama-3.1-8b`	Meta	$0.12 / 1M tokens	$0.12 / 1M tokens
DeepBrain Router	`deepbrain-router`	Skytells	$0.50 / 1M tokens	$1.25 / 1M tokens

You can also use the Model Catalog to get the full list of models that are supported by the Inference API.

For model catalog information, Always refer to Skytells's public model catalog for the latest and most accurate information, It is the source of truth.

DeepBrain Router (deepbrain-router) is the recommended default. It intelligently routes each request to the best underlying model for the task — no model selection required on your end.

Using the OpenAI SDK with Inference Sub-APIs

Since Chat Completions, Responses, and Embeddings all follow OpenAI-compatible request/response schemas, you can use any OpenAI SDK or client by changing two values:

Setting	OpenAI	Skytells
`base_url`	`https://api.openai.com/v1`	`https://api.skytells.ai/v1`
`api_key`	Your OpenAI key	Your Skytells key

Drop-in replacement (Chat Completions)

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_SKYTELLS_API_KEY",
    base_url="https://api.skytells.ai/v1",
)

response = client.chat.completions.create(
    model="deepbrain-router",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Authentication

Authenticate using your Skytells API key in the x-api-key header (or as Authorization: Bearer ... for OpenAI SDK compatibility). Obtain your key from the Console.

x-api-key: YOUR_API_KEY

See Authentication for full details.

Endpoints Overview

Method	Path	Sub-API	Streaming
`POST`	`/v1/chat/completions`	Chat Completions	Yes — `"stream": true` or `?stream=true`
`POST`	`/v1/responses`	Responses	Yes — `"stream": true`
`POST`	`/v1/embeddings`	Embeddings	No

Enterprise Deployments

Skytells offers enterprise deployments for organizations that need dedicated infrastructure and private networking, offering custom hardware tiers and private networking options.

See Enterprise Deployments for more details.

Error handling

Inference failures extend API errors (HTTP semantics, auth, credits, 429). On error, responses follow the Inference error response shape ({ "error": { ... } }). The Inference API errors catalog lists every error_id, resolution steps, and code samples.

How is this guide?

Chat Completions

Responses

Embeddings

Enterprise Deployments

Drop-in replacement (Chat Completions)

On this page