Inference

Inference API

Skytells Inference API — the gateway for running LLMs, text, code, reasoning, and embedding workloads. Sub-APIs (Chat, Responses, Embeddings) operate under this umbrella.

Inference API

The Skytells Inference API is the gateway for running heavy AI workloads — large language models (LLMs), text generation, reasoning, code execution, and embeddings. Think of it the same way you think of the Predictions API: it is the umbrella that governs a family of specialized sub-APIs, each with its own endpoint, schema, and streaming behavior, This API is the backbone of the heavy compute and AI workloads that Skytells offers.

The sub-APIs that operate under Inference are:


How Inference Differs from Predictions

Both Inference and Predictions run AI models, but they serve different workloads:

Predictions APIInference API
Primary useMedia generation (images, video, audio)LLMs, text, code, reasoning, embeddings
Execution modelAsynchronous (queue → poll or webhook)Synchronous or streaming (single connection)
Sub-APIsSingle prediction schemaChat Completions, Responses, Embeddings
OpenAI-compatible schemasNoYes — at the sub-API level (Chat, Responses, Embeddings)
Skytells safety fieldsModel-dependentYes — content_filter_results on every response
StreamingTask-specificYes — via "stream": true or ?stream=true

Sub-API Schemas

Each sub-API under Inference has its own schema. All three currently follow OpenAI-compatible schemas, augmented with Skytells-specific safety fields:

Sub-APIEndpointSchema BaseSkytells Additions
Chat CompletionsPOST /v1/chat/completionsOpenAI Chat Completionscontent_filter_results per choice, prompt_filter_results at root
ResponsesPOST /v1/responsesOpenAI Responsescontent_filters[] array at root
EmbeddingsPOST /v1/embeddingsOpenAI Embeddings

Supported Models

These models are routed through the Inference API. See the Model Catalog for full schema and pricing details.

ModelNamespaceVendorInputOutput
GPT-5gpt-5OpenAI$0.50 / 1M tokens$1.25 / 1M tokens
GPT-5.4gpt-5.4OpenAI$0.50 / 1M tokens$1.25 / 1M tokens
GPT-5.3 Codexgpt-5.3-codexOpenAI$0.50 / 1M tokens$1.25 / 1M tokens
DeepBrain Routerdeepbrain-routerSkytells$0.50 / 1M tokens$1.25 / 1M tokens

Using the OpenAI SDK with Inference Sub-APIs

Since Chat Completions, Responses, and Embeddings all follow OpenAI-compatible request/response schemas, you can use any OpenAI SDK or client by changing two values:

SettingOpenAISkytells
base_urlhttps://api.openai.com/v1https://api.skytells.ai/v1
api_keyYour OpenAI keyYour Skytells key

Drop-in replacement (Chat Completions)

OpenAI SDK
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_SKYTELLS_API_KEY",
    base_url="https://api.skytells.ai/v1",
)

response = client.chat.completions.create(
    model="deepbrain-router",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Authentication

Authenticate using your Skytells API key in the x-api-key header (or as Authorization: Bearer ... for OpenAI SDK compatibility). Obtain your key from the Console.

x-api-key: YOUR_API_KEY

See Authentication for full details.


Endpoints Overview

MethodPathSub-APIStreaming
POST/v1/chat/completionsChat CompletionsYes — "stream": true or ?stream=true
POST/v1/responsesResponsesYes — "stream": true
POST/v1/embeddingsEmbeddingsNo

Enterprise Deployments

Skytells offers enterprise deployments for organizations that need dedicated infrastructure and private networking, offering custom hardware tiers and private networking options.

See Enterprise Deployments for more details.

Error Handling

Errors from any Inference sub-API follow a consistent schema with additional Skytells fields. See Inference Errors for the full error catalog and retry guidance.

{
  "error": {
    "message": "The model 'unknown-model' was not found.",
    "type": "server_error",
    "code": "model_not_found",
    "error_id": "MODEL_NOT_FOUND",
    "status": 404,
    "request_id": "req_abc123"
  }
}

See Inference Error Reference for all error codes and retry guidance.

How is this guide?

On this page