Inference API
Skytells Inference API — the gateway for running LLMs, text, code, reasoning, and embedding workloads. Sub-APIs (Chat, Responses, Embeddings) operate under this umbrella.
Inference API
The Skytells Inference API is the gateway for running heavy AI workloads — large language models (LLMs), text generation, reasoning, code execution, and embeddings. Think of it the same way you think of the Predictions API: it is the umbrella that governs a family of specialized sub-APIs, each with its own endpoint, schema, and streaming behavior, This API is the backbone of the heavy compute and AI workloads that Skytells offers.
The sub-APIs that operate under Inference are:
Chat Completions
POST /v1/chat/completions — Turn-based conversation with LLMs. Streaming supported. Follows the OpenAI Chat Completions schema with additional Skytells safety fields.
Responses
POST /v1/responses — Stateful multi-turn responses with server-side conversation management. Streaming supported. Follows the OpenAI Responses schema with additional Skytells safety fields.
Embeddings
POST /v1/embeddings — Generate vector embeddings for semantic search and similarity. Follows the OpenAI Embeddings schema.
Enterprise Deployments
Run your models on dedicated Skytells infrastructure with private networking and custom hardware tiers.
The Inference API itself is not a single endpoint — it is the namespace under which Chat, Responses, and Embeddings operate. Each sub-API has its own request/response schema. OpenAI compatibility applies at the sub-API level, not at the Inference API level as a whole.
How Inference Differs from Predictions
Both Inference and Predictions run AI models, but they serve different workloads:
| Predictions API | Inference API | |
|---|---|---|
| Primary use | Media generation (images, video, audio) | LLMs, text, code, reasoning, embeddings |
| Execution model | Asynchronous (queue → poll or webhook) | Synchronous or streaming (single connection) |
| Sub-APIs | Single prediction schema | Chat Completions, Responses, Embeddings |
| OpenAI-compatible schemas | No | Yes — at the sub-API level (Chat, Responses, Embeddings) |
| Skytells safety fields | Model-dependent | Yes — content_filter_results on every response |
| Streaming | Task-specific | Yes — via "stream": true or ?stream=true |
Sub-API Schemas
Each sub-API under Inference has its own schema. All three currently follow OpenAI-compatible schemas, augmented with Skytells-specific safety fields:
| Sub-API | Endpoint | Schema Base | Skytells Additions |
|---|---|---|---|
| Chat Completions | POST /v1/chat/completions | OpenAI Chat Completions | content_filter_results per choice, prompt_filter_results at root |
| Responses | POST /v1/responses | OpenAI Responses | content_filters[] array at root |
| Embeddings | POST /v1/embeddings | OpenAI Embeddings | — |
Because Chat Completions and Responses follow the OpenAI schema as a base, you can use the official OpenAI SDK by pointing base_url to https://api.skytells.ai/v1. The additional Skytells safety fields are returned alongside the standard fields and are fully optional to consume.
Supported Models
These models are routed through the Inference API. See the Model Catalog for full schema and pricing details.
| Model | Namespace | Vendor | Input | Output |
|---|---|---|---|---|
| GPT-5 | gpt-5 | OpenAI | $0.50 / 1M tokens | $1.25 / 1M tokens |
| GPT-5.4 | gpt-5.4 | OpenAI | $0.50 / 1M tokens | $1.25 / 1M tokens |
| GPT-5.3 Codex | gpt-5.3-codex | OpenAI | $0.50 / 1M tokens | $1.25 / 1M tokens |
| DeepBrain Router | deepbrain-router | Skytells | $0.50 / 1M tokens | $1.25 / 1M tokens |
DeepBrain Router (deepbrain-router) is the recommended default. It intelligently routes each request to the best underlying model for the task — no model selection required on your end.
Using the OpenAI SDK with Inference Sub-APIs
Since Chat Completions, Responses, and Embeddings all follow OpenAI-compatible request/response schemas, you can use any OpenAI SDK or client by changing two values:
| Setting | OpenAI | Skytells |
|---|---|---|
base_url | https://api.openai.com/v1 | https://api.skytells.ai/v1 |
api_key | Your OpenAI key | Your Skytells key |
Drop-in replacement (Chat Completions)
from openai import OpenAI
client = OpenAI(
api_key="YOUR_SKYTELLS_API_KEY",
base_url="https://api.skytells.ai/v1",
)
response = client.chat.completions.create(
model="deepbrain-router",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Authentication
Authenticate using your Skytells API key in the x-api-key header (or as Authorization: Bearer ... for OpenAI SDK compatibility). Obtain your key from the Console.
x-api-key: YOUR_API_KEYSee Authentication for full details.
Endpoints Overview
| Method | Path | Sub-API | Streaming |
|---|---|---|---|
POST | /v1/chat/completions | Chat Completions | Yes — "stream": true or ?stream=true |
POST | /v1/responses | Responses | Yes — "stream": true |
POST | /v1/embeddings | Embeddings | No |
Enterprise Deployments
Skytells offers enterprise deployments for organizations that need dedicated infrastructure and private networking, offering custom hardware tiers and private networking options.
See Enterprise Deployments for more details.
Error Handling
Errors from any Inference sub-API follow a consistent schema with additional Skytells fields. See Inference Errors for the full error catalog and retry guidance.
{
"error": {
"message": "The model 'unknown-model' was not found.",
"type": "server_error",
"code": "model_not_found",
"error_id": "MODEL_NOT_FOUND",
"status": 404,
"request_id": "req_abc123"
}
}See Inference Error Reference for all error codes and retry guidance.
How is this guide?
Webhooks
Receive real-time notifications when prediction lifecycle events occur.
Enterprise Deployments
Unlike standard endpoints, Enterprise customers get fully dedicated endpoints secured and managed by Skytells—private via Skytells Private Network or internet per org settings. Contact Sales for a custom deployment or endpoint.