Conversations API
Reference overview for the Chat Completions sub-API — POST /v1/chat/completions.
Chat API / Conversations API
Chat API, also known as the Conversations API, It's a part of the Inference APIs, which provides a turn-based conversation model. It accepts a history of messages and returns the model's next reply — synchronously or as a stream of SSE chunks with built-in safety evaluations. The Chat API is ideal for traditional conversational use cases where you want to maintain a message history and get the model's response in a chat format. It's fully compatible with the OpenAI Chat Completions API, with Skytells-specific additions for content safety filtering and jailbreak detection. For a modern alternative to chat completions with persistent memory and richer streaming events, see the Responses API. and for safety features, see Safety and Responsible AI.
- Endpoint:
POST /v1/chat/completions - SDK access:
client.chat.completions.create(params) - Streaming: send
"stream": true— returnsAsyncIterable<ChatCompletionChunk>(SDK) or SSE (REST) - OpenAI-compatible: yes — same schema, augmented with Skytells
content_filter_resultsandprompt_filter_results
How it works
The Conversation API (Also known as Chat API) is designed for turn-based conversations. You send a list of messages representing the conversation history, and the model generates the next reply. Each message has a role (system, user, or assistant) and content (text or multimodal). The API processes these messages in order and produces a response that continues the conversation. You can receive the response as a complete ChatCompletion as JSON object or as a stream of incremental updates (SSE) for real-time applications.
As part of Skytells’ commitment to safety, responsible AI, and the proper use of AI technologies, the API includes built-in content safety evaluations for both input prompts and generated completions, helping you detect, monitor, and filter harmful content.
When to use Chat API vs Responses API
| Use Case | Recommendation |
|---|---|
| Simple Q&A, stateless tasks | Chat API — send message history each turn |
| Multi-turn agents, memory across calls | Responses API — use previous_response_id |
| Tool-using agents with persistent context | Responses API — store: true required |
| Low-latency, token-efficient | Chat API (no server storage overhead) |
Create a Chat Completion
Full endpoint reference — every request parameter, response shape, streaming format, and multi-client code examples.
Chat Objects
Named type definitions: ChatCompletion, ChatCompletionChunk, ChatMessage, ContentFilterResults, PromptFilterResults, ChatCompletionUsage.
Quick Example
Create a chat completion
import Skytells from 'skytells';
const client = Skytells(process.env.SKYTELLS_API_KEY);
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' },
],
});
console.log(completion.choices[0].message.content); // "Paris"
console.log(completion.usage.total_tokens);Returns a ChatCompletion object, or a stream of ChatCompletionChunk objects when stream: true.
Conversation API FAQs
Which models can I use with the Chat API?
The Chat API supports all Skytells's language models along with models by our partners, including general-purpose models like
deepbrain-routerandgpt-5-nano, as well as specialized models for tasks like coding, reasoning, or multimodal input. Please refer to the Model Catalog for the latest list of available models and their capabilities. When creating a chat completion, specify the desired model in themodelparameter of your request.
Can I use the Chat API for real-time applications?
Yes! By setting
"stream": truein your request, you can receive the model's response as a stream of SSE chunks, allowing you to display the response incrementally as it's generated. For even faster responses (under 2 seconds), consider using fast models likedeepbrain-routerorgpt-5-nano.
How does the Chat API handle content safety?
The Chat API includes built-in content safety evaluations for both input prompts and generated completions. Each response includes
content_filter_resultsfor the generated completion andprompt_filter_resultsfor the input messages, which categorize and rate the severity of any potentially harmful content. You can use this information to implement your own filtering logic or to monitor the safety of interactions. For more details, see the Safety Types and Responsible AI documentation.
Is the Chat API compatible with the OpenAI Chat Completions API?
Yes! The Chat API follows the same schema as OpenAI's Chat Completions API, with Skytells-specific additions for content safety filtering. You can use the OpenAI SDK with Skytells by simply changing the
baseURLtohttps://api.skytells.ai/v1. All standard parameters and response formats are supported, along with Skytells' enhanced safety features which can only accessed through Skytells-specific fields in the response, these fields can be safely ignored when using the OpenAI SDK, But you can take advantage of them with Skytells SDKs. For more details on the schema and safety features, see the Chat Objects reference.
Which Skytells API handles requests to the Chat API?
The Chat API is part of the Skytells Inference APIs, which also include the Responses API and the Embeddings API.
How to debug or monitor Chat API usage?
Skytells provides detailed response objects that include usage information (token counts), content safety evaluations, and error messages when applicable. You can log these details in your application for monitoring and debugging purposes. Additionally, Skytells' dashboard offers analytics and insights into your API usage, including breakdowns of which models you're using, how many tokens are being processed, and any safety filter triggers. For more information on the response schema and safety features, see the Chat Objects reference and the Safety Types documentation.
How is this guide?
Compute
Skytells CPU and GPU hardware tiers for dedicated Enterprise inference—per-second billing, standard and multi-GPU options. See Enterprise inference for endpoints and networking.
Create ChatPOST
POST /v1/chat/completions — full parameter reference with code examples, streaming format, tool calling, and OpenAPI spec.