Chat Completions
OpenAI-compatible chat completions with streaming, tool calling, vision, and structured JSON output.
The Chat API provides OpenAI-compatible chat completions via POST /v1/chat/completions. If you're migrating from OpenAI, Anthropic, or any other provider, the interface is identical — just change the import and API key.
Access it through client.chat.completions. The API is stateless: you pass the full conversation history on every call. For stateful, server-persisted multi-turn conversations, see the Responses API.
Chat vs Responses: Use Chat when you want full control over conversation state (e.g., storing in your database). Use Responses when you want the server to manage conversation context via previous_response_id.
Non-streaming
Returns a complete ChatCompletion response once the model finishes generating. This is the simplest way to get a response — the SDK blocks until the model completes.
Non-streaming
import Skytells from 'skytells';
const client = Skytells(process.env.SKYTELLS_API_KEY);
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' },
],
});
console.log(completion.choices[0].message.content); // "Paris"
console.log(completion.usage);
// { prompt_tokens: 28, completion_tokens: 5, total_tokens: 33 }Streaming
Pass stream: true to get an AsyncIterable<ChatCompletionChunk>. Each chunk contains a delta with partial content. Streaming is ideal for real-time UIs where you want to display tokens as they arrive.
Streaming calls are not retried if they fail after the stream starts. See Reliability.
Streaming
const stream = client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: 'Tell me a short story.' }],
stream: true,
});
let fullText = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? '';
process.stdout.write(delta);
fullText += delta;
}
console.log('\nDone:', fullText);Parameters
All parameters follow the OpenAI chat completions schema. The SDK passes them through to the API without modification.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | ✅ | Model slug (e.g. "deepbrain-router") |
messages | ChatCompletionMessageParam[] | ✅ | Conversation history |
stream | boolean | true for streaming | |
max_tokens | number | Max completion tokens | |
temperature | number | Sampling temperature (0–2) | |
top_p | number | Nucleus sampling probability | |
n | number | Number of completion choices | |
stop | string | string[] | Stop sequences | |
presence_penalty | number | Penalise new topics (−2 to 2) | |
frequency_penalty | number | Penalise repeated tokens (−2 to 2) | |
logprobs | boolean | Include per-token log probabilities | |
top_logprobs | number | Top N logprobs per token (requires logprobs: true) | |
tools | ChatCompletionTool[] | Function/tool definitions | |
tool_choice | ChatCompletionToolChoiceOption | Tool invocation mode | |
user | string | End-user identifier | |
response_format | object | { type: 'json_object' } or { type: 'text' } |
Multi-turn Conversations
The API is stateless — you must pass the full conversation history every time.
Append each assistant reply to your messages array before continuing. If you'd prefer the server to manage this for you, see the Responses API which chains conversations via previous_response_id.
Multi-turn
const messages: ChatCompletionMessageParam[] = [
{ role: 'system', content: 'You are a Python tutor.' },
{ role: 'user', content: 'What is a list comprehension?' },
];
const first = await client.chat.completions.create({
model: 'deepbrain-router',
messages,
});
// Append assistant reply
messages.push({
role: 'assistant',
content: first.choices[0].message.content,
});
// Continue conversation
messages.push({
role: 'user',
content: 'Show me an example with filtering.',
});
const second = await client.chat.completions.create({
model: 'deepbrain-router',
messages,
});
console.log(second.choices[0].message.content);Tool Calling
Define tools with a JSON Schema. When finish_reason is "tool_calls", execute the tool and send the result back as a tool message. This enables your model to interact with external systems — APIs, databases, or any function you define.
Always validate and sanitise data extracted from tool arguments before using it in queries or file operations.
Tool Calling
const tools = [
{
type: 'function' as const,
function: {
name: 'get_weather',
description: 'Returns current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' },
unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
},
},
];
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: "What's the weather in Paris?" }],
tools,
tool_choice: 'auto',
});
const choice = completion.choices[0];
if (choice.finish_reason === 'tool_calls') {
const call = choice.message.tool_calls![0];
console.log(call.function.name);
// "get_weather"
console.log(JSON.parse(call.function.arguments));
// { location: "Paris" }
}Vision / Image Inputs
Pass images as part of a multi-part content array in user messages. Supports URLs and base64. This is supported by vision-capable models — check the model catalog for supported capabilities.
Vision
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image in detail.' },
{
type: 'image_url',
image_url: { url: 'https://example.com/photo.jpg' },
},
],
},
],
});
console.log(completion.choices[0].message.content);Structured JSON Output
Use response_format: { type: 'json_object' } to guarantee valid JSON in the response content. The model will always return valid JSON that you can safely JSON.parse(). Combine with a clear instruction in your prompt for structured extraction.
JSON Mode
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
{
role: 'user',
content: 'List 3 programming languages with their year of creation. Respond in JSON.',
},
],
response_format: { type: 'json_object' },
});
const languages = JSON.parse(completion.choices[0].message.content!);
console.log(languages);Finish Reasons
The finish_reason field on each choice tells you why the model stopped generating. Handle each case appropriately:
finish_reason | Meaning |
|---|---|
"stop" | Normal completion |
"length" | Truncated at max_tokens |
"tool_calls" | Model wants to invoke a tool |
"content_filter" | Content was filtered — see Safety |
null | Streaming chunk (not the final chunk) |
Content Filtering
At Skytells, every inference response includes content filter metadata. Check whether a completion was filtered using the Safety module — no additional API call required:
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: userInput }],
});
if (client.safety.wasFiltered(completion)) {
const categories = client.safety.getFilteredCategories(completion);
console.warn('Filtered categories:', categories);
}Response Shapes
ChatCompletion
See ChatCompletion in the Reference for the full type.
idstring
object'chat.completion'
creatednumber
modelstring
choicesChatCompletionChoice[]
ChatCompletionChoice completion choices.usageobject
prompt_tokens, completion_tokens, total_tokens.system_fingerprintstring
ChatCompletionChoice
indexnumber
messageChatCompletionMessage
finish_reasonstring
stop | length | tool_calls | content_filter | null.content_filter_resultsRecord<string, unknown>
Response Shapes
// ChatCompletion shape
{
id: "chatcmpl_abc123",
object: "chat.completion",
created: 1700000000,
model: "deepbrain-router",
choices: [{
index: 0,
message: {
role: "assistant",
content: "Paris is the capital of France."
},
finish_reason: "stop"
}],
usage: {
prompt_tokens: 28,
completion_tokens: 8,
total_tokens: 36
}
}Error Handling
import { SkytellsError } from 'skytells';
try {
const completion = await client.chat.completions.create({ /* ... */ });
} catch (e) {
if (e instanceof SkytellsError) {
switch (e.errorId) {
case 'RATE_LIMIT_EXCEEDED':
// Retry after delay or reduce request rate
break;
case 'CONTENT_POLICY_VIOLATION':
// Prompt or response was flagged
break;
case 'REQUEST_TIMEOUT':
// Increase client timeout or reduce max_tokens
break;
}
console.error(e.errorId, e.httpStatus, e.message);
}
}For the full error reference and all error IDs, see Errors. For rate limiting and retry configuration, see Reliability.
Related
- Chat Completions REST API — Underlying REST endpoint
- Responses API — Stateful multi-turn conversations with
previous_response_id - Models — Discover available models and their capabilities
- Model Catalog — Browse all available models
- Safety — Content moderation for chat responses (no extra API call)
- Errors — All error IDs and handling patterns
- Reliability — Timeouts, retries, and streaming reliability
Best Practices
- System prompt: Always set a
systemmessage to define the assistant's persona and constraints. - Token budgets: Set
max_tokensto prevent unexpectedly large responses and control cost. - Temperature: Use
0for deterministic outputs (code, structured data);0.7–1.0for creative tasks. - Content filtering: Check
finish_reason === 'content_filter'when accepting user-supplied prompts. See Safety and Responsible AI. - Multi-turn memory: The API is stateless — pass the full conversation history every time. Use Responses for server-managed state.
How is this guide?