Chat Completions

OpenAI-compatible chat completions with streaming, tool calling, vision, and structured JSON output.

The Chat API provides OpenAI-compatible chat completions via POST /v1/chat/completions. If you're migrating from OpenAI, Anthropic, or any other provider, the interface is identical — just change the import and API key.

Access it through client.chat.completions. The API is stateless: you pass the full conversation history on every call. For stateful, server-persisted multi-turn conversations, see the Responses API.

Chat vs Responses: Use Chat when you want full control over conversation state (e.g., storing in your database). Use Responses when you want the server to manage conversation context via previous_response_id.

Non-streaming

Returns a complete ChatCompletion response once the model finishes generating. This is the simplest way to get a response — the SDK blocks until the model completes.

Non-streaming

Basic

import Skytells from 'skytells';

const client = Skytells(process.env.SKYTELLS_API_KEY);

const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is the capital of France?' },
],
});

console.log(completion.choices[0].message.content); // "Paris"
console.log(completion.usage);
// { prompt_tokens: 28, completion_tokens: 5, total_tokens: 33 }

Streaming

Pass stream: true to get an AsyncIterable<ChatCompletionChunk>. Each chunk contains a delta with partial content. Streaming is ideal for real-time UIs where you want to display tokens as they arrive.

Streaming calls are not retried if they fail after the stream starts. See Reliability.

Streaming

Basic

const stream = client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: 'Tell me a short story.' }],
stream: true,
});

let fullText = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? '';
process.stdout.write(delta);
fullText += delta;
}
console.log('\nDone:', fullText);

Parameters

All parameters follow the OpenAI chat completions schema. The SDK passes them through to the API without modification.

Field	Type	Required	Description
`model`	`string`	✅	Model slug (e.g. `"deepbrain-router"`)
`messages`	`ChatCompletionMessageParam[]`	✅	Conversation history
`stream`	`boolean`		`true` for streaming
`max_tokens`	`number`		Max completion tokens
`temperature`	`number`		Sampling temperature (0–2)
`top_p`	`number`		Nucleus sampling probability
`n`	`number`		Number of completion choices
`stop`	`string \| string[]`		Stop sequences
`presence_penalty`	`number`		Penalise new topics (−2 to 2)
`frequency_penalty`	`number`		Penalise repeated tokens (−2 to 2)
`logprobs`	`boolean`		Include per-token log probabilities
`top_logprobs`	`number`		Top N logprobs per token (requires `logprobs: true`)
`tools`	`ChatCompletionTool[]`		Function/tool definitions
`tool_choice`	`ChatCompletionToolChoiceOption`		Tool invocation mode
`user`	`string`		End-user identifier
`response_format`	`object`		`{ type: 'json_object' }` or `{ type: 'text' }`

Multi-turn Conversations

The API is stateless — you must pass the full conversation history every time. Append each assistant reply to your messages array before continuing. If you'd prefer the server to manage this for you, see the Responses API which chains conversations via previous_response_id.

Multi-turn

Conversation

const messages: ChatCompletionMessageParam[] = [
{ role: 'system', content: 'You are a Python tutor.' },
{ role: 'user', content: 'What is a list comprehension?' },
];

const first = await client.chat.completions.create({
model: 'deepbrain-router',
messages,
});

// Append assistant reply
messages.push({
role: 'assistant',
content: first.choices[0].message.content,
});

// Continue conversation
messages.push({
role: 'user',
content: 'Show me an example with filtering.',
});

const second = await client.chat.completions.create({
model: 'deepbrain-router',
messages,
});
console.log(second.choices[0].message.content);

Tool Calling

Define tools with a JSON Schema. When finish_reason is "tool_calls", execute the tool and send the result back as a tool message. This enables your model to interact with external systems — APIs, databases, or any function you define.

Always validate and sanitise data extracted from tool arguments before using it in queries or file operations.

Tool Calling

Define & Call

const tools = [
{
  type: 'function' as const,
  function: {
    name: 'get_weather',
    description: 'Returns current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string', description: 'City name' },
        unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
      },
      required: ['location'],
    },
  },
},
];

const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: "What's the weather in Paris?" }],
tools,
tool_choice: 'auto',
});

const choice = completion.choices[0];
if (choice.finish_reason === 'tool_calls') {
const call = choice.message.tool_calls![0];
console.log(call.function.name);
// "get_weather"
console.log(JSON.parse(call.function.arguments));
// { location: "Paris" }
}

Vision / Image Inputs

Pass images as part of a multi-part content array in user messages. Supports URLs and base64. This is supported by vision-capable models — check the model catalog for supported capabilities.

Vision

Image URL

const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
  {
    role: 'user',
    content: [
      { type: 'text', text: 'Describe this image in detail.' },
      {
        type: 'image_url',
        image_url: { url: 'https://example.com/photo.jpg' },
      },
    ],
  },
],
});

console.log(completion.choices[0].message.content);

Structured JSON Output

Use response_format: { type: 'json_object' } to guarantee valid JSON in the response content. The model will always return valid JSON that you can safely JSON.parse(). Combine with a clear instruction in your prompt for structured extraction.

JSON Mode

Example

const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
  {
    role: 'user',
    content: 'List 3 programming languages with their year of creation. Respond in JSON.',
  },
],
response_format: { type: 'json_object' },
});

const languages = JSON.parse(completion.choices[0].message.content!);
console.log(languages);

Finish Reasons

The finish_reason field on each choice tells you why the model stopped generating. Handle each case appropriately:

`finish_reason`	Meaning
`"stop"`	Normal completion
`"length"`	Truncated at `max_tokens`
`"tool_calls"`	Model wants to invoke a tool
`"content_filter"`	Content was filtered — see Safety
`null`	Streaming chunk (not the final chunk)

Content Filtering

At Skytells, every inference response includes content filter metadata. Check whether a completion was filtered using the Safety module — no additional API call required:

const completion = await client.chat.completions.create({
  model: 'deepbrain-router',
  messages: [{ role: 'user', content: userInput }],
});

if (client.safety.wasFiltered(completion)) {
  const categories = client.safety.getFilteredCategories(completion);
  console.warn('Filtered categories:', categories);
}

Response Shapes

ChatCompletion

See ChatCompletion in the Reference for the full type.

idstring

Unique completion ID.

object'chat.completion'

Object type.

creatednumber

Unix timestamp.

modelstring

Model used.

choicesChatCompletionChoice[]

Array of ChatCompletionChoice completion choices.

usageobject

prompt_tokens, completion_tokens, total_tokens.

system_fingerprintstring

Optional system fingerprint.

ChatCompletionChoice

indexnumber

Choice index.

messageChatCompletionMessage

ChatCompletionMessage with role, content, optional tool_calls.

finish_reasonstring

stop | length | tool_calls | content_filter | null.

content_filter_resultsRecord<string, unknown>

Optional filter metadata.

Response Shapes

ChatCompletion

// ChatCompletion shape
{
id: "chatcmpl_abc123",
object: "chat.completion",
created: 1700000000,
model: "deepbrain-router",
choices: [{
  index: 0,
  message: {
    role: "assistant",
    content: "Paris is the capital of France."
  },
  finish_reason: "stop"
}],
usage: {
  prompt_tokens: 28,
  completion_tokens: 8,
  total_tokens: 36
}
}

Error Handling

import { SkytellsError } from 'skytells';

try {
  const completion = await client.chat.completions.create({ /* ... */ });
} catch (e) {
  if (e instanceof SkytellsError) {
    switch (e.errorId) {
      case 'RATE_LIMIT_EXCEEDED':
        // Retry after delay or reduce request rate
        break;
      case 'CONTENT_POLICY_VIOLATION':
        // Prompt or response was flagged
        break;
      case 'REQUEST_TIMEOUT':
        // Increase client timeout or reduce max_tokens
        break;
    }
    console.error(e.errorId, e.httpStatus, e.message);
  }
}

For the full error reference and all error IDs, see Errors. For rate limiting and retry configuration, see Reliability.

Chat Completions REST API — Underlying REST endpoint
Responses API — Stateful multi-turn conversations with previous_response_id
Models — Discover available models and their capabilities
Model Catalog — Browse all available models
Safety — Content moderation for chat responses (no extra API call)
Errors — All error IDs and handling patterns
Reliability — Timeouts, retries, and streaming reliability

Best Practices

System prompt: Always set a system message to define the assistant's persona and constraints.
Token budgets: Set max_tokens to prevent unexpectedly large responses and control cost.
Temperature: Use 0 for deterministic outputs (code, structured data); 0.7–1.0 for creative tasks.
Content filtering: Check finish_reason === 'content_filter' when accepting user-supplied prompts. See Safety and Responsible AI.
Multi-turn memory: The API is stateless — pass the full conversation history every time. Use Responses for server-managed state.

How is this guide?

Non-streaming

Streaming

Multi-turn

Tool Calling

Vision

JSON Mode

Response Shapes

On this page