TypeScript SDK

Chat Completions

OpenAI-compatible chat completions with streaming, tool calling, vision, and structured JSON output.

The Chat API provides OpenAI-compatible chat completions via POST /v1/chat/completions. If you're migrating from OpenAI, Anthropic, or any other provider, the interface is identical — just change the import and API key.

Access it through client.chat.completions. The API is stateless: you pass the full conversation history on every call. For stateful, server-persisted multi-turn conversations, see the Responses API.

Non-streaming

Returns a complete ChatCompletion response once the model finishes generating. This is the simplest way to get a response — the SDK blocks until the model completes.

Non-streaming

Basic
import Skytells from 'skytells';

const client = Skytells(process.env.SKYTELLS_API_KEY);

const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is the capital of France?' },
],
});

console.log(completion.choices[0].message.content); // "Paris"
console.log(completion.usage);
// { prompt_tokens: 28, completion_tokens: 5, total_tokens: 33 }

Streaming

Pass stream: true to get an AsyncIterable<ChatCompletionChunk>. Each chunk contains a delta with partial content. Streaming is ideal for real-time UIs where you want to display tokens as they arrive.

Streaming

Basic
const stream = client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: 'Tell me a short story.' }],
stream: true,
});

let fullText = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? '';
process.stdout.write(delta);
fullText += delta;
}
console.log('\nDone:', fullText);

Parameters

All parameters follow the OpenAI chat completions schema. The SDK passes them through to the API without modification.

FieldTypeRequiredDescription
modelstringModel slug (e.g. "deepbrain-router")
messagesChatCompletionMessageParam[]Conversation history
streambooleantrue for streaming
max_tokensnumberMax completion tokens
temperaturenumberSampling temperature (0–2)
top_pnumberNucleus sampling probability
nnumberNumber of completion choices
stopstring | string[]Stop sequences
presence_penaltynumberPenalise new topics (−2 to 2)
frequency_penaltynumberPenalise repeated tokens (−2 to 2)
logprobsbooleanInclude per-token log probabilities
top_logprobsnumberTop N logprobs per token (requires logprobs: true)
toolsChatCompletionTool[]Function/tool definitions
tool_choiceChatCompletionToolChoiceOptionTool invocation mode
userstringEnd-user identifier
response_formatobject{ type: 'json_object' } or { type: 'text' }

Multi-turn Conversations

The API is stateless — you must pass the full conversation history every time. Append each assistant reply to your messages array before continuing. If you'd prefer the server to manage this for you, see the Responses API which chains conversations via previous_response_id.

Multi-turn

Conversation
const messages: ChatCompletionMessageParam[] = [
{ role: 'system', content: 'You are a Python tutor.' },
{ role: 'user', content: 'What is a list comprehension?' },
];

const first = await client.chat.completions.create({
model: 'deepbrain-router',
messages,
});

// Append assistant reply
messages.push({
role: 'assistant',
content: first.choices[0].message.content,
});

// Continue conversation
messages.push({
role: 'user',
content: 'Show me an example with filtering.',
});

const second = await client.chat.completions.create({
model: 'deepbrain-router',
messages,
});
console.log(second.choices[0].message.content);

Tool Calling

Define tools with a JSON Schema. When finish_reason is "tool_calls", execute the tool and send the result back as a tool message. This enables your model to interact with external systems — APIs, databases, or any function you define.

Tool Calling

Define & Call
const tools = [
{
  type: 'function' as const,
  function: {
    name: 'get_weather',
    description: 'Returns current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string', description: 'City name' },
        unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
      },
      required: ['location'],
    },
  },
},
];

const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [{ role: 'user', content: "What's the weather in Paris?" }],
tools,
tool_choice: 'auto',
});

const choice = completion.choices[0];
if (choice.finish_reason === 'tool_calls') {
const call = choice.message.tool_calls![0];
console.log(call.function.name);
// "get_weather"
console.log(JSON.parse(call.function.arguments));
// { location: "Paris" }
}

Vision / Image Inputs

Pass images as part of a multi-part content array in user messages. Supports URLs and base64. This is supported by vision-capable models — check the model catalog for supported capabilities.

Vision

Image URL
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
  {
    role: 'user',
    content: [
      { type: 'text', text: 'Describe this image in detail.' },
      {
        type: 'image_url',
        image_url: { url: 'https://example.com/photo.jpg' },
      },
    ],
  },
],
});

console.log(completion.choices[0].message.content);

Structured JSON Output

Use response_format: { type: 'json_object' } to guarantee valid JSON in the response content. The model will always return valid JSON that you can safely JSON.parse(). Combine with a clear instruction in your prompt for structured extraction.

JSON Mode

Example
const completion = await client.chat.completions.create({
model: 'deepbrain-router',
messages: [
  {
    role: 'user',
    content: 'List 3 programming languages with their year of creation. Respond in JSON.',
  },
],
response_format: { type: 'json_object' },
});

const languages = JSON.parse(completion.choices[0].message.content!);
console.log(languages);

Finish Reasons

The finish_reason field on each choice tells you why the model stopped generating. Handle each case appropriately:

finish_reasonMeaning
"stop"Normal completion
"length"Truncated at max_tokens
"tool_calls"Model wants to invoke a tool
"content_filter"Content was filtered — see Safety
nullStreaming chunk (not the final chunk)

Content Filtering

At Skytells, every inference response includes content filter metadata. Check whether a completion was filtered using the Safety module — no additional API call required:

const completion = await client.chat.completions.create({
  model: 'deepbrain-router',
  messages: [{ role: 'user', content: userInput }],
});

if (client.safety.wasFiltered(completion)) {
  const categories = client.safety.getFilteredCategories(completion);
  console.warn('Filtered categories:', categories);
}

Response Shapes

ChatCompletion

See ChatCompletion in the Reference for the full type.

idstring
Unique completion ID.
object'chat.completion'
Object type.
creatednumber
Unix timestamp.
modelstring
Model used.
choicesChatCompletionChoice[]
Array of ChatCompletionChoice completion choices.
usageobject
prompt_tokens, completion_tokens, total_tokens.
system_fingerprintstring
Optional system fingerprint.

ChatCompletionChoice

indexnumber
Choice index.
messageChatCompletionMessage
ChatCompletionMessage with role, content, optional tool_calls.
finish_reasonstring
stop | length | tool_calls | content_filter | null.
content_filter_resultsRecord<string, unknown>
Optional filter metadata.

Response Shapes

ChatCompletion
// ChatCompletion shape
{
id: "chatcmpl_abc123",
object: "chat.completion",
created: 1700000000,
model: "deepbrain-router",
choices: [{
  index: 0,
  message: {
    role: "assistant",
    content: "Paris is the capital of France."
  },
  finish_reason: "stop"
}],
usage: {
  prompt_tokens: 28,
  completion_tokens: 8,
  total_tokens: 36
}
}

Error Handling

import { SkytellsError } from 'skytells';

try {
  const completion = await client.chat.completions.create({ /* ... */ });
} catch (e) {
  if (e instanceof SkytellsError) {
    switch (e.errorId) {
      case 'RATE_LIMIT_EXCEEDED':
        // Retry after delay or reduce request rate
        break;
      case 'CONTENT_POLICY_VIOLATION':
        // Prompt or response was flagged
        break;
      case 'REQUEST_TIMEOUT':
        // Increase client timeout or reduce max_tokens
        break;
    }
    console.error(e.errorId, e.httpStatus, e.message);
  }
}

For the full error reference and all error IDs, see Errors. For rate limiting and retry configuration, see Reliability.

  • Chat Completions REST API — Underlying REST endpoint
  • Responses API — Stateful multi-turn conversations with previous_response_id
  • Models — Discover available models and their capabilities
  • Model Catalog — Browse all available models
  • Safety — Content moderation for chat responses (no extra API call)
  • Errors — All error IDs and handling patterns
  • Reliability — Timeouts, retries, and streaming reliability

Best Practices

  • System prompt: Always set a system message to define the assistant's persona and constraints.
  • Token budgets: Set max_tokens to prevent unexpectedly large responses and control cost.
  • Temperature: Use 0 for deterministic outputs (code, structured data); 0.71.0 for creative tasks.
  • Content filtering: Check finish_reason === 'content_filter' when accepting user-supplied prompts. See Safety and Responsible AI.
  • Multi-turn memory: The API is stateless — pass the full conversation history every time. Use Responses for server-managed state.

How is this guide?

On this page