Responses API
Stateful multi-turn conversations with reasoning models, 9 streaming event types, and server-side persistence.
The Responses API (POST /v1/responses) provides a stateful, multi-turn conversation API following the OpenAI Responses schema. It supports reasoning models, tool calling, multi-turn context via previous_response_id, and server-sent event streaming with 9 distinct event types.
Access via client.responses or client.chat.responses.
Non-streaming
Returns a complete ResponsesResponse once the model finishes generating.
Non-streaming
import Skytells from 'skytells';
const client = Skytells(process.env.SKYTELLS_API_KEY);
const response = await client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: 'Explain recursion simply.' }],
instructions: 'You are a helpful tutor.',
});
// Output is an array of output messages
console.log(response.output[0].content[0].text);
console.log(response.usage);
// { input_tokens: 32, output_tokens: 120, total_tokens: 152 }Streaming
Pass stream: true — the method returns AsyncIterable<ResponsesStreamEvent> directly (no extra await).
Streaming calls are not retried if they fail after the stream starts. See Reliability.
Streaming
for await (const event of client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: 'Write a limerick about JavaScript.' }],
stream: true,
})) {
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
if (event.type === 'response.completed') {
console.log('\nDone. Usage:', event.response.usage);
}
}Parameters
| Field | Type | Description |
|---|---|---|
model | string | Model identifier, e.g. "gpt-5.3-codex" |
input | string | ResponsesInputMessage[] | Text prompt or array of role/content messages |
instructions | string | null | System-level instructions |
stream | boolean | true for SSE streaming |
max_output_tokens | number | null | Maximum tokens in the output |
temperature | number | Sampling temperature (0–2) |
top_p | number | Nucleus sampling probability |
tools | ChatCompletionTool[] | Tool/function definitions |
tool_choice | string | object | 'none' | 'auto' | 'required' | { type: 'function', function: { name } } |
parallel_tool_calls | boolean | Allow parallel tool calls |
reasoning | { effort?, summary? } | Reasoning effort and summary verbosity |
store | boolean | Persist the response server-side (for multi-turn) |
previous_response_id | string | null | Chain to a prior response for multi-turn |
metadata | Record<string, unknown> | Arbitrary key/value for labelling |
user | string | null | End-user identifier |
truncation | string | Truncation strategy ('auto', 'disabled') |
frequency_penalty | number | Token frequency penalty |
presence_penalty | number | Token presence penalty |
text | { format?, verbosity? } | Output text formatting options |
Multi-turn with previous_response_id
Instead of sending the full conversation history like the Chat API, you pass the id from the previous response. The server reconstructs context automatically.
store: true must be set when you intend to use a response as the parent of a future call. Only stored responses can be referenced by previous_response_id.
Multi-turn
// Turn 1
const turn1 = await client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: 'My name is Alex. What is a closure in JavaScript?' }],
store: true, // persist for future turns
});
console.log(turn1.id); // "resp_abc123"
console.log(turn1.output[0].content[0].text);
// Turn 2 — no need to repeat history
const turn2 = await client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: 'Can you give me an example using my name?' }],
previous_response_id: turn1.id,
});
console.log(turn2.output[0].content[0].text);
// Model remembers "Alex" from turn 1Reasoning Models
Use the reasoning parameter with models that support extended thinking:
const response = await client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: 'Solve: if 2x + 5 = 17, what is x?' }],
reasoning: {
effort: 'high', // 'none' | 'low' | 'medium' | 'high'
summary: 'detailed', // 'auto' | 'concise' | 'detailed'
},
});Streaming Events
The ResponsesStreamEvent is a discriminated union on type. Nine event types:
type | Description |
|---|---|
response.created | Initial response snapshot (status: 'in_progress') |
response.in_progress | Intermediate state update |
response.completed | Final response with full usage |
response.output_item.added | New output message opened |
response.output_item.done | Output message closed |
response.content_part.added | Content part within a message opened |
response.content_part.done | Content part closed |
response.output_text.delta | Incremental text chunk |
response.output_text.done | Final accumulated text for one content part |
Event Shapes
// response.output_text.delta
{
type: 'response.output_text.delta',
sequence_number: 5,
output_index: 0,
content_index: 0,
item_id: 'msg_abc123',
delta: 'The Pythagorean' // incremental text
}Tool Calling
Define tools the same way as the Chat API. Inspect response.output for function call items.
Tool Calling
const response = await client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: "What's the weather in Berlin?" }],
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Returns current weather',
parameters: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
},
},
],
tool_choice: 'auto',
});
// Inspect output for tool call items
const outputItem = response.output[0];
console.log(outputItem);Content Filtering
The Responses API includes Skytells-specific content_filters on the response:
const response = await client.responses.create({
model: 'gpt-5.3-codex',
input: [{ role: 'user', content: userInput }],
});
if (response.content_filters?.some(f => f.blocked)) {
const blocked = response.content_filters.filter(f => f.blocked);
console.warn('Content blocked:', blocked.map(f => f.source_type));
}For full content filtering utilities, see Safety.
Response Shape
idstring
previous_response_id).object'response'
created_atnumber
statusstring
completed | in_progress | failed.modelstring
outputResponsesOutputMessage[]
ResponsesOutputMessage output messages.usageobject
input_tokens, output_tokens, total_tokens, plus details on cached/reasoning tokens.content_filtersResponsesContentFilter[]
instructionsstring | null
previous_response_idstring | null
reasoningobject
effort, summary settings used.storeboolean
metadataRecord<string, unknown>
For the full TypeScript definitions, see Reference.
// ResponsesResponse shape
{
id: "resp_abc123",
object: "response",
created_at: 1700000000,
status: "completed",
model: "gpt-5.3-codex",
output: [{
id: "msg_001",
type: "message",
role: "assistant",
content: [{
type: "output_text",
text: "Recursion is when a function calls itself..."
}]
}],
usage: {
input_tokens: 32,
output_tokens: 120,
total_tokens: 152,
output_tokens_details: { reasoning_tokens: 15 }
}
}Differences from Chat API
| Feature | Chat | Responses |
|---|---|---|
| Multi-turn | Full history on every call | previous_response_id reference |
| State | Stateless | Persistent with store: true |
| Input type | messages array | input string or array + instructions |
| Streaming events | Single delta stream | 9 typed event types |
| Reasoning control | — | reasoning.effort and summary |
| Server-side identity | — | id on response (chainable) |
Error Handling
import { SkytellsError } from 'skytells';
try {
const response = await client.responses.create({ /* ... */ });
} catch (e) {
if (e instanceof SkytellsError) {
console.error(e.errorId, e.httpStatus, e.message);
}
}For the full error reference, see Errors.
Related
- Responses REST API — Underlying REST endpoint
- Chat API — Stateless chat completions (compare with Differences from Chat API)
- Models — Discover available models and their capabilities
- Model Catalog — Browse all available models
- Safety — Content moderation for responses (no extra API call with
wasFiltered()) - Errors — All error IDs and handling patterns
- Reliability — Timeouts, retries, and streaming reliability
How is this guide?