Embeddings API

The Embeddings API is part of the Inference APIs. It converts text into dense vector representations (float arrays) that capture semantic meaning. Use embeddings for semantic search, nearest-neighbor retrieval, clustering, classification, and retrieval-augmented generation (RAG). Pass a single string or a batch of strings; the API returns one vector per input, synchronously in a single response.

It follows the same request and response shape as the OpenAI Embeddings API, so you can point the OpenAI SDK at Skytells with a baseURL override. Pair embeddings with the Chat API or Responses API when you need to generate answers from retrieved context. For safety features on generative APIs, see Safety and Responsible AI.

Endpoint: POST /v1/embeddings
SDK access: client.embeddings.create(params)
Input: one string or string[] — multiple inputs are embedded in one request (batching)
OpenAI-compatible: yes — same schema; use baseURL: 'https://api.skytells.ai/v1' with the OpenAI client

How it works

You send a model identifier and an input (string or list of strings). The service returns an EmbeddingResponse whose data array contains one Embedding per input, each with a vector of floats. You can request "encoding_format": "float" (default) or "base64" for compact transfer, and use dimensions on supported models to truncate vectors. Compare vectors with cosine similarity or store them in a vector database for search.

Embeddings turn text into coordinates in a semantic space: texts with similar meaning tend to land closer together, which powers search and retrieval without keyword matching alone.

When to use Embeddings API vs Chat or Responses API

Use Case	Recommendation
Semantic search, duplicate detection, clustering	Embeddings API — index and compare vectors
RAG: find documents, then generate an answer	Embeddings for retrieval + Chat API or Responses API for generation
Conversation, Q&A, tool use	Chat API or Responses API — not embeddings alone
Stateless chat with OpenAI SDKs	Chat API — embeddings are for vectors, not dialogue

Create embeddings

Skytells SDK

import Skytells from 'skytells';

const client = Skytells(process.env.SKYTELLS_API_KEY);

const result = await client.embeddings.create({
model: 'skytells-embed-3-large',
input: 'The quick brown fox jumps over the lazy dog.',
});

const vector = result.data[0].embedding;
// Float32Array of 3072 dimensions
console.log(vector.length);   // 3072
console.log(vector[0]);       // e.g. 0.0023064255

// For semantic similarity: compute cosine similarity between two vectors
function cosineSimilarity(a: number[], b: number[]) {
const dot = a.reduce((sum, v, i) => sum + v * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, v) => sum + v * v, 0));
const magB = Math.sqrt(b.reduce((sum, v) => sum + v * v, 0));
return dot / (magA * magB);
}

Returns an EmbeddingResponse containing a list of Embedding items and EmbeddingUsage token counts.

Embeddings API FAQs

Which models can I use with the Embeddings API?

The Embeddings API supports Skytells embedding models (for example skytells-embed-3-large) and partner models where offered. See the Model Catalog for dimensions, limits, and availability. Set the model parameter on each request.

Can I embed multiple texts in one request?

Yes. Pass input as an array of strings to embed several texts in a single call. The response data array aligns with your inputs by index.

What is the difference between `float` and `base64` encoding?

"encoding_format": "float" (default) returns each embedding as a JSON array of numbers. "base64" returns the same bytes encoded as Base64, which can reduce payload size for large batches. Decode Base64 on the client to recover the float buffer if needed.

Is the Embeddings API compatible with the OpenAI Embeddings API?

Yes. The endpoint and fields match OpenAI’s embeddings shape. Use the OpenAI SDK with baseURL set to https://api.skytells.ai/v1 and your Skytells API key. Behavior and model IDs follow Skytells’ catalog.

Which Skytells API handles requests to the Embeddings API?

The Embeddings API is part of the Skytells Inference APIs, alongside the Chat API, Responses API, and related inference endpoints.

How do I debug or monitor Embeddings API usage?

Responses include usage with token counts you can log. Use your application logs for per-request debugging and the Skytells dashboard for aggregate usage and model breakdowns. For field-level detail, see Embedding Objects.

How is this guide?

Embeddings API

Embeddings API

How it works

When to use Embeddings API vs Chat or Responses API

Create Embeddings

Embedding Objects

Quick Example

Create embeddings

Embeddings API FAQs

Which models can I use with the Embeddings API?

Can I embed multiple texts in one request?

What is the difference between `float` and `base64` encoding?

Is the Embeddings API compatible with the OpenAI Embeddings API?

Which Skytells API handles requests to the Embeddings API?

How do I debug or monitor Embeddings API usage?

On this page