Embeddings

Embeddings API

Reference overview for the Embeddings sub-API — POST /v1/embeddings.

Embeddings API

The Embeddings API is part of the Inference APIs. It converts text into dense vector representations (float arrays) that capture semantic meaning. Use embeddings for semantic search, nearest-neighbor retrieval, clustering, classification, and retrieval-augmented generation (RAG). Pass a single string or a batch of strings; the API returns one vector per input, synchronously in a single response.

It follows the same request and response shape as the OpenAI Embeddings API, so you can point the OpenAI SDK at Skytells with a baseURL override. Pair embeddings with the Chat API or Responses API when you need to generate answers from retrieved context. For safety features on generative APIs, see Safety and Responsible AI.

  • Endpoint: POST /v1/embeddings
  • SDK access: client.embeddings.create(params)
  • Input: one string or string[] — multiple inputs are embedded in one request (batching)
  • OpenAI-compatible: yes — same schema; use baseURL: 'https://api.skytells.ai/v1' with the OpenAI client

How it works

You send a model identifier and an input (string or list of strings). The service returns an EmbeddingResponse whose data array contains one Embedding per input, each with a vector of floats. You can request "encoding_format": "float" (default) or "base64" for compact transfer, and use dimensions on supported models to truncate vectors. Compare vectors with cosine similarity or store them in a vector database for search.

When to use Embeddings API vs Chat or Responses API

Use CaseRecommendation
Semantic search, duplicate detection, clusteringEmbeddings API — index and compare vectors
RAG: find documents, then generate an answerEmbeddings for retrieval + Chat API or Responses API for generation
Conversation, Q&A, tool useChat API or Responses API — not embeddings alone
Stateless chat with OpenAI SDKsChat API — embeddings are for vectors, not dialogue

Quick Example

Create embeddings

Skytells SDK
import Skytells from 'skytells';

const client = Skytells(process.env.SKYTELLS_API_KEY);

const result = await client.embeddings.create({
model: 'skytells-embed-3-large',
input: 'The quick brown fox jumps over the lazy dog.',
});

const vector = result.data[0].embedding;
// Float32Array of 3072 dimensions
console.log(vector.length);   // 3072
console.log(vector[0]);       // e.g. 0.0023064255

// For semantic similarity: compute cosine similarity between two vectors
function cosineSimilarity(a: number[], b: number[]) {
const dot = a.reduce((sum, v, i) => sum + v * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, v) => sum + v * v, 0));
const magB = Math.sqrt(b.reduce((sum, v) => sum + v * v, 0));
return dot / (magA * magB);
}

Returns an EmbeddingResponse containing a list of Embedding items and EmbeddingUsage token counts.


Embeddings API FAQs

Which models can I use with the Embeddings API?

The Embeddings API supports Skytells embedding models (for example skytells-embed-3-large) and partner models where offered. See the Model Catalog for dimensions, limits, and availability. Set the model parameter on each request.

Can I embed multiple texts in one request?

Yes. Pass input as an array of strings to embed several texts in a single call. The response data array aligns with your inputs by index.

What is the difference between float and base64 encoding?

"encoding_format": "float" (default) returns each embedding as a JSON array of numbers. "base64" returns the same bytes encoded as Base64, which can reduce payload size for large batches. Decode Base64 on the client to recover the float buffer if needed.

Is the Embeddings API compatible with the OpenAI Embeddings API?

Yes. The endpoint and fields match OpenAI’s embeddings shape. Use the OpenAI SDK with baseURL set to https://api.skytells.ai/v1 and your Skytells API key. Behavior and model IDs follow Skytells’ catalog.

Which Skytells API handles requests to the Embeddings API?

The Embeddings API is part of the Skytells Inference APIs, alongside the Chat API, Responses API, and related inference endpoints.

How do I debug or monitor Embeddings API usage?

Responses include usage with token counts you can log. Use your application logs for per-request debugging and the Skytells dashboard for aggregate usage and model breakdowns. For field-level detail, see Embedding Objects.

How is this guide?

On this page