Rate Limits

Edge Rate Limits

Rate limits for the Skytells Edge API and streaming endpoints.

Edge Rate Limits

The Edge API is designed for low-latency inference and real-time streaming. It operates under separate, more conservative limits to protect the shared edge infrastructure.

Tier ceilings (RPM, concurrent streams) follow the same spend-based tier system as the Standard API — see Account tiers. This page focuses on Edge-specific behavior (streaming, duration). For x-skytells-ratelimit-* headers and 429 handling, see the Rate limits overview.


Edge vs Standard API

Standard APIEdge API
Optimized forBatch workloads, async predictionsLow-latency, streaming output
RPM limitTier-based — see Account tiersTier-based — see Account tiers
Concurrent streamsN/ATier-based per account
Max stream durationN/A300 seconds
Webhooks supportedYesNo — use streaming directly

Streaming Limits

Yes No Yes No Connect to stream Concurrent streams ≤ tier limit? Stream opened 429 rate_limit_exceeded Duration ≤ 300s? Data flowing Stream closed by server

Each open stream counts as one concurrent stream slot. Slots are released when the stream closes — either because the prediction completed, was cancelled, or timed out. Exact stream ceilings for your tier are listed on Account tiers.


Headers and errors

Edge responses use the same x-skytells-ratelimit-* family as the Standard API — see Response headers (limits and usage). On 429, follow Retry-After and error.details as in the overview.


Best Practices for Edge

PracticeWhy
Close streams as soon as output is completeFrees concurrent stream slots immediately
Do not open speculative streamsOnly open a stream when you intend to consume all output
Use the Standard API for non-real-time workloadsReserves edge capacity for latency-sensitive use cases
Handle stream closed gracefullyThe server may close the stream on timeout — always handle end events

How is this guide?

On this page