Edge Rate Limits

The Edge API is designed for low-latency inference and real-time streaming. It operates under separate, more conservative limits to protect the shared edge infrastructure.

Tier ceilings (RPM, concurrent streams) follow the same spend-based tier system as the Standard API — see Account tiers. This page focuses on Edge-specific behavior (streaming, duration). For x-skytells-ratelimit-* headers and 429 handling, see the Rate limits overview.

Edge vs Standard API

	Standard API	Edge API
Optimized for	Batch workloads, async predictions	Low-latency, streaming output
RPM limit	Tier-based — see Account tiers	Tier-based — see Account tiers
Concurrent streams	N/A	Tier-based per account
Max stream duration	N/A	300 seconds
Webhooks supported	Yes	No — use streaming directly

Streaming Limits

Each open stream counts as one concurrent stream slot. Slots are released when the stream closes — either because the prediction completed, was cancelled, or timed out. Exact stream ceilings for your tier are listed on Account tiers.

Headers and errors

Edge responses use the same x-skytells-ratelimit-* family as the Standard API — see Response headers (limits and usage). On 429, follow Retry-After and error.details as in the overview.

Best Practices for Edge

Practice	Why
Close streams as soon as output is complete	Frees concurrent stream slots immediately
Do not open speculative streams	Only open a stream when you intend to consume all output
Use the Standard API for non-real-time workloads	Reserves edge capacity for latency-sensitive use cases
Handle `stream closed` gracefully	The server may close the stream on timeout — always handle `end` events

Leaving idle streams open unnecessarily will exhaust your concurrent stream slots and block other real-time requests. Always close streams explicitly when done.

How is this guide?

Edge Rate Limits

Edge Rate Limits

Edge vs Standard API

Streaming Limits

Headers and errors

Best Practices for Edge

On this page