Edge Rate Limits
Rate limits for the Skytells Edge API and streaming endpoints.
Edge Rate Limits
The Edge API is designed for low-latency inference and real-time streaming. It operates under separate, more conservative limits to protect the shared edge infrastructure.
Tier ceilings (RPM, concurrent streams) follow the same spend-based tier system as the Standard API — see Account tiers. This page focuses on Edge-specific behavior (streaming, duration). For x-skytells-ratelimit-* headers and 429 handling, see the Rate limits overview.
Edge vs Standard API
| Standard API | Edge API | |
|---|---|---|
| Optimized for | Batch workloads, async predictions | Low-latency, streaming output |
| RPM limit | Tier-based — see Account tiers | Tier-based — see Account tiers |
| Concurrent streams | N/A | Tier-based per account |
| Max stream duration | N/A | 300 seconds |
| Webhooks supported | Yes | No — use streaming directly |
Streaming Limits
Each open stream counts as one concurrent stream slot. Slots are released when the stream closes — either because the prediction completed, was cancelled, or timed out. Exact stream ceilings for your tier are listed on Account tiers.
Headers and errors
Edge responses use the same x-skytells-ratelimit-* family as the Standard API — see Response headers (limits and usage). On 429, follow Retry-After and error.details as in the overview.
Best Practices for Edge
| Practice | Why |
|---|---|
| Close streams as soon as output is complete | Frees concurrent stream slots immediately |
| Do not open speculative streams | Only open a stream when you intend to consume all output |
| Use the Standard API for non-real-time workloads | Reserves edge capacity for latency-sensitive use cases |
Handle stream closed gracefully | The server may close the stream on timeout — always handle end events |
Leaving idle streams open unnecessarily will exhaust your concurrent stream slots and block other real-time requests. Always close streams explicitly when done.
How is this guide?