Edge Rate Limits
Rate limits for the Skytells Edge API and streaming endpoints.
Edge Rate Limits
The Edge API (edge.skytells.ai) is designed for low-latency inference and real-time streaming. It operates under separate, more conservative limits to protect the shared edge infrastructure.
Edge vs Standard API
Standard API (api.skytells.ai) | Edge API (edge.skytells.ai) | |
|---|---|---|
| Optimized for | Batch workloads, async predictions | Low-latency, streaming output |
| RPM limit | Tier-based (25–150+) | Tier-based (see below) |
| Concurrent streams | N/A | Tier-based per account |
| Max stream duration | N/A | 300 seconds |
| Webhooks supported | Yes | No — use streaming directly |
Streaming Limits
Each open stream counts as one concurrent stream slot. Slots are released when the stream closes — either because the prediction completed, was cancelled, or timed out.
Limits by Account Tier
Edge API limits follow the same spend-based tier system as the Standard API, but with more conservative RPM and stream ceilings to protect shared edge infrastructure.
| Tier | Monthly Spend | Edge RPM | Concurrent Streams |
|---|---|---|---|
| Tier 1 | $0 – $100 | 10 | 2 |
| Tier 2 | $100 – $500 | 25 | 5 |
| Tier 3 | $500 – $2,000 | 75 | 10 |
| Tier 4 | $2,000+ | Higher | Higher |
| Enterprise | Per contract | Custom | Custom |
Tier upgrades are applied automatically as your monthly spend crosses each threshold. Enterprise limits are negotiated per contract — contact Skytells Support for details.
Edge Rate Limit Headers
Every Edge API response includes the same rate limit headers as the Standard API, plus stream-specific headers:
X-RateLimit-Limit-RPM: 30
X-RateLimit-Remaining-RPM: 22
X-RateLimit-Limit-Streams: 3
X-RateLimit-Remaining-Streams: 2
X-RateLimit-Reset: 1741910220Best Practices for Edge
| Practice | Why |
|---|---|
| Close streams as soon as output is complete | Frees concurrent stream slots immediately |
| Do not open speculative streams | Only open a stream when you intend to consume all output |
| Use the Standard API for non-real-time workloads | Reserves edge capacity for latency-sensitive use cases |
Handle stream closed gracefully | The server may close the stream on timeout — always handle end events |
Leaving idle streams open unnecessarily will exhaust your concurrent stream slots and block other real-time requests. Always close streams explicitly when done.
How is this guide?