Enterprise Deployments
Unlike standard endpoints, Enterprise customers get fully dedicated endpoints secured and managed by Skytells—private via Skytells Private Network or internet per org settings. Contact Sales for a custom deployment or endpoint.
What this is
Most customers use the shared Skytells API: one public surface, shared capacity.
Enterprise inference means Skytells runs your models—or models we set up for you—on infrastructure reserved for your organization. You still use the normal Inference APIs (Chat, Responses, Embeddings). The difference is where the request runs and who else shares it: your traffic goes to your environment, not the general pool.
For CPU and GPU tiers, billing, and hardware IDs used with dedicated inference, see Enterprise compute.
Standard vs Enterprise
Unlike the standard inference endpoints—the shared surface used by most customers—Enterprise customers receive fully dedicated endpoints: inference URLs and routing reserved for your organization, not shared with other tenants.
Those dedicated endpoints are secured and managed by Skytells. Skytells operates the stack, applies platform security controls, and keeps the service current; you integrate against the endpoint, not raw infrastructure.
Network access is configured per user or organization settings:
- Private — The endpoint is reachable only on Skytells Private Network. Traffic does not traverse the public internet, which supports stricter isolation and compliance requirements.
- Internet — The same dedicated endpoint can be exposed on the public internet when your policy and contract allow it, so client applications can call it from outside Skytells’s private network.
How private connectivity differs from the internet
Across major cloud private link designs (for example AWS PrivateLink, Azure Private Link), the pattern is the same in spirit: a private endpoint appears in your network as a private IP (an interface your workloads call). Traffic to the service rides the provider’s private network—not the public internet. At the provider edge, flows are often source-mapped through a NAT or managed address pool so the backend service sees stable, provider-scoped addresses while your internal IPs stay inside your network. Skytells Private Network follows this idea: no public internet hop for the private path; routing and address mapping are operated by Skytells on your behalf.
Private path — from your app to inference (conceptual):
Internet path — same dedicated inference surface, reachable over the public internet when your org enables it:
| Shared (default) | Enterprise | |
|---|---|---|
| Endpoint model | Shared inference surface | Dedicated endpoints for your org |
| Operations | Skytells-managed shared pool | Secured and managed by Skytells — dedicated to you |
| Network | Public internet | Skytells Private Network only, internet, or both — per org / user settings |
| Base URL | Standard api.skytells.ai surface | Dedicated base URL for your deployment |
Models and API surface
Your dedicated base URL exposes the same Inference API as the shared platform—Chat, Responses, and Embeddings—with the same paths, schemas, streaming behavior, and error model; only the host (and thus base_url in clients) changes. For operations, references, and SDK usage, see Inference and Authentication. Runtime hardware is covered under Enterprise compute.
In addition to the supported models in our catalog that can be deployed to run on Enterprise for your organization, an enterprise or business may request a custom deployment—bring-your-own weights, fine-tunes, or other Skytells-provisioned stacks—scoped to your account. What appears on your endpoint is reflected in the Models API for keys that are authorized to use it, and is agreed in your contract.
Request a deployment or endpoint
Custom deployments and dedicated endpoints are not available through self-serve signup. To request a custom model deployment or a dedicated endpoint, contact Sales. The team will walk through your model, networking (private vs internet), hardware, and compliance requirements.
Related
- Enterprise compute — hardware tiers, GPU options, and billing context for dedicated inference
- Inference API — Chat, Responses, and Embeddings overview
- Models API — which models your key can see
How is this guide?
About Inference API
Skytells Inference API — the gateway for running LLMs, text, code, reasoning, and embedding workloads. Sub-APIs (Chat, Responses, Embeddings) operate under this umbrella.
Compute
Skytells CPU and GPU hardware tiers for dedicated Enterprise inference—per-second billing, standard and multi-GPU options. See Enterprise inference for endpoints and networking.