GPU Infrastructure

Skytells provides on-demand and reserved GPU compute across a global network — from individual H100 and A100 nodes to distributed multi-GPU configurations spanning multiple regions. Purpose-built for AI training, inference, and large-scale parallel workloads.

Skytells GPU Infrastructure gives you access to high-performance accelerated compute — provisioned on demand, distributed globally, and managed from the same Console as the rest of your infrastructure.

GPU instances are first-class virtual machines on the Skytells infrastructure layer. They follow the same networking, VPC, and firewall model as CPU instances but are provisioned on hardware with dedicated GPU allocation — no shared GPU time, no fractional access. The GPU is yours for the lifetime of the instance.

The full product overview is at skytells.ai/infrastructure.

Available GPU hardware

Skytells operates a global network of accelerated nodes across GPU hardware generations:

GPU	Architecture	VRAM	Best for
NVIDIA H100	Hopper	80 GB HBM3	Large model training, LLM fine-tuning, high-throughput inference
NVIDIA A100	Ampere	40 GB / 80 GB HBM2e	Training, mixed-precision inference, scientific simulation
NVIDIA A10	Ampere	24 GB GDDR6	Inference at scale, image generation, video processing
NVIDIA L40S	Ada Lovelace	48 GB GDDR6	Production inference, diffusion models, real-time generation

Hardware availability varies by region. The instance creation form shows which GPU types are available in each region at the time of provisioning.

GPU node availability reflects real inventory. If a specific GPU type is not listed for your preferred region, select an adjacent region or check back — inventory is continuously expanded across the Skytells global network.

Global distribution

Skytells GPU nodes are distributed across its global network. This means:

Low-latency inference — deploy GPU instances in the region closest to your users or your downstream systems. A model running in a regional node serves requests with minimal round-trip time.
Training fault tolerance — distribute training workloads across nodes in different regions to reduce exposure to any single point of hardware contention.
Multi-region pipelines — a generation workload in one region can feed output to a serving instance in another over Skytells' network backbone, without traversing the public internet when both instances are in the same VPC or connected via private routing.

To provision in a specific region, select it during instance creation. GPU hardware options are filtered to what is available in that region.

Why dedicated GPU instances

GPU resources on shared infrastructure are unpredictable. Shared GPU pools, fractional allocations, and burstable GPU access are suitable for exploration — not for production model workloads where latency, throughput, and memory headroom directly affect user experience.

Skytells GPU instances are fully dedicated:

The entire GPU (or set of GPUs on multi-GPU nodes) is allocated exclusively to your instance.
VRAM is not shared with other tenants.
No noisy-neighbour effects on compute performance.
You control the software stack: CUDA version, driver version, frameworks, libraries.

Deploying a GPU instance

GPU instances are deployed through the same workflow as standard instances. The GPU is selected during the size and hardware selection step.

Open the Skytells Console and navigate to Infrastructure → All Instances.
Click New Instance.
Select a region. Regions with GPU availability will show accelerated hardware options.
In the size selection, choose a GPU instance type (H100, A100, A10, L40S, or other available options).
Select the number of GPUs if multi-GPU nodes are available in that region.
Select or create a VPC for private network placement.
Attach a firewall group to control inbound access (e.g., expose only the inference API port, block all else).
Confirm provisioning. The GPU instance reaches Running status in seconds.

GPU instances cost more than CPU instances. Confirm the GPU type and count before provisioning. To avoid unnecessary spend, stop or decommission instances when workloads are not running.

Networking GPU instances

GPU instances participate in VPCs and firewall groups identically to CPU instances:

Private GPU-to-GPU communication — place multiple GPU instances in the same VPC. They communicate over private IP, which is essential for distributed training where GPUs exchange gradients at high bandwidth.
Inference API exposure — attach a firewall group that opens only your inference server's port (e.g., 8080 or 443) to the internet. Leave all other ports closed.
Worker-only nodes — GPU training workers that do not serve public traffic should have no public inbound rules at all. All orchestration traffic flows over the private VPC.

Common workload patterns

LLM fine-tuning

Run supervised fine-tuning or RLHF on H100 or A100 nodes. Provision one or more GPU instances inside a VPC, write your training data to object storage, and run your training job directly on the machine.

Model inference serving

Deploy a GPU instance in the region closest to your users. Serve your model via a framework like vLLM, TGI, or Triton. Attach a firewall group that exposes only the inference port. Use a Skytells domain or your own DNS to front the instance.

Distributed training

Provision multiple GPU instances in the same VPC. Use the private IP addresses for inter-node communication (NCCL, GLOO). No public routing required for the training communication plane.

Global inference network

Deploy inference replicas in multiple regions. Route users to the nearest replica via DNS-based geo-routing or a load balancer. Each regional instance operates independently on dedicated GPU hardware.

Real-time generation pipelines

Run image, video, or audio generation models on A10 or L40S nodes optimized for throughput. Chain generation nodes over a VPC for pipeline stages (prompt → image → upscale → delivery).

GPU instances and Eve

Eve executes agentic tasks — including AI generation — by invoking Skytells model infrastructure. When Eve runs a prediction (image generation, music composition, video creation), the underlying compute is Skytells GPU infrastructure running the relevant model. You do not need to provision a GPU instance to use Eve; Eve routes to the platform's model layer automatically.

GPU instances are for when you need direct access to the hardware — your own models, your own software stack, your own data.

Availability guarantees

GPU nodes are on-demand — provisioned immediately when inventory is available in the selected region.
Skytells continuously adds GPU capacity to its global network. Regions with constrained availability expand on a rolling basis.
The control plane (Console management, metrics, status) is independent of the instance OS and remains available even during instance OS-level issues.
For training workloads requiring guaranteed capacity, use reserved instances where available — contact Skytells for enterprise GPU reservation.

Instances — general VM provisioning, detail pages, and lifecycle management.
VPCs — private networking for GPU node clusters.
Firewalls — access control for GPU inference endpoints.
Eve — the Skytells agent that uses Skytells GPU infrastructure for model-driven tasks.
Predictions — managed AI inference via the Skytells platform, without provisioning your own GPU instance.
Security — security practices that apply to all Skytells infrastructure.

How is this guide?

LLM fine-tuning

Model inference serving

Distributed training

Global inference network

Real-time generation pipelines

On this page