Inference

Running Custom Models on Skytells

Skytells CPU and GPU hardware tiers for dedicated Enterprise inference—per-second billing, standard and multi-GPU options. See Enterprise inference for endpoints and networking.

Overview

This page lists supported hardware for workloads that run on Skytells-managed compute—including Enterprise dedicated inference deployments. Dedicated inference uses the same hardware catalog as the rest of the platform; billing is per second of active processing (idle time is not charged). For list pricing, see Hardware Pricing.

If you are defining endpoints, private networking, or how to request a deployment, start with Enterprise inference.

Standard hardware

These tiers are generally available. The ID is how hardware is referenced in billing and in the Console.

HardwareIDGPUCPUGPU RAMRAM
CPU (Small)cpu-small1x2 GB
CPUcpu4x8 GB
Nvidia T4 GPUgpu-t41x4x16 GB16 GB
Nvidia L40S GPUgpu-l40s1x10x48 GB65 GB
2x Nvidia L40S GPUgpu-l40s-2x2x20x96 GB144 GB
Nvidia A100 (80 GB) GPUgpu-a100-large1x10x80 GB144 GB
2x Nvidia A100 (80 GB) GPUgpu-a100-large-2x2x20x160 GB288 GB
Nvidia H100 GPUgpu-h1001x13x80 GB72 GB

Additional multi-GPU hardware

Larger multi-GPU shapes are available with committed spend contracts. Contact Support for availability and custom pricing.

HardwareID
4x Nvidia A100 (80 GB) GPUgpu-a100-large-4x
8x Nvidia A100 (80 GB) GPUgpu-a100-large-8x
2x Nvidia H100 GPUgpu-h100-2x
4x Nvidia H100 GPUgpu-h100-4x
8x Nvidia H100 GPUgpu-h100-8x
4x Nvidia L40S GPUgpu-l40s-4x
8x Nvidia L40S GPUgpu-l40s-8x

Per-second list pricing for these tiers (where applicable) is on Hardware Pricing. Enterprise deployments may use a subset of tiers depending on model size, latency targets, and contract—Sales can align hardware to your workload.


How is this guide?

On this page