Model Schemas
Understand how model input and output schemas work, and how they differ across model types.
How Schemas Work
Every model on Skytells publishes a JSON Schema for its input and output. When you create a prediction, the input object you send must conform to the model's input_schema. The response will match the output_schema.
You can retrieve any model's schema programmatically:
curl https://api.skytells.ai/v1/models \
-H "x-api-key: YOUR_API_KEY"Or fetch a single model's schema by slug (GET /v1/models/{slug}):
curl "https://api.skytells.ai/v1/models/truefusion?fields=input_schema,output_schema" \
-H "x-api-key: YOUR_API_KEY"Each model in the response includes input_schema and output_schema fields.
Input Schema Structure
Input schemas follow JSON Schema conventions:
{
"type": "object",
"title": "Input",
"required": ["prompt"],
"properties": {
"prompt": {
"type": "string",
"title": "Prompt",
"description": "Text prompt for generation",
"x-order": 0
},
"aspect_ratio": {
"type": "string",
"enum": ["1:1", "16:9", "9:16"],
"default": "1:1",
"x-order": 1
}
}
}Key fields:
| Field | Meaning |
|---|---|
required | Parameters you must include in your request |
type | Data type — string, integer, number, boolean, array |
enum | Fixed set of allowed values |
default | Value used if you omit the parameter |
minimum / maximum | Numeric bounds |
format | Special format — uri for URLs, password for secrets |
x-order | Display ordering hint |
Schema Complexity by Model Tier
Models range from minimal to highly configurable schemas. Here's how complexity scales:
Minimal schema (2–4 parameters)
Models like TrueFusion, Imagen 3, and Imagen 4 have streamlined schemas—just prompt, aspect_ratio, and maybe a negative_prompt or safety filter. These are ideal when you want fast results with minimal configuration.
{
"input": {
"prompt": "A sunset over mountains",
"aspect_ratio": "16:9"
}
}Standard schema (8–12 parameters)
Models like TrueFusion Pro, TrueFusion Edge, and Flux.1 Edge add control over generation quality: num_inference_steps, guidance, seed, output_format, and output_quality. You get fine-grained tuning without overwhelming options.
{
"input": {
"prompt": "A cyberpunk cityscape at night",
"aspect_ratio": "21:9",
"num_inference_steps": 35,
"guidance": 5,
"seed": 42,
"output_format": "png",
"output_quality": 95
}
}Advanced schema (12+ parameters)
Models like TrueFusion Standard (with LoRA support), TrueFusion Ultra (with inpainting + style references), and Flux 2 Flex (with 10 reference images + prompt upsampling) expose the full range of creative control.
{
"input": {
"prompt": "Portrait in the style of @ref, soft lighting",
"reference_images": ["https://example.com/style.jpg"],
"reference_tags": ["ref"],
"resolution": "1080p",
"aspect_ratio": "4:3",
"seed": 12345
}
}Output Schema Patterns
There are only two output patterns across all models:
Single URL
Most models return a single URL string:
{
"type": "string",
"format": "uri"
}Response:
{
"output": "https://delivery.skytells.ai/abc123.jpg"
}Used by: TrueFusion Max, TrueFusion Ultra, TrueFusion 2.0, all video models, all audio models, Imagen, FLUX.2 Pro.
Array of URLs
Models that support multiple outputs return an array:
{
"type": "array",
"items": { "type": "string", "format": "uri" }
}Response:
{
"output": [
"https://delivery.skytells.ai/abc123.jpg",
"https://delivery.skytells.ai/def456.jpg"
]
}Used by: TrueFusion, TrueFusion Pro, TrueFusion Edge, TrueFusion Standard, GPT-Image-1.
Always check the output schema of the model you're using. If you expect an array but the model returns a single string (or vice versa), your parsing logic will break.
Type-Specific Schema Patterns
Image models
Common parameters: prompt, aspect_ratio, seed, output_format, output_quality
Advanced parameters vary by model:
- img2img:
image,prompt_strength - Inpainting:
image,mask - Reference-based:
reference_images,reference_tags,image_prompt,input_images - LoRA:
lora_weights,lora_scale - Quality control:
num_inference_steps,guidance,guidance_scale - Speed:
go_fast,speed_mode,megapixels
Video models
Common parameters: prompt, aspect_ratio, duration/seconds
Model-specific patterns:
- Frame control:
start_image,end_image,last_frame - Reference:
reference_images,input_reference - Audio:
generate_audio(Veo),audio(Wan 2.5) - Flexibility:
cfg_scale(lower = more creative) - Safety:
negative_prompt,person_generation
Audio models
Unique parameters:
lyrics— song lyrics with structure tags ([Verse],[Chorus], etc.)prompt— style/mood descriptionsample_rate,bitrate,audio_format— audio encoding settings
Pricing Models
Different models use different billing units. Understanding these helps you estimate costs accurately.
| Pricing Unit | How it works | Example |
|---|---|---|
| Per image | Flat rate per generated image | TrueFusion: $0.03/image |
| Per second | Billed by output duration | Veo 3.1: $0.43/second |
| Per prediction | Flat rate per API call | Mera: $3.42/prediction |
| Per GPU second | Billed by GPU compute time | TrueFusion Pano: $0.02/GPU s |
| Per computing second | Billed by total compute time | TrueFusion Optima: $0.008/s |
| Per megapixel | Billed by output resolution | FLUX.2 Pro: $0.02/MP |
| Per 5 seconds | Chunked video billing | Video Upscale: $0.10/5s |
Some models have conditional pricing based on parameters — for example, Veo 3.1 Fast charges differently depending on whether generate_audio is enabled.
Check the model's pricing.criterias field in the API response for conditional pricing rules. Some models charge differently based on resolution, generate_audio, or other input parameters.
Partner Models
Some models are served through partner APIs (OpenAI, Google). These models have "inference_party": "partner" in their metadata and typically require you to provide your own API key:
{
"input": {
"openai_api_key": "sk-...",
"prompt": "A watercolor painting of a cottage"
}
}Partner-served models: GPT-Image-1, Sora 2, Sora 2 Pro
The API key field uses "format": "password" and "x-cog-secret": true — it is never logged or stored by Skytells.
How is this guide?