Audio with BeatFusion

Generate music and audio tracks with BeatFusion 2.0. Combine AI audio with your AI videos to build complete multimedia content pipelines.

BeatFusion overview

BeatFusion is Skytells' audio generation model family. It generates original music, sound effects, and ambient audio from a text description.

Model	Cost	Quality	Best for
`beatfusion-2.0`	$0.75/pred	High	Production use, music
`beatfusion-1.0`	$0.45/pred	Standard	Prototyping, ambient audio

Your first audio prediction

curl -X POST https://api.skytells.ai/v1/predictions \
  -H "x-api-key: $SKYTELLS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "beatfusion-2.0",
    "input": {
      "prompt": "Upbeat corporate background music, light piano and strings, professional, 120 BPM",
      "duration_seconds": 30
    }
  }'

Response output contains an audio file URL (.mp3 or .wav).

BeatFusion input parameters

Parameter	Type	Description
`prompt`	string	Description of the desired audio
`duration_seconds`	int	Length in seconds (5–120)
`seed`	int	Reproducibility seed
`bpm`	int	Beats per minute (optional)
`key`	string	Musical key e.g. `"C major"` (optional)

Writing effective audio prompts

BeatFusion responds well to prompts that specify:

Genre or style — jazz, cinematic, lo-fi, EDM, ambient
Instruments — piano, acoustic guitar, synths, strings, drums
Mood/energy — relaxing, intense, uplifting, dark
Tempo — slow, medium, 90 BPM, 140 BPM
Use case — background music, video intro, podcast intro

Examples

"Cinematic orchestral score, strings and brass swelling to a dramatic climax, 
 epic film trailer style, 80 BPM"

"Lo-fi hip hop beat, vinyl crackle, muted piano chords, soft kick drum, 
 relaxing study music, 85 BPM"

"Tech startup product demo music, upbeat electronic, clean synths, 
 confident and modern, 125 BPM"

"Ambient meditation music, soft bells, nature sounds, slow evolving pads, 
 no percussion, peaceful"

Python: generate audio

import time
import urllib.request
import json
import os

API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"

def generate_audio(prompt: str, duration_seconds: int = 30) -> str:
    req = urllib.request.Request(
        f"{BASE}/predictions",
        data=json.dumps({
            "model": "beatfusion-2.0",
            "input": {
                "prompt": prompt,
                "duration_seconds": duration_seconds,
            },
        }).encode(),
        headers={
            "x-api-key": API_KEY,
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req) as resp:
        prediction = json.loads(resp.read())

    prediction_id = prediction["id"]
    while prediction["status"] not in ("succeeded", "failed"):
        time.sleep(3)
        req = urllib.request.Request(
            f"{BASE}/predictions/{prediction_id}",
            headers={"x-api-key": API_KEY},
        )
        with urllib.request.urlopen(req) as resp:
            prediction = json.loads(resp.read())

    if prediction["status"] != "succeeded":
        raise RuntimeError(prediction.get("error"))

    return prediction["output"][0]  # audio file URL

audio_url = generate_audio(
    "Lo-fi study music, soft piano, warm vinyl crackle, 85 BPM",
    duration_seconds=60,
)
print("Audio:", audio_url)

Combining video and audio with FFmpeg

Once you have both a video URL and an audio URL, combine them using FFmpeg:

import subprocess
import urllib.request

def download(url: str, path: str):
    urllib.request.urlretrieve(url, path)

def merge_video_audio(video_url: str, audio_url: str, output_path: str):
    download(video_url, "/tmp/video.mp4")
    download(audio_url, "/tmp/audio.mp3")

    subprocess.run([
        "ffmpeg", "-y",
        "-i", "/tmp/video.mp4",
        "-i", "/tmp/audio.mp3",
        "-c:v", "copy",
        "-c:a", "aac",
        "-shortest",
        output_path,
    ], check=True)

merge_video_audio(video_url, audio_url, "final_output.mp4")

The -shortest flag stops encoding when the shorter stream ends, ensuring the audio doesn't run past the video.

Multimedia pipeline

Put it all together — a complete pipeline that generates a video, generates matching audio, and merges them:

import os
import time
import urllib.request
import json
import subprocess

API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"

def create_prediction(model: str, input_data: dict) -> dict:
    req = urllib.request.Request(
        f"{BASE}/predictions",
        data=json.dumps({"model": model, "input": input_data}).encode(),
        headers={
            "x-api-key": API_KEY,
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())

def wait_for(prediction_id: str, interval: int = 5) -> str:
    while True:
        req = urllib.request.Request(
            f"{BASE}/predictions/{prediction_id}",
            headers={"x-api-key": API_KEY},
        )
        with urllib.request.urlopen(req) as resp:
            p = json.loads(resp.read())
        if p["status"] == "succeeded":
            return p["output"][0]
        if p["status"] in ("failed", "canceled"):
            raise RuntimeError(p.get("error"))
        time.sleep(interval)

scene = "A mountain biker riding through a pine forest trail at sunset"
music = "Energetic indie rock, electric guitar, driving drums, adventurous, 130 BPM"

print("Generating video...")
v_pred = create_prediction("truefusion-video-pro", {
    "prompt": scene,
    "duration_seconds": 10,
    "aspect_ratio": "16:9",
})

print("Generating audio...")
a_pred = create_prediction("beatfusion-2.0", {
    "prompt": music,
    "duration_seconds": 30,
})

video_url = wait_for(v_pred["id"])
audio_url = wait_for(a_pred["id"])

print("Merging...")
urllib.request.urlretrieve(video_url, "/tmp/video.mp4")
urllib.request.urlretrieve(audio_url, "/tmp/audio.mp3")
subprocess.run([
    "ffmpeg", "-y",
    "-i", "/tmp/video.mp4", "-i", "/tmp/audio.mp3",
    "-c:v", "copy", "-c:a", "aac", "-shortest",
    "scene_with_music.mp4",
], check=True)

print("Done! Output: scene_with_music.mp4")

Summary

You've completed the Video & Audio path:

Video generation — 9 models from Skytells, Google, and OpenAI
Async polling — video takes minutes, always poll or use webhooks
Audio with BeatFusion — generate music from text descriptions
Multimedia pipeline — combine AI video + AI audio with FFmpeg

Next steps:

Building Production Apps — webhooks, rate limiting, and the Edge API
SDK Mastery — cleaner code with the Python or TypeScript SDK

PreviousCreating AI Videos

On this page