Intermediate25 minModule 3 of 3

Audio with BeatFusion

Generate music and audio tracks with BeatFusion 2.0. Combine AI audio with your AI videos to build complete multimedia content pipelines.

BeatFusion overview

BeatFusion is Skytells' audio generation model family. It generates original music, sound effects, and ambient audio from a text description.

ModelCostQualityBest for
beatfusion-2.0$0.75/predHighProduction use, music
beatfusion-1.0$0.45/predStandardPrototyping, ambient audio

Your first audio prediction

curl -X POST https://api.skytells.ai/v1/predictions \
  -H "x-api-key: $SKYTELLS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "beatfusion-2.0",
    "input": {
      "prompt": "Upbeat corporate background music, light piano and strings, professional, 120 BPM",
      "duration_seconds": 30
    }
  }'

Response output contains an audio file URL (.mp3 or .wav).

BeatFusion input parameters

ParameterTypeDescription
promptstringDescription of the desired audio
duration_secondsintLength in seconds (5–120)
seedintReproducibility seed
bpmintBeats per minute (optional)
keystringMusical key e.g. "C major" (optional)

Writing effective audio prompts

BeatFusion responds well to prompts that specify:

  1. Genre or style — jazz, cinematic, lo-fi, EDM, ambient
  2. Instruments — piano, acoustic guitar, synths, strings, drums
  3. Mood/energy — relaxing, intense, uplifting, dark
  4. Tempo — slow, medium, 90 BPM, 140 BPM
  5. Use case — background music, video intro, podcast intro

Examples

"Cinematic orchestral score, strings and brass swelling to a dramatic climax, 
 epic film trailer style, 80 BPM"

"Lo-fi hip hop beat, vinyl crackle, muted piano chords, soft kick drum, 
 relaxing study music, 85 BPM"

"Tech startup product demo music, upbeat electronic, clean synths, 
 confident and modern, 125 BPM"

"Ambient meditation music, soft bells, nature sounds, slow evolving pads, 
 no percussion, peaceful"

Python: generate audio

import time
import urllib.request
import json
import os

API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"

def generate_audio(prompt: str, duration_seconds: int = 30) -> str:
    req = urllib.request.Request(
        f"{BASE}/predictions",
        data=json.dumps({
            "model": "beatfusion-2.0",
            "input": {
                "prompt": prompt,
                "duration_seconds": duration_seconds,
            },
        }).encode(),
        headers={
            "x-api-key": API_KEY,
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req) as resp:
        prediction = json.loads(resp.read())

    prediction_id = prediction["id"]
    while prediction["status"] not in ("succeeded", "failed"):
        time.sleep(3)
        req = urllib.request.Request(
            f"{BASE}/predictions/{prediction_id}",
            headers={"x-api-key": API_KEY},
        )
        with urllib.request.urlopen(req) as resp:
            prediction = json.loads(resp.read())

    if prediction["status"] != "succeeded":
        raise RuntimeError(prediction.get("error"))

    return prediction["output"][0]  # audio file URL

audio_url = generate_audio(
    "Lo-fi study music, soft piano, warm vinyl crackle, 85 BPM",
    duration_seconds=60,
)
print("Audio:", audio_url)

Combining video and audio with FFmpeg

Once you have both a video URL and an audio URL, combine them using FFmpeg:

import subprocess
import urllib.request

def download(url: str, path: str):
    urllib.request.urlretrieve(url, path)

def merge_video_audio(video_url: str, audio_url: str, output_path: str):
    download(video_url, "/tmp/video.mp4")
    download(audio_url, "/tmp/audio.mp3")

    subprocess.run([
        "ffmpeg", "-y",
        "-i", "/tmp/video.mp4",
        "-i", "/tmp/audio.mp3",
        "-c:v", "copy",
        "-c:a", "aac",
        "-shortest",
        output_path,
    ], check=True)

merge_video_audio(video_url, audio_url, "final_output.mp4")

The -shortest flag stops encoding when the shorter stream ends, ensuring the audio doesn't run past the video.

Multimedia pipeline

Put it all together — a complete pipeline that generates a video, generates matching audio, and merges them:

import os
import time
import urllib.request
import json
import subprocess

API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"

def create_prediction(model: str, input_data: dict) -> dict:
    req = urllib.request.Request(
        f"{BASE}/predictions",
        data=json.dumps({"model": model, "input": input_data}).encode(),
        headers={
            "x-api-key": API_KEY,
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())

def wait_for(prediction_id: str, interval: int = 5) -> str:
    while True:
        req = urllib.request.Request(
            f"{BASE}/predictions/{prediction_id}",
            headers={"x-api-key": API_KEY},
        )
        with urllib.request.urlopen(req) as resp:
            p = json.loads(resp.read())
        if p["status"] == "succeeded":
            return p["output"][0]
        if p["status"] in ("failed", "canceled"):
            raise RuntimeError(p.get("error"))
        time.sleep(interval)

scene = "A mountain biker riding through a pine forest trail at sunset"
music = "Energetic indie rock, electric guitar, driving drums, adventurous, 130 BPM"

print("Generating video...")
v_pred = create_prediction("truefusion-video-pro", {
    "prompt": scene,
    "duration_seconds": 10,
    "aspect_ratio": "16:9",
})

print("Generating audio...")
a_pred = create_prediction("beatfusion-2.0", {
    "prompt": music,
    "duration_seconds": 30,
})

video_url = wait_for(v_pred["id"])
audio_url = wait_for(a_pred["id"])

print("Merging...")
urllib.request.urlretrieve(video_url, "/tmp/video.mp4")
urllib.request.urlretrieve(audio_url, "/tmp/audio.mp3")
subprocess.run([
    "ffmpeg", "-y",
    "-i", "/tmp/video.mp4", "-i", "/tmp/audio.mp3",
    "-c:v", "copy", "-c:a", "aac", "-shortest",
    "scene_with_music.mp4",
], check=True)

print("Done! Output: scene_with_music.mp4")

Summary

You've completed the Video & Audio path:

  1. Video generation — 9 models from Skytells, Google, and OpenAI
  2. Async polling — video takes minutes, always poll or use webhooks
  3. Audio with BeatFusion — generate music from text descriptions
  4. Multimedia pipeline — combine AI video + AI audio with FFmpeg

Next steps:

On this page