Intermediate25 minModule 3 of 3
Audio with BeatFusion
Generate music and audio tracks with BeatFusion 2.0. Combine AI audio with your AI videos to build complete multimedia content pipelines.
BeatFusion overview
BeatFusion is Skytells' audio generation model family. It generates original music, sound effects, and ambient audio from a text description.
| Model | Cost | Quality | Best for |
|---|---|---|---|
beatfusion-2.0 | $0.75/pred | High | Production use, music |
beatfusion-1.0 | $0.45/pred | Standard | Prototyping, ambient audio |
Your first audio prediction
curl -X POST https://api.skytells.ai/v1/predictions \
-H "x-api-key: $SKYTELLS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "beatfusion-2.0",
"input": {
"prompt": "Upbeat corporate background music, light piano and strings, professional, 120 BPM",
"duration_seconds": 30
}
}'Response output contains an audio file URL (.mp3 or .wav).
BeatFusion input parameters
| Parameter | Type | Description |
|---|---|---|
prompt | string | Description of the desired audio |
duration_seconds | int | Length in seconds (5–120) |
seed | int | Reproducibility seed |
bpm | int | Beats per minute (optional) |
key | string | Musical key e.g. "C major" (optional) |
Writing effective audio prompts
BeatFusion responds well to prompts that specify:
- Genre or style — jazz, cinematic, lo-fi, EDM, ambient
- Instruments — piano, acoustic guitar, synths, strings, drums
- Mood/energy — relaxing, intense, uplifting, dark
- Tempo — slow, medium, 90 BPM, 140 BPM
- Use case — background music, video intro, podcast intro
Examples
"Cinematic orchestral score, strings and brass swelling to a dramatic climax,
epic film trailer style, 80 BPM"
"Lo-fi hip hop beat, vinyl crackle, muted piano chords, soft kick drum,
relaxing study music, 85 BPM"
"Tech startup product demo music, upbeat electronic, clean synths,
confident and modern, 125 BPM"
"Ambient meditation music, soft bells, nature sounds, slow evolving pads,
no percussion, peaceful"Python: generate audio
import time
import urllib.request
import json
import os
API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"
def generate_audio(prompt: str, duration_seconds: int = 30) -> str:
req = urllib.request.Request(
f"{BASE}/predictions",
data=json.dumps({
"model": "beatfusion-2.0",
"input": {
"prompt": prompt,
"duration_seconds": duration_seconds,
},
}).encode(),
headers={
"x-api-key": API_KEY,
"Content-Type": "application/json",
},
)
with urllib.request.urlopen(req) as resp:
prediction = json.loads(resp.read())
prediction_id = prediction["id"]
while prediction["status"] not in ("succeeded", "failed"):
time.sleep(3)
req = urllib.request.Request(
f"{BASE}/predictions/{prediction_id}",
headers={"x-api-key": API_KEY},
)
with urllib.request.urlopen(req) as resp:
prediction = json.loads(resp.read())
if prediction["status"] != "succeeded":
raise RuntimeError(prediction.get("error"))
return prediction["output"][0] # audio file URL
audio_url = generate_audio(
"Lo-fi study music, soft piano, warm vinyl crackle, 85 BPM",
duration_seconds=60,
)
print("Audio:", audio_url)Combining video and audio with FFmpeg
Once you have both a video URL and an audio URL, combine them using FFmpeg:
import subprocess
import urllib.request
def download(url: str, path: str):
urllib.request.urlretrieve(url, path)
def merge_video_audio(video_url: str, audio_url: str, output_path: str):
download(video_url, "/tmp/video.mp4")
download(audio_url, "/tmp/audio.mp3")
subprocess.run([
"ffmpeg", "-y",
"-i", "/tmp/video.mp4",
"-i", "/tmp/audio.mp3",
"-c:v", "copy",
"-c:a", "aac",
"-shortest",
output_path,
], check=True)
merge_video_audio(video_url, audio_url, "final_output.mp4")The -shortest flag stops encoding when the shorter stream ends, ensuring the audio doesn't run past the video.
Multimedia pipeline
Put it all together — a complete pipeline that generates a video, generates matching audio, and merges them:
import os
import time
import urllib.request
import json
import subprocess
API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"
def create_prediction(model: str, input_data: dict) -> dict:
req = urllib.request.Request(
f"{BASE}/predictions",
data=json.dumps({"model": model, "input": input_data}).encode(),
headers={
"x-api-key": API_KEY,
"Content-Type": "application/json",
},
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())
def wait_for(prediction_id: str, interval: int = 5) -> str:
while True:
req = urllib.request.Request(
f"{BASE}/predictions/{prediction_id}",
headers={"x-api-key": API_KEY},
)
with urllib.request.urlopen(req) as resp:
p = json.loads(resp.read())
if p["status"] == "succeeded":
return p["output"][0]
if p["status"] in ("failed", "canceled"):
raise RuntimeError(p.get("error"))
time.sleep(interval)
scene = "A mountain biker riding through a pine forest trail at sunset"
music = "Energetic indie rock, electric guitar, driving drums, adventurous, 130 BPM"
print("Generating video...")
v_pred = create_prediction("truefusion-video-pro", {
"prompt": scene,
"duration_seconds": 10,
"aspect_ratio": "16:9",
})
print("Generating audio...")
a_pred = create_prediction("beatfusion-2.0", {
"prompt": music,
"duration_seconds": 30,
})
video_url = wait_for(v_pred["id"])
audio_url = wait_for(a_pred["id"])
print("Merging...")
urllib.request.urlretrieve(video_url, "/tmp/video.mp4")
urllib.request.urlretrieve(audio_url, "/tmp/audio.mp3")
subprocess.run([
"ffmpeg", "-y",
"-i", "/tmp/video.mp4", "-i", "/tmp/audio.mp3",
"-c:v", "copy", "-c:a", "aac", "-shortest",
"scene_with_music.mp4",
], check=True)
print("Done! Output: scene_with_music.mp4")Summary
You've completed the Video & Audio path:
- Video generation — 9 models from Skytells, Google, and OpenAI
- Async polling — video takes minutes, always poll or use webhooks
- Audio with BeatFusion — generate music from text descriptions
- Multimedia pipeline — combine AI video + AI audio with FFmpeg
Next steps:
- Building Production Apps — webhooks, rate limiting, and the Edge API
- SDK Mastery — cleaner code with the Python or TypeScript SDK