AI Voiceovers & Music
Add AI-generated background music with BeatFusion and create lip-synced avatar videos with LipFusion to elevate your content.
Two audio tools for creators
Skytells offers two audio generation models for content creators:
| Model | What it does | Best for |
|---|---|---|
beatfusion-2.0 | Generates original music from a text prompt | Background music, intros, b-roll |
lipfusion | Animates a face to match an audio file | Talking-head videos, avatar content |
Creating background music with BeatFusion
Match music mood to your content type:
curl -X POST https://api.skytells.ai/v1/predictions \
-H "x-api-key: $SKYTELLS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "beatfusion-2.0",
"input": {
"prompt": "Upbeat corporate background music, light piano and acoustic guitar, energetic but not distracting, 120 BPM, 30 seconds",
"duration_seconds": 30
}
}'Music prompt formulas by content type
Social media montage:
Upbeat [genre] background music, [tempo] BPM, [instruments],
energetic, [mood], no vocals, [duration] secondsTutorial / educational:
Calm focus music, lo-fi [genre], subtle [instruments],
minimal, non-distracting, [duration] secondsProduct launch / reveal:
Cinematic build-up, [genre], starts minimal then swells to a
dramatic peak at [X] seconds, [instruments], epic feelLifestyle / vlog:
Warm acoustic [genre], [instruments], feel-good, positive energy,
suitable for video background, [duration] secondsMusic generation reference
| Content type | Prompt style | Duration |
|---|---|---|
| TikTok / Reel (15s) | Punchy, energetic, recognizable hook | 15–30s |
| Product showcase | Clean, modern, minimal | 30–60s |
| Tutorial / how-to | Calm focus, unobtrusive | 60–120s |
| Vlog intro | Upbeat, branded, memorable | 10–15s |
Adding music to video (ffmpeg)
Once you have both files, merge them:
# Combine video + AI music
ffmpeg -i social_video.mp4 -i background_music.mp3 \
-c:v copy -c:a aac \
-filter_complex "[1:a]volume=0.3[music];[0:a][music]amix=inputs=2:duration=first[out]" \
-map 0:v -map "[out]" \
output_with_music.mp4The volume=0.3 keeps the music subtle behind any original audio. Adjust to taste.
LipFusion — talking head videos
LipFusion animates any portrait image to match an audio file — perfect for creating:
- Spokesperson videos without a camera
- Multilingual content from a single image
- AI avatar explainer videos
curl -X POST https://api.skytells.ai/v1/predictions \
-H "x-api-key: $SKYTELLS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lipfusion",
"input": {
"image_url": "https://yoursite.com/avatar.jpg",
"audio_url": "https://yoursite.com/voiceover.mp3"
}
}'Output: an .mp4 of the portrait speaking in sync with the audio.
Requirements for best LipFusion results
| Factor | Recommendation |
|---|---|
| Image | Front-facing, neutral expression, good lighting |
| Image resolution | At least 512×512 |
| Audio quality | Clear voice, minimal background noise |
| Audio format | MP3 or WAV, 44.1kHz |
| Video duration | Matches audio length (up to 60s) |
Complete creator workflow
Full Python workflow
import os
import time
import urllib.request
import json
import subprocess
API_KEY = os.environ["SKYTELLS_API_KEY"]
BASE = "https://api.skytells.ai/v1"
def create_and_wait(model, input_data):
req = urllib.request.Request(
f"{BASE}/predictions",
data=json.dumps({"model": model, "input": input_data}).encode(),
headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as resp:
prediction = json.loads(resp.read())
while prediction["status"] not in ("succeeded", "failed"):
time.sleep(5)
req = urllib.request.Request(
f"{BASE}/predictions/{prediction['id']}",
headers={"x-api-key": API_KEY},
)
with urllib.request.urlopen(req) as resp:
prediction = json.loads(resp.read())
if prediction["status"] != "succeeded":
raise RuntimeError(prediction.get("error"))
return prediction["output"][0]
# 1. Generate video
video_url = create_and_wait("truefusion-video-pro", {
"prompt": "A barista making pour-over coffee, morning light, cinematic",
"duration_seconds": 10,
"aspect_ratio": "9:16",
})
# 2. Generate matching music
music_url = create_and_wait("beatfusion-2.0", {
"prompt": "Calm morning café ambience, acoustic guitar, warm, relaxing, 10 seconds",
"duration_seconds": 12,
})
# 3. Download both
urllib.request.urlretrieve(video_url, "video.mp4")
urllib.request.urlretrieve(music_url, "music.mp3")
# 4. Merge with ffmpeg
subprocess.run([
"ffmpeg", "-i", "video.mp4", "-i", "music.mp3",
"-c:v", "copy", "-c:a", "aac",
"-filter_complex", "[1:a]volume=0.4[m];[m]apad[out]",
"-map", "0:v", "-map", "[out]",
"-shortest", "final.mp4",
], check=True)
print("Done! Saved to final.mp4")Summary
- BeatFusion generates original background music — match mood to content type
- LipFusion creates talking-head videos from portrait + audio
- Use ffmpeg to merge video + music tracks
- Generate all assets in parallel, then merge — saves time
In the next module, you'll automate this entire workflow with a scheduling pipeline.
Video Content for Social Media
Generate short-form video clips from text prompts — aspect ratios, duration, visual styles, and prompting strategies for social platforms.
Automate Your Content Pipeline
Build an automated content pipeline using n8n (no-code) or a Python script that generates and schedules content from a weekly calendar.