Sync vs. Async Patterns
Choose the right generation architecture for every use case — real-time response, optimistic UI, background queue, and push notification patterns.
What you'll be able to build after this module
Select the right architecture pattern based on your generation time and user experience requirements — and implement it correctly the first time.
The fundamental decision tree
Pattern 1: Direct synchronous response
When: Generation completes in < 3 seconds. User is waiting at the screen.
Models: truefusion-edge (~1.5s), chat endpoints, fast audio
// app/api/preview/route.ts
import Skytells from '@skytells/sdk';
const client = Skytells(process.env.SKYTELLS_API_KEY, {
baseUrl: 'https://edge.skytells.ai/v1', // Edge for < 2s response
});
export const runtime = 'edge';
export async function POST(req: Request) {
const { prompt } = await req.json();
const prediction = await client.predictions.create({
model: 'truefusion-edge',
input: { prompt, width: 512, height: 512, num_inference_steps: 4 },
});
// SDK polls — prediction is already succeeded here
return Response.json({ url: prediction.output![0] });
}Pattern 1 requires the Edge API (Business/Enterprise). For lower plans, use truefusion with Pattern 2 — it averages ~8s but is perfectly smooth with a loading spinner.
Pattern 2: Optimistic UI + polling
When: Generation takes 5–15 seconds. User is waiting, but a loading state is acceptable.
Models: truefusion-pro (~8s), truefusion-2.0 (~12s)
// app/api/generate/route.ts
export async function POST(req: Request) {
const { prompt } = await req.json();
// Don't wait for completion — return the prediction ID immediately
const prediction = await client.predictions.create({
model: 'truefusion-pro',
input: { prompt, width: 1024, height: 1024 },
wait: false, // Return immediately
});
return Response.json({
predictionId: prediction.id,
status: prediction.status, // 'queued' or 'processing'
});
}
// app/api/generate/[id]/route.ts
export async function GET(_req: Request, { params }: { params: { id: string } }) {
const prediction = await client.predictions.get(params.id);
return Response.json({
status: prediction.status,
output: prediction.output ?? null,
error: prediction.error ?? null,
});
}Pattern 3: Fire-and-forget + webhook notification
When: Generation takes > 15 seconds (video, audio, high-quality batch).
Models: truefusion-video-pro (30–120s), beatfusion-2.0 (30–90s), mera (2–5min)
// POST /api/generate
export async function POST(req: Request) {
const { prompt, userId } = await req.json();
const prediction = await client.predictions.create({
model: 'truefusion-video-pro',
input: { prompt, duration_seconds: 10 },
webhook: `${process.env.BASE_URL}/api/webhooks/skytells`,
webhookEventsFilter: ['completed'],
wait: false,
});
// Store job
await db.jobs.create({
data: { id: prediction.id, userId, status: 'pending' },
});
// Return immediately — don't make the user wait
return Response.json({ jobId: prediction.id });
}
// POST /api/webhooks/skytells
export async function POST(req: Request) {
const raw = await req.text();
// ... (verify signature — see Webhooks module)
const prediction = JSON.parse(raw);
await db.jobs.update({
where: { id: prediction.id },
data: { status: prediction.status, outputUrl: prediction.output?.[0] },
});
// Notify user via your push system (Pusher, SSE, WebSocket, email, etc.)
await notifyUser(prediction.id, prediction.status);
return Response.json({ received: true });
}Pattern 4: Background queue (batch processing)
When: You need to process many requests at once — batch generation, scheduled jobs, bulk exports.
// workers/generation.ts
import { Queue, Worker } from 'bullmq';
import Skytells from '@skytells/sdk';
const client = Skytells(process.env.SKYTELLS_API_KEY);
export const generationQueue = new Queue('generation', {
connection: { host: process.env.REDIS_HOST, port: 6379 },
defaultJobOptions: {
attempts: 3,
backoff: { type: 'exponential', delay: 2000 },
},
});
// Worker — processes jobs at a controlled rate
new Worker('generation', async (job) => {
const { prompt, userId, jobId } = job.data;
const prediction = await client.predictions.create({
model: 'truefusion-pro',
input: { prompt, width: 1024, height: 1024 },
});
await db.results.create({
data: { jobId, userId, outputUrl: prediction.output![0], status: 'done' },
});
return { predictionId: prediction.id };
}, {
connection: { host: process.env.REDIS_HOST, port: 6379 },
concurrency: 5, // process 5 jobs at once
limiter: { max: 8, duration: 1000 }, // max 8/second
});Choosing the right pattern
| Pattern | Latency | Complexity | When to use |
|---|---|---|---|
| 1: Direct sync | < 3s | Low | Real-time previews, Edge API |
| 2: Optimistic UI + poll | 5–15s | Low | Most image generation |
| 3: Fire-and-forget + webhook | Any | Medium | Video, audio, long jobs |
| 4: Background queue | Any | High | Batch processing, high volume |
Don't over-engineer. Pattern 2 (optimistic UI + polling) handles 80% of use cases beautifully. Only add webhooks (Pattern 3) or queues (Pattern 4) when generation time or scale genuinely requires it.
Summary
You can now design the right architecture for any generation use case. Pick the pattern that matches your latency requirements and user expectations — not the most complex one.
The four patterns:
- Direct sync — sub-3s, Edge API, simple
await - Optimistic UI + poll — 5–15s, return job ID, poll every 2s on the frontend
- Webhook — any duration, fire-and-forget, push result to user
- Background queue — high volume, controlled rate, Celery or BullMQ
Next: caching strategies to reduce costs and improve response times.
Generative AI Patterns
Architectural patterns for integrating AI generation into production systems — sync vs. async, caching strategies, and multi-model pipelines.
Caching, Deduplication & Cost Optimization
Reduce AI generation costs by 60–80% with prompt hashing, semantic deduplication, output TTL strategies, and budget guardrails.