Intermediate40 minModule 2 of 6

Workflow Patterns

Master the three foundational workflow patterns — prompt chaining, routing, and parallelization — that form the building blocks of every AI system.

What you'll learn in this module

  • How to build reliable prompt chains with gates
  • How to route inputs to specialized handlers
  • How to parallelize independent LLM calls for speed
  • When each pattern is the right choice

Pattern 1: Prompt Chaining

The simplest multi-step pattern. Each LLM call processes the output of the previous one, transforming data through a pipeline.

Pass Pass Fail Fail Input Step 1: Generate Gate Step 2: Refine Gate Step 3: Format Output Error / Retry Error / Retry

How it works

  1. Break a complex task into sequential subtasks
  2. Each step has a focused prompt optimized for its specific job
  3. Between steps, add gates — programmatic checks that verify quality before proceeding

When to use it

  • The task naturally decomposes into distinct phases (draft → review → polish)
  • You need higher accuracy than a single call can deliver
  • You want to trade latency for quality

Example: Content Pipeline

StepTaskGate
1Generate a blog outline from a topic and audienceCheck outline has 3–7 sections
2Expand each section into paragraphsCheck word count > 200 per section
3Edit for tone and grammarVerify no placeholder text remains
4Generate meta description and titleCheck length constraints

Implementation sketch

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.SKYTELLS_API_KEY,
  baseURL: "https://api.skytells.ai/v1",
});

async function contentPipeline(topic: string, audience: string) {
  // Step 1: Generate outline
  const outline = await client.chat.completions.create({
    model: "deepbrain-router",
    messages: [
      { role: "user", content: `Create a blog outline about "${topic}" for ${audience}. Return 3-7 sections as JSON.` },
    ],
  });
  const outlineText = outline.choices[0].message.content!;

  // Gate: verify outline structure
  const sections = JSON.parse(outlineText);
  if (sections.length < 3 || sections.length > 7) {
    throw new Error("Outline must have 3-7 sections");
  }

  // Step 2: Expand each section
  const expanded = await client.chat.completions.create({
    model: "deepbrain-router",
    messages: [
      { role: "user", content: `Expand this outline into full paragraphs:\n${outlineText}` },
    ],
  });
  const expandedText = expanded.choices[0].message.content!;

  // Step 3: Edit for tone
  const edited = await client.chat.completions.create({
    model: "deepbrain-router",
    messages: [
      { role: "user", content: `Edit the following for a professional tone. Remove any placeholder text:\n${expandedText}` },
    ],
  });

  return edited.choices[0].message.content;
}

Pattern 2: Routing

A classifier LLM examines the input and directs it to a specialized handler. This is how you get both breadth (handling many input types) and depth (each handler is optimized for its specific case).

Type A Type B Type C Input Classifier LLM Handler A Handler B Handler C Output

How it works

  1. A lightweight LLM call classifies the input into categories
  2. Based on the classification, route to a specialized prompt/workflow
  3. Each handler can use different models, prompts, or even entirely different tools

When to use it

  • Inputs vary widely in type or complexity
  • Different input types need fundamentally different processing
  • You want to optimize cost by using cheaper models for simple inputs

Example: Support Ticket Router

billing technical general Incoming ticket Classify:billing / technical / general Billing Agent(access to payment APIs) Technical Agent(access to logs, docs) General Agent(FAQ + knowledge base)
CategoryModelTools availableResponse template
BillingGPT-4-classPayment API, refund systemFormal, include account details
TechnicalGPT-4-classLog search, documentation RAGTechnical, include steps to reproduce
GeneralGPT-3.5-classKnowledge base searchFriendly, link to help articles

The classifier prompt

The quality of routing depends entirely on the classifier. Design it carefully:

Classify the following support ticket into exactly one category:
- billing: payment issues, subscription changes, refunds, invoices
- technical: bugs, errors, API issues, integration problems
- general: how-to questions, feature requests, feedback

Respond with only the category name, nothing else.

Ticket: {{input}}

Tips for reliable classification:

  • Give clear, mutually exclusive category definitions
  • Include 2-3 example keywords per category
  • Ask for a single-word response to avoid parsing issues
  • Test with edge cases that sit between categories

Pattern 3: Parallelization

When subtasks are independent, run them simultaneously instead of sequentially. This dramatically reduces latency.

Sectioning

Split a task into independent parallel subtasks:

Input Split LLM 1: Summarize LLM 2: Extract entities LLM 3: Sentiment analysis Combine Output

Voting

Run the same task multiple times and aggregate results for higher confidence:

Input LLM Call 1 LLM Call 2 LLM Call 3 Majority Vote / Aggregate Output

When to use parallelization

VariantUse whenBenefit
SectioningTask has independent subtasksLower latency (wall-clock time)
VotingTask needs high confidence on a single answerHigher accuracy

Example: Document Analysis

Process a document three ways simultaneously:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.SKYTELLS_API_KEY,
  baseURL: "https://api.skytells.ai/v1",
});

async function analyzeDocument(document: string) {
  const [summary, entities, sentiment] = await Promise.all([
    client.chat.completions.create({
      model: "deepbrain-router",
      messages: [{ role: "user", content: `Summarize this document in 3 sentences:\n${document}` }],
    }),
    client.chat.completions.create({
      model: "deepbrain-router",
      messages: [{ role: "user", content: `Extract all named entities (people, companies, locations) as JSON:\n${document}` }],
    }),
    client.chat.completions.create({
      model: "deepbrain-router",
      messages: [{ role: "user", content: `Analyze the sentiment (positive/neutral/negative) with confidence score:\n${document}` }],
    }),
  ]);

  return {
    summary: summary.choices[0].message.content,
    entities: JSON.parse(entities.choices[0].message.content!),
    sentiment: sentiment.choices[0].message.content,
  };
}

Combining Patterns

The real power comes from composing these patterns. A production system often looks like:

Simple Complex Input Route: classify input type Chain: quick response Parallel: multi-analysis Chain: synthesize + review Output

Decision framework

When designing a workflow, evaluate at each step:

QuestionIf yes →
Can this step be broken into sequential phases?Chain them
Does the input need different handling by type?Route first
Are there independent subtasks?Parallelize them
Does the output need to be high-confidence?Use voting

Start simple. Add complexity only when measurement shows you need it.


What you now understand

PatternArchitectureKey benefit
Prompt ChainingSequential LLM calls with gatesHigher accuracy through decomposition
RoutingClassifier + specialized handlersOptimized handling per input type
ParallelizationConcurrent independent callsLower latency or higher confidence
CompositionPatterns combinedProduction-grade flexibility

Up next: Advanced Orchestration Patterns — orchestrator-workers, evaluator-optimizer loops, and autonomous agents.

On this page