Every practitioner who has worked seriously with AI tools has hit the same ceiling. You write a prompt that asks the AI to do something genuinely complex — research a topic, draft a polished document, analyze a dataset, produce a finished...
In This Chapter
- The Limits of the Single Prompt
- What Chaining Is and Why It Matters
- The Anatomy of a Chain
- Chain Design Principles
- Types of Chains
- Manual Chaining: Doing It by Hand
- Semi-Automated Chaining: No-Code Tools
- Fully Automated Chains: Python Scripting
- Chain Design Patterns
- Quality Gates: When to Add Human Review
- Managing Context Between Chain Steps
- Error Handling in Chains
- Scenario Walkthrough: Alex's Content Creation Chain
- Scenario Walkthrough: Raj's Code Review Chain
- Scenario Walkthrough: Elena's Consulting Deliverable Chain
- Research Breakdown: Multi-Step AI Workflow Effectiveness
- Putting It Together: Building Your First Chain
Chapter 35: Chaining AI Interactions and Multi-Step Workflows
The Limits of the Single Prompt
Every practitioner who has worked seriously with AI tools has hit the same ceiling. You write a prompt that asks the AI to do something genuinely complex — research a topic, draft a polished document, analyze a dataset, produce a finished deliverable — and the output is underwhelming. Not because the AI is incapable, but because what you asked for was not one task. It was five tasks, bundled together into a single instruction, with no room for iteration, review, or correction between them.
The solution is not a better prompt. The solution is a chain.
Chaining AI interactions means breaking a complex task into discrete steps, executing each step as its own AI interaction, and feeding the output of each step as input to the next. Where a single prompt says "research this topic and write me a polished report," a chain says: first, gather and summarize the relevant sources; then, identify the key themes in those summaries; then, build an outline based on those themes; then, draft each section; then, revise for consistency and tone. At each step, the AI is doing one thing well rather than five things adequately.
This is not a subtle improvement. Practitioners who move from single-prompt thinking to chain thinking consistently report that output quality jumps in ways that feel qualitative rather than marginal — not just better prose but structurally sounder analysis, more coherent arguments, and deliverables that actually match what was needed rather than a plausible approximation of it.
This chapter teaches you how to design, build, and execute AI chains — from manual chains you run by hand in a chat interface, through semi-automated chains using no-code tools, to fully automated chains using Python code. It covers the design principles that make chains robust, the failure modes that make them fragile, and the human-in-the-loop checkpoints that keep quality high even when automation handles the mechanics.
What Chaining Is and Why It Matters
A chain is a sequence of AI interactions where the output of one step becomes the input for the next, with the whole sequence directed toward a complex goal that no single step could accomplish alone.
The power of chains comes from several sources.
Quality through focus. When an AI is asked to do one well-defined thing at a time, it can do that thing well. When it is asked to do many things simultaneously, it must balance competing demands and often does none of them optimally. A step that says "summarize these five sources" will produce better summaries than a step that says "summarize these five sources and also identify themes and also write an introduction."
Checkpoints for human judgment. A chain creates natural pause points where a human can review what has been produced, catch errors before they propagate, and redirect the work if the AI has gone astray. In a single-prompt workflow, an error in reasoning at the start propagates invisibly through to the finished output. In a chain, an error in step two is visible before it poisons steps three through six.
Context management. AI models have context windows — limits on how much text they can consider at once. A single prompt trying to do everything at once can exhaust that context window before producing a complete result. A chain distributes the cognitive load across multiple interactions, with each step working on a focused, manageable slice of the overall task.
Reusability. Individual steps in a well-designed chain can be reused across different chains. A "summarize source documents" step is useful in a research chain, a competitive analysis chain, and a literature review chain. Building a library of reliable chain steps is an investment that pays compound returns.
Auditability. A chain produces intermediate outputs at each step, creating a record of how a final deliverable was produced. This is valuable for quality review, for training and improvement, and in professional contexts where showing your work matters.
💡 Intuition: Think of a chain like a kitchen brigade in a restaurant. One person preps the ingredients, another does the initial cooking, another does the finishing work, and a chef reviews before the plate goes out. No single station is trying to do everything. The quality of the final dish emerges from the coordination of focused, sequential work — not from one person doing everything at once.
The Anatomy of a Chain
The basic structure of a chain is:
Input → Step 1 → Output 1 → Step 2 → Output 2 → Step 3 → Output 3 → ... → Final Output
Each element of this structure has a role.
The initial input is the raw material that starts the chain. It might be a topic, a set of source documents, a dataset, a brief, a set of requirements — whatever the chain is meant to process. The quality and clarity of the initial input shapes everything downstream. Garbage in, garbage out applies at the chain level even more than at the individual-prompt level.
Each step is a discrete AI interaction with a specific, focused purpose. Good steps have: - A clear instruction that can be stated in one or two sentences - A well-defined input (the output of the previous step, plus any additional context) - A clearly describable expected output format - A criterion by which a human reviewer can judge whether the output is acceptable
Each intermediate output is both a deliverable in its own right (it should be useful and coherent) and the input for the next step. If an intermediate output is so incoherent that the next step cannot use it, the chain has broken.
The final output is the deliverable the chain was designed to produce. In a well-designed chain, the final output is higher quality than anything achievable through a single prompt because it has been built up through focused, reviewed steps.
Between steps, there may be human review points — moments where a practitioner reviews the intermediate output, edits it if necessary, and approves it before the chain continues. The placement of these review points is one of the most important design decisions in chain construction.
Chain Design Principles
Decompose Before You Chain
The most common chain design error is starting to build before you have fully decomposed the task. Before writing a single prompt, answer this question: what are the logical steps a skilled human expert would take to complete this task?
If a skilled human researcher were producing a competitive analysis report, they would: 1. Identify the companies and dimensions to analyze 2. Gather information on each company 3. Organize that information into a comparison framework 4. Identify key patterns and insights from the comparison 5. Structure a report that communicates those insights 6. Review and polish the draft
That decomposition is the skeleton of your chain. Each human step becomes an AI step. The chain mirrors the expert process.
This decomposition exercise often reveals that what you thought was one task is actually six or eight distinct tasks — and that some of those tasks require judgment that an AI should not perform without human review.
Each Step Should Have a Clear, Verifiable Output
A step that produces output you cannot evaluate is a step that will silently fail without your knowing it. Before adding a step to a chain, ask: if this step produces bad output, will I be able to tell?
Good outputs are verifiable: a list of sources is verifiable (do these sources exist and are they relevant?), an outline is verifiable (does this structure make sense for the goal?), a draft is verifiable (does this say what I intended?). Bad outputs are vague: "insights" is not verifiable unless you specify what form the insights should take and how to evaluate them.
When you cannot make a step's output verifiable, that is often a sign that the step is doing too many things and needs to be further decomposed.
Build in Human Review at Key Junctions
Not every step needs a human review gate. Steps that are purely mechanical — reformatting, basic summarization, translation — can often proceed without review. Steps that involve judgment — drawing conclusions, making recommendations, adapting content for an audience — benefit from human oversight.
The general heuristic: add a human review gate anywhere that: - An error would be invisible but consequential - The AI is making a judgment call rather than executing a mechanical task - The output will be seen by clients, stakeholders, or the public - The stakes of a mistake are high
Human review gates slow chains down. That is a feature, not a bug. The slowdown is worth it when the alternative is propagating an undetected error through five subsequent steps.
Design for Failure
Every chain will eventually produce bad output at some step. The question is whether your chain design makes that failure visible and recoverable, or invisible and catastrophic.
Design for failure by asking for each step: what is the worst plausible output this step could produce, and what happens if the next step receives that output? If the answer is "the next step fails visibly and I notice," that is acceptable. If the answer is "the next step proceeds and the failure propagates invisibly to the final output," you need either a human review gate or an automated quality check at that step.
Also ask: if this step produces bad output, do I have a fallback? For automated chains, fallbacks might be automated (flag for human review if output does not meet format requirements). For manual chains, fallbacks are simpler: you review and revise before proceeding.
⚠️ Common Pitfall: The "just get to the final output" trap. When you are in the middle of executing a chain and step 3 produces something marginal, the temptation is to pass it to step 4 and hope the AI fixes it. Resist this. A mediocre step 3 output fed into step 4 does not become a good step 4 output. The chain compounds errors just as readily as it compounds quality.
Types of Chains
Linear Chains: A → B → C
The simplest chain type. Each step takes exactly one input and produces exactly one output, which feeds the next step. Linear chains are appropriate for tasks with a natural sequential structure: research → synthesize → write → revise.
When to use: When the task has a clear beginning, middle, and end, and the output of each step unambiguously feeds the next.
Design consideration: In a long linear chain (six or more steps), errors can compound. Add human review gates every two or three steps.
Example — Research to Report Chain: 1. Take the research question → generate a list of sub-questions to investigate 2. Take each sub-question → produce a brief summary of what is known 3. Take all summaries → identify the five most important findings 4. Take the five findings → draft a report structure 5. Take the structure → draft each section 6. Take the full draft → revise for consistency, flow, and accuracy
Branching Chains: A → B1 or B2 Based on Condition
A branching chain forks at a decision point. Based on the output of one step (or a human review of that output), the chain takes one of two or more paths. Branching chains are appropriate when the appropriate next step depends on what you find.
When to use: When the task involves a classification, assessment, or decision that determines subsequent work.
Example — Content Triage Chain: 1. Review submitted content → classify as: needs major revision, needs minor revision, or ready to publish 2a. If major revision: → generate a rewrite brief → rewrite → review 2b. If minor revision: → generate specific edits → apply edits → review 2c. If ready: → format for publication → publish
Design consideration: Branch conditions need to be clearly specified. If the branching condition is ambiguous, the chain will branch inconsistently. For automated chains, define branch conditions programmatically. For manual chains, you make the branching decision yourself at the fork.
Iterative Chains: A → B → Evaluate → A Again If Needed
An iterative chain cycles back to an earlier step when the output of a later step does not meet a quality threshold. The loop continues until the output is satisfactory.
When to use: When quality is the paramount concern and you cannot predict in advance how many revision passes will be needed. Writing, code generation, and complex analysis tasks benefit from iterative chains.
Example — Draft Quality Loop: 1. Generate draft based on brief 2. Evaluate draft against quality criteria (scoring 1-10 on: relevance, accuracy, tone, structure) 3. If score < 7: generate specific improvement instructions → return to step 1 with improved brief 4. If score >= 7: proceed to formatting and review
Design consideration: Iterative chains can loop indefinitely if the quality threshold is never met. Always define a maximum iteration count. Three to five cycles is usually sufficient; if the chain is still not producing acceptable output after five cycles, the issue is likely with the initial input or the quality criteria, not with the number of iterations.
💡 Intuition: An iterative chain is like a sculptor's process — rough shape, then refinement, then more refinement. You do not try to carve the final details in the first pass.
Parallel Chains: A1 and A2 Simultaneously → Merge
A parallel chain runs two or more independent sub-chains simultaneously, then merges the outputs. Parallel chains are appropriate when the final output requires input from multiple independent research or generation tasks.
When to use: When you need to gather information from multiple domains that do not depend on each other, then synthesize.
Example — Competitive Analysis Chain: - Sub-chain A: Research Company 1 → summarize strengths/weaknesses → format - Sub-chain B: Research Company 2 → summarize strengths/weaknesses → format - Sub-chain C: Research Company 3 → summarize strengths/weaknesses → format - Merge step: Take all three formatted summaries → synthesize comparative analysis → draft report
Design consideration: Parallel chains are powerful but require a well-designed merge step. The merge step must handle variability in the outputs of the parallel sub-chains — they will rarely be perfectly consistent in format or depth.
Manual Chaining: Doing It by Hand
The simplest way to run a chain is manually: you prompt the AI, receive the output, review it, then write the next prompt using that output as input. You are the orchestration layer.
Manual chaining requires no special tools and works in any AI chat interface. Its advantages are control and visibility — you see every intermediate output, you can intervene at any point, and you can adapt the chain based on what the AI produces.
How to execute a manual chain:
-
Write out your chain steps on paper or in a document before you start. This is your chain specification. What is the input and output of each step?
-
Execute step 1. Review the output critically. If it is not acceptable, revise the prompt and try again. Do not proceed until step 1's output meets your quality bar.
-
Copy the accepted step 1 output. Open the prompt for step 2. Paste the step 1 output where the chain specification says it goes.
-
Execute step 2. Review. Revise if necessary. Proceed only when satisfied.
-
Continue until the final output is complete.
The chain specification document is your key tool for manual chaining. It should include, for each step: - The step number and purpose - What input it takes (which previous step's output, plus any static context) - The prompt template with a clear placeholder for the variable input - The expected output format - The quality criteria for moving to the next step
🗣️ Chain Specification Template:
CHAIN: [Chain Name]
Goal: [What the chain produces]
Total Steps: [N]
STEP 1: [Step Name]
Input: [Initial input - describe format]
Prompt:
"[Static context, if any]
[Instructions for this step]
[Input placeholder: {step_1_input}]
[Output format instructions]"
Expected Output: [Describe what good output looks like]
Quality Gate: Human review before Step 2? [Yes/No]
Pass Criteria: [How will you know this output is acceptable?]
STEP 2: [Step Name]
Input: Output from Step 1 [+ any additional context]
Prompt: [...]
Expected Output: [...]
Quality Gate: [...]
Pass Criteria: [...]
[Continue for all steps]
⚠️ Common Pitfall: Skipping the chain specification and just "improvising" step by step. This works for simple two-step chains. For anything with four or more steps, improvisation leads to drift — the chain slowly shifts away from the original goal as you make small accommodating decisions at each step.
Semi-Automated Chaining: No-Code Tools
No-code automation platforms — Zapier, Make (formerly Integromat), and n8n — let you build chains that execute automatically based on triggers, without writing code. These platforms can call AI APIs, transform data between steps, integrate with other services, and handle conditional logic.
What these tools do well: - Trigger chains based on external events (a new email arrives, a form is submitted, a file appears in a folder) - Pass data between steps automatically - Integrate AI steps with non-AI steps (store output to a spreadsheet, send an email, create a record in a database) - Run chains on a schedule - Handle multiple parallel chains
Zapier AI Chains: Zapier allows you to add AI actions (using OpenAI or other providers) as steps in a Zap. You can pass previous step outputs as variables into AI prompts, enabling chains like: New customer feedback email → AI classifies sentiment and category → AI drafts suggested response → stores both in CRM.
Make Scenarios: Make's visual interface lets you build flow diagrams of chain steps. AI modules can call APIs, and data transformers can reformat outputs between steps. Make's conditional routing handles branching chains cleanly.
n8n: The open-source option, which can be self-hosted for privacy-sensitive workflows. n8n has dedicated LLM nodes and strong support for AI chains, including memory/context management between steps.
Limitations of no-code tools: - Less flexible than custom code for complex data transformations - Dependent on the platform's available integrations - Can become expensive at scale (per-task pricing on Zapier and Make) - Debugging is harder than in code — when a chain fails, finding which step failed and why takes more investigation - Context management across steps is limited compared to what you can do in code
For workflows that trigger frequently and involve straightforward data flow, no-code tools are excellent. For workflows that require complex logic, large-scale processing, or fine-grained control over prompts and context, code is the better choice.
Fully Automated Chains: Python Scripting
Python gives you complete control over AI chains: precise prompt construction, full context management, custom quality gates, retry logic, logging, and integration with any data source or destination. Chapter 36 covers the Anthropic and OpenAI Python SDKs in detail. Here, we introduce the concept with a working example.
🐍 Code Block: A Simple Two-Step Chain
import anthropic
client = anthropic.Anthropic()
def run_chain(initial_input: str) -> dict:
"""A simple two-step chain: outline then expand."""
# Step 1: Generate outline
step1_prompt = f"""Create a detailed outline for a blog post about:
{initial_input}
Format as numbered sections with 3-4 bullet points each."""
step1_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": step1_prompt}]
)
outline = step1_response.content[0].text
# Step 2: Expand first section
step2_prompt = f"""Using this outline:
{outline}
Write a fully expanded version of Section 1 only.
Target 400-500 words. Conversational but authoritative tone."""
step2_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": step2_prompt}]
)
return {
"outline": outline,
"section_1": step2_response.content[0].text
}
result = run_chain("effective AI prompting for marketing professionals")
print("OUTLINE:\n", result["outline"])
print("\nSECTION 1:\n", result["section_1"])
Notice what this code makes explicit: the outline from step 1 is captured as a string and inserted directly into the step 2 prompt. The chain is transparent — every intermediate value is a named variable you can inspect, log, or print.
For longer chains, a more structured approach using a list of step outputs is cleaner:
🐍 Code Block: Multi-Step Chain with Logging
import anthropic
import json
from datetime import datetime
client = anthropic.Anthropic()
def run_research_chain(topic: str, output_log: bool = True) -> dict:
"""
A four-step research and synthesis chain:
1. Generate research questions
2. For each question, summarize what is generally known
3. Identify key themes across all summaries
4. Draft a structured synthesis document
"""
chain_output = {
"topic": topic,
"started_at": datetime.now().isoformat(),
"steps": {}
}
# Step 1: Research question generation
step1_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=512,
messages=[{
"role": "user",
"content": (
f"Generate 5 focused research questions about: {topic}\n\n"
"Format as a numbered list. Each question should be specific "
"and answerable through research."
)
}]
)
questions = step1_response.content[0].text
chain_output["steps"]["step_1_questions"] = questions
if output_log:
print("Step 1 complete: Research questions generated")
# Step 2: Brief answers to each question
step2_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": (
f"For each of these research questions about '{topic}', "
f"provide a concise 2-3 sentence answer based on established knowledge:\n\n"
f"{questions}\n\n"
"Format: repeat each question, then provide the answer below it."
)
}]
)
answers = step2_response.content[0].text
chain_output["steps"]["step_2_answers"] = answers
if output_log:
print("Step 2 complete: Questions answered")
# Step 3: Theme identification
step3_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": (
f"Review these research findings about '{topic}':\n\n"
f"{answers}\n\n"
"Identify the 3-4 most important cross-cutting themes. "
"For each theme, write 2-3 sentences explaining what the "
"research findings tell us about it."
)
}]
)
themes = step3_response.content[0].text
chain_output["steps"]["step_3_themes"] = themes
if output_log:
print("Step 3 complete: Themes identified")
# Step 4: Synthesis document
step4_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": (
f"Using these themes and findings about '{topic}':\n\n"
f"THEMES:\n{themes}\n\n"
f"DETAILED FINDINGS:\n{answers}\n\n"
"Write a structured synthesis document with:\n"
"- An executive summary (2-3 sentences)\n"
"- A section for each major theme\n"
"- A conclusion with 3 key takeaways\n\n"
"Write clearly and concisely for a professional audience."
)
}]
)
synthesis = step4_response.content[0].text
chain_output["steps"]["step_4_synthesis"] = synthesis
if output_log:
print("Step 4 complete: Synthesis document drafted")
chain_output["completed_at"] = datetime.now().isoformat()
chain_output["final_output"] = synthesis
# Save chain log
if output_log:
log_filename = f"chain_log_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(log_filename, "w") as f:
json.dump(chain_output, f, indent=2)
print(f"\nChain log saved to {log_filename}")
return chain_output
# Run the chain
result = run_research_chain("the psychology of effective AI prompting")
print("\n=== FINAL SYNTHESIS ===")
print(result["final_output"])
This code introduces two practices that become important as chains grow more complex: logging intermediate outputs (so you can inspect any step) and capturing metadata (start time, completion time) for later analysis.
Chain Design Patterns
Several chain structures appear repeatedly across different professional contexts. These design patterns are not rigid templates — they are starting points that you adapt to your specific situation.
The Research Chain: Search → Extract → Synthesize → Format
Use case: Any task that requires gathering information from multiple sources and producing a coherent summary or analysis.
Steps: 1. Define research scope: generate the specific questions, keywords, or dimensions to investigate 2. Gather: for each item in the research scope, produce a focused summary 3. Extract: identify the key findings, data points, or insights from all summaries 4. Synthesize: identify patterns, tensions, and conclusions across all findings 5. Format: structure the synthesis into the appropriate output format (report, brief, presentation, etc.)
Human review gates: After step 2 (are these summaries accurate and relevant?) and after step 4 (does this synthesis reflect what the sources actually say?).
The Writing Chain: Outline → Draft → Edit → Adapt
Use case: Content creation where quality, consistency, and audience fit are critical.
Steps: 1. Brief analysis: extract the key constraints, goals, and audience from the brief 2. Outline: generate a structure that meets the brief 3. Draft: write each section based on the approved outline 4. Edit: revise for clarity, concision, and flow 5. Adapt: adjust tone, vocabulary, and examples for the specific target audience or channel
Human review gates: After step 2 (does this structure make sense for the brief?) and after step 4 (is this draft meeting the quality bar?). The brief analysis in step 1 also benefits from human verification.
The Analysis Chain: Collect → Clean → Analyze → Interpret → Report
Use case: Data analysis, research synthesis, or any task requiring systematic examination of a body of information.
Steps: 1. Structure the data: organize raw inputs into a consistent format 2. Clean and validate: identify gaps, inconsistencies, and data quality issues 3. Analyze: apply the specified analytical lens (comparison, trend analysis, categorization, etc.) 4. Interpret: translate findings into plain language conclusions 5. Report: structure the interpretation for the target audience
Human review gates: After step 2 (what do you want to do about the data quality issues?) and after step 4 (do these interpretations accurately reflect the analysis?).
The Review Chain: Generate → Critique → Revise → Final Check
Use case: High-stakes deliverables where quality review is essential — proposals, reports, communications, code.
Steps: 1. Generate: produce the initial version 2. Critique: evaluate against specific quality criteria, producing a structured critique 3. Revise: address each critique point in a revised version 4. Final check: verify that the revision addresses all critique points without introducing new issues 5. Human approval: human reviews the final version before it goes out
Human review gates: After step 2 (do you agree with the critique? Are there issues the AI missed?) and the mandatory human approval in step 5.
💡 Intuition: Chain design patterns are like recipes — they give you a reliable starting structure, but every actual execution will require small adaptations based on the specific ingredients (your data, your audience, your quality bar) you are working with.
Quality Gates: When to Add Human Review
Quality gates are checkpoints in a chain where a human reviews the intermediate output before the chain proceeds. Adding too many gates makes a chain slow and defeats the purpose of automation. Adding too few creates the risk of invisible errors propagating through to the final output.
The decision framework for adding a quality gate:
Add a gate if: - The next step uses the current output as a foundation for significant work (if step 3's output shapes steps 4 through 8, a gate after step 3 is worth the slowdown) - The current step makes interpretive or judgment-based decisions (not just mechanical processing) - An error in the current step would be invisible in subsequent outputs - The output will eventually be seen by an external audience - The step involves factual claims that should be verified
Skip a gate if: - The step is purely mechanical (reformatting, counting, sorting) - An error in this step would be immediately visible in the next step's output - The chain loops iteratively and will self-correct - Speed is more important than quality for this specific workflow
The minimal gate rule: At minimum, add a human gate before any chain output is delivered to an external party. Never let an automated chain produce a client-facing deliverable without human review of the final output, regardless of how confident you are in the chain's reliability.
✅ Best Practice: Build your chains with more gates than you think you need. After running the chain a dozen times, you will have a sense of which gates almost never catch issues — those are the ones to remove. It is much easier to remove a gate you do not need than to add one back after a chain has produced bad output that reached an external audience.
Managing Context Between Chain Steps
AI models do not have persistent memory between separate API calls or chat sessions. Each step in a chain starts fresh, with only what you explicitly provide in that step's prompt. Managing what context each step receives is one of the most important design decisions in chain construction.
The context cascade problem: As a chain progresses, the total amount of relevant context grows. By step 5 in a six-step chain, there is output from steps 1 through 4 that might be relevant. But providing all of it in every subsequent prompt can exhaust the AI's context window and dilute the signal.
Strategies for context management:
Summarize intermediate outputs: Rather than passing the full text of each previous step, have an intermediate "summarization" step that condenses the key points from prior steps into a compact format that can be included in subsequent prompts without context bloat.
Pass only what the next step needs: Be selective about what you include in each step's prompt. Step 4 in a writing chain probably needs the outline and any editorial feedback, but it may not need the original brief if the outline already incorporates the brief's requirements.
Maintain a chain state document: For manual chains, keep a running document that holds the current state of the most important decisions and conclusions. Each step adds to this document; subsequent steps are given the current state document rather than all previous outputs individually.
Use structured formats: JSON or structured markdown is easier for an AI to parse accurately than unstructured prose. When passing complex information between steps, use a consistent structure that the receiving step's prompt explicitly references.
⚠️ Common Pitfall: The "full context" instinct — putting everything from all previous steps into every subsequent prompt, believing that more context is always better. It is not. Irrelevant context can confuse the AI, and excessive context can hit token limits or degrade output quality. Be deliberate about what each step actually needs.
Error Handling in Chains
Chains fail in several predictable ways. Designing for these failures up front makes chains robust.
Off-format output: The AI produces output in a different format than expected, breaking the downstream step's ability to use it. Mitigation: specify output formats precisely in prompts, use structured formats (JSON, numbered lists) when format matters, add a validation step that checks format compliance before passing output downstream.
Hallucination propagation: The AI fabricates a fact in an early step, and subsequent steps treat that fabrication as established truth. Mitigation: add human review gates at steps involving factual claims, include instructions to flag uncertainty ("if you are not certain, say so explicitly"), and in automated chains, use a fact-checking step for critical factual claims.
Quality degradation: The quality of output is marginally acceptable at each step but the chain produces a mediocre final result because small inadequacies at each step compound. Mitigation: set quality bars at each step rather than only at the end, use iterative loops for steps where quality is most critical.
Context window exhaustion: The accumulated context from previous steps exceeds what the AI can process. Mitigation: summarize rather than pass full outputs, use models with larger context windows for steps that require it, restructure the chain to reduce context dependencies.
Prompt injection: Malicious or problematic content in the input data modifies the AI's behavior in an unintended way. Mitigation: treat external data as untrusted, use separate system prompts and user content fields, validate inputs before they enter the chain.
🐍 Code Block: Chain with Validation and Error Handling
import anthropic
import re
client = anthropic.Anthropic()
class ChainValidationError(Exception):
"""Raised when a chain step produces output that fails validation."""
pass
def validate_numbered_list(text: str, min_items: int = 3) -> bool:
"""Check that text contains at least min_items numbered list items."""
pattern = r'^\s*\d+\.'
numbered_lines = [
line for line in text.split('\n')
if re.match(pattern, line.strip())
]
return len(numbered_lines) >= min_items
def chain_step_with_retry(
prompt: str,
validator_fn,
max_attempts: int = 3,
model: str = "claude-opus-4-6"
) -> str:
"""Execute a chain step with validation and retry logic."""
for attempt in range(1, max_attempts + 1):
response = client.messages.create(
model=model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
output = response.content[0].text
if validator_fn(output):
return output
if attempt < max_attempts:
print(f"Step output failed validation (attempt {attempt}). Retrying...")
# Add a clarifying instruction to the retry prompt
prompt = prompt + "\n\nIMPORTANT: Your previous response did not meet the format requirements. Please ensure you provide a numbered list with at least 3 items."
raise ChainValidationError(
f"Step failed validation after {max_attempts} attempts. Last output:\n{output}"
)
def robust_outline_chain(topic: str) -> dict:
"""Two-step chain with validation at step 1."""
# Step 1: Generate outline with validation
outline_prompt = (
f"Create a detailed outline for a guide about: {topic}\n\n"
"Format as a numbered list of main sections. Include at least 5 sections."
)
try:
outline = chain_step_with_retry(
outline_prompt,
validator_fn=lambda text: validate_numbered_list(text, min_items=5)
)
except ChainValidationError as e:
print(f"Chain failed at step 1: {e}")
return {"success": False, "error": str(e)}
print("Step 1 validated: Outline has required structure")
# Step 2: Expand the outline into section headers and brief descriptions
expand_prompt = (
f"Using this outline:\n\n{outline}\n\n"
"For each section, add 2-3 bullet points describing the key points "
"that section will cover. Keep bullet points brief (one sentence each)."
)
expanded = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": expand_prompt}]
).content[0].text
return {
"success": True,
"outline": outline,
"expanded_outline": expanded
}
result = robust_outline_chain("writing effective AI prompts for business communication")
if result["success"]:
print("Chain completed successfully")
print("\nExpanded Outline:")
print(result["expanded_outline"])
Scenario Walkthrough: Alex's Content Creation Chain
🎭 Alex Chen — Digital Marketing Manager
Alex manages content for a mid-sized B2B software company. Her team produces three blog posts per week, plus email newsletters, social posts, and occasional longer-form pieces. The current process: a junior writer produces a draft, Alex reviews and revises heavily, a subject matter expert checks technical accuracy, a copy editor reviews, then Alex does a final brand check.
The whole process takes about three days per post and Alex's review time is the bottleneck.
Alex decides to chain AI to handle the mechanical parts of the workflow, so her time is spent on judgment rather than mechanics.
Alex's five-step content chain:
Step 1 — Brief processing and research questions Input: the content brief (topic, target audience, key points to cover, SEO keywords) Output: a set of five research questions the post should answer, plus a fact-check checklist
Step 2 — Research synthesis Input: the research questions from step 1 Output: a 400-word research summary addressing each question
Alex reviews this output. She verifies factual claims, adds any information she knows from her own expertise, and flags anything that needs SME input. This step used to be done entirely by a writer; now the AI produces a first pass that Alex refines in fifteen minutes rather than spending an hour starting from scratch.
Step 3 — Outline Input: approved research summary + original brief Output: a structured post outline with section headers and 2-3 bullet points per section
Alex reviews the outline. This is where she makes strategic decisions about emphasis and structure. The AI's outline is usually a good starting point that she tweaks.
Step 4 — Draft Input: approved outline + research summary + brand voice guidelines Output: a full draft post
Step 5 — Brand voice and SEO check Input: the draft from step 4 Output: a structured review identifying specific passages to revise for brand voice consistency and SEO optimization, plus a suggested revised version of each flagged passage
Alex does a final review and copy-edit, then publishes.
Results: Three days per post dropped to one day. Alex's personal time per post dropped from about three hours to about forty-five minutes. The junior writer now focuses on interview-based pieces that require human source relationships rather than research-and-write tasks. Post quality — measured by engagement metrics — improved, because the chain produces more thoroughly researched, better-structured posts than the previous process did.
The key: Alex did not remove herself from the chain. She removed the mechanical work from her to-do list and focused her expertise where it creates value.
Scenario Walkthrough: Raj's Code Review Chain
🎭 Raj Patel — Senior Software Engineer
Raj works on a team that ships features weekly. Code review is a constant time sink — reviewing for correctness, security, style, and documentation takes hours per week.
Raj builds a four-step chain that handles the mechanical parts of code review.
Step 1 — Code analysis Input: the pull request diff (the changed code) Output: a structured summary of what the code does, what it changes, and any immediate concerns (e.g., "this function has no error handling for null inputs")
Step 2 — Security review Input: the full code in the PR (not just the diff) + the step 1 summary Output: a security-focused review covering: input validation, authentication/authorization concerns, data handling, dependency risks
Step 3 — Test coverage analysis Input: the PR code + existing test files Output: an assessment of test coverage, specifically identifying: functions with no tests, edge cases not covered by existing tests, suggested additional test cases
Step 4 — Documentation check Input: the PR code Output: a list of functions/classes with insufficient documentation + suggested docstrings for each
Raj reviews all four outputs before the PR is approved. The AI review surfaces issues that human reviewers, moving quickly, sometimes miss. It also reduces the amount of nit-picking in human review comments — because basic issues are caught before human review, human reviewers can focus their attention on architectural and design concerns.
Raj's key insight: The chain does not replace human code review. It raises the floor so that human review starts from a higher baseline.
Scenario Walkthrough: Elena's Consulting Deliverable Chain
🎭 Elena Rodriguez — Independent Management Consultant
Elena produces strategic deliverables for clients: market analyses, organizational assessments, implementation plans. Each deliverable requires research, synthesis, structured writing, and calibration to the client's specific context and communication style.
Elena's five-step deliverable chain:
Step 1 — Research collection and organization Input: the research question or deliverable scope + a collection of source documents, interview notes, and data Output: organized thematic summaries from all input sources, with source citations
Elena reviews carefully. She knows what the sources actually say and can catch any misinterpretation at this stage.
Step 2 — Insight development Input: the organized thematic summaries + the client's strategic context Output: a set of key insights, each stated as a specific finding with supporting evidence from the summaries
Elena reviews and often adds or modifies insights based on her professional judgment. The AI surfaces what the research shows; Elena applies the strategic lens that turns research into advice.
Step 3 — Deliverable structure Input: approved insights + deliverable format guidelines + the specific audience (CEO, board, operational team, etc.) Output: a detailed deliverable structure tailored to the audience
Step 4 — Draft Input: approved structure + approved insights + source summaries for reference Output: a full draft deliverable
Step 5 — Client voice adaptation Input: the draft + examples of previous communications with this client + notes on client preferences Output: a revised draft calibrated to the client's vocabulary, level of detail preference, and communication style
Elena does a final review and delivers.
Elena's key insight: The chain takes care of the time-intensive mechanical work — organizing research, drafting prose, applying consistent structure. Elena's value to the client is in the quality of her insights and judgment, not in her typing speed. The chain lets her spend more time on the thinking and less on the writing.
Research Breakdown: Multi-Step AI Workflow Effectiveness
Research on how multi-step AI workflows compare to single-prompt approaches consistently shows quality improvements — but the size of the improvement depends heavily on task complexity and chain design quality.
Studies of structured AI-assisted writing find that decomposed, multi-step approaches produce outputs rated higher on coherence, accuracy, and relevance than single-prompt approaches for complex writing tasks. The effect is strongest for tasks requiring both research and synthesis — the chain approach's ability to separate these cognitively distinct tasks appears to be the key mechanism.
Research on AI-assisted code generation finds that chains incorporating review and testing steps catch significantly more bugs than single-pass generation. The review chain pattern — generate, critique, revise — produces code that passes more test cases than direct generation approaches.
For analytical tasks, iterative chains with explicit quality evaluation steps outperform single-pass approaches on tasks where quality can be measured. The key variable is not the number of iterations but whether the quality evaluation step is specific enough to produce actionable revision instructions.
The human-in-the-loop evidence is consistently positive: practitioner review at key chain junctions improves final output quality beyond what fully automated chains achieve, even when those review interventions are brief. A five-minute human review at the midpoint of a chain is worth more than five additional automated steps.
The practical implication: chains are worth the design investment for complex, high-stakes tasks. For simple, low-stakes tasks, the overhead of chain design and execution is not justified by the quality improvement.
Putting It Together: Building Your First Chain
The best way to learn chain design is to build one. Here is a sequence for your first chain project:
Choose the right task. Select a task you perform regularly that has three or more distinct phases (research, then writing, then review; or gathering, then analysis, then reporting). The task should produce output you genuinely care about — this gives you a real quality bar to measure against.
Map the expert process. Write down the steps a skilled human expert would take to complete this task well. Be specific. "Research" is not a step; "identify the five most relevant sources and summarize each in two sentences" is a step.
Write the chain specification. For each step: what is the input, what is the output format, what does good output look like, does this step need a human review gate?
Build incrementally. Start with a two-step version of your chain. Run it, review the outputs, identify what works and what does not. Add one step at a time until the chain covers the full workflow.
Document what you learn. Keep notes on which steps reliably produce good output and which are fragile. Your chain design will improve rapidly if you treat each run as a learning opportunity.
📋 Action Checklist: Starting Your First Chain
- [ ] Identify a recurring complex task with three or more distinct phases
- [ ] Map the expert process: what steps would a skilled human take?
- [ ] Write a chain specification document with input/output for each step
- [ ] Identify which steps need human review gates
- [ ] Build a two-step version and test it on a real task
- [ ] Add steps one at a time, testing after each addition
- [ ] Document which steps are reliable and which need refinement
- [ ] After five runs, review your gates: are there any you can remove? Any you should add?
- [ ] Consider whether any steps would benefit from automation (see Chapter 36)
The shift from single-prompt thinking to chain thinking is one of the highest-leverage changes you can make in your AI practice. It is also one that pays ongoing dividends — as your chains mature and stabilize, you build a library of reliable workflow components that can be recombined for new tasks.
Continue to Chapter 36 to learn how to implement chains programmatically using the Anthropic and OpenAI Python APIs.