Chapter 38 Exercises: Multi-Agent Development Systems

These exercises progress from basic recall through creative application to challenging multi-chapter integration. Complete them in order within each tier, but feel free to skip tiers that are too easy for your current level.


Tier 1: Recall (Exercises 1-6)

These exercises test your understanding of the core concepts from this chapter.

Exercise 1: Agent Role Identification

For each of the following tasks, identify which agent role (Architect, Coder, Tester, Reviewer, or a specialist role) should be responsible. Explain your reasoning.

a) Deciding whether to use a relational database or a document store for a new feature. b) Writing a class that implements a caching layer specified in a design document. c) Creating tests that verify a password hashing function handles Unicode characters correctly. d) Flagging that a method has 15 parameters and recommending extraction into a configuration object. e) Determining that the system needs a message queue between two microservices. f) Checking that all API endpoints validate input against SQL injection.

Exercise 2: Orchestration Pattern Matching

Match each scenario to the most appropriate orchestration pattern (sequential pipeline, parallel execution, hierarchical delegation, or event-driven). Justify your choice.

a) A bug fix that requires changing one file, testing it, and reviewing the change. b) A new feature that needs simultaneous frontend work, backend work, and database migrations. c) An implementation that needs to be analyzed by security, performance, and accessibility reviewers independently. d) A CI/CD system that runs agents in response to commits, PR comments, and test results. e) A microservices project where each service has its own architect-coder-tester mini-team.

Exercise 3: Communication Mechanism Selection

For each scenario, identify whether shared context, message passing, or artifact exchange is the most appropriate communication mechanism. Explain why.

a) An architect agent needs to share a design document with the coder agent. b) A tester agent needs to tell the orchestrator that 3 out of 15 tests failed. c) Five parallel review agents need to contribute their findings to a single review report. d) A coder agent needs access to the full project source code while implementing. e) A monitoring system needs to track which agents have completed their work.

Exercise 4: Conflict Classification

Classify each of the following conflicts by type (design-implementation, implementation-test, review-implementation, or cross-agent priority). Then state which resolution strategy you would apply.

a) The architect designed a synchronous API, but the coder implemented it with async/await because the database library requires it. b) The tester finds that the login function returns None on failure, but the design specification says it should raise an AuthenticationError. c) The reviewer says the code should use dependency injection, but the coder argues this adds unnecessary complexity for a simple script. d) The security agent says user data must be encrypted at rest, but the performance agent says encryption will make queries 5x slower.

Exercise 5: Scaling Terminology

Define each of the following terms in the context of multi-agent systems. Provide one concrete example for each.

a) Coordination tax b) Span of control c) Domain-based partitioning d) Dynamic team composition e) Token budget

Exercise 6: Monitoring Metrics

List the five most important metrics for monitoring a multi-agent pipeline. For each metric, explain what a problematic value would indicate and what action you would take.


Tier 2: Apply (Exercises 7-12)

These exercises ask you to apply the concepts in practical scenarios.

Exercise 7: System Prompt Design

Write a complete system prompt for a Security Agent that reviews code for security vulnerabilities. Your prompt must include: - A clear role definition - Specific areas of focus (at least five) - At least three explicit "Do NOT" constraints - A required output format

Exercise 8: Pipeline Design

Design a sequential pipeline for the following task: "Add a REST API endpoint for user profile updates that validates input, updates the database, and sends a notification email."

For each stage, specify: - The agent role - The input it receives - The output it produces - What constitutes a "pass" vs. "fail" for that stage

Exercise 9: Context Summarization

Given the following (abbreviated) architect output, write two different summaries: one optimized for the coder agent and one optimized for the tester agent. Each summary should be no more than 200 words.

## Design Document: User Profile API

### Architecture Decision: REST vs. GraphQL
After analyzing the existing codebase (which uses REST for all 12 current
endpoints), we chose REST to maintain consistency. GraphQL would offer
more flexible queries but would introduce a new paradigm the team has not
used before.

### Components
1. ProfileController - handles HTTP requests and responses
2. ProfileService - business logic for profile operations
3. ProfileValidator - input validation using Pydantic models
4. UserRepository - database access layer

### Interfaces
- ProfileController.update(request: Request) -> Response
- ProfileService.update_profile(user_id: int, data: ProfileUpdate) -> Profile
- ProfileValidator.validate(data: dict) -> ProfileUpdate
- UserRepository.save(profile: Profile) -> Profile

### Constraints
- Email changes require re-verification
- Username changes limited to once per 30 days
- Profile photo must be under 5MB
- All changes must be audit-logged

### Error Handling
- 400 for validation errors with field-level details
- 404 if user not found
- 409 if username is taken
- 429 if username change rate limit exceeded

Exercise 10: Conflict Resolution Implementation

Write a Python function called resolve_review_conflicts that takes a list of review findings from two different reviewer agents and resolves conflicts. Two findings conflict if they refer to the same file and line number but recommend different actions. Your function should:

  • Identify conflicting findings
  • Resolve by severity (CRITICAL > WARNING > SUGGESTION)
  • If severity ties, prefer the finding with more detailed evidence
  • Return the merged, de-conflicted list

Exercise 11: Budget Calculator

Write a Python function that estimates the cost of a multi-agent pipeline run given: - Number of agents - Average input tokens per agent - Average output tokens per agent - Number of expected feedback loops - Cost per 1000 input tokens ($0.003) and per 1000 output tokens ($0.015)

The function should return the estimated cost and warn if it exceeds a given budget.

Exercise 12: Trace Analysis

Given the following pipeline execution trace, answer the questions below:

[00.0s] pipeline_start | issue=#87
[00.1s] agent_start    | planner
[07.2s] agent_complete | planner | tokens=1840 | cost=$0.02
[07.3s] agent_start    | architect
[21.5s] agent_complete | architect | tokens=4200 | cost=$0.08
[21.6s] agent_start    | coder
[45.0s] agent_complete | coder | tokens=6100 | cost=$0.14
[45.1s] agent_start    | tester (parallel)
[45.1s] agent_start    | reviewer (parallel)
[62.3s] tester complete | 2/10 FAILED | cost=$0.06
[65.8s] reviewer complete | 1 CRITICAL | cost=$0.07
[65.9s] agent_start    | coder (fix cycle 1)
[88.1s] agent_complete | coder | tokens=5800 | cost=$0.12
[88.2s] agent_start    | tester (retest)
[99.4s] tester complete | 1/10 FAILED | cost=$0.05
[99.5s] agent_start    | coder (fix cycle 2)
[120.3s] agent_complete | coder | tokens=5200 | cost=$0.11
[120.4s] agent_start   | tester (retest)
[130.1s] tester complete | 10/10 PASSED | cost=$0.05
[130.2s] pipeline_complete | SUCCESS

a) What was the total pipeline execution time? b) What was the total cost? c) How many fix cycles were needed? d) Which agent was the most expensive? e) What was the parallelism speedup for the testing/review phase? f) If the tester and reviewer had run sequentially, what would the total time have been?


Tier 3: Analyze (Exercises 13-18)

These exercises require deeper analysis and critical thinking.

Exercise 13: Single Agent vs. Multi-Agent Trade-off Analysis

For each of the following project scenarios, analyze whether a single agent or multi-agent approach is more appropriate. Consider context window usage, quality requirements, cost, and execution time.

a) Generating a simple CRUD API with 4 endpoints and basic tests (approximately 200 lines of code). b) Building a complete authentication system with OAuth2, JWT, session management, rate limiting, and comprehensive security testing (approximately 2000 lines). c) Fixing a one-line bug in a well-tested codebase where the test that catches the bug already exists. d) Refactoring a 5000-line monolithic module into 15 smaller modules while maintaining all existing tests.

Exercise 14: Role Bleed Analysis

The following system prompt has several problems that could lead to role bleed (an agent overstepping its defined role). Identify at least five problems and rewrite the prompt to fix them.

You are a code reviewer. Review the code and fix any issues you find.
If the tests are failing, update them. If the architecture seems wrong,
redesign it. Make sure everything works perfectly. You can use any tools
you need.

Exercise 15: Orchestration Pattern Comparison

Design a multi-agent solution for the following task using two different orchestration patterns. For each, draw the agent flow diagram, list the pros and cons, and recommend which you would use in production.

Task: "Implement a data export feature that reads from three different database tables, transforms the data into CSV and JSON formats, validates the output, and uploads to cloud storage."

Exercise 16: Communication Overhead Analysis

A team uses the shared context communication model where every agent reads the entire workspace. The workspace currently contains: - Design document: 3,000 tokens - Implementation code: 8,000 tokens - Test code: 5,000 tokens - Test results: 1,000 tokens - Review report: 2,000 tokens

If each agent has a 32,000-token context window: a) How much of each agent's context window is consumed by the shared workspace? b) What is the maximum additional context each agent can use for its own reasoning? c) At what workspace size would this model become impractical? d) Propose a context management strategy that reduces per-agent consumption by at least 40%.

Exercise 17: Failure Mode Analysis

For a five-agent sequential pipeline (planner, architect, coder, tester, reviewer), analyze the following failure scenarios:

a) The architect agent times out after 120 seconds. b) The coder agent produces code that does not compile. c) The tester agent produces tests that test the wrong thing (they pass but do not actually verify the requirements). d) The reviewer agent always approves everything (gives no critical feedback).

For each, explain: (1) how you would detect the failure, (2) what the impact is if not detected, and (3) what mitigation you would implement.

Exercise 18: Cost Optimization

A multi-agent pipeline currently costs $2.50 per run with the following breakdown: - Planner (Claude Sonnet): $0.10 - Architect (Claude Sonnet): $0.40 - Coder (Claude Sonnet): $0.80 - Tester (Claude Sonnet): $0.50 - Reviewer (Claude Sonnet): $0.30 - Average 1.5 fix cycles adding: $0.40

The team wants to reduce the cost to under $1.00 per run without significantly reducing quality. Propose a cost optimization strategy that includes: - Model selection per agent (which agents could use a cheaper model?) - Context reduction techniques - Pipeline optimization (which stages could be combined or eliminated?) - Caching strategies


Tier 4: Evaluate (Exercises 19-24)

These exercises ask you to make and defend judgments.

Exercise 19: Agent Team Size Evaluation

A colleague proposes a 12-agent team for a medium-complexity feature: 1. Requirements Agent, 2. Architect Agent, 3. Database Agent, 4. Backend Agent, 5. Frontend Agent, 6. API Agent, 7. Unit Test Agent, 8. Integration Test Agent, 9. Security Agent, 10. Performance Agent, 11. Documentation Agent, 12. Code Review Agent

Evaluate this proposal. Which agents are essential? Which could be combined? Which could be eliminated? Propose an optimized team with justification.

Exercise 20: Conflict Resolution Strategy Evaluation

Compare three conflict resolution strategies: A) Strict priority hierarchy (security > architect > reviewer > tester > coder) B) Evidence-based resolution (more evidence wins) C) Mediator agent (a separate agent resolves all conflicts)

For each strategy, evaluate: - Fairness (does every agent's perspective get appropriate consideration?) - Speed (how quickly are conflicts resolved?) - Cost (what is the resource overhead?) - Quality (does the strategy tend to produce good outcomes?) - Failure modes (how can the strategy go wrong?)

Recommend which strategy (or combination) you would use for a production system.

Exercise 21: Quality Gate Evaluation

Design a set of quality gates for a multi-agent pipeline. A quality gate is a checkpoint where the pipeline pauses to verify that the output meets minimum standards before proceeding.

For each gate, specify: - Where it occurs in the pipeline - What it checks - What happens when the check fails - Whether it requires human approval

Evaluate the trade-off between thoroughness (more gates) and speed (fewer gates).

Exercise 22: Monitoring Dashboard Design

Design a monitoring dashboard for a multi-agent pipeline that runs 50+ times per day. Specify: - Which metrics appear on the main overview screen - What alerts would trigger notifications - What drill-down views are available - How historical trends are displayed

Justify each design decision in terms of what it helps the operator understand or act on.

Exercise 23: Human-in-the-Loop Evaluation

Evaluate three levels of human involvement in a multi-agent pipeline:

A) Full autonomy: Pipeline runs end-to-end without human intervention. Humans only review the final PR. B) Checkpoint approval: Humans approve the design before implementation begins, and review the final PR. C) Active oversight: Humans review and approve every agent's output before passing it to the next agent.

For each level, evaluate: quality risk, speed, human time cost, and suitability for different project types (startup prototype vs. financial system vs. internal tool).

Exercise 24: Real-World Readiness Assessment

Assess whether a multi-agent pipeline is ready for production use in each of these scenarios. For each, list the criteria that must be met and identify the highest-risk gap.

a) An open-source library with 1000+ stars that accepts community contributions. b) A startup building an MVP with a two-person engineering team. c) A bank's internal development team building customer-facing financial tools. d) A solo developer building personal projects on weekends.


Tier 5: Create (Exercises 25-30)

These exercises require you to design and build original solutions.

Exercise 25: Custom Agent Role Design

Design a complete agent role for a Database Migration Agent that: - Analyzes schema changes in a design document - Generates migration scripts (forward and rollback) - Validates migrations against existing data - Checks for data loss risks

Provide the complete system prompt, tool access list, output format specification, and three example interactions.

Exercise 26: Hybrid Orchestrator

Design and implement (in pseudocode or Python) a hybrid orchestrator that combines sequential, parallel, and hierarchical patterns. The orchestrator should: - Run planning and design sequentially - Decompose implementation into sub-tasks and run them in parallel - Run testing and review in parallel after all implementations complete - Support a feedback loop for failed tests

Include the flow diagram, the orchestration logic, and at least three test scenarios.

Exercise 27: Multi-Agent Code Review System

Design a multi-agent code review system that uses three specialized reviewers: 1. Correctness Reviewer - verifies logic and behavior 2. Security Reviewer - checks for vulnerabilities 3. Maintainability Reviewer - evaluates readability and structure

Define the system prompt for each reviewer, the aggregation strategy for merging their reports, the conflict resolution approach when reviewers disagree, and the final report format.

Exercise 28: Pipeline Observability System

Design and implement a complete observability system for a multi-agent pipeline. Include: - Structured logging with correlation IDs - Metrics collection and aggregation - Trace visualization - Anomaly detection rules - Alert configuration

Provide Python code for the core components and a sample output showing what the logs, metrics, and traces look like for a successful run and a failed run.

Exercise 29: Multi-Agent Debugging System

Design a multi-agent system specifically for debugging. The system should include: - A Reproducer Agent that creates a minimal test case for a bug - A Analyzer Agent that identifies the root cause - A Fixer Agent that proposes and implements a fix - A Verifier Agent that confirms the fix resolves the bug without regressions

Define the workflow, communication protocol, and how the system handles bugs it cannot fix automatically.

Exercise 30: Capstone: Complete Pipeline Implementation

Build a complete multi-agent development pipeline (in Python) that: 1. Accepts a natural-language feature request 2. Produces a design document 3. Generates implementation code 4. Creates and runs tests 5. Performs a code review 6. Generates a summary report

Requirements: - Use the agent role definitions from this chapter - Implement at least two orchestration patterns (sequential + parallel) - Include conflict resolution - Include monitoring with structured logging - Include budget tracking - Handle at least two types of failures gracefully - Produce a human-readable execution report

This is an extended project. Expect it to take 3-5 hours. Test your pipeline with at least three different feature requests of varying complexity.