Chapter 13 Exercises: Diagnosing and Fixing Bad Outputs

These exercises move from conceptual understanding of the diagnostic framework to hands-on application with real and provided failure cases. The "Failure Diagnosis Lab" in Section B is the core of this chapter's practical skill development.

Section A: The Diagnostic Framework

Exercise 1 — Root Cause Matching (Core)

Goal: Build fluency in matching observed failures to root causes.

For each of the following bad output descriptions, identify the most likely root cause from the seven-cause framework. Write one sentence explaining your reasoning.

Asked for a competitive analysis of the CRM market; received a general, accurate overview of the CRM market with no mention of specific competitors.
Asked for a meeting summary; received a 1,200-word essay when you needed a bulleted action item list.
Asked for a recommendation on cloud storage pricing; AI recommended AWS pricing tiers that were two years out of date.
Asked for "something creative" for a product launch; AI produced three ideas that were all minor variations of the same theme — social media-first campaign approaches.
Asked for research citations supporting a claim about remote work productivity; AI provided three citations with authors, journals, and dates — but two of the papers don't exist.
A long research session started with the instruction "always maintain a skeptical analytical tone." By message 15, the AI is writing in an enthusiastic, positive tone with no skepticism.
Asked for employee benefits recommendations; AI produced recommendations centered on equity grants, health insurance, and 401k matching — the team is hourly workers at a small retail store.

Exercise 2 — Triage Matrix Practice (Core)

Goal: Practice applying the repair vs. restart decision.

For each scenario, apply the Triage Matrix. Specify: (1) distance from goal (1-5), (2) effort to repair (High/Low), and (3) your decision (Repair / Restart / Consider carefully).

Wrote a product description with correct features and benefits, but in third person when you need second person.
Asked for a go-to-market strategy; received a strategic analysis instead — broad framework with no implementation steps, opposite structure from what was needed.
Asked for a 5-bullet executive summary; received a 15-bullet version where bullets 1-5 are exactly what you need and bullets 6-15 are surplus.
Asked AI to draft a legal contract; received something that looks like a contract but lacks standard protective clauses and uses non-standard terminology in ways that would require complete expert redraft.
Received a code refactoring that uses the wrong design pattern — you needed dependency injection but got a singleton pattern. The rest of the code is well-written.

Exercise 3 — Failure to Prevention Mapping (Core)

Goal: Connect diagnostic root causes back to preventive prompting practices.

For each root cause below, write: (1) one sentence describing what the preventive practice is, and (2) which chapter contains the relevant guidance.

Insufficient context
Vague instruction
Format mismatch
Training data bias
Context window overflow

This exercise is a review of the chapter's synthesis table in reverse — working from the failure back to the preventive skill.

Section B: The Failure Diagnosis Lab

The following section provides five bad AI outputs for systematic diagnosis. For each one: 1. Read the original prompt and the bad output 2. Identify the specific failure 3. Identify the root cause(s) 4. Write a complete repair prompt using the appropriate template 5. Reflect: what would have prevented this in the original prompt?

Lab Case 1 — The Generic Strategy (Core)

Original prompt:

"What's the best marketing strategy for our new product launch?"

AI Output received:

"For a successful new product launch, consider a comprehensive marketing strategy that includes:

Digital Marketing: Leverage social media platforms, SEO, and content marketing to build awareness. Create compelling content that highlights your product's unique value proposition.

Email Marketing: Build an email list and send targeted campaigns to potential customers. Offer early access or exclusive discounts to drive pre-launch sign-ups.

Influencer Partnerships: Partner with relevant influencers in your industry to expand your reach. Micro-influencers often provide better ROI than macro-influencers for niche products.

PR and Media Outreach: Send press releases to relevant publications and journalists. A successful media placement can dramatically amplify your launch.

Paid Advertising: Allocate budget for targeted paid campaigns on platforms where your audience is most active.

The key is to start building awareness 4-6 weeks before launch and maintain momentum through launch day and beyond."

Your tasks: - Name the specific failure - Identify the root cause(s) - Write a complete repair prompt using the appropriate template - Write the prevention: what would the original prompt have needed to produce a useful response?

Lab Case 2 — The Format Disaster (Core)

Original prompt:

"Summarize the key points from this 8-page product brief for a 10-minute internal team meeting."

AI Output received:

"The product brief introduces Meridian, a B2B expense tracking solution aimed at mid-size companies in the 50-500 employee range. The document begins by establishing the problem context: finance teams at these organizations spend an average of 11.3 hours per week on manual expense reconciliation, a figure the brief attributes to a 2022 industry survey. This inefficiency translates to approximately $47,000 annually in lost productivity per finance headcount, creating a clear ROI case for automation.

The product itself is described as a three-module platform: automated receipt capture via mobile app, AI-powered categorization that learns company-specific expense policies, and a compliance reporting module that integrates with QuickBooks, NetSuite, and SAP. The integration capability is presented as a key differentiator, with the brief noting that competitors either lack native integration or require expensive middleware.

Pricing is positioned at the premium end of the market, with a stated rationale of targeting buyers who have already decided to invest in automation and are selecting on quality rather than price. The launch timeline anticipates a Q3 beta with general availability in Q4. The brief identifies three target buyer personas: CFOs at growth-stage companies, VP Finance at established mid-market companies, and Controllers at private equity portfolio companies.

The document closes with a competitive landscape section that positions Meridian against four named competitors, arguing that the combination of deep integration capability, AI categorization accuracy, and compliance reporting creates a sustainable competitive moat..."

[Response continues for 6 more paragraphs of prose]

Your tasks: - Name the specific failure - Identify the root cause(s) - Write a complete repair prompt - What format specification would have prevented this?

Lab Case 3 — The Hallucinated Citation (Core)

Original prompt:

"What does the research say about the effectiveness of four-day work weeks on employee productivity?"

AI Output received:

"Research on four-day work weeks shows largely positive results. A landmark 2022 study by Microsoft Japan found that implementing a four-day work week increased productivity by 40%, while employee happiness increased significantly. This was supported by a 2023 meta-analysis published in the Journal of Applied Organizational Psychology (Hendricks et al., 2023), which analyzed 47 studies and found that compressed work weeks produced an average 18.6% improvement in productivity metrics while reducing burnout scores by 23%.

The Perpetual Guardian trial in New Zealand (2018) showed similar results, with 78% of employees reporting improved work-life balance with no measurable decline in output. A 2024 UK study coordinated by the 4 Day Week Global organization found that 92% of participating companies maintained the policy after the trial ended, citing sustained or improved productivity.

However, not all research is uniformly positive. A Stanford working paper (Chen & Williams, 2023) found that four-day work weeks disproportionately benefit knowledge workers in office settings, with limited or mixed benefits for service, retail, and manufacturing environments."

Your tasks: - Identify which specific claims should be flagged for verification - Which of these is most likely to be hallucinated, and why? - Write a repair prompt for this response - Write a prevention instruction to add to future research prompts

(Note: Some of the claims in this output are real; some are invented. Your job is to identify which claims require verification regardless, and to write prompts that prevent this ambiguity.)

Lab Case 4 — The Code That Almost Works (Extension)

Original prompt:

"Write a Python function that takes a list of dictionaries containing 'name' and 'score' keys and returns the top 3 highest scorers."

AI Output received:

def get_top_scorers(players):
    sorted_players = sorted(players, key=lambda x: x['score'])
    return sorted_players[:3]

Context: This code runs without errors and returns 3 players. But it consistently returns the wrong players.

Your tasks: - Identify the specific bug (without running the code — reason through what it does) - Which root cause explains why this bug appears? - Write a repair prompt using the CoT approach from Chapter 10 combined with the Targeted Correction template - What prevention instruction could be added to code generation prompts to reduce this type of error?

Lab Case 5 — The Authoritative Wrong Answer (Extension)

Original prompt:

"Write a section for our investor update explaining how our approach to data privacy differs from industry standard practice. We are GDPR compliant and have implemented privacy-by-design principles."

AI Output received:

"Our approach to data privacy reflects the highest standards in the industry. Unlike many competitors who remain primarily focused on minimum compliance requirements, we have implemented a comprehensive privacy-by-design architecture that exceeds GDPR requirements in several key areas.

Our data minimization approach limits collection to strictly necessary data, ensuring that less than 15% of user data collected by our platform is retained beyond 30 days — compared to the industry average of 67% data retention beyond 90 days. This is benchmarked against the 2023 European Data Protection Board audit of 89 SaaS companies, which found this retention pattern to be characteristic of privacy-leading organizations.

Our encryption standards surpass requirements, using AES-256 for all data at rest (the GDPR minimum is AES-128), and our independently audited privacy controls have been verified under ISO 27701 certification, which only 12% of SaaS companies of our size have achieved."

Your tasks: - Identify every specific factual claim that requires verification before this appears in an investor update - Why is this output particularly dangerous despite sounding authoritative? - Write a repair prompt that acknowledges the good structure but addresses the fabrication risk - Write a system-level prevention prompt to prepend to any investor communication generation request

Section C: Repair Prompt Practice

Exercise 9 — Write All Seven Repair Templates from Memory (Core)

Goal: Internalize the repair templates so you can apply them without reference.

Without looking at the chapter, write a version of each of the seven repair prompt templates in your own words:

The Targeted Correction
The Task Clarification
The Format Fix
The Context Reload
The Factual Correction
The Depth Request
The Full Restart

After writing your versions, compare them to the chapter's templates. What did you include? What did you miss? What did you phrase differently in a way that might be even more effective?

Exercise 10 — Repair a Real Output from Your Own Work (Core)

Goal: Apply the full diagnostic workflow to a real failure.

Find a recent AI output from your own work that you were not satisfied with — something you had to substantially edit, restart, or abandon.

Apply the full diagnostic workflow: 1. What was the specific failure? 2. Which root cause(s) were responsible? 3. Apply the Triage Matrix: would you have repaired or restarted? 4. Write the repair prompt you should have used 5. Test it: does the repair prompt produce substantially better output than the original?

What do you now know about this task type that you didn't know before this analysis?

Exercise 11 — Building a Prevention List from Failures (Core)

Goal: Turn failures into systematic improvements to your prompting practice.

Review the five Lab Cases from Section B and any real failures from Exercise 10. For each root cause you identified:

Build a "prevention checklist" item for that root cause — a specific addition you can make to prompts for that task type to prevent the failure.

Format:

Task type: [e.g., "Research questions" or "Code generation" or "Strategy prompts"]
Root cause to prevent: [root cause name]
Prevention instruction to add: [specific instruction text]

You should have at least 5 prevention checklist items by the end of this exercise. Add them to your pattern library (from Chapter 11) as notes under the relevant patterns.

Section D: Building Your Failure Taxonomy

Exercise 12 — Start Your Failure Taxonomy (Core)

Goal: Begin the long-term practice of documenting failures.

Create a failure taxonomy document using this format for each entry:

DATE:
TASK:
FAILURE: (specific description)
ROOT CAUSE:
REPAIR PROMPT USED:
RESULT OF REPAIR:
PREVENTION FOR NEXT TIME:

Log at least 5 entries, combining the Lab Cases from this chapter with real failures from your own work.

After 5 entries: look for patterns. Do multiple entries share the same root cause? The same task type? The same failure mode? What does this tell you about your most important area for prompt improvement?

Exercise 13 — Pattern Analysis: Your Failure Distribution (Extension)

Goal: Identify your personal failure distribution and compare it to the chapter's research benchmark.

Track every significant AI failure you experience over 10 consecutive working days. For each failure: log the root cause.

After 10 days: count failures by root cause. Your personal distribution probably differs from the chapter's research benchmark: - Insufficient context: ~35% - Vague instruction: ~25% - Format mismatch: ~20% - Hallucination: ~10% - Others: ~10% combined

What does your actual distribution tell you? Which root causes are above average for you, and why? Which are below average? What specific interventions would address your highest-frequency failure causes?

Section E: Integration Across Part 2

Exercise 14 — Full Diagnostic Cycle (Core)

Goal: Practice the complete prompt → output → diagnose → repair → document cycle.

Choose a complex, real work task. Run the full cycle:

Write your initial prompt (without over-engineering it)
Evaluate the output honestly: what's right, what's wrong
Apply the diagnostic framework to the failures
Run repair prompts as needed
Document each failure in your taxonomy

Reflect on the whole cycle: how did the diagnostic framework change how you experienced the failures? How did the repair prompts compare to your pre-framework instinct about what to try?

Exercise 15 — The Anti-Pattern Audit (Extension)

Goal: Identify anti-patterns in your own prompting practice.

Review the last 20 prompts you have written (scroll through your chat history). For each one, ask: - Did it have explicit context? - Did it have specific instruction? - Did it specify format? - Did I include hallucination prevention for any factual claims?

Count: what percentage of your prompts included each element? What is the gap between your current practice and the standards from Chapters 7-9?

Write 3-5 specific improvements to your regular prompting practice based on this audit.

Exercise 16 — Part 2 Retrospective (Core)

Goal: Synthesize learning across all of Part 2.

Answer these questions with specific examples from your own work during this Part:

What is the single most impactful change to your prompting practice from Chapters 7-9 (fundamentals)?
Which advanced technique from Chapter 10 has produced the most meaningful improvement in output quality for you?
What pattern from Chapter 11 have you built or plan to build that will most reduce friction in your recurring work?
What multimodal workflow from Chapter 12 are you most likely to implement?
What failure root cause do you most commonly encounter, and what is your prevention plan for it?

These five answers are your personal AI prompting development plan — where you are now and what you'll work on next.

Exercise 17 — Teach It Back (Extension)

Goal: Deepen understanding through teaching.

Teach the diagnostic framework to one other person: a colleague, a friend, or even an AI assistant acting as a student. Explain: - The seven root causes - The diagnostic questions - At least two repair templates

After explaining, test their understanding with a scenario ("if you got an output that was generic and didn't reference our industry, what root cause is that, and what would you do?").

What did you learn from teaching it? Where did your explanation break down or require clarification? Update your own understanding based on the teaching experience.

Exercise 18 — Cross-Chapter Integration: The Full Workflow (Extension)

Goal: Apply tools from multiple chapters to a single complex task.

Choose a high-stakes, complex task you need to complete this week. Build the full workflow:

Chapter 11: Choose or build the appropriate pattern
Chapter 10: Identify which advanced technique(s) to add (CoT, few-shot, self-critique)
Chapter 12: Determine if any multimodal inputs are available
Chapter 13: Build your failure prevention checklist (what root causes are most likely for this task, and what instructions prevent them)

Run the task. Document: which preparation steps from which chapters produced the most value? What would you have missed if you had approached it without this preparation?

Exercise 19 — Build a Personal "Bad Output First Aid Kit" (Core)

Goal: Create a quick reference card you can use when you get a bad output.

Build a one-page reference card containing: - The 7 root causes (very brief description each) - The 5 diagnostic questions (one sentence each) - The Triage Matrix (the 2x2 decision tool) - The 7 repair template names with one trigger phrase for each

This card should be accessible wherever you work. The goal: when you get a bad output and aren't sure what to do, you open this card, run through it in 3 minutes, and know your next action.

Exercise 20 — One-Month Prompt Quality Review (Extension)

Goal: Measure improvement in your prompting quality over time.

Retrieve a sample of prompts from one month ago (from chat history). Apply the diagnostic framework in reverse: for each prompt, identify what was missing that could have caused failure (insufficient context, vague instruction, format specification absent, etc.).

Count: what percentage of your prompts from one month ago would have benefited from improvements now visible to you?

Now review your most recent prompts. Apply the same analysis. What is the improvement rate?

If you're currently working through this textbook, your improvement should be measurable. If it isn't, identify which chapters' practices you haven't yet integrated into your regular prompting and make a specific plan.