Every competent AI user gets bad outputs. Every single day, people using these tools get responses that are wrong, incomplete, off-tone, hallucinatory, confused about the task, or simply useless. This is not a sign that AI tools don't work. It is...
In This Chapter
- The Right Way to Think About AI Failure
- 1. The Seven Root Causes of Bad Output
- 2. The Diagnostic Framework: 5 Questions for Any Bad Output
- 3. The Triage Matrix: Repair vs. Restart
- 4. Fix Strategies by Root Cause
- 5. Repair Prompt Templates
- 6. When to Recognize Unfixable Outputs
- 7. Documenting Failures: Building a Personal Failure Taxonomy
- 8. Platform-Specific Failure Modes
- 9. Scenario: Alex — Diagnosing 3 Types of Bad Marketing Copy
- 10. Scenario: Raj — Diagnosing Bad Code
- 11. Scenario: Elena — When Authoritative-Sounding Output Is Wrong
- 12. Synthesis: Connecting Diagnosis Back to Chapters 7-12
- 13. Content Blocks
- Summary
Chapter 13: Diagnosing and Fixing Bad Outputs
The Right Way to Think About AI Failure
Every competent AI user gets bad outputs. Every single day, people using these tools get responses that are wrong, incomplete, off-tone, hallucinatory, confused about the task, or simply useless. This is not a sign that AI tools don't work. It is the normal operating condition of a powerful but imperfect technology.
What separates effective AI users from ineffective ones is not that they get fewer bad outputs. It's what they do when they get one.
Ineffective response to a bad output: frustration, retry the same prompt hoping for better luck, switch tools, or give up.
Effective response to a bad output: treat it as a diagnostic signal. Ask: what does this output tell me about what the model understood (or misunderstood) from my prompt? What was missing from my request that would have prevented this failure? What specific repair will fix the specific problem?
This reframe — bad output as information, not failure — is the foundation of everything in this chapter. Once you adopt it, the experience of getting a bad AI output changes from frustrating to systematic. You stop hoping the model will do better next time and start engineering the conditions for it to succeed.
This is the capstone chapter of Part 2. Everything from Chapters 7 through 12 — understanding what a prompt is, the role of context, instruction design, advanced techniques, prompt patterns, and multimodal inputs — converges here in the skill that matters most for daily AI use: knowing what to do when something goes wrong.
1. The Seven Root Causes of Bad Output
Before you can fix a problem, you need to know what caused it. AI output failures cluster into seven distinct root causes, each with specific diagnostic signals and specific fixes.
Root Cause 1: Insufficient Context
What it is: The model did not have enough information about your situation, organization, audience, or goals to produce a relevant, accurate output. It knew what to do but not what to optimize for.
Diagnostic signals: - Output is generic — could apply to any company or situation - Output uses reasonable approaches but not your specific approach - Output sounds plausible but doesn't fit your actual context - The model made assumptions about your audience, goals, or constraints that were wrong
Example output indicator: "You should consider increasing social media engagement by posting more consistently, engaging with comments, and running targeted ads."
This could describe any company's social media strategy. It tells you the model had no specific context about your company, brand, competitive position, or what you've already tried.
Root Cause 2: Vague Instruction
What it is: The instruction was too broad, too abstract, or underspecified for the model to know what you actually wanted.
Diagnostic signals: - Output is broad when you needed specific - Output covers too many things at insufficient depth - Output interprets your request differently from what you intended - Multiple valid interpretations exist and the model chose the wrong one
Example output indicator: Asked "analyze our marketing," the model produces a five-page document that is broad and shallow, covering everything from brand identity to ad spend to social media to customer segmentation — all at the same level of depth, none usefully deep.
The instruction "analyze our marketing" contained no specificity about what aspect of marketing, what time period, what criteria, what depth, or what output you needed.
Root Cause 3: Format Mismatch
What it is: The output has the right content but the wrong format — wrong structure, wrong length, wrong level of formality, wrong presentation style for the use case.
Diagnostic signals: - Content is mostly right but unusable in the current form - Output is much too long or much too short - Output uses the wrong structural format (prose when you needed bullets, table when you needed narrative) - Output doesn't match the platform or medium it will be used in
Example output indicator: You asked for "talking points for a 5-minute presentation" and received a five-page essay that would take 20 minutes to read, beautifully structured as prose paragraphs.
The content may be excellent. The format is unusable for the stated purpose.
Root Cause 4: Wrong Capability
What it is: You asked the model for something it cannot do — either in general (a known limitation) or in this specific configuration (a limitation of the particular tool or version you're using).
Diagnostic signals: - The model produces a generic answer when you needed specific, up-to-date information - The model cannot access content it would need to answer (a website, a database, a real-time feed) - The model refuses or hedges excessively on a task that should be straightforward - The output clearly lacks information that would require specific access or capabilities
Example output indicator: Asking a non-web-enabled language model "what was the stock price of Apple on March 15, 2026?" produces a refusal ("I don't have access to real-time data") or a hallucination (a confidently stated but invented price).
This is not a prompting failure — it's a capability mismatch. No prompting improvement will fix it; you need a different tool or approach.
Root Cause 5: Hallucination
What it is: The model confidently states factual information that is incorrect, invented, or unverifiable — including made-up citations, non-existent products, incorrect statistics, or fabricated quotes.
Diagnostic signals: - The output contains specific, verifiable claims that cannot be confirmed - Statistics, citations, or references that sound authoritative but don't check out - Specific named entities (people, companies, products, places) that behave suspiciously in context - Information presented with high confidence on topics where you know the model may lack reliable training data
Example output indicator: Asked to cite research supporting a claim, the model produces "Smith et al. (2021), Journal of Applied Psychology, found that..."— but no such paper exists. Or it states a market size figure ("the global AI market is $127.4 billion") that doesn't match any credible source.
Hallucination is distinct from other failure modes because it requires special handling: not just a better prompt, but explicit verification of key claims.
Root Cause 6: Training Data Bias
What it is: The model's training data reflects particular biases — cultural, historical, demographic, or ideological — that produce outputs that are systematically skewed in ways that may not be immediately obvious.
Diagnostic signals: - Recommendations that implicitly assume a particular demographic or cultural context - Solutions that consistently favor established large-company approaches over small business or emerging market approaches - Historical information that reflects dominant narratives while omitting alternatives - Creative outputs that default to particular cultural representations
Example output indicator: Asked "what are the best employee benefits to offer?", the output focuses entirely on health insurance, 401k matching, and stock options — approaches suited to US-based, tech-industry, full-time employees. For a company with hourly workers in a different country, this is not only wrong but systematically wrong in a specific direction.
Root Cause 7: Context Window Overflow
What it is: The conversation or document has become too long for the model to maintain full, consistent attention on all of it. Earlier instructions, context, or constraints are effectively forgotten.
Diagnostic signals: - The model seems to "forget" instructions you gave earlier in the conversation - Quality of responses degrades over a long exchange - The model contradicts something it said earlier without acknowledging it - Output no longer matches the format or constraints specified in an early system prompt
Example output indicator: In a long editing session, you specified "always write in second person" at the start. By message 20, the model has drifted to third person without acknowledgment.
This is a structural limitation, not a prompting failure in the traditional sense. The fix is conversation management, not prompt revision.
2. The Diagnostic Framework: 5 Questions for Any Bad Output
When you receive output that isn't what you needed, run through these five questions before deciding how to respond:
Diagnostic Question 1: What specific failure is present?
Name the failure precisely. Not "this isn't right" but: "The format is wrong — I got prose when I needed bullets" or "The content is generic — no specific reference to our industry" or "This claim is incorrect — the date is wrong."
The more precisely you can name the failure, the more precisely you can fix it. Vague failure identification produces vague repair prompts.
Diagnostic Question 2: Which root cause(s) are responsible?
Match the specific failure to one or more root causes from the list above. Most failures have a primary root cause. Some have two.
| Observed Failure | Likely Root Cause |
|---|---|
| Output is generic | Insufficient context |
| Output is off-topic or addresses the wrong thing | Vague instruction |
| Content right, format wrong | Format mismatch |
| Model says it can't do something / uses outdated data | Wrong capability |
| Confident specific claim that's verifiably wrong | Hallucination |
| Systematically skewed toward one type of solution | Training data bias |
| Model seems to have forgotten earlier instructions | Context window overflow |
Diagnostic Question 3: What was missing or wrong in the prompt?
For each identified root cause, identify the specific missing or incorrect element:
- Insufficient context → What context should I have provided?
- Vague instruction → What dimension of specificity was missing? (Task? Format? Criteria? Depth?)
- Format mismatch → What format specification was absent or unclear?
- Wrong capability → Is there a tool that has this capability, or a different approach that doesn't require it?
- Hallucination → How can I instruct the model to flag uncertainty or stick to verifiable claims?
- Training data bias → What counter-instruction or example would rebalance the output?
- Context window overflow → What conversation management approach would maintain essential context?
Diagnostic Question 4: How far is the output from what I need?
Estimate the distance between the current output and a usable output on a 1-5 scale: - 1 (Very close): Minor adjustments needed — one element is slightly off - 2 (Close): Structural improvements needed — format or focus needs adjusting - 3 (Moderate): Significant rewriting needed — right idea, wrong execution - 4 (Far): Fundamental approach is wrong — needs to be rebuilt from different framing - 5 (Very far): Wrong task interpretation or wrong capability — requires restart
Diagnostic Question 5: Repair or restart?
Based on questions 1-4, decide: should you repair the existing output (targeted fix) or restart with a fundamentally different prompt?
The Triage Matrix (detailed in the next section) provides the decision framework.
3. The Triage Matrix: Repair vs. Restart
The Triage Matrix is a 2x2 tool for deciding whether to repair or restart after a bad output:
LOW DISTANCE FROM GOAL
(content/format fixable)
|
HIGH EFFORT ___________|____________ LOW EFFORT
TO REPAIR | | | TO REPAIR
| Repair | Repair |
| (invest | (quick |
| the time)| fix) |
|__________|____________|
| Consider | Restart |
| restarting| (the |
| carefully | output |
| | can't be |
HIGH | | salvaged)|
DISTANCE | | |
FROM GOAL |___________|____________|
|
HIGH DISTANCE FROM GOAL
(fundamentally wrong)
Quadrant 1: Low effort, low distance → Repair immediately The fix is small and obvious. Use a targeted repair prompt. Example: the response is slightly too long, or it used the wrong technical term throughout.
Quadrant 2: High effort, low distance → Repair (invest the time) The output has the right structure and approach but needs substantial editing or additional content. A targeted repair may be more efficient than starting over, because you're keeping the good structure. Example: the analysis framework is right but the content in each section needs more depth.
Quadrant 3: Low effort to restart, high distance from goal → Restart The output is fundamentally wrong (wrong task interpretation, wrong capability, etc.) and a new prompt would be quick to construct. Don't invest time trying to salvage an output that's heading in the wrong direction. Example: the model wrote a blog post when you needed a client proposal — start over with a completely different prompt.
Quadrant 4: High effort to restart, high distance from goal → Consider carefully The output is substantially wrong, and rebuilding would be significant work. Evaluate: what specifically would a restart accomplish that repair cannot? Sometimes in this quadrant, a significant repair prompt ("start over, keeping only X from your current response, and approach it this way") produces a better result than abandoning everything.
4. Fix Strategies by Root Cause
Fix Strategy 1: Addressing Insufficient Context
What to do: Add the missing context explicitly. Don't assume the model inferred it.
Repair prompt pattern:
Your response was too generic for our specific situation. Let me add context I should
have provided:
[SPECIFIC CONTEXT: industry, company size, existing constraints, what you've already tried,
specific audience characteristics, etc.]
With this context, please revise your response to address our specific situation.
Prevention for next time: Use the context-loading techniques from Chapter 8. Before any significant prompt, ask: "What does the model need to know about my situation that it couldn't assume?"
Fix Strategy 2: Addressing Vague Instruction
What to do: Replace the vague instruction with specific dimensions of what you want. Specify the scope, depth, format, and criteria explicitly.
Repair prompt pattern:
My original request was too broad. I need to be more specific:
What I actually need: [SPECIFIC TASK]
Scope: [SPECIFIC SCOPE — not "marketing" but "email subject line performance for Q3 campaigns"]
Depth: [SPECIFIC DEPTH — not "analysis" but "3 specific recommendations with implementation steps"]
Criteria: [SPECIFIC CRITERIA — what makes the output good for your use case]
Please respond to this more specific version of the request.
Prevention: Use the instruction design techniques from Chapter 9. Before submitting a prompt, ask: "Could this instruction be interpreted in more than one way? What's missing?"
Fix Strategy 3: Addressing Format Mismatch
What to do: Keep the content, change the format. This is the easiest repair.
Repair prompt pattern:
The content is good but the format isn't right for my use case. Please reformat this response:
Current format: [what it is]
Needed format: [what you need — bullet points, table, numbered list, specific length,
specific section structure, etc.]
Use case: [how this will be used — why the format matters]
Keep all the substantive content, just restructure it.
Prevention: Always specify format explicitly in your original prompt. Chapter 7's format specification guidance applies here.
Fix Strategy 4: Addressing Wrong Capability
What to do: If the model genuinely lacks the capability you need, don't try to prompt your way around a fundamental limitation. Find the right tool or reframe the request to work with what the model can do.
Option A — Tool change: Use a web-enabled model for current information. Use ChatGPT Advanced Data Analysis for computational work. Use a specialized tool for the specific capability you need.
Option B — Reframe the request:
I understand you don't have access to [REAL-TIME DATA / SPECIFIC INFORMATION].
Instead of [original request], please:
- Tell me what information I would need to gather to answer this question myself
- Explain what methodology I should use to analyze it
- Describe what to look for when I retrieve current information
This turns a capability limitation into a usable framework.
Fix Strategy 5: Addressing Hallucination
What to do: Distinguish between correcting a specific hallucination and building systemic hallucination resistance into your prompts.
Immediate repair:
That claim is incorrect. The actual answer is [X]. I know this because [brief evidence].
With this correction, please revise the relevant section of your response.
Also flag any other claims in your response that you are less than certain about.
Systemic prevention:
In your response, please:
- Flag any specific factual claim (statistics, dates, citations) that you are not
certain is accurate with [UNVERIFIED]
- Avoid specific numeric claims unless you are confident they are accurate
- If you cite a source, include enough detail (author, publication, year) for me to verify it
Prevention for high-stakes content: Use the self-critique factual audit from Chapter 10 (Elena's protocol). For research-dependent tasks, provide the sources yourself rather than asking the model to cite from memory.
Fix Strategy 6: Addressing Training Data Bias
What to do: Explicitly counter the bias by specifying the perspective or context you need.
Repair prompt pattern:
Your response assumes [IDENTIFIED ASSUMPTION — e.g., "a US-based corporate context"
or "a large company with significant resources"].
My actual context is [SPECIFIC CONTEXT]. Please reframe your recommendations for
this specific situation.
Prevention: When asking for recommendations or analysis, specify your context explicitly upfront. "For a 15-person startup in Southeast Asia with limited capital" is context that prevents bias-default recommendations.
Fix Strategy 7: Addressing Context Window Overflow
What to do: Reset the essential context without starting a new conversation.
Repair prompt pattern:
We've covered a lot of ground in this conversation. Let me restate the key parameters
that should guide the rest of this session:
1. Task: [core task reminder]
2. Format requirement: [key format specification]
3. Constraint: [most important constraint]
4. Current status: [where we are in the process]
With these parameters re-established, please continue with [NEXT STEP].
Prevention: For long conversations, periodically restate key constraints. For very long sessions, consider starting a fresh conversation and pasting in the essential context rather than continuing a degraded session.
5. Repair Prompt Templates
These are the core repair prompt patterns, ready to use with minimal adaptation:
Template 1: The Targeted Correction
Use when: A specific element is wrong but the rest is fine.
That's not quite right because [SPECIFIC REASON]. Specifically, I need [SPECIFIC CORRECTION].
Please revise just [THE SPECIFIC SECTION/ELEMENT] to address this.
Keep everything else as is.
Template 2: The Task Clarification
Use when: The model misunderstood what you were asking for.
You misunderstood the task. What I actually need is [CORRECT TASK DESCRIPTION].
What you produced was [DESCRIPTION OF WHAT YOU GOT].
What I need is [DESCRIPTION OF WHAT YOU ACTUALLY NEED — be specific about
the difference].
Please try again with this corrected understanding.
Template 3: The Format Fix
Use when: Content is acceptable but format is wrong.
Good structure but wrong format. Keep all the content and restructure it as:
[EXACT FORMAT SPECIFICATION]
The content should be preserved completely — I just need it formatted
differently for [USE CASE].
Template 4: The Context Reload
Use when: Output is generic due to missing context, or context has been lost in a long conversation.
Start over with more context:
[DETAILED CONTEXT: company, industry, audience, specific situation, constraints,
what you've already tried, what the output will be used for]
Now try again: [ORIGINAL TASK]
Template 5: The Factual Correction
Use when: A specific claim is wrong and you know the correct information.
That claim is incorrect. The actual answer is [X]. I know this because [brief evidence].
With this correction, please:
1. Revise the section containing the error
2. Review the rest of your response for similar claims I should verify
Template 6: The Depth Request
Use when: Output is too shallow — correct direction but insufficient detail.
The structure is right but this needs more depth. Specifically for [SECTION/ELEMENT]:
- What you gave me: [brief description of current content]
- What I need instead: [description of needed depth — specific examples, data, reasoning steps]
Please expand this section only. Keep the rest as is.
Template 7: The Full Restart
Use when: The output is fundamentally wrong and repair would be more work than restarting.
This response isn't working because [SPECIFIC REASON — wrong approach, wrong capability,
fundamentally misunderstood task].
Let me start over with a clearer request:
[NEW, IMPROVED PROMPT — apply lessons from the diagnostic framework]
6. When to Recognize Unfixable Outputs
Not every bad output is fixable with better prompting. Some outputs are unfixable for the current tool, task, or context:
Unfixable: Genuine capability gap. If the model doesn't have access to the information or capability you need, no prompt improvement will help. Use a different tool.
Unfixable: Wrong tool for the task. Asking a general language model to produce legally binding documents, make medical diagnoses, or run financial calculations is misapplied capability regardless of prompt quality.
Unfixable: The task requires knowledge the model doesn't have. Asking for analysis of your unpublished internal data, your proprietary code, or events that occurred after the model's training cutoff will not produce accurate results from prompting alone.
Unfixable: Task requires specialized accuracy the model cannot provide. For tasks where small errors have large consequences (legal, medical, financial, safety-critical), AI-generated outputs may always require human expert review regardless of prompt quality. This is not fixable — it is appropriate acknowledgment of the tool's limits.
Recognizing the signal: If you have run two or three repair iterations and the fundamental problem persists despite good diagnosis and targeted repair prompts, you are likely hitting a genuine capability or knowledge limit. Stop trying to fix and change your approach.
7. Documenting Failures: Building a Personal Failure Taxonomy
The most effective AI users treat failures as learning opportunities that accumulate in a documented form. A personal failure taxonomy is a record of:
- The task type
- What went wrong
- Which root cause was responsible
- What repair prompt fixed it
- What prevention to apply next time
This takes two to three minutes to log after a significant failure. Over months of use, it produces a personalized reference guide to your own failure patterns.
Why bother? Because most users have consistent failure patterns across their specific task types. If your AI-generated reports always fail due to insufficient context about your company structure, that pattern will recur until you build context-loading into your standard approach. The taxonomy makes the pattern visible.
Simple documentation format:
DATE: [date]
TASK: [brief description]
FAILURE: [what went wrong]
ROOT CAUSE: [from the 7 root causes list]
REPAIR: [what you did to fix it]
PREVENTION: [what to add to future prompts for this task type]
After collecting 20-30 entries, group them by task type and root cause. The patterns will tell you exactly where to invest in prompt improvement for your specific work.
8. Platform-Specific Failure Modes
Different AI platforms have characteristic failure modes worth knowing:
ChatGPT (GPT-4o): - Occasionally over-eager to help — may fabricate specifics when it doesn't know - Can be verbose without explicit length constraints - Very strong instruction following but sometimes over-literal interpretation
Claude: - More conservative about factual claims — may refuse or hedge more than necessary - Strong at nuanced instruction following and maintaining specified constraints - May occasionally be overly cautious about requests it interprets as potentially harmful
Gemini: - Strong at current information with web access - Can struggle with very long structured output consistency - Generally reliable for Google Workspace integration tasks
General patterns across all platforms: - All models hallucinate more on obscure topics, recent events, and specific numerical data - All models produce better output with explicit format specification than without - All models improve significantly with context — the "blank slate" problem affects all platforms equally
9. Scenario: Alex — Diagnosing 3 Types of Bad Marketing Copy
Alex asked for a product launch email for a new Brightleaf kitchen product. She received an output that failed in three distinct ways. Here's how she diagnosed each one:
Failure 1: Generic brand voice The email sounded like a placeholder template — "energetic, clean design" and "sustainable materials" language that could come from any lifestyle brand.
Diagnosis: Insufficient context (Root Cause 1) — she didn't provide her few-shot brand voice reference. Root cause: She had forgotten to include the brand voice examples from her pattern library. Repair: Template 4 (Context Reload) — reload the brand voice examples and re-request.
Failure 2: Wrong email structure The email was structured as a long narrative when her email campaigns always follow: hook → single key benefit → proof point → CTA, no more than 3 short paragraphs.
Diagnosis: Format mismatch (Root Cause 3) — she hadn't specified her email structure. Repair: Template 3 (Format Fix) — keep the content, restructure to the correct format.
Failure 3: Incorrect claim about the product The email claimed the product was "dishwasher safe," which it is not.
Diagnosis: Hallucination (Root Cause 5) — the model inferred this common product attribute without having it confirmed. Repair: Template 5 (Factual Correction) — correct the claim, then review the rest of the email for similar unsupported inferences.
She ran all three repairs in sequence — context reload first, then format fix, then factual correction on the final version. Total repair time: 8 minutes. Better than the 35-minute manual rewrite she would have done before learning the diagnostic framework.
10. Scenario: Raj — Diagnosing Bad Code
In a typical week, Raj encounters three types of AI-generated code failures, each requiring different diagnosis:
Type A: Wrong approach AI generates code that solves a slightly different problem than the one he described, or uses an approach that's valid in general but wrong for his architecture.
Diagnosis: Vague instruction (Root Cause 2). The description of what he needed was underspecified — it said what but not how-within-our-system. Repair: Template 2 (Task Clarification) — specify the correct approach and the architectural constraints the solution must respect. Prevention: Add architecture context to code prompts: "In our codebase, we use [specific pattern/library/constraint]. Solutions must be compatible with this."
Type B: Subtle logic bug The code looks correct, compiles, and runs — but has an off-by-one error, an edge case that fails silently, or a logic condition that is inverted.
Diagnosis: This is not exactly a root cause from the main list — it's a precision failure related to the statistical nature of model generation. The code is mostly correct because the overall pattern is common, but the specific detail is wrong. Repair: Template 1 (Targeted Correction), combined with CoT debugging from Chapter 10. "This function has a logic error at line [X]: the condition should be > not >=. Can you explain why you chose >= and review whether the same issue appears elsewhere in the function?" Prevention: Add an explicit code review request to generation prompts: "After generating this function, review it for: off-by-one errors, edge cases with empty input, and logical condition inversions."
Type C: Security issue The code does what was asked but does it in a way that's insecure — missing input validation, using an outdated cryptographic approach, storing credentials in memory.
Diagnosis: Training data bias (Root Cause 6). Common code patterns in training data often have security issues, and the model reproduces the common pattern without security improvements. Repair: Template 5 (Factual Correction) for specific issues: "This code doesn't validate [INPUT] before [OPERATION]. This is a security risk because [REASON]. Please add the appropriate validation." Prevention: Add security review to the generation prompt: "After generating this code, review it as a security-focused engineer. Identify any: input validation gaps, credential handling issues, dependency vulnerabilities, or injection risks."
11. Scenario: Elena — When Authoritative-Sounding Output Is Wrong
Elena received an AI-generated section of a market analysis that contained the following sentence: "According to a 2023 McKinsey Global Institute study, companies that implement AI-driven supply chain optimization see an average ROI of 340% within 18 months."
This is the worst kind of bad output: confidently stated, specifically attributed, exactly the type of data that makes a consulting report credible — and entirely fabricated. No such study exists. The "340%" figure was invented.
Diagnosis: Hallucination (Root Cause 5). Classic pattern: specific attribution + specific quantitative claim + plausible source name = high confidence presentation of invented information.
What made this dangerous: The output didn't feel wrong. It was not obviously incorrect. The writing style was professional, the claim was plausible, the attribution sounded real. Without verification, it would have appeared in a client deliverable.
Elena's verification protocol for this type of content: 1. Any statistic that cites a specific source: verify the source exists and the study contains the stated finding 2. Any ROI claim: cross-reference with at least one independent source 3. Any "companies that..." generalization: check if any actual named companies confirm it
The repair: Template 5 (Factual Correction) + her Chapter 10 factual audit protocol. She flags the hallucinated claim, removes it, and instructs the model to replace it with either a verified claim or a framing that acknowledges the claim without specific attribution ("organizations implementing supply chain AI report significant ROI improvements, with specific results varying substantially by context and implementation quality").
The lesson Elena took from this: "The professional tone is the tell. When AI-generated content sounds exactly like a well-sourced consulting report, that's exactly when to be most skeptical about the sourcing. The more authoritative the tone, the more carefully I verify the underlying claims."
12. Synthesis: Connecting Diagnosis Back to Chapters 7-12
Chapter 13 is not just about fixing bad outputs — it's about understanding why the earlier chapters matter.
Each root cause in the diagnostic framework maps directly to a skill developed in an earlier chapter:
| Root Cause | Preventive Skill | Chapter |
|---|---|---|
| Insufficient context | Context-loading techniques | Chapter 8 |
| Vague instruction | Instruction design | Chapter 9 |
| Format mismatch | Output specification | Chapter 7 |
| Wrong capability | Platform literacy and tool selection | Chapter 6 |
| Hallucination | Self-critique and factual audit | Chapter 10 |
| Training data bias | Role assignment and explicit context | Chapter 8, 9 |
| Context window overflow | Conversation management, pattern design | Chapter 11 |
When you apply the diagnostic framework to a bad output and identify the root cause, you are identifying which earlier skill was not applied — or was applied insufficiently. The diagnostic framework is both a repair tool and a feedback loop that reinforces the earlier skills.
13. Content Blocks
💡 Intuition: Bad Output as a Diagnostic Signal Every bad output is answering the question: "What did the model understand from my prompt?" A generic output tells you it lacked specific context. An off-task output tells you the instruction was ambiguous. A hallucinated output tells you it was generating plausibly-patterned content in the absence of reliable knowledge. Read the output as a diagnostic signal before you decide how to respond.
⚠️ Common Pitfall: The Retry Loop The most common ineffective response to a bad output is retrying the same prompt — sometimes multiple times — hoping for a better result. This works occasionally (if the failure was due to random generation variance). But for systematic failures caused by insufficient context, vague instruction, format mismatch, or capability limits, the same prompt will produce the same type of failure. Diagnose first, then repair.
✅ Best Practice: Name the Failure Before Repairing It Before writing a repair prompt, write one sentence describing exactly what is wrong: "The output is generic — no reference to our industry or specific situation." "The format is wrong — prose when I need a bulleted list." "This claim is incorrect — the date should be 2024 not 2023." Naming it precisely produces a more targeted repair prompt. The more specifically you can describe the problem, the more specifically the model can fix it.
⚖️ Myth vs. Reality: "AI Just Makes Things Up" Myth: AI hallucination is random and unpredictable — you never know when it will invent something. Reality: Hallucination has consistent patterns. It is most common for: specific numerical claims, citation attribution, events near or after the training cutoff, and obscure topics with limited training data. It is least common for well-established, frequently documented information. Knowing the hallucination risk profile lets you apply verification effort where it matters most, not uniformly.
⚖️ Myth vs. Reality: "If I Didn't Get It Right, AI Can't Do This Task" Myth: If a prompt didn't produce good output, AI can't perform this type of task. Reality: Most "AI can't do this" conclusions actually reflect a prompt failure, not a capability failure. The same model that produced a useless output on a vague prompt often produces excellent output on a specific, well-constructed prompt for the same task. Applying the diagnostic framework before concluding capability failure usually reveals a solvable prompt problem.
🎭 Scenario Walkthrough: Three-Minute Diagnosis You receive an AI output that is wrong. Three minutes: - Minute 1: Read the output carefully. Identify the specific failure. Write one sentence describing what is wrong. - Minute 2: Match the failure to a root cause from the seven-cause framework. Identify what was missing from your original prompt that caused this. - Minute 3: Choose a repair template from the five options. Write the repair prompt. Running this three-minute process before typing anything produces better repairs than immediately re-prompting.
📋 Action Checklist: Diagnostic Framework Application - [ ] When you get a bad output, resist the urge to re-prompt immediately - [ ] Name the specific failure in one sentence - [ ] Identify the root cause from the 7-cause framework - [ ] Apply the Triage Matrix: repair or restart? - [ ] Choose the matching repair template - [ ] After the repair, note: what would have prevented this in the original prompt? - [ ] Document significant failures in your personal failure taxonomy
📊 Research Breakdown: How Common Are the Root Causes? Analysis of AI prompt failures across professional users suggests the following approximate distribution of root causes: - Insufficient context: ~35% of failures - Vague instruction: ~25% of failures - Format mismatch: ~20% of failures - Hallucination: ~10% of failures - Wrong capability: ~5% of failures - Training data bias: ~3% of failures - Context window overflow: ~2% of failures The dominant causes — insufficient context and vague instruction — account for roughly 60% of failures and are entirely preventable with the skills from Chapters 7-9. This is the empirical case for investing in prompting fundamentals before advanced techniques.
Summary
Bad AI output is not failure — it is information. The diagnostic framework in this chapter gives you a systematic tool for reading that information: identifying which of the seven root causes is responsible, assessing how far the output is from what you need, and choosing the repair approach that directly addresses the cause.
The repair prompt templates are practical tools that can be applied immediately. The Triage Matrix helps you decide when to invest in repair and when to restart. The failure taxonomy builds the long-term learning that makes your prompting progressively better.
This chapter closes Part 2, which has taken you from the fundamentals of what a prompt is (Chapter 7) through context design (Chapter 8), instruction mechanics (Chapter 9), advanced techniques (Chapter 10), reusable patterns (Chapter 11), multimodal inputs (Chapter 12), and now the diagnostic and repair skills that complete the cycle.
The most effective AI users are not the ones who get the best outputs on the first attempt. They are the ones who have the skills to understand what went wrong, fix it efficiently, and build that learning into how they prompt next time. That is the skill this chapter provides.
Part 3 extends the view: from individual prompting skill to the broader questions of how AI tools fit into professional workflows, teams, and organizations.