Chapter 29 Exercises: Hallucinations, Errors, and How to Catch Them

Instructions

These exercises are designed to build your practical hallucination detection skills. Some exercises ask you to reflect on your own practices; others provide concrete AI outputs for analysis. Complete them in order — they build on each other.


Part 1: Spot the Hallucination (Provided Examples)

The following AI-generated responses contain hallucinations or errors of varying types. For each one, identify: (a) what type of error is present, (b) what signals should have flagged it as suspicious, and (c) how you would verify whether it is accurate.

Exercise 1.1

An AI responds to the question "Who founded Patagonia and when?":

"Patagonia was founded in 1975 by Yvon Chouinard and his business partner Doug Tompkins. Chouinard originally ran a small mountaineering equipment company called Chouinard Equipment, which he converted into Patagonia when he realized the apparel market offered better growth opportunities."

Identify the error (there is at least one significant factual problem here), the type of error, the detection signals, and the verification approach.

Exercise 1.2

An AI responds to the question "What does GDPR say about data retention?":

"Under GDPR Article 17a, organizations must delete personal data no later than 36 months after the purpose of collection has been fulfilled, unless a legitimate business interest exception under Article 6(1)(f) applies. Fines for non-compliance with retention requirements are set at up to 2% of global annual turnover."

What should make you suspicious here? What would you verify and how?

Exercise 1.3

An AI responds to a request for a citation supporting the claim that remote work increases productivity:

"Bloom, N., Liang, J., Roberts, J., & Ying, Z. J. (2015). Does working from home work? Evidence from a Chinese experiment. The Quarterly Journal of Economics, 130(1), 165-218. https://doi.org/10.1093/qje/qju032"

This citation appears completely professional and specific. What is your verification procedure? What are you checking for?

Exercise 1.4

An AI responds to "What was the outcome of the 2024 EU AI Act vote?":

"The EU AI Act passed its final trilogue vote in December 2023 and received formal approval from the European Parliament in March 2024 with 523 votes in favor and 46 against, with 49 abstentions. It entered into force in August 2024 and will be fully applicable by 2026 with a phased implementation timeline."

What is the primary risk category here? What should you check?

Exercise 1.5

An AI responds to a question about a medication's dosing:

"Standard adult dosing for metformin for Type 2 diabetes management begins at 500mg twice daily with meals, titrated upward to a maximum of 2,550mg per day in divided doses. Dose adjustments are required for eGFR below 45 mL/min/1.73m², and metformin is contraindicated when eGFR falls below 30."

How should you approach this type of output? What does the specificity here tell you about risk level?


Part 2: Classifying AI Errors

Exercise 2.1: Error Spectrum Classification

For each scenario below, identify which error type from the spectrum applies (pure hallucination, confident error, plausible fabrication, outdated information, context collapse, subtle distortion):

a) AI states that a Python library's requests.get() function requires an auth_token parameter, when that was deprecated two versions ago.

b) AI generates a citation for a journal article by a real author, with a plausible title in their field, that does not actually exist.

c) AI correctly describes a medication's mechanism of action but gives the wrong half-life.

d) AI provides an accurate summary of a policy document except that it consistently refers to "mandatory compliance" where the document says "recommended best practice."

e) AI accurately describes trends in renewable energy but states that solar capacity surpassed wind globally in 2019, when the crossover actually occurred in 2022.

Exercise 2.2: Risk Domain Sorting

Sort the following tasks into HIGH hallucination risk and LOWER hallucination risk:

a) Asking AI to generate five brainstormed names for a new product b) Asking AI for the current federal minimum wage c) Asking AI to summarize a 2,000-word article you've pasted into the prompt d) Asking AI for three possible frameworks for structuring a presentation e) Asking AI for the correct citation for a 2019 behavioral economics paper f) Asking AI what the HIPAA penalty tiers are for data breach violations g) Asking AI to help you rewrite a paragraph for clarity h) Asking AI whether a specific chemical compound is included on the EU REACH restricted substances list


Part 3: Building Detection Skills

Exercise 3.1: The "Too Specific" Signal Practice

Review the following AI response and mark every element that is suspiciously specific and therefore warrants verification:

"According to McKinsey's 2023 Global State of AI report, organizations that have reached AI 'maturity' — defined as those generating over $100M in annual revenue from AI deployments — represent 12% of the survey respondents (up from 7% in 2021) and are 4.3 times more likely to have established a formal AI governance framework. The report surveyed 2,318 executives across 22 industries and 64 countries."

List every specific claim you would check.

Exercise 3.2: The Challenge Technique Practice

Write three follow-up questions you could ask an AI model that just gave you a statistic with a named source, using the challenge technique. The questions should be designed to surface uncertainty or elicit a correction without being adversarial.

Exercise 3.3: Designing Your Personal Verification Protocol

Based on the framework in Section 6, write your own personal hallucination detection protocol for your specific professional domain. Include: - The three most common types of claims you encounter in AI output in your work - Your primary source for verifying each type - Your threshold for "must verify" vs. "spot-check" vs. "low-risk, proceed"


Part 4: Applied Verification

Exercise 4.1: Citation Audit

Use an AI tool to generate a list of five academic citations relevant to a topic in your field. Then verify each one using: - Google Scholar - DOI resolution (doi.org) - PubMed (if applicable) - The publisher's website

Record: how many were real, how many were completely fabricated, how many had real authors but wrong details, how many had correct titles but wrong publication years or journals.

Exercise 4.2: The Statistic Trail

Ask an AI model for three statistics related to a topic you're working on (e.g., "productivity impacts of open office plans," "e-commerce growth trends," "attrition rates in tech companies"). For each statistic, attempt to trace it back to its original source. What proportion can you verify? What proportion have no traceable origin?

Exercise 4.3: Current Events Verification

Ask an AI model about something that happened in the past 6 months in your industry or professional field. Cross-check every specific claim the model makes against current news sources. How many details are accurate? How many are outdated, wrong, or extrapolated?


Part 5: Confidence Calibration

Exercise 5.1: Confidence Mismatch Identification

Review the following AI response for examples where expressed confidence outstrips what should be warranted:

"The research on this is quite clear: organizations that implement AI tools see a productivity improvement of between 22% and 31% within the first year of deployment. This is consistent across industries and firm sizes, with the effects being particularly pronounced in knowledge work roles where repetitive cognitive tasks can be substantially automated. The ROI on AI tool investment typically reaches breakeven within 7-9 months."

What is the model communicating through its language? What claims would need verification before you could use these figures professionally?

Exercise 5.2: Rewriting for Honest Confidence

Rewrite the paragraph from Exercise 5.1 to reflect appropriate epistemic humility — while still communicating useful information. The rewritten version should be accurate to what can actually be claimed without fabricated specificity.

Exercise 5.3: Building Intuition for High-Confidence Flags

Write your own list of five "high-confidence language patterns" that you'll watch for in AI output — phrases or constructions that signal the model is presenting something as certain that you should treat as unverified. Examples: "research clearly shows," "studies have found," "according to [specific organization]."


Part 6: Reflection Exercises

Exercise 6.1: Your Hallucination History

Think back over your AI use. Has AI output you've trusted ever turned out to be wrong? What was the context? What detection practice, had you applied it, would have caught the error? What did you learn?

Exercise 6.2: Stakes Analysis

For the AI use you do in your professional life, create a stakes matrix: what types of AI output go directly into high-stakes deliverables, what goes into medium-stakes work, and what is low-stakes experimentation? How does this analysis change your verification practice?

Exercise 6.3: The Informed Confidence Statement

Write a one-paragraph "informed confidence statement" — a description of where you extend trust to AI tools, where you verify, and why. This should be something you could share with a colleague or supervisor to explain your AI verification practices.


Answer Guidance

Exercise 1.1: Doug Tompkins co-founded Esprit and The North Face — not Patagonia. Patagonia was founded by Yvon Chouinard alone (with Malinda Chouinard). This is a confident error — the general context is accurate (Chouinard Equipment did evolve into Patagonia; the founding period is approximately correct) but the named co-founder is fabricated from plausible adjacency.

Exercise 1.3: This citation is real and verifiable — it is the famous Nicholas Bloom Stanford/Ctrip study. However, the verification process (check DOI, check journal, check that the abstract matches how the model characterizes the study) is the same regardless. The point of this exercise is that verification looks the same whether the citation is real or fabricated.

Exercise 1.5: Medical dosing information requires direct verification against current prescribing information (FDA label, manufacturer PI, clinical pharmacist consultation) regardless of how accurate it sounds. This is a safety-critical domain where the stakes of error are highest.