Chapter 4 Exercises: Practicing Trust Calibration
These exercises are designed to build practical, hands-on skill in calibrating trust for AI outputs. Some exercises require you to use an AI tool; others are reflection and analysis exercises. Work through them in order for the best progression.
Part A: Foundational Calibration (Exercises 1-5)
Exercise 1: Zone Classification Practice
For each of the following tasks, identify which Trust Zone (1-5) it belongs to and write a two-sentence justification for your answer.
- Asking AI to rewrite a paragraph in a more formal tone
- Asking AI for the current unemployment rate in your country
- Asking AI to draft 10 email subject lines for a promotional campaign
- Asking AI to explain what a derivative is in finance
- Asking AI to generate a Python function that validates email addresses
- Asking AI whether a specific medication interacts with another medication
- Asking AI to cite three academic papers on machine learning interpretability
- Asking AI to summarize a 10-page document you have pasted into the chat
- Asking AI to draft a non-disclosure agreement for a business partnership
- Asking AI to brainstorm five angles for a blog post on remote work productivity
Reflection: Were there any tasks where you were unsure of the zone classification? What made them ambiguous?
Exercise 2: Controlled Hallucination Hunt
Setup: You will deliberately ask an AI tool to generate information in a Zone 3 area and then verify every factual claim.
-
Ask an AI tool: "Tell me about three significant studies on AI's impact on workplace productivity from the last five years. Include the authors, institution, publication year, and key findings."
-
Attempt to verify every specific claim: author names, institutional affiliations, publication titles, findings, and dates.
-
Document each claim as: Verified Correct / Verified Incorrect / Unverifiable.
-
Write a one-paragraph summary of what you found.
Learning objective: Direct, personal experience with citation hallucination is more memorable than reading about it. This exercise creates that experience.
Exercise 3: The Confidence Decoupling Exercise
Goal: Practice noticing confident tone separately from evaluating accuracy.
-
Ask an AI tool a question you already know the answer to in your professional domain. Ask it in a way that would normally lead to a confident, authoritative response.
-
Before checking the answer, rate your sense of confidence in the AI's response based purely on how it sounds (1-5 scale).
-
Now evaluate the actual accuracy of the response based on your expertise.
-
Note the gap between "how confident the AI sounded" and "how accurate it was."
-
Repeat with five different questions, varying the difficulty.
Questions to reflect on: - Did tone correlate with accuracy? - Were there cases where the AI sounded uncertain but was accurate? - Were there cases where the AI sounded extremely confident but was wrong or imprecise?
Exercise 4: Personal Reliability Mapping
Create a personal reliability map for your professional domain using the following template. You will fill in cells based on your actual experience and knowledge.
Template structure:
| Task Type | Domain | Expected Zone | Verification Method |
|---|---|---|---|
Fill in at least 15 rows covering task types and domains most relevant to your actual work. Be specific — not "code generation" but "Python REST API boilerplate" or "SQL query optimization for large datasets."
Share this with a colleague and compare your maps. Where do your reliability assessments diverge? Discuss why.
Exercise 5: Trust Zone Reclassification
Go back to a recent piece of work where you used AI assistance. List every AI-assisted component. For each one:
- Identify which zone you implicitly treated it as when you used it
- Identify which zone it should have been classified as given what you now know
- Note whether you over-trusted, under-trusted, or correctly calibrated
- Identify any errors that slipped through, and any over-verification that cost unnecessary time
Write a short (half-page) summary of what you would do differently.
Part B: Verification Practice (Exercises 6-10)
Exercise 6: The Citation Verification Sprint
-
Ask an AI tool: "Give me five key statistics about social media marketing effectiveness, with sources."
-
For each statistic, spend no more than 5 minutes attempting to verify it.
-
Record: source exists (yes/no), statistic is accurate (yes/no/not found), source is current (yes/no).
-
Calculate your "accuracy rate" for the five statistics.
-
If you use a different AI tool, run the same query and compare results.
Discussion question: Given your results, what verification effort would you apply to AI-generated statistics in your work going forward?
Exercise 7: The Reverse Lookup Drill
Setup: AI frequently generates plausible-sounding but nonexistent references.
-
Ask an AI tool to recommend five books about a professional topic in your field, with brief descriptions of what each book covers.
-
Use a library catalog, book retailer, or Google to verify: Does each book exist? Is the author correct? Does the description match the actual book?
-
Note any discrepancies.
-
Now ask for five more books, but this time prompt: "Only recommend books that you are certain exist. Do not recommend anything you are uncertain about."
-
Repeat the verification exercise. Did the prompt change the accuracy rate?
Exercise 8: The Numerical Reasoning Check
Language models are unreliable for multi-step numerical reasoning. Test this directly.
-
Create three word problems involving percentage calculations and multi-step arithmetic. Make them realistic to your work (e.g., calculating a marketing ROI, a software project cost estimate, a compound growth figure).
-
Ask an AI tool to solve each one, showing its work.
-
Solve each one yourself or with a calculator.
-
Compare results. Note which steps, if any, the AI got wrong.
Extension: Ask the AI to double-check its work. Does it catch its own errors?
Exercise 9: Domain Expertise as Verification Filter
This exercise tests whether your professional expertise acts as an adequate verification filter for Zone 2 outputs.
-
Select a topic you know extremely well professionally (e.g., your core technical specialty, your primary business domain).
-
Ask an AI tool to explain a nuanced concept in that domain — one where a non-expert might not notice an error but you would.
-
Read the output carefully and flag every claim that is: Fully accurate / Partially accurate / Inaccurate / Oversimplified to the point of being misleading.
-
Write a corrected version of any inaccurate or misleading sections.
-
Estimate: for this type of content, what percentage of errors would a non-expert catch? What percentage would you catch?
Key insight: This reveals whether Zone 2 status is appropriate for this type of content for you specifically. If you would not catch the errors either, it belongs in Zone 3.
Exercise 10: The Sampling Verification Method
For a large volume of AI-generated content (such as a long research summary, a full report draft, or a large set of generated content items):
-
Use AI to generate a substantial piece of content with many factual claims (e.g., a 1,000-word industry overview).
-
Rather than verifying every claim, randomly select 5 claims and verify each one.
-
Calculate the error rate in your sample.
-
Based on the sample error rate, decide whether to: (a) use the content with light editing, (b) verify all remaining claims, or (c) discard and regenerate with a different approach.
-
If your sample shows any errors, verify all remaining claims and note how accurate your sample rate prediction was.
Part C: Calibration Building (Exercises 11-15)
Exercise 11: The Trust Calibration Log Setup
Set up your personal Trust Calibration Log. This is an ongoing tool you will maintain throughout your use of AI tools.
Required columns: - Date - Tool used - Task type - Domain - What zone you treated it as - What happened (was there an error? was verification needed?) - Calibration update (what does this tell you about your model?)
Set up this log in your preferred tool (spreadsheet, note app, document). Commit to logging at least 10 AI interactions over the next two weeks.
After two weeks, review the log and write a half-page calibration summary: What patterns do you see? What has changed in your trust calibration?
Exercise 12: The Trust Audit on a Real Project
Apply the four-stage Trust Audit framework to a real project you have recently completed with AI assistance:
Stage 1 (Inventory): List every AI-assisted element.
Stage 2 (Categorize): For each element, note whether you used it directly, modified it, or discarded it. What zone were you implicitly treating it as?
Stage 3 (Evaluate): Were there errors? Were you caught? Did you catch them or did someone else?
Stage 4 (Update): What should change in your calibration going forward?
Write up the full Trust Audit in a two-page document. Keep it. Revisit it in three months after you have done several more audits to see how your calibration has evolved.
Exercise 13: Calibration Comparison
Find one other person in your organization or network who uses AI tools. Ask them to complete the self-assessment in section 4.13 independently. Compare your scores.
Discussion questions: - Where do your trust calibrations differ significantly? - Have either of you had costly over-trust or under-trust experiences? - What has each of you learned from direct experience that should be passed on? - Is there a "team calibration" standard that would benefit your organization?
Write a one-page summary of your discussion and any calibration updates you adopted based on it.
Exercise 14: The Red Flag List Sprint
Based on everything you have done in this chapter — the exercises, the case studies, the reading — compile your initial personal Red Flag List.
Your Red Flag List should include: - Specific task types where you have directly observed errors - Specific domains where you have verified AI output to be unreliable - Specific tools where you have observed systematic failure patterns - Any patterns from the exercises that surprised you
Aim for at least 10 specific entries. These should be specific enough to be actionable: not "AI can be wrong about statistics" but "ChatGPT generated incorrect industry statistics in two of the three marketing research tasks I tested."
Post this list somewhere visible to you. Review and update it monthly.
Exercise 15: The Over-Trust Challenge
This exercise is designed to reveal your over-trust tendencies specifically.
-
For one full workday, every time you use AI assistance, note whether you verified the output before using it, modified it, or used it directly.
-
At the end of the day, for every item you used directly without verification, ask: "What would have happened if this were wrong?"
-
Identify any cases where the answer is "something costly would have happened" — those are cases where you over-trusted.
-
Identify any cases where the answer is "nothing much" — those are cases where you correctly trusted or appropriately streamlined verification.
Write a short reflection: What is your over-trust pattern? What is the highest-risk thing you used without verification that day?
Exercise 16: The Under-Trust Challenge
The mirror of Exercise 15.
-
For one full workday, every time you would normally verify AI output independently, ask: "Given the zone classification of this task, is this verification actually necessary?"
-
Identify cases where you are verifying Zone 1 tasks unnecessarily.
-
Calculate roughly how much time you spent on unnecessary verification.
-
Identify the emotional or cognitive source of the over-verification: Is it general distrust of AI? Perfectionism? Habit from early AI use when tools were less reliable?
Write a short reflection: What is your under-trust pattern? What verification steps can you safely eliminate going forward?
Exercise 17: Building a Team Calibration Standard (Group Exercise)
If you work in a team environment, this exercise develops a shared calibration standard for your team's AI use.
-
Gather three to five colleagues who use AI tools.
-
As a group, identify the five most common AI use cases in your team.
-
For each use case, agree on: Which zone does it fall into? What is the required verification process? Who is responsible for verification?
-
Document this as a one-page "Team AI Trust Standard."
-
Commit to reviewing it quarterly.
The goal is not to create a bureaucratic policy but to align implicit calibration assumptions across the team — because when one person over-trusts and another under-trusts, the team's AI quality is unpredictable.
Exercise 18: The Adversarial Prompting Test
Test how your AI tool responds when you ask it to be explicit about its uncertainty.
-
Ask an AI a Zone 3 question (something involving specific statistics or recent events).
-
After you receive the answer, ask: "How confident are you in these specific figures? What is the likelihood that some of these are inaccurate?"
-
Note what happens: Does the AI express appropriate uncertainty? Does it double down on confidence? Does it give you useful information about its reliability for this question?
-
Try this with several different questions and note the patterns.
-
Write a one-paragraph assessment: Is this AI tool's self-reported confidence calibrated? Can you use its uncertainty expressions as a reliability signal?
Exercise 19: Cross-Tool Calibration Comparison
If you have access to two or more AI tools (e.g., ChatGPT and Claude, or Claude and Copilot), run a calibration comparison:
-
Select five tasks across different zones (include at least one Zone 1, two Zone 2/3, and one domain-specific task).
-
Run the same prompt on both tools.
-
Evaluate and compare: accuracy, fluency, appropriate uncertainty expression, completeness.
-
Document any systematic differences you observe between the tools for these task types.
This builds a tool-specific calibration layer on top of your general calibration model.
Exercise 20: The Calibration Evolution Tracker
This is a long-term exercise designed to be revisited quarterly.
Initial calibration snapshot: Based on everything in this chapter and the exercises above, write a two-page summary of your current trust calibration. Include: - Your personal reliability map - Your Red Flag list - Your over-trust and under-trust tendencies - Your verification practices by zone
Quarterly review: Every three months, reread your previous snapshot and write an update: - What has changed in your calibration? - What new failure patterns have you observed? - What areas have you found more reliable than expected? - Have any AI tools you use changed significantly, requiring recalibration?
After one year of quarterly reviews, you will have a rich record of how your AI intuition has evolved — and concrete evidence of the specific calibration lessons that have mattered most in your actual work.