Appendix H: Answers to Selected Exercises

This appendix provides model answers for selected exercises from each chapter. For exercises that involve prompting and AI interaction, model answers show both the prompt and an example of what a high-quality response looks like. Your actual AI responses will differ — use these as benchmarks for evaluating quality, not as expected exact matches.


Chapter 1: Understanding What AI Actually Is

Exercise 1.2: Distinguish AI Outputs by Type

Exercise: Classify each of the following tasks as (a) well-suited for current AI, (b) partially suited with human oversight required, or (c) not well-suited for current AI. Justify your classifications.

Tasks: (1) Drafting a blog post about a historical event. (2) Deciding whether to fire an employee. (3) Extracting key dates from a legal contract. (4) Predicting next month's stock price. (5) Explaining a medical diagnosis to a patient's family.

Model Answer:

  1. Drafting a blog post about a historical event — (a) Well-suited. AI excels at generating fluent prose about well-documented historical topics. It can produce a complete first draft quickly. Key caveats: verify specific dates and quotations; style and voice will need human refinement; very recent historical events may fall after training cutoff.

  2. Deciding whether to fire an employee — (c) Not well-suited. This decision involves legal exposure, interpersonal nuance, organizational context, documented performance history, equity considerations, and human judgment about a complex individual situation. AI can help structure the relevant considerations or draft HR documentation, but the decision itself requires human accountability. Delegating it to AI would be both impractical and ethically inappropriate.

  3. Extracting key dates from a legal contract — (a) Well-suited. This is a structured extraction task with clear inputs and outputs. AI performs this reliably on most contracts. Recommended practice: verify extracted dates against the source document, especially for high-value contracts.

  4. Predicting next month's stock price — (c) Not well-suited. Stock price prediction depends on real-time market data, unpredictable events, and complex dynamics that exceed AI's capabilities. AI can analyze historical patterns or synthesize analyst reports, but cannot reliably predict prices. Any AI claiming otherwise should be treated with extreme skepticism.

  5. Explaining a medical diagnosis to a patient's family — (b) Partially suited with oversight required. AI can help a clinician prepare accessible explanations, check for medical jargon, and suggest how to present difficult information empathetically. However, the clinician must provide the explanation — this is a relationship-intensive, emotionally significant interaction requiring human presence, accountability, and real-time adaptation.


Exercise 1.5: Identify the Training Cutoff Problem

Exercise: Identify three types of professional tasks where the training cutoff would cause significant problems, and three where it would not matter.

Model Answer:

Where training cutoff causes significant problems:

  1. Legal research involving recent case law. Court decisions issued after the cutoff do not exist in the model's knowledge. An AI citing "current case law" may omit decisions that substantially changed the legal landscape.

  2. Investment or financial analysis. Market conditions, interest rates, regulatory changes, and company financials change continuously. Analysis based on outdated financial data can lead to poor decisions.

  3. Competitive intelligence. Competitor product launches, organizational changes, pricing updates, and market share shifts are invisible to the model after the cutoff date.

Where training cutoff does not matter:

  1. Drafting a business memo or email. Writing skill and professional communication conventions are stable; the cutoff is irrelevant.

  2. Explaining foundational concepts (mathematics, physics, historical events, established business frameworks). These do not change.

  3. Code review for syntax and logic. Programming language syntax and best practices change slowly; most code review tasks are unaffected by a cutoff that is 12-18 months old.


Chapter 2: How Large Language Models Work

Exercise 2.1: The Next-Token Intuition

Exercise: Explain next-token prediction in plain language to a non-technical colleague in 3-4 sentences. Then explain why this mechanism produces hallucinations.

Model Answer (explanation):

"A language model is trained to play an extremely sophisticated fill-in-the-blank game. It reads billions of pages of text and learns statistical patterns — which words tend to follow which other words in which contexts. When you ask it a question, it generates a response word by word, each time picking the word that its training says is most likely to come next given everything written so far. It does not retrieve stored answers from a database — it composes each word fresh based on learned patterns.

This mechanism produces hallucinations because 'the most statistically likely next word' and 'the factually true next word' are not always the same thing. If the model has learned that a certain type of question (e.g., 'What is the capital of X?') is typically followed by a place name, it will produce a place name — even for a country it knows little about or has inconsistent information about. The fluency and confidence of the output is a property of the language patterns it has learned, not evidence of the truth of the content."


Chapter 3: Context, Memory, and Limitations

Exercise 3.3: Context Window Management

Exercise: You have a 50-page research report and want AI to help you write an executive summary. Describe the problem and a practical approach to solving it.

Model Answer:

The problem: A 50-page report is approximately 25,000-30,000 words, or roughly 35,000-40,000 tokens. While this fits within the context windows of frontier models (Claude's 200K context, Gemini's 1M context), stuffing the entire document into a single prompt raises issues: the model's attention becomes diluted across the full length, important information in the middle may be underweighted, and the cost in API tokens is high.

Practical approach (hierarchical summarization):

Step 1 — Section-level summaries: Break the document into logical sections (chapters or major headings). Submit each section with this prompt:

"Summarize the following section of a research report in 3-4 bullet points. Capture the main finding, key supporting evidence, and any significant limitations or caveats. Do not include section headings or transitions — just the substance.\n\n[PASTE SECTION]"

Step 2 — Synthesis: Paste all section summaries into a new conversation with this prompt:

"Below are summaries of sections from a 50-page research report. Write an executive summary of 300 words that a senior executive could read in 2 minutes. Include: the central finding, 2-3 key supporting points, and any critical risks or recommendations.\n\n[PASTE ALL SECTION SUMMARIES]"

Step 3 — Verify: Review the final summary against the original document's introduction, conclusion, and any tables you know are important. Correct any misrepresentations.

This approach works reliably within any context window and produces better results than submitting the full document at once.


Chapter 4: Writing Effective Prompts

Exercise 4.4: Prompt Iteration

Exercise: Start with the prompt "Write about climate change." Improve it through three iterations, explaining what you changed at each step and why.

Model Answer:

Iteration 0 (original): "Write about climate change."

Problems: No audience, no purpose, no length, no angle, no format. The model will produce a generic overview of uncertain quality and direction.


Iteration 1: "Write a 500-word overview of climate change for high school students. Cover the causes, current impacts, and what individuals can do."

What changed: Added audience (high school students), length (500 words), and structure (causes, impacts, actions). Why: This gives the model enough constraints to make appropriate vocabulary and depth choices, and the structure ensures coverage.


Iteration 2: "Write a 500-word overview of climate change for high school students preparing for a class debate. Cover: (1) the scientific consensus on causes, (2) two current, specific real-world impacts, and (3) three concrete actions individuals can take that have meaningful (not just symbolic) impact. Use plain language. Avoid jargon. Do not use the word 'crucial.'"

What changed: Added purpose (debate preparation), specificity on impacts (two specific examples), a meaningful qualifier for actions (not just symbolic), and style constraints (plain language, one word to avoid). Why: Purpose shapes which aspects to emphasize. Specificity on examples prevents vague gestures. The "not just symbolic" constraint pushes against weak content. The word prohibition prevents a filler word that appears constantly in AI outputs.


Iteration 3: "Write a 500-word overview of climate change for 10th-grade students preparing for a class debate on climate policy. Cover: (1) the IPCC scientific consensus on human causes (cite what percentage of scientists agree), (2) two specific real-world impacts that have occurred since 2015 — give examples with locations, (3) three individual actions that researchers estimate have the highest per-person carbon reduction impact, with estimated savings in metric tons of CO2 per year. Write at a 9th-grade reading level. Do not include opinions or policy recommendations — stick to established facts and research findings."

What changed: Added grade level precision, specified a research citation (IPCC, percentage), required recency for examples (post-2015), specified a quantitative standard for individual actions, set a reading level target, and added a factual-only constraint. Why: The final version produces a specific, well-bounded, high-quality piece of writing rather than a generic overview. Each added constraint removes a dimension of ambiguity.


Chapter 5: Advanced Prompting Techniques

Exercise 5.2: Chain-of-Thought Application

Exercise: Use chain-of-thought prompting to solve the following problem: A company has three departments. Department A generates $400K revenue and costs $300K to run. Department B generates $600K and costs $500K. Department C generates $200K and costs $280K. The company is considering cutting one department to improve profitability. Which should be cut?

Model Prompt and Response:

Prompt:

"Think through the following business problem step by step before reaching a conclusion. Show your reasoning at each step.\n\nA company has three departments:\n- Department A: $400K revenue, $300K cost\n- Department B: $600K revenue, $500K cost\n- Department C: $200K revenue, $280K cost\n\nThe company wants to cut one department to improve profitability. Which should be cut, and what factors beyond the simple numbers should they consider?"

Example of high-quality response:

Step 1: Calculate net contribution of each department. - Dept A: $400K - $300K = $100K profit - Dept B: $600K - $500K = $100K profit - Dept C: $200K - $280K = -$80K loss

Step 2: Identify the clear finding. Department C is the only department operating at a loss. On raw numbers, cutting it improves profitability by $80K (stopping the loss). Departments A and B both contribute positively.

Step 3: But the decision is more complex than the raw numbers. Several factors should be considered before acting:

  • Strategic value of C: Is Department C new and expected to reach profitability? Is it a loss-leader that enables revenue in A or B (e.g., a product demo or support function)? Is it required for regulatory compliance?
  • Cost structure: Are the $280K costs fixed or variable? If largely fixed, cutting the department may not reduce costs by the full $280K.
  • Revenue dependencies: Does Dept B's $600K revenue depend on Dept C's output in any way?
  • Alternative improvements: Can Dept C's costs be reduced or revenue increased more cheaply than shutdown?
  • People and transition costs: Eliminating a department carries severance, morale, and institutional knowledge costs that do not appear in a simple P&L comparison.

Step 4: Recommendation. On the basis of financial contribution alone, Department C is the priority for either turnaround or elimination. However, a responsible decision requires investigating the three factors above before making a final call. The numbers are necessary but not sufficient.


Chapter 6: AI for Research and Information Work

Exercise 6.1: Evaluating AI Research Outputs

Exercise: You ask an AI for a summary of research on the effectiveness of standing desks for reducing back pain. The AI returns a confident 200-word summary citing "multiple studies." What are the three most important things you should verify before using this in a work report?

Model Answer:

  1. Whether the cited studies exist and say what the AI claims. AI models frequently hallucinate citations — inventing plausible-sounding paper titles, authors, and journals. Before treating any AI-provided citation as real, search for it in Google Scholar, PubMed, or a library database. If the AI did not cite specific papers (just said "multiple studies"), this is a signal to find the actual sources yourself.

  2. Whether the summary reflects the actual research consensus on this specific question. The research on standing desks and back pain is more nuanced than popular perception suggests — some studies show minimal benefit, and some show new problems (leg fatigue, varicose veins) that emerge from standing too long. A confident AI summary may smooth over genuine uncertainty. Search for a systematic review or meta-analysis on the topic to get a sense of what the evidence base actually supports.

  3. Whether the research is current and applicable to your context. "Reducing back pain" is a broad claim — the evidence may apply specifically to people with existing lower back conditions, may not account for ergonomic sit-stand protocols (vs. just standing all day), and the research base has shifted over time. Verify the recency of the sources and whether the population studied matches the context of your report.


Chapter 7: Trust, Verification, and Accuracy

Exercise 7.3: Building a Verification Protocol

Exercise: Design a three-tier verification protocol for a communications team that regularly uses AI to draft external-facing content.

Model Answer:

Tier 1 — Standard review (for most content): All AI-assisted drafts receive a human editorial review before publication. The reviewer checks for: tone consistency with brand voice, logical flow and accuracy of any claims, factual statements that include statistics or named third parties (verify each against primary sources), and any regulatory or legal language (flag for legal review if present). Estimated time: 10-20 minutes per draft.

Tier 2 — Enhanced review (for high-visibility or sensitive content): Content going to press releases, executive communications, external reports, or crisis communications receives: two independent reviewer passes (one for content accuracy, one for tone and brand), verification of all statistics against the original source documents, and sign-off by a senior communications lead. Any uncertainty about factual accuracy triggers escalation to the subject-matter expert. Estimated time: 1-2 hours per draft.

Tier 3 — Expert review (for technical or specialized content): Content involving medical, legal, financial, scientific, or compliance claims requires review by a qualified subject-matter expert who holds accountability for accuracy in their domain. The AI draft is treated as a starting point only; the expert's judgment takes precedence over the AI's wording on substance. Estimated time: variable.

Documentation requirement: All published AI-assisted content should have a record noting which tier of review was applied and who approved it. This supports accountability and enables post-publication auditing if issues arise.


Chapter 8: AI in Your Professional Workflow

Exercise 8.1: Workflow Mapping

Exercise: Map the steps of a weekly task you currently do manually. Identify which steps AI can handle, which require human judgment, and which are ambiguous.

Model Answer (example task: preparing a weekly team briefing document):

Step Description AI Role Human Role
1 Collect updates from team members AI: draft request email Human: send, follow up
2 Read and organize raw updates AI: categorize and structure Human: review for missing context
3 Identify key themes and risks AI: propose themes Human: validate against domain knowledge
4 Draft briefing document AI: write full draft Human: edit, add judgment calls
5 Add strategic context and recommendations Ambiguous: AI can propose; human must own Human: responsible for all recommendations
6 Final review and distribution Human only: accountability Human

Analysis: Steps 1-4 are reasonable AI-assistance candidates. Step 5 is genuinely ambiguous — AI can help generate options, but recommendations going to leadership carry the human's credibility and accountability. Step 6 should remain human-controlled because the reviewer catches issues the AI introduced, not just validates what a human wrote.


Chapters 9-13: Selected Exercises

Chapter 9 Exercise: Ethics Scenario

Exercise: A colleague tells you they used Claude to write an entire client proposal and submitted it without review. Assess the ethical issues.

Model Answer: Several concerns arise. First, professional representation: the client is paying for your colleague's expertise, not an AI's output — submitting unreviewed AI work misrepresents the nature of the service delivered. Second, quality assurance: an unreviewed AI proposal may contain errors, hallucinated facts, or recommendations not grounded in the actual client situation — this creates professional and legal risk. Third, accountability: the colleague is professionally accountable for what was submitted; having not read it, they cannot fulfill that accountability. The issue is not that AI was used — it is that human judgment and review were eliminated entirely. The appropriate correction is to establish a policy that all AI-assisted client deliverables receive substantive human review before submission.


Chapter 11 Exercise: API Integration Planning

Exercise: You want to automate the summarization of 200 customer support tickets per week using the API. What are three key design decisions?

Model Answer:

  1. Model selection and cost. At 200 tickets per week, roughly 1,000 tokens per ticket (ticket + summary), this is approximately 200,000 tokens per week. At Claude Haiku pricing (~$0.25 per million input tokens), this costs under $1/week — use the cheapest capable model for this repetitive task. Reserve more capable (expensive) models for complex or escalated cases.

  2. Output format standardization. Define a structured output format (e.g., JSON with fields for: issue category, sentiment, urgency level, resolution requested, and summary text) and enforce it in the system prompt. Structured outputs are far more useful for downstream systems than freeform prose.

  3. Error handling and quality gates. Automated pipelines will encounter edge cases: tickets in unexpected languages, extremely short or long tickets, tickets with sensitive information. Build in: input validation before sending to the API, error handling for API failures with retry logic, a sampling-based human review (e.g., 5% of outputs reviewed weekly) to catch systematic errors, and a flagging mechanism for tickets where the AI's confidence or output quality appears low.


For all exercises not answered here, model answers are available in the companion resources at [the book's website]. Answers to computational exercises in Chapter 11 are included in the online code repository.