Chapter 39 Quiz: Measuring Effectiveness

Question 1

What is the primary reason that untracked AI use tends to look more effective than it actually is?

A) AI tools are designed to make users feel productive even when they aren't B) People naturally remember AI successes more vividly than AI-assisted failures C) Time savings from AI use are inherently difficult to measure D) Quality metrics always favor AI-assisted work

Answer

**B** is correct. There is a systematic memory asymmetry in how practitioners perceive their AI use: successes (the brilliant first draft, the time saved) are memorable and accumulate in the mental ledger, while failures (the hour spent verifying a plausible-but-wrong analysis) tend to get attributed to "verification" rather than "AI failure." This bias makes untracked AI use look better than it is. Measurement corrects this by creating an objective record.

Question 2

A practitioner calculates their weekly time savings from AI use: 5 hours. Their hourly time value is $50/hour and their annual AI subscription costs $360. What is the approximate annual ROI of their AI subscription?

A) 14x B) 35x C) 28x D) 7x

Answer

**B** is correct. Calculation: 5 hours/week × $50/hour × 50 weeks = $12,500 in annual time value. ROI = $12,500 / $360 = approximately 35x. This illustrates why, for almost any professional earning a reasonable salary and getting genuine value from AI tools, the ROI on AI subscriptions is dramatically positive.

Question 3

What does a consistently high iteration count (5+ rounds) for a specific task type most likely indicate?

A) The AI tool is broken or malfunctioning B) Either the task is poorly suited to AI assistance, or the prompting approach needs fundamental rethinking C) The user needs to subscribe to a more expensive AI tool D) The task requires more detailed factual information than AI can provide

Answer

**B** is correct. Consistently high iteration counts signal that something is fundamentally wrong with the current approach: either the task type is genuinely not well-suited to AI assistance (and the practitioner should reconsider using AI for it), or the prompt structure is so poorly matched to the task that each iteration is trying to fix problems introduced by the previous prompt. The response is not to try harder with the same approach but to fundamentally rethink it.

Question 4

What is the "AI batting average"?

A) The percentage of AI interactions that save more than 30 minutes B) The percentage of AI first outputs that are usable with only minor revision C) The ratio of AI-assisted tasks to manually-completed tasks D) The accuracy rate of AI factual claims in a practitioner's domain

Answer

**B** is correct. The AI batting average is a metric borrowed from baseball: the percentage of AI first outputs that are usable with only minor revision — good enough that the practitioner is mostly editing and refining rather than substantively rewriting. A mature practitioner on well-suited tasks should have a batting average above 0.6; beginners typically start at 0.3-0.5 and improve with practice.

Question 5

Elena's quality measurement reveals that AI assistance is improving her data quality and communication quality but reducing her analytical rigor. What does she do in response?

A) Stops using AI for analytical work B) Accepts the quality trade-off as inherent to AI-assisted work C) Adds a "devil's advocate" prompt step where she asks AI to challenge the analysis D) Uses a different AI tool for analytical tasks

Answer

**C** is correct. Elena's response is to adapt her workflow to address the specific quality dimension where AI assistance is creating a problem. By adding a step where she explicitly asks AI to challenge and critique the analysis it helped produce, she counteracts the tendency toward superficial-looking rigorous analysis. This is measurement-driven practice improvement: identifying a specific weakness and designing a specific intervention.

Question 6

What does the "coverage metric" track?

A) The geographic coverage of AI tools' knowledge base B) What percentage of eligible tasks are AI-assisted and the nature of that assistance C) How many different AI tools a practitioner uses D) The percentage of an organization's employees using AI tools

Answer

**B** is correct. Coverage metrics track the scope of AI's role in your work: what percentage of your tasks involve AI assistance, whether AI is leading the task or assisting with components, and whether certain task categories remain unassisted that might benefit from AI. Coverage tracking surfaces both over-use (tasks being AI-assisted that don't benefit) and under-use (tasks not being AI-assisted that would).

Question 7

The "productivity illusion" research finding refers to:

A) The finding that productivity metrics consistently overstate real-world AI benefits B) The tendency for AI users to feel more productive than they are because AI reduces cognitive effort even when output quality hasn't proportionally improved C) The illusion that AI will eventually eliminate the need for human productivity D) The mistaken belief that AI tools are more productive than they actually are

Answer

**B** is correct. Research from Microsoft and other organizations has identified that AI use often reduces cognitive effort — work feels easier — which users experience as feeling more productive. However, output quality may not have improved proportionally. This is the "productivity illusion": subjective productivity feeling is decoupled from objective output quality. This is precisely why measurement matters — feeling productive and being productive are not the same thing.

Question 8

What does a declining iteration efficiency trend (rounds needed going up over time) most likely indicate?

A) The practitioner is using AI for harder tasks as their skill develops B) The AI tool's performance has degraded C) The practitioner may be plateaued or regression has occurred in their prompting practice D) The practitioner is using AI for tasks it isn't suited for

Answer

**A and C are both potentially correct.** The chapter notes that increasing iteration counts could indicate that you're taking on harder, more complex tasks (which is positive — it means you're expanding your use of AI into more challenging territory). However, it could also indicate that your prompt quality has degraded or that your practice has stagnated. The distinction requires qualitative examination of what types of tasks the high-iteration interactions involve. For scoring purposes: if forced to choose one, **A** is the more nuanced answer that the chapter endorses — this requires investigation before assuming stagnation.

Question 9

The "stop doing" analysis identifies:

A) Tasks that AI cannot perform under any circumstances B) AI use cases with negative ROI — where AI assistance is not saving time or is reducing quality C) Team members who should stop using AI tools D) AI tool subscriptions that should be cancelled

Answer

**B** is correct. The "stop doing" analysis applies the general productivity principle of productive elimination to AI use. By identifying task categories where both time savings are low and quality is not improved — and being willing to stop AI-assisting those tasks — practitioners concentrate their AI use on high-leverage areas and stop wasting time on AI interactions that aren't generating value.

Question 10

Which quality measurement method is most reliable for determining whether AI assistance actually improves output quality?

A) Self-assessment rubrics B) Time-to-completion tracking C) Blind comparison (colleague rates outputs without knowing which was AI-assisted) D) Client satisfaction scores aggregated over a month

Answer

**C** is correct. Blind comparison removes the bias that comes from knowing which output was AI-assisted. Self-assessment rubrics are useful and necessary but subject to confirmation bias. Client satisfaction scores are valuable but lag and conflate many factors beyond AI assistance. Blind comparison provides the most controlled signal about the specific quality effect of AI assistance on equivalent tasks.

Question 11

According to the chapter, when should a practitioner adjust their optimization strategy to focus on discovering new use cases rather than refining existing ones?

A) After the first month of tracking B) When their AI subscription cost goes up C) When key metrics have been stable for 6-8 weeks despite deliberate experimentation (indicating they've reached the ceiling for their current approach) D) When a new AI model is released

Answer

**C** is correct. The chapter describes the "diminishing returns problem": as you optimize your AI use, gains get smaller and smaller. The signal that you've reached the ceiling for your current approach is metrics stability despite deliberate experimentation. At this point, the next gains aren't available by further refining the same use cases — they come from expanding into new use cases, new workflows, or new AI capabilities.

Question 12

Raj's developer productivity measurement uses four metrics. Which of these is described as "the gold standard quality measure" for code?

A) Code review cycle time B) Developer-reported velocity C) Post-merge defect rate D) Co-pilot review flag frequency

Answer

**C** is correct. Post-merge defect rate — how often merged code generates issues requiring follow-up patches — is described as the gold standard quality measure for code because it captures actual failures in production rather than indicators or proxies. Code review cycle time and flag frequency are useful leading indicators, but the defect rate is the ultimate measure of whether the code was good.

Question 13

A practitioner's measurement data shows that AI assistance saves them 40% of time on content drafting but their error rate on AI-assisted content is 25% higher than on non-AI-assisted content. What should they conclude?

A) AI assistance is net positive — the time savings outweigh the error rate increase B) AI assistance is net negative — they should stop using AI for content drafting C) More information is needed — specifically, what downstream costs the higher error rate creates D) The error rate increase is within normal variation and can be ignored

Answer

**C** is correct. The 40% time savings and 25% error rate increase cannot be compared without knowing the downstream costs of errors. If errors are caught in review and corrected cheaply, the time savings may still be net positive. If errors reach clients and create relationship damage or require expensive rework, the time savings are more than offset. The right response is to investigate the error cost, not to make an immediate judgment from the raw numbers alone.

Question 14

What distinguishes team-level measurement from individual-level measurement?

A) Team measurement is only relevant for organizations with more than 50 employees B) Team measurement focuses on aggregate outcomes (total time savings, error rate trends, adoption depth) and quality distribution across members rather than individual performance C) Team measurement replaces individual measurement once the team is established D) Team measurement focuses exclusively on cost and ROI rather than quality

Answer

**B** is correct. Team-level measurement answers different questions than individual measurement. The key team-level metrics — aggregate time savings, quality distribution across members, adoption depth, error rate trends, best practice propagation rate — are about collective outcomes and organizational health, not individual performance. Both levels of measurement serve important but distinct purposes.

Question 15

Alex's ROI analysis for her team's AI adoption shows a 2.76x return. She notes this is "conservative." What factors is the conservative estimate NOT accounting for?

A) The cost of AI subscriptions B) Her management time investment in policy and training C) Quality improvements and reduced error rate downstream effects D) The team's hourly time values

Answer

**C** is correct. Alex explicitly notes that her 2.76x ROI calculation doesn't account for quality improvements (reduced error rate, improved client satisfaction) or their downstream value (reduced rework, stronger client relationships, reputation). These are real economic benefits that are harder to quantify precisely but are meaningful. The stated ROI is based only on direct time savings — which makes it more defensible to a skeptical audience but understates the true value.