Quiz: The Streetlight Effect
Target: 70% or higher to proceed confidently.
Section 1: Multiple Choice (1 point each)
1. Goodhart's Law states: - A) Bad metrics drive out good metrics - B) When a measure becomes a target, it ceases to be a good measure - C) All metrics are inherently flawed - D) Measurement always improves outcomes
Answer
**B)** When a measure becomes a target, it ceases to be a good measure — because people optimize for the metric rather than the underlying construct. *Reference:* Section 4.22. The McNamara Fallacy proceeds through four steps. The final step is: - A) Measuring what's easy to measure - B) Disregarding what can't be measured - C) Assuming what can't be measured isn't important - D) Declaring that what can't be measured doesn't exist
Answer
**D)** The final and most dangerous step: what can't be easily measured "really doesn't exist." *Reference:* Section 4.13. "Metric displacement" refers to: - A) Replacing one metric with another - B) The proxy measure actively crowding out the construct it represents - C) Metrics being displaced by qualitative assessment - D) Old metrics losing relevance over time
Answer
**B)** The metric doesn't just fail to capture the construct — it displaces it. Test prep crowds out actual learning. *Reference:* Section 4.34. The chapter identifies four structural reasons why "measuring better" doesn't solve the streetlight effect. Which is NOT one of them? - A) The most important things are hardest to measure - B) Better metrics get gamed too - C) Measurement is inherently evil - D) Measurement systems create constituencies
Answer
**C)** The chapter explicitly states that measurement is essential — the problem is confusing the measurement with the thing being measured. *Reference:* Section 4.55. The GDP example illustrates the streetlight effect because: - A) GDP is inaccurately calculated - B) GDP measures economic activity but misses health, environment, equality, and wellbeing - C) GDP was never intended to be a useful measure - D) GDP has been replaced by better measures worldwide
Answer
**B)** GDP captures economic activity but not the dimensions of human welfare that economic activity is supposed to serve. *Reference:* Section 4.36. The citation count problem in academia demonstrates: - A) That scientists are uniquely corrupt - B) That even the institution dedicated to truth-seeking evaluates its members using metrics that distort truth-seeking - C) That citations are meaningless - D) That all scientific research is unreliable
Answer
**B)** The irony of science using metrics that select for publishability over importance. *Reference:* Section 4.7Section 2: True/False with Justification (1 point each)
7. "The streetlight effect means we should stop using metrics entirely."
Answer
**False.** The chapter explicitly states that measurement is essential. The error is not in measuring but in confusing the measurement with the thing being measured. The strategies proposed are about using metrics wisely, not eliminating them.8. "If we create a perfect metric that truly captures the construct, Goodhart's Law will no longer apply."
Answer
**False.** Campbell's Law applies to *any* metric used for high-stakes decisions. Even a "perfect" metric becomes gameable once it becomes a target. The problem is structural, not technical.9. "The streetlight effect only operates in fields that use quantitative metrics."
Answer
**False (mostly).** While the clearest examples involve quantitative metrics, qualitative fields can exhibit the effect too — focusing on the aspects of a question that are most easily discussed, published about, or taught, while ignoring harder dimensions. However, quantitative metrics amplify the effect dramatically.10. "Robert McNamara applied quantitative management to Vietnam because he didn't understand the difference between manufacturing and warfare."
Answer
**False.** McNamara was analytically brilliant and understood the difference in principle. The error was structural: the institutional environment demanded legible, reportable metrics, and the metrics available measured the wrong things. Individual insight could not overcome institutional incentive structures.Section 3: Short Answer (2 points each)
11. Explain the "five-step deeper pattern" that the chapter identifies across all streetlight effect examples. Apply it briefly to one example NOT from the chapter.
Sample Answer
The five steps: (1) The construct is complex and hard to quantify, (2) A proxy is adopted because it's measurable, (3) The system attaches rewards/punishments to the proxy, (4) Actors optimize for the proxy at the expense of the construct, (5) The proxy's improvement is mistaken for construct improvement. Example: Social worker caseloads. Construct: child welfare. Proxy: cases closed per month. System: social workers evaluated by closure rate. Decoupling: workers close cases prematurely to meet targets. Blindness: management believes child welfare is improving because case closure rates are rising.12. What is the difference between the streetlight effect and the authority cascade as failure modes?
Sample Answer
The authority cascade introduces a specific wrong answer through prestige dynamics — one person says something wrong, and the field follows. The streetlight effect distorts the landscape of inquiry — it determines which questions get asked, not which answers are given. The authority cascade is about *who* says it; the streetlight effect is about *what gets measured*. Both are structural, but they operate on different aspects of knowledge production.Section 4: Applied Scenario (3 points)
13. A tech company introduces "lines of code written per day" as a productivity metric for software developers. Within six months, the metric has improved significantly. Using the chapter's framework, analyze what is likely happening and recommend a better approach.
Sample Answer
The construct (developer productivity / code quality) has decoupled from the proxy (lines of code). Developers are likely: writing verbose code that could be more concise, avoiding code deletion/refactoring (which reduces lines), splitting single statements across multiple lines, and avoiding time spent on architecture, testing, and documentation (which produce fewer lines). The metric incentivizes code *volume* rather than code *value*. Better approach: Use multiple metrics (features shipped, bug rates, code review scores, user impact) with no single target. Include qualitative peer assessment. Rotate which metrics receive attention. Explicitly track the gap between lines written and features delivered. And critically: measure outcomes (working software that users value) rather than activities (typing).Scoring & Next Steps
| Score | Assessment | Recommended Action |
|---|---|---|
| < 50% | Needs review | Re-read 4.1–4.3 and Goodhart's Law |
| 50–70% | Partial | Review the five-step pattern and four structural reasons |
| 70–85% | Solid | Ready to proceed |
| > 85% | Strong | Proceed to Chapter 5 |