Chapter 20 Exercises
How to use these exercises: Work through the parts in order. Part A builds recognition skills, Part B develops analysis, Part C applies concepts to your own domain, Part D requires synthesis across multiple ideas, Part E stretches into advanced territory, and Part M provides interleaved practice that mixes skills from all levels.
For self-study, aim to complete at least Parts A and B. For a course, your instructor will assign specific sections. For the Deep Dive path, do everything.
Part A: Pattern Recognition
These exercises develop the fundamental skill of recognizing legibility traps across domains.
A1. For each of the following scenarios, identify (i) the complex system being simplified, (ii) the measured dimension, (iii) the unmeasured dimensions being lost, and (iv) whether the scenario is at the "first-generation success" stage or the "second-generation failure" stage.
a) A hospital measures physician performance by patient satisfaction scores collected via post-visit surveys. Satisfaction scores have been rising steadily for two years. Physicians have been spending more time on bedside manner and less time on thorough diagnostic workups. Misdiagnosis rates have begun to tick upward, but the hospital does not track them as a performance metric.
b) A city police department is measured by the number of arrests made per officer per month. Arrest numbers are impressive. Officers are arresting people for minor infractions that previously would have been handled with a warning. The district attorney's office is overwhelmed, and conviction rates are falling.
c) A social media company measures content moderation success by the percentage of flagged content removed within 24 hours. The removal rate has reached 97 percent. Moderators, under pressure to meet the speed metric, are removing borderline content that does not actually violate policies, including legitimate political speech and satire.
d) A national government measures economic health primarily by GDP growth. GDP has been growing at 4 percent annually. The growth is driven by resource extraction that is depleting topsoil, contaminating aquifers, and reducing biodiversity. None of these are included in GDP.
e) A university measures research productivity by number of publications per faculty member. Publication counts have tripled in a decade. The average quality and replicability of published studies has declined. Faculty spend less time on teaching and mentorship.
f) A farm shifts from diverse polyculture to single-crop monoculture for efficiency. First-year yields are the highest in the farm's history. Soil biological activity has declined by 40 percent, but the farmer is not measuring soil biology.
A2. Classify each of the following as primarily an example of (a) metric fixation, (b) dashboard driving, (c) institutional lock-in, (d) the destruction of illegible knowledge, or (e) blaming the people rather than the metrics. Some may involve more than one pattern.
a) A school district fires its most experienced teachers (who resist test-prep-focused instruction) and replaces them with new graduates willing to follow the scripted curriculum.
b) A CEO reviews quarterly KPI reports but has not visited a factory floor, spoken to a frontline employee, or taken a customer call in three years.
c) A forestry agency continues planting monoculture spruce because its entire supply chain -- nurseries, sawmills, transport contracts -- is built around spruce.
d) A hospital administrator argues that a surgeon with excellent complication rates must be performing well, despite nurses reporting that the surgeon routinely ignores safety protocols.
e) A government agency, confronted with evidence that its standardized testing program has narrowed curricula, responds by adding standardized tests in additional subjects.
f) A hedge fund replaces experienced portfolio managers who use qualitative judgment with algorithmic trading systems that optimize quantitative signals. Performance improves for two years and then suffers a catastrophic drawdown during an unprecedented market event.
A3. For each pair, identify which scenario is more likely to result in a legibility trap and explain why:
a) Scenario A: A school district uses standardized test scores as one of twelve indicators of school quality. Scenario B: A school district uses standardized test scores as the sole determinant of school ratings, teacher evaluations, and funding allocation.
b) Scenario A: A company sets quarterly revenue targets and reviews them in the context of customer retention, employee satisfaction, and product quality data. Scenario B: A company sets quarterly revenue targets and ties all executive compensation to hitting them.
c) Scenario A: A central government sets national agricultural production targets and allows regional officials to determine how to meet them based on local conditions. Scenario B: A central government prescribes specific crops, planting densities, and harvest schedules for every farm in the country.
A4. The chapter identifies five self-reinforcing mechanisms that maintain legibility traps: institutional constituencies, cognitive commitment, sunk cost dynamics, destruction of alternatives, and illegibility of failure. For each of the following failed legibility projects, identify which mechanism(s) made the trap hardest to escape:
a) The continuation of monoculture forestry in Germany despite second-rotation decline b) The persistence of high-rise public housing projects despite rising crime and social dysfunction c) The continuation of Soviet quota-based planning despite well-documented Goodhart distortions d) The persistence of No Child Left Behind-style testing despite evidence of curriculum narrowing e) The continuation of metric-fixated management at a company whose experienced employees have left
A5. Identify three legibility traps in your daily life -- situations in which a complex reality has been reduced to a simple metric, the metric is being optimized at the expense of the underlying reality, and the infrastructure built around the metric makes change difficult. For each, describe the arc: What was simplified? What initial success did the simplification produce? What is being lost?
Part B: Analysis
These exercises require deeper analysis of legibility trap dynamics.
B1. The Arc Analysis. Choose one of the following legibility projects and trace the full arc of legibility failure:
- Body Mass Index (BMI) as the primary measure of health
- Credit scores as the primary measure of creditworthiness
- University rankings (e.g., U.S. News & World Report)
- Social media engagement metrics as the measure of content quality
- Crime statistics (reported crimes per capita) as the measure of public safety
For your chosen project:
a) Identify the complex system being simplified and the measurable dimension(s) chosen.
b) Describe the unmeasured dimensions being lost.
c) Identify the first-generation success: what improved on the measured dimension?
d) Identify the second-generation failure: what degraded on the unmeasured dimensions?
e) Identify the self-reinforcing mechanisms that maintain the trap.
f) Assess whether the trap has closed (alternatives destroyed, course correction impossible) or is still open (escape possible with sufficient will).
g) Propose a mixed-methods alternative that would preserve the legitimate benefits of the metric while mitigating the legibility trap.
B2. The Soviet Quota Game. A central planning agency must set a production quota for a shoe factory. The factory must produce shoes that are actually useful to citizens. Consider the following sequence of quotas and predict the Goodhart distortion each would produce:
a) Quota: Number of pairs of shoes produced per month.
b) Quota: Total weight of shoes produced per month.
c) Quota: Number of pairs of shoes produced per month, with a minimum quality inspection requirement.
d) Quota: Number of pairs of shoes sold to citizens (rather than produced).
e) Quota: Citizen satisfaction rating with shoes purchased.
For each quota, explain: What would the factory manager optimize? What dimension of shoe quality or utility would be sacrificed? Is there any quota that would not produce a Goodhart distortion? Why or why not?
Now compare this exercise to a real-world corporate KPI system you have encountered. Are the dynamics fundamentally different, or merely less extreme?
B3. The Testing Trap Quantified. Research the following (or use estimates based on the chapter's discussion):
a) The percentage of instructional time devoted to test preparation in a typical U.S. elementary school before NCLB (pre-2002) versus after NCLB (2005-2015).
b) The change in instructional time devoted to science, social studies, art, music, and physical education in U.S. elementary schools between 2001 and 2010.
c) The trend in NAEP scores (the low-stakes federal test) versus state standardized test scores over the same period.
Using this data, assess whether the "score inflation" phenomenon described in the chapter is supported by evidence. If state test scores rose while NAEP scores were flat, what does this tell you about the relationship between the metric and the reality?
B4. Dashboard Forensics. Obtain a real or hypothetical corporate dashboard (many examples are available in management textbooks or online). For each metric on the dashboard:
a) Identify what the metric measures.
b) Identify at least one important dimension of organizational health that the metric does not capture.
c) Describe how an employee could "game" the metric -- improve the number without improving the underlying reality.
d) Assess whether the metric is functioning as a thermometer (providing information) or a thermostat (driving behavior).
e) Propose one qualitative measure that could be used alongside the metric to reduce legibility trap risk.
Part C: Application to Your Own Domain
These exercises connect legibility traps to your area of expertise.
C1. Identify the most important legibility project in your professional domain -- the primary metric or set of metrics used to evaluate performance, allocate resources, or make decisions. Then:
a) Describe the complex system the metrics are intended to capture.
b) Identify the measured dimensions and the unmeasured dimensions.
c) Assess where the legibility project is on the arc: first-generation success, emerging second-generation failure, or full legibility trap?
d) Identify any institutional constituencies that depend on the continuation of the current metrics.
e) Describe what practitioners in your domain say the metrics miss. Is anyone listening?
f) Propose a mixed-methods alternative that would preserve the legitimate value of the metrics while mitigating legibility trap risk.
C2. Identify a case in your professional experience where a legibility project was resisted -- where practitioners pushed back against a metric, a standardization, or a simplification. Analyze the case:
a) What was the legibility project?
b) Who resisted, and on what grounds?
c) Was the resistance successful? If so, what made it work? If not, why did it fail?
d) In retrospect, was the resistance justified? Did the practitioners' concerns prove valid?
e) What institutional structures would have made the resistance more effective?
C3. Design an "early warning system" for legibility traps in your domain. What signals would indicate that a legibility project has entered the dangerous phase -- that first-generation success is masking second-generation degradation? Who should be monitoring for these signals? What authority should they have to trigger course correction?
Part D: Synthesis
These exercises require integrating ideas across multiple chapters.
D1. Legibility Traps and Goodhart's Law. Chapter 15 showed that metrics used as targets are corrupted. Chapter 20 shows that this corruption follows a predictable arc and creates self-reinforcing traps.
a) Explain the relationship between Goodhart's Law and the Arc of Legibility Failure. Is the arc merely Goodhart's Law played out over time, or is it something more?
b) Identify a case from Chapter 20 in which Goodhart distortions were the primary mechanism of second-generation failure. Identify a case where the failure was driven by something other than Goodhart distortions (e.g., ecological degradation, community destruction).
c) Can a legibility trap exist without Goodhart distortions? Can Goodhart distortions exist without creating a legibility trap? Use examples to support your answer.
d) Design a metric system that is resistant to both Goodhart corruption and legibility trap dynamics. What features would it need? Is such a system possible in practice?
D2. Legibility Traps and Cascading Failures. Chapter 18 showed that tightly coupled systems are vulnerable to cascading failures. Chapter 20 shows that legibility projects create tight coupling by eliminating the diversity and redundancy that buffer systems against shock.
a) Explain how the Soviet planned economy's legibility created the tight coupling that made cascading failures inevitable.
b) Identify the "circuit breakers" (Ch. 18) that were removed by the legibility projects described in Chapter 20. For each, explain what the circuit breaker was, why it was classified as "waste" by the legibility project, and what happened when it was removed.
c) Propose a general principle: does legibility always increase tight coupling? Or are there forms of legibility that preserve loose coupling?
D3. Legibility Traps and Iatrogenesis. Chapter 19 showed that interventions can cause more harm than the problems they address. Chapter 20 shows that the "doubling down" phase of a legibility trap is a form of iatrogenesis.
a) Map the doubling-down phase of the forestry legibility trap onto the iatrogenic framework: What is the "disease" (the problem the intervention addresses)? What is the "treatment" (the intervention)? What is the iatrogenic harm?
b) Explain why the doubling-down response is an example of the intervention spiral (Ch. 19). Why does each new intervention produce the need for another intervention?
c) Apply the Intervention Calculus (Ch. 19) to the decision to implement No Child Left Behind. Who bore the burden of proof? Should they have? What would a proper cost-benefit analysis have included?
D4. Legibility Traps and Redundancy. Chapter 17 argued that redundancy is not waste. Chapter 20 shows that legibility projects systematically classify redundancy as waste and eliminate it.
a) In each of the four major case studies (forestry, urban renewal, Soviet planning, standardized testing), identify what the legibility project classified as "redundant" or "wasteful." For each, explain why it was actually essential.
b) Propose a principle for distinguishing genuine waste from vital redundancy. How would you operationalize this principle in an organization that is under pressure to "eliminate waste" and "improve efficiency"?
Part E: Advanced Challenges
These exercises push beyond the chapter's material into deeper or more speculative territory.
E1. Research Elinor Ostrom's eight design principles for sustainable commons governance. For each principle, explain how it resists the Arc of Legibility Failure. Then identify a real-world governance system (a specific fishery, a specific forest, a specific irrigation system) that successfully applies these principles and has avoided the legibility trap. What can centralized institutions learn from this case?
E2. The chapter argues that legibility traps are self-reinforcing because they destroy their own counter-evidence. But some legibility traps have eventually been escaped (German forestry transitioned to close-to-nature management; urban renewal was eventually discredited). Research one of these reversals and analyze: What broke the trap? Was it an external shock, a generational change, a political realignment, or something else? Can the mechanism of escape be generalized?
E3. James C. Scott distinguishes between "thin simplifications" (census data, maps) and "thick simplifications" (comprehensive plans that reshape reality to match the simplified model). Is the Arc of Legibility Failure specific to thick simplifications, or can thin simplifications also produce legibility traps? Construct an argument with examples.
E4. The chapter discusses legibility traps in traditional institutions: governments, schools, corporations. Consider whether the same pattern operates in algorithmic systems. When a recommendation algorithm (YouTube, TikTok, Spotify) simplifies user preferences into a vector of measurable signals, optimizes for engagement, and progressively narrows the user's information environment, is this a legibility trap? Who is trapped -- the user, the platform, or both? How does the speed of algorithmic optimization change the dynamics of the arc?
E5. Write a 500-word proposal for a "Legibility Trap Audit" -- a systematic process that an organization could use to identify legibility traps in its own operations. What would the audit look for? Who would conduct it? How would it distinguish between useful measurement and destructive metric fixation? How would it handle the political resistance that such an audit would inevitably provoke?
Part M: Mixed Practice (Interleaved Review)
These exercises mix concepts from Chapters 14-20 to build integrated understanding.
M1. A national healthcare system measures hospital quality by patient mortality rates (legibility, Ch. 16/20). Hospitals, to reduce their mortality rates, begin refusing to admit the sickest patients, transferring them to other facilities (Goodhart's Law, Ch. 15). The transferred patients arrive at receiving hospitals in worse condition due to transfer delays (iatrogenesis, Ch. 19). Meanwhile, hospitals eliminate "redundant" specialist positions to focus resources on the metrics being tracked (redundancy stripping, Ch. 17). When a pandemic strikes, these hospitals lack the specialist capacity to handle the surge (cascading failure, Ch. 18). Trace the complete failure chain, identify the legibility trap at its root, and propose a polycentric alternative.
M2. A school district implements a data-driven management system that evaluates teachers by student test scores (legibility trap, Ch. 20), adjusts classroom assignments based on predictive algorithms (overfitting, Ch. 14), and eliminates programs that cannot demonstrate measurable outcomes (redundancy stripping, Ch. 17). Teachers who understand their students' individual needs (metis) are overridden by the algorithmic system (Goodhart's Law, Ch. 15). When a new type of assessment is introduced, the entire system -- optimized for the old test -- collapses (cascading failure, Ch. 18). Design a school system that resists all five failure modes simultaneously.
M3. A corporation's supply chain is optimized for maximum efficiency, with single-source suppliers, just-in-time delivery, and minimal inventory (redundancy stripping, Ch. 17). Performance is tracked by a dashboard showing cost-per-unit, delivery times, and defect rates (legibility, Ch. 20). All three metrics are green. When a key supplier fails, the cascade destroys the entire production line (cascading failure, Ch. 18). The company's response is to add more dashboard metrics and more frequent supplier audits (doubling down, Ch. 20). Explain why this response is iatrogenic (Ch. 19) and propose a via negativa alternative.
M4. A government's environmental policy reduces forest management to a single metric: total forested area (legibility trap, Ch. 20). To hit the target, the government plants vast monoculture tree plantations (simplification). The plantations count as "forested area" on the dashboard but support a fraction of the biodiversity of natural forests (Goodhart distortion, Ch. 15). The monocultures, lacking ecological redundancy (Ch. 17), are devastated by a bark beetle outbreak that would have been contained in a diverse forest (cascading failure, Ch. 18). The government responds by spraying pesticides, which kill the beetle's natural predators, creating a worse outbreak in the following season (iatrogenesis, Ch. 19). Map this entire chain of failures and identify the point at which intervention would have been most effective.
M5. A tech company measures engineering productivity by lines of code committed per developer per week. Identify all of the failure patterns from Chapters 14-20 that this single metric could trigger. Be specific: name each pattern, explain the mechanism, and predict the outcome. Then design a measurement system that captures engineering productivity without creating a legibility trap.