Chapter 8 Exercises: Regression to the Mean — Why Hot Streaks Cool Down
Level 1: Recall and Comprehension
1.1 What is regression to the mean? Define it in your own words without using jargon. Why does it happen, mathematically?
1.2 What did Francis Galton observe when studying the heights of parents and their adult children? What did he mistakenly think was the mechanism, and what is the actual mechanism?
1.3 Use the formula for regression to the mean: if the correlation between first and second performance is r = 0.6, and a player's first-season batting average was 2.5 standard deviations above the population mean, what is the expected distance from the mean of their second-season average?
1.4 Explain the "illusion of coaching effects" created by regression to the mean. Why does an intervention after a terrible performance almost always appear to "work"?
1.5 What is the difference between regression to the mean and genuine performance decline? List three signs that help distinguish them.
1.6 Why does regression to the mean make it dangerous to change strategy after extreme outcomes (either extremely good or extremely bad)?
1.7 Dr. Yuki says "the most dangerous moment in any endeavor is right after it's going brilliantly." Explain this warning using the concept of regression to the mean.
Level 2: Application
2.1 A student scores 98% on their first exam — the highest in the class. Their professor expects great things from them. On the second exam, the student scores 84% — still above average, but much lower than the first. The professor suspects the student "peaked" and is now "declining." Apply regression to the mean to explain what is more likely happening. What additional information would you need to confirm it's regression rather than genuine decline?
2.2 Marcus's startup had revenues of $8,200, $11,500, and $14,800 in months 1-3. The month-to-month correlation for early-stage startup revenue is typically around r = 0.4. If $14,800 was 2.1 standard deviations above the historical startup mean of $8,000 (with a standard deviation of $3,200), what does regression to the mean predict for Month 4? Show your calculation.
2.3 A basketball team fires its coach after a season with 20 wins (historically, the team averages 35 wins). The next coach leads the team to 38 wins. Sports journalists attribute the turnaround to the new coach. What alternative explanation should they have considered, and how would you test between the two explanations?
2.4 Nadia's video goes viral with 450,000 views. Her next five videos average 3,100 views (her historical average). She concludes her formula for viral videos isn't working. Apply regression to the mean to evaluate this conclusion. What would you tell her?
2.5 A school implements a new tutoring program specifically for its lowest-performing students (bottom 15% on standardized tests). After one semester of tutoring, the same students show significant improvement. The principal credits the tutoring program. What threat to this conclusion does regression to the mean present? How would you design a study that properly controls for it?
2.6 In investing, a "star fund manager" outperforms the market by 3% annually for five years. They are then given significantly more assets to manage. In years 6-10, their performance is essentially average. What role might regression to the mean play in explaining this pattern?
Level 3: Analysis
3.1 The correlation between a baseball player's batting average in the first half and second half of the same season is approximately r = 0.5. The correlation between a player's batting average in Year 1 and Year 2 is approximately r = 0.6. Using the regression-to-the-mean formula, calculate the expected regression for a player who bats .380 in the first half (assuming population mean of .265 and standard deviation of .025). Calculate separately for half-season-to-half-season and year-to-year. Interpret the difference.
3.2 The chapter discusses the "coaching intervention illusion" — that interventions following bad performances appear to work even when they have no effect. Design a controlled experiment that would allow a company to distinguish genuine improvement caused by a management intervention from improvement caused by regression to the mean after bad months.
3.3 Galton originally called regression to the mean "regression toward mediocrity." Is this label accurate or misleading? Explain carefully: does regression to the mean imply that exceptional people become average? That systems tend toward mediocrity? Or something more precise? What does the label get wrong?
3.4 Compare how regression to the mean operates in three different domains: (a) athletic performance, (b) startup revenue, (c) academic test scores. For each domain: What are the true-ability and luck components? How strong is the year-to-year correlation (high or low)? What magnitude of regression would you typically expect? Which domain do you think is most dangerous in terms of misleading decision-makers, and why?
Level 4: Synthesis and Evaluation
4.1 Dr. Yuki tells Marcus that his three-month hot streak might be regression fodder. But suppose Marcus pushes back: "The correlation between my monthly revenue and the next month's revenue is probably low — you said r = 0.4 — which means there's a lot of variance. But doesn't that variance also mean that another three great months in a row is plausible? I might just be someone whose true underlying growth rate is high." Evaluate this argument. Is Marcus right? What is the correct response to a scenario with genuinely high variance? How do you distinguish a truly high-growth startup from a lucky one in a high-variance environment?
4.2 Some commentators have argued that the concept of regression to the mean has been overextended and that it is sometimes used to dismiss genuine expertise, genuine skill, and genuine streaks. Make the strongest possible argument that this criticism is valid — that regression to the mean is sometimes misapplied to explain away real patterns. Then evaluate when regression to the mean is the right explanation and when genuine sustained exceptional performance is real.
4.3 The chapter argues that making irreversible decisions during hot streaks is dangerous because of regression to the mean. But consider: in competitive environments, speed may be essential. If you wait for more data, you may lose the opportunity. Analyze the trade-off between "gather more data to avoid regression traps" and "act quickly to capture the opportunity." When does the cost of waiting outweigh the cost of acting on incomplete data?
4.4 "The most dangerous moment in any endeavor is right after it's going brilliantly." Is this always true? Generate three scenarios in which it would be correct, and then generate two counter-scenarios in which the most dangerous moment is actually right after things are going terribly (when regression to the mean might lead to complacency rather than action). What framework would you use to decide which situation you're in?
Level 5: Creative and Personal Application
5.1 Think of three "turnarounds" you've witnessed or experienced — a student who improved dramatically after tutoring, a team that turned around after a coaching change, a person who got better after starting a new routine. For each case, evaluate: How extreme was the performance before the intervention? Is it possible the improvement was regression to the mean rather than a genuine effect of the intervention? What evidence would convince you either way?
5.2 Keep a performance journal for four weeks on a domain that matters to you (grades, athletic training, creative work, a skill you're developing). Record your daily or weekly performance. At the end of four weeks, identify your best and worst weeks. Were there interventions or strategy changes that coincided with changes in performance? Using regression to the mean, re-interpret those changes. What story did the performance data tell you without regression awareness? What story does it tell with it?
5.3 Find a news article from sports, business, or entertainment that attributes a performance change to a specific cause (a new coach, a new product launch, a change in strategy, a new partnership). Using regression to the mean, write an alternative analysis of the same story. What would need to be true for the journalist's causal story to be correct? What evidence would demonstrate that regression was not the primary explanation?
5.4 Marcus is considering dropping to part-time school based on three great startup months. Write the conversation you think Dr. Yuki should have with him — including what questions she would ask, what data she would want to see, and what she would recommend. Then write Marcus's most sophisticated counter-argument. Who do you think is right, and why?
5.5 Design a personal decision-making protocol for yourself that builds in protection against regression-to-the-mean traps. Specifically: How long will you track performance before making major decisions? What comparison points will you use? How will you distinguish a genuine signal from a statistical fluctuation? Write this as a concrete set of rules, not general principles.