Chapter 7 Further Reading: The Law of Large Numbers — Why Small Samples Lie

DataField.Dev

Chapter 7 Further Reading: The Law of Large Numbers — Why Small Samples Lie

Foundational Texts

1. Mlodinow, Leonard. The Drunkard's Walk: How Randomness Rules Our Lives (2008). Pantheon Books.

Mlodinow's accessible exploration of randomness covers the law of large numbers, regression to the mean, and the human tendency to find patterns in noise. His treatment of the small-sample problem is elegantly non-technical, making it ideal for readers encountering these ideas for the first time. Particularly strong chapters on how people misread streaks in sports and business. One of the best general-audience books on probability and luck.

2. Kahneman, Daniel. Thinking, Fast and Slow (2011). Farrar, Straus and Giroux.

Kahneman won the Nobel Prize in Economics for his work on cognitive biases. Part of his contribution involves how people systematically misread small samples — the "law of small numbers" (the erroneous belief that small samples should resemble the population they came from). Chapter 10 ("The Law of Small Numbers") is directly relevant to this chapter. The full book provides the psychological foundation for why human beings are structurally bad at statistical reasoning.

3. Gilovich, Thomas. How We Know What Isn't So: The Fallibility of Human Reason in Everyday Life (1991). Free Press.

Gilovich's classic includes detailed treatment of the hot hand studies (he co-authored the original Gilovich, Vallone, and Tversky 1985 paper). The book covers belief in hot hands, streaks, and the tendency to see patterns in random sequences. Essential reading for understanding how smart people misread small samples. Note: read alongside the Miller and Sanjurjo (2018) revision to get the updated picture.

Key Research Papers

4. Nosek, B.A., et al. (2015). "Estimating the Reproducibility of Psychological Science." Science, 349(6251).

The landmark Open Science Collaboration paper that documented the replication crisis in psychology. Nosek and 270 collaborators replicated 100 psychology studies, finding approximately 39% successful replications. The paper carefully discusses what "replication success" means, the relationship between original study sample size and replication success, and the role of effect size inflation. Available open-access at osf.io.

5. Button, K.S., Ioannidis, J.P.A., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S.J., & Munafò, M.R. (2013). "Power failure: Why small sample size undermines the reliability of neuroscience." Nature Reviews Neuroscience, 14(5), 365–376.

The systematic review that documented median statistical power of approximately 21% in neuroscience studies. The paper explains how low power interacts with publication bias to produce a literature full of inflated or false findings. Essential reading for understanding the mechanisms behind the replication crisis. Highly readable despite being a technical paper.

6. Miller, J.B., & Sanjurjo, A. (2018). "Surprised by the Hot Hand Fallacy? A Truth in the Law of Small Numbers." Econometrica, 86(6), 2019–2047.

The paper that partially rehabilitated the hot hand by finding a mathematical bias in studies that denied it. Miller and Sanjurjo show that conditioning on streaks in finite sequences introduces a selection bias. After correction, they find modest evidence for a genuine hot hand in skill tasks. A masterclass in how small-sample issues affect even technically sophisticated analyses, and how science self-corrects when errors are found.

7. Ioannidis, J.P.A. (2005). "Why Most Published Research Findings Are False." PLOS Medicine, 2(8), e124.

Perhaps the most-cited and most-discussed paper in the methodology of science. Ioannidis demonstrates mathematically that under typical research conditions — small samples, publication bias, multiple comparisons — the majority of published findings in many fields are likely to be false. Available free online. Challenging but accessible to careful readers. Essential for anyone who wants to evaluate scientific claims critically.

Applied Reading

8. Silver, Nate. The Signal and the Noise: Why So Many Predictions Fail — but Some Don't (2012). Penguin Press.

Silver's examination of forecasting in fields from weather to baseball to politics. The book is essentially a guided tour through the small-sample problem in different domains: how do we separate real signals from noise when the data is limited and volatile? Silver's discussion of baseball statistics (the PECOTA system), political forecasting, and financial prediction is especially relevant to themes of this chapter. His treatment of Bayesian reasoning as the solution to the signal/noise problem is illuminating.

9. Reinhart, Alex. Statistics Done Wrong: The Woefully Complete Guide (2015). No Starch Press.

A concise, practical guide to the most common statistical mistakes — most of which trace back to small samples, multiple comparisons, and misunderstanding p-values. Reinhart covers the replication crisis, statistical power, publication bias, and base rate neglect with unusual clarity and wit. Available free online at statisticsdonewrong.com. Particularly strong on why p < 0.05 is not what most people think it is.

10. Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant." Psychological Science, 22(11), 1359–1366.

The paper that coined the term "researcher degrees of freedom" — the many small decisions researchers make (when to stop collecting data, which outliers to exclude, which covariates to include) that each individually seem reasonable but together create enormous opportunity for false-positive findings. The paper demonstrated this with an absurd but statistically valid finding: listening to a specific song makes people feel younger. The lesson: even completely honest researchers can produce false findings through flexible small-sample analysis. Highly readable and somewhat alarming.

Online Resources

"Power Failure" calculator (G*Power software): Free tool for calculating required sample sizes. Available at gpower.hhu.de. Input your expected effect size, desired power, and significance level; get required sample size.
Our World in Data — Scientific Progress section: Excellent visualizations of the replication crisis and its scope across different scientific fields.
Pre-registration resources: The Open Science Framework (osf.io) provides free pre-registration tools for anyone who wants to commit to hypotheses before collecting data — including for personal experiments.