Chapter 7 Exercises: The Law of Large Numbers — Why Small Samples Lie


Level 1: Recall and Comprehension

1.1 State the law of large numbers in your own words. What does it say, and what does it explicitly not say about individual outcomes?

1.2 What is the difference between the weak law of large numbers and the strong law of large numbers? Why does this distinction matter in practice?

1.3 Define variance in plain language. Explain why higher variance in a measurement means you need a larger sample size to detect a real pattern.

1.4 What is statistical power? If a study has 20% power, what does that mean about its ability to detect real effects?

1.5 What is the multiple comparisons problem? Give a concrete example of how it creates false patterns in social media analytics.

1.6 Distinguish between pre-registered hypotheses and post-hoc explanations. Why does the distinction matter for interpreting results?

1.7 What is the "winner's curse" in science, as described in this chapter? How does it cause published research to systematically overstate effect sizes?


Level 2: Application

2.1 A fair coin is flipped 20 times and lands heads 14 times (70%). Someone concludes: "This coin might be biased." Apply the law of large numbers to evaluate this claim. How many flips would you want to see before concluding bias? What would you look for?

2.2 Nadia tracks which days her Instagram Reels perform best. Over 8 weeks, she posts 32 videos and finds that Friday posts average 4,200 views while Monday posts average 2,800 views. She has 8 Friday posts and 6 Monday posts in her dataset. List three specific reasons why this 8-week finding should not be acted upon as a concluded fact.

2.3 Marcus's startup had revenue of $8,200 in Month 1, $11,500 in Month 2, and $14,800 in Month 3 — a steady upward trend. He presents this data to a potential investor and says it demonstrates reliable growth. Using principles from this chapter, write the two key questions an investor familiar with the law of large numbers would immediately ask.

2.4 A psychology study finds that students who listened to classical music while studying scored 12% higher on a subsequent test (n = 22 students, p = 0.04). Apply the Button et al. (2013) framework from this chapter to evaluate how much you should believe this finding. What would make you more confident?

2.5 You're managing a school fundraiser. In the first week, your best fundraiser collected $800 while the average was $300. You plan to give that top performer extra responsibilities next week. What does the law of large numbers suggest might happen? Name the statistical phenomenon at play.

2.6 A hedge fund manager posts returns of +18% for three years running in a market that averaged +7%. Someone says, "Three years is a long track record — this is genuine skill." Use the law of large numbers to explain why three years may still be an insufficient sample.


Level 3: Analysis

3.1 Nadia is testing whether a specific video format (fast-cut B-roll) performs better than her standard format (talking head). She posts 6 fast-cut videos and 6 talking head videos over 12 weeks and finds fast-cut averages 3,800 views vs. 2,900 for talking head. Her view counts per video range from 900 to 15,000 (very high variance). Analyze whether her sample is adequate to conclude the format difference is real. What sample size would you roughly need if the true effect is 30% and the standard deviation in views is 3,000?

3.2 The chapter states that in the Button et al. (2013) study, median statistical power in neuroscience studies was approximately 21%. Calculate: If a study has 21% power and the true effect exists, what is the probability it will not find the effect? If 1,000 underpowered studies are run and 50% of them are testing a real effect, how many true positives would you expect? How many false positives (at a 5% false-positive rate)? What fraction of the published "significant" results would be false positives?

3.3 Compare the small-sample problem in three different domains: (a) medical research, (b) social media analytics, and (c) startup performance evaluation. In each case: What is typically being measured? What is the source of variance? What is the consequence of acting on small-sample conclusions? Which domain do you think suffers most from the problem, and why?

3.4 The hot hand paper by Miller and Sanjurjo (2018) found that studies denying the hot hand contained a mathematical bias. Does this mean the original Gilovich et al. (1985) study was wrong to challenge intuitions about the hot hand? Evaluate the implications carefully: What did the bias correction show? What did it not show? What does this episode tell us about how science works and how small samples affect even expert statistical analysis?


Level 4: Synthesis and Evaluation

4.1 Design a 16-week content experiment for Nadia that would give her reliable information about the effect of posting time on engagement. Your design should include: (a) the specific hypothesis she is testing; (b) how she will control for confounding variables (content quality, topic, format); (c) how many observations per condition she needs; (d) how she will analyze the results; (e) what level of evidence would be convincing. Explain how your design addresses the small-sample and multiple-comparisons problems.

4.2 You are advising a startup that has had 4 months of data showing 15% month-over-month growth. The founder wants to raise a Series A round using this growth as the primary evidence of product-market fit. Write a memo arguing that 4 months is an insufficient sample to confidently claim sustained growth, using the law of large numbers. Then write a counter-memo arguing the case for why 4 months may be sufficient context for a decision of this type. Evaluate which memo is more persuasive and why.

4.3 Consider two content creators: Creator A has 2 years of weekly posting data showing consistent 10% above-average engagement on Thursdays. Creator B has 8 weeks of data showing 40% above-average Thursday engagement. Apply the law of large numbers to compare how confident you should be in each creator's pattern. Now introduce this complication: Creator A has been posting identical content formats every week, while Creator B has been experimenting with new formats. How does this change your analysis?

4.4 The medical replication crisis described in this chapter (and in Chapter 9's case study) has led some researchers to propose that the standard p < 0.05 significance threshold should be replaced with p < 0.005 as the default. Evaluate this proposal using the law of large numbers and statistical power concepts. What would be the benefits? What would be the costs? Is there a better solution?


Level 5: Creative and Personal Application

5.1 Keep a 4-week "Pattern Journal." Each day, record one pattern you noticed in your life (e.g., "I got more done on mornings when I skipped social media," "I felt better after eating breakfast," "My first chess move wins more often when I take a long pause"). At the end of 4 weeks, analyze each pattern: How many observations does each claim rest on? What is the variance? What would you need to establish it as reliable? Which patterns survived scrutiny?

5.2 Find one claim you believe strongly about your own productivity, relationships, or performance ("I'm a night person," "I study better with music," "I perform better under pressure"). How many data points is that belief based on? What would a proper test of that belief look like? Design the test.

5.3 Find a news article reporting on a study with fewer than 100 participants. (These are common in health, psychology, and business journalism.) Write a 300-word critique applying the law of large numbers and statistical power. What claims does the article make? How confident should the findings make you? What would you need to know before acting on the findings?

5.4 Dr. Yuki says: "Six weeks of data is not a pattern. It's a story you're telling yourself about a pattern." Write a one-page reflection on a time when you — or someone you know — made a significant decision based on a small sample. What pattern did you (or they) believe in? What happened next? In hindsight, how should the decision have been approached differently?

5.5 Simulate the law of large numbers using a physical experiment: Flip a coin 100 times. After every 10 flips, record the cumulative proportion of heads. Create a table and graph showing how the proportion evolves. What is the proportion after 10 flips? 50 flips? 100 flips? How does your personal simulation compare to the theoretical expectation?