Further Reading: Inference for Proportions

Books

For Deeper Understanding

Charles Wheelan, Naked Statistics: Stripping the Dread from the Data (2013) Wheelan's chapters on polls, surveys, and inference are among the most accessible treatments of proportion inference available. His discussion of how polling margins of error work — and how they can be misinterpreted — is directly relevant to Section 14.7 and Case Study 1. If you want more examples of proportion inference in everyday contexts, this is the book to read. Previously recommended for hypothesis testing (Chapter 13) — the polling chapters are equally strong.

David Freedman, Robert Pisani, and Roger Purves, Statistics, 4th edition (2007) Chapter 21 covers tests of significance for proportions with exceptional clarity. Freedman was particularly careful about the distinction between the standard error under the null (using $p_0$) versus the standard error for confidence intervals (using $\hat{p}$). His treatment of polling and election prediction in the exercises is a perfect companion to Case Study 1.

Andrew Gelman, Jennifer Hill, and Aki Vehtari, Regression and Other Stories (2020) Chapter 4 provides a modern treatment of proportion inference that includes discussion of the Wilson interval, Bayesian alternatives, and the limitations of the Wald interval. The authors are leading voices in the movement to improve statistical practice, and their perspective on proportion inference reflects current best practice.

Nate Silver, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't (2012) Silver's book is the definitive popular treatment of election forecasting, polling methodology, and the challenges of prediction. Chapter 2 ("Are You Smarter Than a Television Pundit?") is particularly relevant to Case Study 1's discussion of how margins of error are misinterpreted in election coverage. Silver's framework for thinking about uncertainty in forecasting connects directly to Theme 4 (uncertainty is not failure).

For the Mathematically Curious

Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference (2004) Chapter 11 covers inference for proportions with mathematical rigor, including formal derivations of the Wald interval, the Wilson interval, and the exact Clopper-Pearson interval. Wasserman also proves why the Wald interval has poor coverage — the mathematical argument is elegant and helps explain the practical guidance in Section 14.6.

George Casella and Roger Berger, Statistical Inference, 2nd edition (2002) For the mathematically inclined: Chapter 9 covers interval estimation with detailed treatment of the Wald, score (Wilson), and likelihood ratio intervals. The connection between hypothesis tests and confidence intervals (the "inversion" principle) is proved rigorously.

Articles and Papers

Brown, L. D., Cai, T. T., & DasGupta, A. (2001). "Interval Estimation for a Binomial Proportion." Statistical Science, 16(2), 101-133. The landmark paper that exposed the Wald interval's poor coverage properties and recommended the Wilson interval as a superior alternative. This is the paper referenced in Section 14.6. The authors show, with extensive simulation, that the Wald interval's actual coverage can be far below the nominal level, even for moderate sample sizes. The paper is surprisingly readable for a technical statistics journal and includes beautiful visualizations of coverage probability. Available free at: https://doi.org/10.1214/ss/1009213286

Agresti, A., & Coull, B. A. (1998). "Approximate Is Better Than 'Exact' for Interval Estimation of Binomial Proportions." The American Statistician, 52(2), 119-126. The paper that introduced the plus-four method (which the authors call the "adjusted Wald interval"). Agresti and Coull show that adding 2 successes and 2 failures before computing the Wald interval substantially improves coverage, with almost no increase in computational complexity. The title makes a provocative point: the "approximate" adjusted Wald interval actually has better coverage than many "exact" methods. Available at: https://doi.org/10.1080/00031305.1998.10480550

Wilson, E. B. (1927). "Probable Inference, the Law of Succession, and Statistical Inference." Journal of the American Statistical Association, 22(158), 209-212. The original paper introducing the Wilson interval. At fewer than four pages, it's remarkably concise. Wilson derives the interval by "inverting" the hypothesis test — finding all values of $p_0$ that would not be rejected at a given significance level. This inversion principle is the formal version of the CI-test duality from Chapter 13.

Kennedy, C., Blumenthal, M., Clement, S., et al. (2018). "An Evaluation of the 2016 Election Polls in the United States." Public Opinion Quarterly, 82(1), 1-33. The American Association for Public Opinion Research (AAPOR) committee's official investigation of the 2016 polling errors. This is the definitive account of what went wrong and why, covering nonresponse bias, weighting failures, and late-deciding voters. Essential reading for anyone interested in the real-world application of proportion inference to elections. Available at: https://doi.org/10.1093/poq/nfx047

Shirani-Mehr, H., Rothschild, D., Goel, S., & Gelman, A. (2018). "Disentangling Bias and Variance in Election Polls." Journal of the American Statistical Association, 113(522), 607-614. A rigorous analysis that separates random sampling error (variance) from systematic error (bias) in election polls. The authors find that the average polling error is about 2-3 times larger than the reported margin of error, consistent with the analysis in Case Study 1. The implication: margins of error should be interpreted as lower bounds on uncertainty.

Online Resources

Interactive Tools

Seeing Theory — Frequentist Inference https://seeing-theory.brown.edu/frequentist-inference/ Brown University's interactive module lets you set up proportion tests and confidence intervals with visual feedback. You can change the sample size, observed proportion, and null hypothesis value and watch the p-value and CI update in real time.

R. Psychologist: Understanding P-Values https://rpsychologist.com/pvalue/ Kristoffer Magnusson's interactive visualization helps you understand the relationship between sample size, effect size, and p-values in the context of proportion tests. Watching the sampling distributions shift as you change parameters builds intuition that static diagrams can't match.

FiveThirtyEight: How Our Forecast Works https://projects.fivethirtyeight.com/2020-election-forecast/ Nate Silver's methodology article explains how FiveThirtyEight builds election forecasts from polls, including how they model uncertainty, adjust for correlated errors, and convert state-level polls into Electoral College probabilities. A master class in applied proportion inference at scale.

Real Clear Politics: Polling Average https://www.realclearpolitics.com/epolls/latest_polls/ A real-time aggregation of current political polls. Practice applying the concepts from this chapter: look at the sample sizes, margins of error, and differences between polls. Ask yourself: are these differences within the margin of error? Which polls have larger samples? Do the CIs overlap?

Video Resources

StatQuest with Josh Starmer: "One Proportion Z-Test" (YouTube) Josh Starmer's clear, step-by-step walkthrough of the one-sample z-test for proportions. He covers the hypotheses, conditions, test statistic, and p-value calculation with characteristic enthusiasm. About 12 minutes. If you want a second explanation after reading this chapter, this is it.

Khan Academy: "Hypothesis Test for a Proportion" (khanacademy.org) Sal Khan works through multiple proportion test examples with detailed arithmetic. His series covers one-tailed and two-tailed tests, conditions checking, and interpretation. Particularly good for students who want more practice with the mechanical steps.

3Blue1Brown: "Binomial distributions | Probabilities of probabilities" (YouTube) Grant Sanderson's beautiful visualization of the binomial distribution and its connection to inference about proportions. He develops the intuition for why the Bayesian and frequentist approaches converge — which illuminates why the Wilson interval (a frequentist procedure) works better than the Wald interval (which ignores the prior information implicit in the problem structure).

Veritasium: "Is Most Published Research Wrong?" (YouTube) Derek Muller's accessible treatment of the replication crisis, base rates, and false positives. The false positive paradox he demonstrates is exactly the calculation in Case Study 2. A good companion to the chapter's theme of connecting proportion inference to Bayes' theorem.

CrashCourse Statistics: "Test Statistics" (YouTube) A quick visual overview of test statistics for proportions. About 12 minutes. Good for review or as a supplement to the textbook treatment.

Software Documentation

statsmodels: proportions_ztest https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.proportions_ztest.html Documentation for the proportions_ztest() function used in Section 14.8. Key parameters: count (number of successes), nobs (sample size), value (null hypothesis proportion), alternative ('two-sided', 'larger', 'smaller'). For one-sample tests, pass scalars. For two-sample tests (Chapter 16), pass arrays of length 2.

statsmodels: proportion_confint https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.proportion_confint.html Documentation for the proportion_confint() function. Key parameter: method — options include 'normal' (Wald), 'wilson', 'binom_test' (Clopper-Pearson exact), 'agresti_coull' (plus-four), and others. This single function computes all the CI methods discussed in Section 14.6.

SciPy: scipy.stats.binomtest https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binomtest.html Documentation for the exact binomial test, used when the success-failure condition fails (Section 14.10). Added in SciPy 1.7. Returns a BinomTestResult object with the p-value and a confidence interval (Clopper-Pearson by default). The alternative parameter accepts 'two-sided', 'greater', or 'less'.

SciPy: scipy.stats.norm https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html The standard normal distribution object used for manual z-tests. Key methods: .cdf(z) for $P(Z \leq z)$, .sf(z) for $P(Z > z)$ (more numerically stable than 1 - cdf(z)), .ppf(q) for critical values. Previously referenced in Chapter 13.

Historical Notes

The Wilson Interval's Long Road to Adoption

Edwin Wilson published his improved confidence interval formula in 1927 — nearly a century ago. Yet introductory statistics textbooks continued to teach only the Wald interval for decades, and many still do.

Why the delay? Several factors:

Computational simplicity. The Wald interval is easy to compute by hand. The Wilson interval requires solving a quadratic equation. Before computers, this mattered.
Tradition. Textbooks copy previous textbooks. Once the Wald interval was established as "the" proportion CI, inertia kept it there.
The coverage problem wasn't widely known. Although the Wald interval's poor coverage was known to theoretical statisticians, it wasn't until Brown, Cai, and DasGupta's 2001 paper (and Agresti and Coull's 1998 paper) that the extent of the problem was demonstrated clearly enough to change practice.
Software. Modern software makes the Wilson interval just as easy to compute as the Wald interval (a single function call in Python or R). Once the computational barrier disappeared, there was no good reason to stick with the inferior method.

Today, most professional statisticians use the Wilson interval (or its variants) as the default for proportion inference. The Wald interval is still taught in introductory courses (this textbook included) because it's simpler and pedagogically transparent — but you should know its limitations and use Wilson or plus-four for real work.

Abraham Wald: The Same Wald

The Wald interval is named after Abraham Wald (1902-1950), the same statistician who appeared in Chapter 4's discussion of survivorship bias (the story about armoring World War II bomber planes). Wald was a brilliant mathematical statistician who made fundamental contributions to decision theory, sequential analysis, and statistical inference. The "Wald test" and "Wald interval" are both based on his asymptotic theory of maximum likelihood estimation. The irony is that Wald — who was so insightful about survivorship bias — developed a confidence interval method that itself has a systematic flaw (poor coverage for small samples and extreme proportions).

What's Coming Next

Chapter 15 will introduce the t-test for means — the most commonly used hypothesis test in practice. The logic is identical to what you've learned here, but:

The test statistic uses the t-distribution instead of the standard normal
The standard error uses $s$ (estimated from the sample) instead of $\sigma$ (known)
The degrees of freedom determine which t-distribution to use

If you understood the proportion z-test in this chapter, the t-test will feel like a natural extension — same framework, different distribution.

Resources to preview: - StatQuest: "Student's t-test" (YouTube) — focused walkthrough of the one-sample t-test - Khan Academy: "One-sample t-test" (khanacademy.org) — multiple worked examples - Seeing Theory: Frequentist Inference module — interactive t-test visualization