Further Reading: Sampling, Estimation, and Confidence Intervals

Sampling and estimation are foundational to nearly everything in statistics and data science. If any part of this chapter sparked your curiosity — or if the confidence interval interpretation still feels a bit slippery — these resources will help deepen your understanding.

Tier 1: Verified Sources

David Spiegelhalter, The Art of Statistics: How to Learn from Data (Basic Books, 2019). Spiegelhalter is a master at explaining statistical ideas without drowning you in formulas. His chapters on sampling, estimation, and the interpretation of uncertainty are some of the clearest explanations in print. If the confidence interval interpretation section of our chapter left you wanting more clarity, start here.

Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference (Springer, 2004). If you want the mathematical foundations behind everything in this chapter — sampling theory, the Central Limit Theorem, confidence interval construction, and the bootstrap — Wasserman covers it rigorously but accessibly. More technical than our treatment, but a fantastic reference for deeper understanding. The bootstrap chapter is particularly good.

Bradley Efron and Robert Tibshirani, An Introduction to the Bootstrap (Chapman & Hall/CRC, 1993). This is the book on the bootstrap, written by its inventor (Efron) and one of its most important developers (Tibshirani). It's technical but foundational. If you want to understand the theory behind why the bootstrap works — and its limitations — this is the definitive source.

Sharon Lohr, Sampling: Design and Analysis (Chapman & Hall/CRC, 3rd edition, 2022). The most comprehensive textbook on survey sampling. Covers simple random sampling, stratified sampling, cluster sampling, and much more, with real-world examples throughout. If you're planning to design a survey or work with survey data, this is the reference you need.

Nate Silver, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't (Penguin, 2012). Silver's discussion of election polling — including why polls go wrong and how aggregation helps — is directly relevant to Case Study 1. The book more broadly addresses how to think about uncertainty in prediction, which connects to the confidence interval themes of this chapter.

Charles Wheelan, Naked Statistics: Stripping the Dread from the Data (W.W. Norton, 2013). A warm, funny introduction to statistics for people who are nervous about it. Wheelan's explanations of sampling, the Central Limit Theorem, and confidence intervals are excellent for building intuition. If our chapter moved too fast for you, Wheelan provides the same ideas at a gentler pace.

Tier 2: Attributed Resources

Seeing Theory (Brown University). An interactive website that visualizes statistical concepts, including sampling distributions, confidence intervals, and the Central Limit Theorem. You can watch sampling distributions form in real time as you draw samples. If the simulation sections of this chapter resonated with you, this will delight you. Search "Seeing Theory Brown University."

Pew Research Center, "What Our Declining Response Rates Mean for Polling." Pew has published several reports documenting the decline in survey response rates and what it means for the accuracy of polling. Their data on the drop from 36% to 6% response rates is striking and directly relevant to Case Study 1. Search for Pew Research Center articles on response rates.

AAPOR (American Association for Public Opinion Research) post-election reports. After major election polling misses, AAPOR convenes committees to analyze what went wrong. Their reports are detailed, honest, and technically rigorous. They're the definitive post-mortems on polling failures. Search "AAPOR post-election report."

WHO/UNICEF Estimates of National Immunization Coverage (WUENIC). The methodology documents for the WUENIC estimation process described in Case Study 2 are publicly available. They describe how multiple data sources are reconciled to produce national coverage estimates. Search "WUENIC methodology" for technical documentation, or visit the WHO immunization data portal for the estimates themselves.

StatQuest with Josh Starmer (YouTube). Starmer's videos on confidence intervals, the bootstrap, and sampling distributions are clear, visual, and often entertaining. His step-by-step approach is excellent for reinforcing the concepts from this chapter. Search "StatQuest confidence intervals" or "StatQuest bootstrap."

Recommended Next Steps

If the confidence interval interpretation still feels confusing: Read Spiegelhalter's chapter on uncertainty, or watch the StatQuest video on confidence intervals. The subtlety takes time to sink in — most statisticians will tell you they had to encounter it several times before it clicked.
If you want to go deeper on the bootstrap: Read Efron and Tibshirani's book, or look up the "bias-corrected and accelerated" (BCa) bootstrap, which improves on the percentile method we used in this chapter.
If you're interested in survey design: Lohr's Sampling: Design and Analysis is the standard reference. If you're working with data from a complex survey (stratified, clustered), this will teach you how to compute standard errors correctly.
If the polling case study fascinated you: Read the AAPOR post-election reports and Nate Silver's work on poll aggregation. The intersection of sampling theory and real-world polling is endlessly interesting.
If you want practice: The scipy.stats documentation has excellent examples of confidence intervals, and the scikit-learn documentation covers bootstrap methods. Building simulations yourself — even simple ones — is the best way to internalize sampling distributions.
If you're ready to move on: Chapter 23 takes the logical next step: using these tools to test specific claims about the world. You've learned how to estimate with uncertainty; now you'll learn how to decide based on that uncertainty.