Key Takeaways: Sampling, Estimation, and Confidence Intervals
This is your reference card for Chapter 22. The core message: you can learn a lot about a population from a well-chosen sample — but only if you're honest about what you know and what you don't.
Key Concepts
-
You almost never have the whole population. The population is the group you care about; the sample is the subset you actually observe. The gap between them is the reason statistical inference exists.
-
Sample quality matters more than sample size. A biased sample of 2 million is worse than an unbiased sample of 1,000. Randomness is the antidote to bias. Always ask: "Is my sample representative?"
-
A point estimate is never the whole story. A single number (like "the average is 72%") hides how uncertain you are about that number. Always pair a point estimate with a measure of uncertainty.
-
The sampling distribution is the key concept. If you could take many samples and compute the mean of each, those means would form a distribution centered at the true population mean. The spread of that distribution is the standard error.
-
Confidence intervals communicate uncertainty honestly. A 95% CI means: if you repeated the process many times, about 95% of the intervals would contain the true value. It does not mean there's a 95% probability the true value is in this particular interval.
-
The bootstrap is a powerful alternative to formulas. By resampling from your sample with replacement, you can build confidence intervals for any statistic — means, medians, percentiles, correlations, and more.
Core Formulas
Standard Error of the Mean: SE = s / √n
95% Confidence Interval: x̄ ± 1.96 × SE (using z, for large n)
x̄ ± t* × SE (using t, for any n)
Margin of Error: MOE = z* × SE or MOE = t* × SE
Required Sample Size: n = (z* × σ / MOE)²
Common z values: - 90% confidence: z = 1.645 - 95% confidence: z = 1.960 - 99% confidence: z = 2.576
Types of Sampling
| Method | How It Works | Best For | Watch Out For |
|---|---|---|---|
| Simple random | Every individual has equal probability of selection | General-purpose, when no population structure is known | May under-represent small subgroups by chance |
| Stratified | Divide into subgroups, sample within each | When subgroups differ substantially | Requires knowing the subgroup structure in advance |
| Cluster | Randomly select groups, measure everyone in selected groups | When listing all individuals is impractical | Less precise than SRS for the same total sample size |
| Systematic | Select every k-th individual from a list | Assembly lines, ordered lists | Can miss patterns that match the sampling interval |
| Convenience | Sample whoever is easiest to reach | Quick pilot studies, preliminary exploration | Almost always biased; results not generalizable |
Types of Bias to Watch For
| Bias Type | What Happens | Classic Example |
|---|---|---|
| Selection bias | Sampling process favors certain individuals | Literary Digest polling wealthy households only |
| Non-response bias | Responders differ from non-responders | Customer surveys answered mainly by angry or delighted customers |
| Survivorship bias | You only see the "survivors" of a process | Studying successful companies without studying the ones that failed |
| Convenience bias | You sample whoever is easy to reach | Polling people at a shopping mall on weekday afternoons |
| Voluntary response | People opt in to participate | Online "click to vote" polls attracting people with strong opinions |
The Bootstrap Procedure
1. Start with your sample of n observations
2. Draw a new sample of n observations WITH REPLACEMENT
3. Compute your statistic (mean, median, correlation, etc.)
4. Repeat steps 2-3 ten thousand times
5. The 2.5th and 97.5th percentiles of the results
form your 95% confidence interval
The bootstrap works for virtually any statistic. It doesn't require distributional assumptions. It needs a reasonably sized sample (n ≥ 30 is a rough guideline).
Critical Distinctions
| Concept | What It Is | What It Is NOT |
|---|---|---|
| Standard deviation | Spread of individual observations | Precision of the sample mean |
| Standard error | Spread of sample means across repeated samples | Spread of individual observations |
| Confidence level | Long-run success rate of the CI procedure | Probability that the truth is in this specific interval |
| Margin of error | Half-width of the CI; captures sampling variability | Total uncertainty (doesn't include bias or measurement error) |
| Point estimate | Your best single-number guess | The true value (which remains unknown) |
The Confidence Interval Interpretation
Correct: "If we repeated this study many times, about 95% of the resulting confidence intervals would contain the true parameter."
Incorrect: "There's a 95% probability that the true parameter is in this interval."
The difference is subtle but important. The truth is fixed. The interval is random. The 95% describes the procedure, not any single result.
Key Trade-Offs
Higher confidence level (95% → 99%)
→ Wider interval (less precise)
→ More likely to capture the truth
Larger sample size (n = 50 → n = 200)
→ Narrower interval (more precise)
→ More expensive / time-consuming
Quadrupling n → Halves the margin of error
(Diminishing returns: each doubling of precision
requires 4x more data)
What You Should Be Able to Do Now
- [ ] Identify the population and sample in any research scenario
- [ ] Recognize common forms of sampling bias and assess whether a sample is likely to be representative
- [ ] Compute a confidence interval for a mean using both the formula and the bootstrap
- [ ] Interpret a confidence interval correctly, avoiding the most common misconception
- [ ] Explain why a large biased sample is worse than a small random sample
- [ ] Use
scipy.statsto compute confidence intervals in Python - [ ] Implement the bootstrap procedure from scratch for any statistic
- [ ] Determine the sample size needed to achieve a desired margin of error
- [ ] Communicate uncertainty to a non-technical audience
If you checked every box, you're ready for Chapter 23, where we'll use these same tools to test specific claims — hypothesis testing. The question shifts from "what's the plausible range?" to "is there enough evidence to reject this claim?"