Key Takeaways: Sampling, Estimation, and Confidence Intervals

This is your reference card for Chapter 22. The core message: you can learn a lot about a population from a well-chosen sample — but only if you're honest about what you know and what you don't.


Key Concepts

  • You almost never have the whole population. The population is the group you care about; the sample is the subset you actually observe. The gap between them is the reason statistical inference exists.

  • Sample quality matters more than sample size. A biased sample of 2 million is worse than an unbiased sample of 1,000. Randomness is the antidote to bias. Always ask: "Is my sample representative?"

  • A point estimate is never the whole story. A single number (like "the average is 72%") hides how uncertain you are about that number. Always pair a point estimate with a measure of uncertainty.

  • The sampling distribution is the key concept. If you could take many samples and compute the mean of each, those means would form a distribution centered at the true population mean. The spread of that distribution is the standard error.

  • Confidence intervals communicate uncertainty honestly. A 95% CI means: if you repeated the process many times, about 95% of the intervals would contain the true value. It does not mean there's a 95% probability the true value is in this particular interval.

  • The bootstrap is a powerful alternative to formulas. By resampling from your sample with replacement, you can build confidence intervals for any statistic — means, medians, percentiles, correlations, and more.


Core Formulas

Standard Error of the Mean:    SE = s / √n

95% Confidence Interval:       x̄ ± 1.96 × SE      (using z, for large n)
                               x̄ ± t* × SE        (using t, for any n)

Margin of Error:               MOE = z* × SE  or  MOE = t* × SE

Required Sample Size:          n = (z* × σ / MOE)²

Common z values: - 90% confidence: z = 1.645 - 95% confidence: z = 1.960 - 99% confidence: z = 2.576


Types of Sampling

Method How It Works Best For Watch Out For
Simple random Every individual has equal probability of selection General-purpose, when no population structure is known May under-represent small subgroups by chance
Stratified Divide into subgroups, sample within each When subgroups differ substantially Requires knowing the subgroup structure in advance
Cluster Randomly select groups, measure everyone in selected groups When listing all individuals is impractical Less precise than SRS for the same total sample size
Systematic Select every k-th individual from a list Assembly lines, ordered lists Can miss patterns that match the sampling interval
Convenience Sample whoever is easiest to reach Quick pilot studies, preliminary exploration Almost always biased; results not generalizable

Types of Bias to Watch For

Bias Type What Happens Classic Example
Selection bias Sampling process favors certain individuals Literary Digest polling wealthy households only
Non-response bias Responders differ from non-responders Customer surveys answered mainly by angry or delighted customers
Survivorship bias You only see the "survivors" of a process Studying successful companies without studying the ones that failed
Convenience bias You sample whoever is easy to reach Polling people at a shopping mall on weekday afternoons
Voluntary response People opt in to participate Online "click to vote" polls attracting people with strong opinions

The Bootstrap Procedure

1. Start with your sample of n observations
2. Draw a new sample of n observations WITH REPLACEMENT
3. Compute your statistic (mean, median, correlation, etc.)
4. Repeat steps 2-3 ten thousand times
5. The 2.5th and 97.5th percentiles of the results
   form your 95% confidence interval

The bootstrap works for virtually any statistic. It doesn't require distributional assumptions. It needs a reasonably sized sample (n ≥ 30 is a rough guideline).


Critical Distinctions

Concept What It Is What It Is NOT
Standard deviation Spread of individual observations Precision of the sample mean
Standard error Spread of sample means across repeated samples Spread of individual observations
Confidence level Long-run success rate of the CI procedure Probability that the truth is in this specific interval
Margin of error Half-width of the CI; captures sampling variability Total uncertainty (doesn't include bias or measurement error)
Point estimate Your best single-number guess The true value (which remains unknown)

The Confidence Interval Interpretation

Correct: "If we repeated this study many times, about 95% of the resulting confidence intervals would contain the true parameter."

Incorrect: "There's a 95% probability that the true parameter is in this interval."

The difference is subtle but important. The truth is fixed. The interval is random. The 95% describes the procedure, not any single result.


Key Trade-Offs

Higher confidence level (95% → 99%)
  → Wider interval (less precise)
  → More likely to capture the truth

Larger sample size (n = 50 → n = 200)
  → Narrower interval (more precise)
  → More expensive / time-consuming

Quadrupling n → Halves the margin of error
  (Diminishing returns: each doubling of precision
   requires 4x more data)

What You Should Be Able to Do Now

  • [ ] Identify the population and sample in any research scenario
  • [ ] Recognize common forms of sampling bias and assess whether a sample is likely to be representative
  • [ ] Compute a confidence interval for a mean using both the formula and the bootstrap
  • [ ] Interpret a confidence interval correctly, avoiding the most common misconception
  • [ ] Explain why a large biased sample is worse than a small random sample
  • [ ] Use scipy.stats to compute confidence intervals in Python
  • [ ] Implement the bootstrap procedure from scratch for any statistic
  • [ ] Determine the sample size needed to achieve a desired margin of error
  • [ ] Communicate uncertainty to a non-technical audience

If you checked every box, you're ready for Chapter 23, where we'll use these same tools to test specific claims — hypothesis testing. The question shifts from "what's the plausible range?" to "is there enough evidence to reject this claim?"