Key Takeaways: Sampling, Estimation, and Confidence Intervals

Contributors to Introduction to Data Science

Key Takeaways: Sampling, Estimation, and Confidence Intervals

This is your reference card for Chapter 22. The core message: you can learn a lot about a population from a well-chosen sample — but only if you're honest about what you know and what you don't.

Key Concepts

You almost never have the whole population. The population is the group you care about; the sample is the subset you actually observe. The gap between them is the reason statistical inference exists.
Sample quality matters more than sample size. A biased sample of 2 million is worse than an unbiased sample of 1,000. Randomness is the antidote to bias. Always ask: "Is my sample representative?"
A point estimate is never the whole story. A single number (like "the average is 72%") hides how uncertain you are about that number. Always pair a point estimate with a measure of uncertainty.
The sampling distribution is the key concept. If you could take many samples and compute the mean of each, those means would form a distribution centered at the true population mean. The spread of that distribution is the standard error.
Confidence intervals communicate uncertainty honestly. A 95% CI means: if you repeated the process many times, about 95% of the intervals would contain the true value. It does not mean there's a 95% probability the true value is in this particular interval.
The bootstrap is a powerful alternative to formulas. By resampling from your sample with replacement, you can build confidence intervals for any statistic — means, medians, percentiles, correlations, and more.

Core Formulas

Standard Error of the Mean:    SE = s / √n

95% Confidence Interval:       x̄ ± 1.96 × SE      (using z, for large n)
                               x̄ ± t* × SE        (using t, for any n)

Margin of Error:               MOE = z* × SE  or  MOE = t* × SE

Required Sample Size:          n = (z* × σ / MOE)²

Common z values: - 90% confidence: z = 1.645 - 95% confidence: z = 1.960 - 99% confidence: z = 2.576

Types of Sampling

Method	How It Works	Best For	Watch Out For
Simple random	Every individual has equal probability of selection	General-purpose, when no population structure is known	May under-represent small subgroups by chance
Stratified	Divide into subgroups, sample within each	When subgroups differ substantially	Requires knowing the subgroup structure in advance
Cluster	Randomly select groups, measure everyone in selected groups	When listing all individuals is impractical	Less precise than SRS for the same total sample size
Systematic	Select every k-th individual from a list	Assembly lines, ordered lists	Can miss patterns that match the sampling interval
Convenience	Sample whoever is easiest to reach	Quick pilot studies, preliminary exploration	Almost always biased; results not generalizable

Types of Bias to Watch For

Bias Type	What Happens	Classic Example
Selection bias	Sampling process favors certain individuals	Literary Digest polling wealthy households only
Non-response bias	Responders differ from non-responders	Customer surveys answered mainly by angry or delighted customers
Survivorship bias	You only see the "survivors" of a process	Studying successful companies without studying the ones that failed
Convenience bias	You sample whoever is easy to reach	Polling people at a shopping mall on weekday afternoons
Voluntary response	People opt in to participate	Online "click to vote" polls attracting people with strong opinions

The Bootstrap Procedure

1. Start with your sample of n observations
2. Draw a new sample of n observations WITH REPLACEMENT
3. Compute your statistic (mean, median, correlation, etc.)
4. Repeat steps 2-3 ten thousand times
5. The 2.5th and 97.5th percentiles of the results
   form your 95% confidence interval

The bootstrap works for virtually any statistic. It doesn't require distributional assumptions. It needs a reasonably sized sample (n ≥ 30 is a rough guideline).

Critical Distinctions

Concept	What It Is	What It Is NOT
Standard deviation	Spread of individual observations	Precision of the sample mean
Standard error	Spread of sample means across repeated samples	Spread of individual observations
Confidence level	Long-run success rate of the CI procedure	Probability that the truth is in this specific interval
Margin of error	Half-width of the CI; captures sampling variability	Total uncertainty (doesn't include bias or measurement error)
Point estimate	Your best single-number guess	The true value (which remains unknown)

The Confidence Interval Interpretation

Correct: "If we repeated this study many times, about 95% of the resulting confidence intervals would contain the true parameter."

Incorrect: "There's a 95% probability that the true parameter is in this interval."

The difference is subtle but important. The truth is fixed. The interval is random. The 95% describes the procedure, not any single result.

Key Trade-Offs

Higher confidence level (95% → 99%)
  → Wider interval (less precise)
  → More likely to capture the truth

Larger sample size (n = 50 → n = 200)
  → Narrower interval (more precise)
  → More expensive / time-consuming

Quadrupling n → Halves the margin of error
  (Diminishing returns: each doubling of precision
   requires 4x more data)

What You Should Be Able to Do Now

[ ] Identify the population and sample in any research scenario
[ ] Recognize common forms of sampling bias and assess whether a sample is likely to be representative
[ ] Compute a confidence interval for a mean using both the formula and the bootstrap
[ ] Interpret a confidence interval correctly, avoiding the most common misconception
[ ] Explain why a large biased sample is worse than a small random sample
[ ] Use scipy.stats to compute confidence intervals in Python
[ ] Implement the bootstrap procedure from scratch for any statistic
[ ] Determine the sample size needed to achieve a desired margin of error
[ ] Communicate uncertainty to a non-technical audience

If you checked every box, you're ready for Chapter 23, where we'll use these same tools to test specific claims — hypothesis testing. The question shifts from "what's the plausible range?" to "is there enough evidence to reject this claim?"