Chapter 8 Key Takeaways

The Fundamental Lesson

The Literary Digest used 2.4 million responses and was catastrophically wrong. Gallup used 50,000 and was right. Representativeness matters more than size.


Probability vs. Nonprobability Sampling

Feature Probability Sampling Nonprobability Sampling
Selection probability Known and nonzero for all population members Unknown or zero for some members
Statistical inference Valid — margin of error calculable Technically invalid — MOE is an approximation
Coverage Can cover full population if frame is complete Coverage depends on who self-selects
Common examples RDD, address-based sampling, voter file samples Online opt-in panels, convenience samples

The Four Sampling Methods

Simple Random Sample (SRS): Every sample of size n equally likely. Theoretical gold standard; requires complete sampling frame. MOE formula derived from SRS assumptions.

Systematic Sampling: Every k-th element from an ordered list. Practical; equivalent to SRS if list is randomly ordered. Watch for periodic patterns in the frame.

Stratified Sampling: Independent SRS within subgroups (strata). Reduces variance for heterogeneous populations; supports subgroup analysis via oversampling. Requires design weights when strata are sampled at different rates.

Cluster Sampling: Select groups, then individuals within groups. Reduces cost of face-to-face surveys. Statistical cost: design effect (DEFF) inflates variance due to within-cluster similarity.


Margin of Error Quick Reference

Rule of thumb: MOE ≈ 1/√n at 95% confidence

n Approximate MOE
400 ±5%
600 ±4%
1,000 ±3.2%
1,600 ±2.5%

Key limitations of reported MOE: - Assumes probability sampling - Captures only random sampling error - Does NOT capture coverage bias, nonresponse bias, question wording effects, or weighting errors - Subgroup MOE is larger: use √(subgroup n) for denominator


The Sampling Frame Problem

Every frame has gaps. Common frames and their coverage gaps:

Frame Good for Coverage gaps
Voter file + phone match Registered voters ~29% unmatched phones; higher gaps in high-mobility, high-Latino areas
RDD (landline) Used to be near-universal Misses ~60% cell-phone-only households
Online opt-in panel Speed, cost, scale Self-selection; misses non-internet users; unmeasured systematic differences
Address-based sampling (ABS) Near-complete coverage No built-in contact info; must mail first; slower

Weighting Methods

Post-stratification: Assign weights so sample demographic composition matches population targets. Corrects for observed imbalances; cannot correct for unmeasured ones.

Raking (iterative proportional fitting): Simultaneously match multiple marginal distributions by cycling through weighting variables until convergence. Requires only marginal targets, not full joint distribution.

MRP (Multilevel Regression and Poststratification): Build regression model predicting opinion from demographics and geography; apply to Census population cells to generate geographic estimates. Best for small-geography estimation from large surveys.


The Declining Response Rate Crisis

  • 1970s-80s: Response rates 60-80%
  • 2020s: Response rates below 6% in commercial telephone polling
  • Low response rates are NOT automatically fatal — depends on whether response propensity is correlated with the outcome of interest
  • When it is correlated (as with education and candidate preference in recent cycles), systematic bias results
  • Remedy: appropriate demographic weighting, MRP, multi-mode approaches, transparency

Who Gets Counted: The Political Stakes

  • Likely voter screens systematically underweight lower-propensity voters (younger, more diverse, lower income)
  • Phone frames underrepresent cell-only households and non-English-dominant speakers
  • Online panels underrepresent non-internet users and lower-engagement adults
  • Sampling design is a political act: it embeds assumptions about whose preferences count and who will vote

Trish's Field Rules

  1. Know your frame before you design the sample — its gaps determine your biggest biases
  2. Oversample strategically, then weight back to population proportions
  3. Track response disposition from day one — early divergence predicts final sample profile problems
  4. Document everything — methodology transparency is professional and ethical
  5. The effective n after weighting is what matters for precision estimates, not the raw n

The Core Question

"Who is in your sample?" — this question determines everything about what a poll can tell you and what it cannot. The best analysts ask it before reading the headline number.