Chapter 8 Key Takeaways

Chapter 8 Key Takeaways

The Fundamental Lesson

The Literary Digest used 2.4 million responses and was catastrophically wrong. Gallup used 50,000 and was right. Representativeness matters more than size.

Probability vs. Nonprobability Sampling

Feature	Probability Sampling	Nonprobability Sampling
Selection probability	Known and nonzero for all population members	Unknown or zero for some members
Statistical inference	Valid — margin of error calculable	Technically invalid — MOE is an approximation
Coverage	Can cover full population if frame is complete	Coverage depends on who self-selects
Common examples	RDD, address-based sampling, voter file samples	Online opt-in panels, convenience samples

The Four Sampling Methods

Simple Random Sample (SRS): Every sample of size n equally likely. Theoretical gold standard; requires complete sampling frame. MOE formula derived from SRS assumptions.

Systematic Sampling: Every k-th element from an ordered list. Practical; equivalent to SRS if list is randomly ordered. Watch for periodic patterns in the frame.

Stratified Sampling: Independent SRS within subgroups (strata). Reduces variance for heterogeneous populations; supports subgroup analysis via oversampling. Requires design weights when strata are sampled at different rates.

Cluster Sampling: Select groups, then individuals within groups. Reduces cost of face-to-face surveys. Statistical cost: design effect (DEFF) inflates variance due to within-cluster similarity.

Margin of Error Quick Reference

Rule of thumb: MOE ≈ 1/√n at 95% confidence

n	Approximate MOE
400	±5%
600	±4%
1,000	±3.2%
1,600	±2.5%

Key limitations of reported MOE: - Assumes probability sampling - Captures only random sampling error - Does NOT capture coverage bias, nonresponse bias, question wording effects, or weighting errors - Subgroup MOE is larger: use √(subgroup n) for denominator

The Sampling Frame Problem

Every frame has gaps. Common frames and their coverage gaps:

Frame	Good for	Coverage gaps
Voter file + phone match	Registered voters	~29% unmatched phones; higher gaps in high-mobility, high-Latino areas
RDD (landline)	Used to be near-universal	Misses ~60% cell-phone-only households
Online opt-in panel	Speed, cost, scale	Self-selection; misses non-internet users; unmeasured systematic differences
Address-based sampling (ABS)	Near-complete coverage	No built-in contact info; must mail first; slower

Weighting Methods

Post-stratification: Assign weights so sample demographic composition matches population targets. Corrects for observed imbalances; cannot correct for unmeasured ones.

Raking (iterative proportional fitting): Simultaneously match multiple marginal distributions by cycling through weighting variables until convergence. Requires only marginal targets, not full joint distribution.

MRP (Multilevel Regression and Poststratification): Build regression model predicting opinion from demographics and geography; apply to Census population cells to generate geographic estimates. Best for small-geography estimation from large surveys.

The Declining Response Rate Crisis

1970s-80s: Response rates 60-80%
2020s: Response rates below 6% in commercial telephone polling
Low response rates are NOT automatically fatal — depends on whether response propensity is correlated with the outcome of interest
When it is correlated (as with education and candidate preference in recent cycles), systematic bias results
Remedy: appropriate demographic weighting, MRP, multi-mode approaches, transparency

Who Gets Counted: The Political Stakes

Likely voter screens systematically underweight lower-propensity voters (younger, more diverse, lower income)
Phone frames underrepresent cell-only households and non-English-dominant speakers
Online panels underrepresent non-internet users and lower-engagement adults
Sampling design is a political act: it embeds assumptions about whose preferences count and who will vote

Trish's Field Rules

Know your frame before you design the sample — its gaps determine your biggest biases
Oversample strategically, then weight back to population proportions
Track response disposition from day one — early divergence predicts final sample profile problems
Document everything — methodology transparency is professional and ethical
The effective n after weighting is what matters for precision estimates, not the raw n

The Core Question

"Who is in your sample?" — this question determines everything about what a poll can tell you and what it cannot. The best analysts ask it before reading the headline number.