Chapter 8 Key Takeaways
The Fundamental Lesson
The Literary Digest used 2.4 million responses and was catastrophically wrong. Gallup used 50,000 and was right. Representativeness matters more than size.
Probability vs. Nonprobability Sampling
| Feature | Probability Sampling | Nonprobability Sampling |
|---|---|---|
| Selection probability | Known and nonzero for all population members | Unknown or zero for some members |
| Statistical inference | Valid — margin of error calculable | Technically invalid — MOE is an approximation |
| Coverage | Can cover full population if frame is complete | Coverage depends on who self-selects |
| Common examples | RDD, address-based sampling, voter file samples | Online opt-in panels, convenience samples |
The Four Sampling Methods
Simple Random Sample (SRS): Every sample of size n equally likely. Theoretical gold standard; requires complete sampling frame. MOE formula derived from SRS assumptions.
Systematic Sampling: Every k-th element from an ordered list. Practical; equivalent to SRS if list is randomly ordered. Watch for periodic patterns in the frame.
Stratified Sampling: Independent SRS within subgroups (strata). Reduces variance for heterogeneous populations; supports subgroup analysis via oversampling. Requires design weights when strata are sampled at different rates.
Cluster Sampling: Select groups, then individuals within groups. Reduces cost of face-to-face surveys. Statistical cost: design effect (DEFF) inflates variance due to within-cluster similarity.
Margin of Error Quick Reference
Rule of thumb: MOE ≈ 1/√n at 95% confidence
| n | Approximate MOE |
|---|---|
| 400 | ±5% |
| 600 | ±4% |
| 1,000 | ±3.2% |
| 1,600 | ±2.5% |
Key limitations of reported MOE: - Assumes probability sampling - Captures only random sampling error - Does NOT capture coverage bias, nonresponse bias, question wording effects, or weighting errors - Subgroup MOE is larger: use √(subgroup n) for denominator
The Sampling Frame Problem
Every frame has gaps. Common frames and their coverage gaps:
| Frame | Good for | Coverage gaps |
|---|---|---|
| Voter file + phone match | Registered voters | ~29% unmatched phones; higher gaps in high-mobility, high-Latino areas |
| RDD (landline) | Used to be near-universal | Misses ~60% cell-phone-only households |
| Online opt-in panel | Speed, cost, scale | Self-selection; misses non-internet users; unmeasured systematic differences |
| Address-based sampling (ABS) | Near-complete coverage | No built-in contact info; must mail first; slower |
Weighting Methods
Post-stratification: Assign weights so sample demographic composition matches population targets. Corrects for observed imbalances; cannot correct for unmeasured ones.
Raking (iterative proportional fitting): Simultaneously match multiple marginal distributions by cycling through weighting variables until convergence. Requires only marginal targets, not full joint distribution.
MRP (Multilevel Regression and Poststratification): Build regression model predicting opinion from demographics and geography; apply to Census population cells to generate geographic estimates. Best for small-geography estimation from large surveys.
The Declining Response Rate Crisis
- 1970s-80s: Response rates 60-80%
- 2020s: Response rates below 6% in commercial telephone polling
- Low response rates are NOT automatically fatal — depends on whether response propensity is correlated with the outcome of interest
- When it is correlated (as with education and candidate preference in recent cycles), systematic bias results
- Remedy: appropriate demographic weighting, MRP, multi-mode approaches, transparency
Who Gets Counted: The Political Stakes
- Likely voter screens systematically underweight lower-propensity voters (younger, more diverse, lower income)
- Phone frames underrepresent cell-only households and non-English-dominant speakers
- Online panels underrepresent non-internet users and lower-engagement adults
- Sampling design is a political act: it embeds assumptions about whose preferences count and who will vote
Trish's Field Rules
- Know your frame before you design the sample — its gaps determine your biggest biases
- Oversample strategically, then weight back to population proportions
- Track response disposition from day one — early divergence predicts final sample profile problems
- Document everything — methodology transparency is professional and ethical
- The effective n after weighting is what matters for precision estimates, not the raw n
The Core Question
"Who is in your sample?" — this question determines everything about what a poll can tell you and what it cannot. The best analysts ask it before reading the headline number.