Chapter 10 Key Takeaways: Reading and Evaluating Polls
Core Concepts
1. Reading a poll and evaluating a poll are different skills. Reading extracts the topline. Evaluation assesses whether the topline is trustworthy. Professional analysts always evaluate before they read.
2. The AAPOR Transparency Initiative provides a standard disclosure framework. ATI-certified polls must disclose sponsor, methodology, population definition, exact question wording, field dates, sample size, margin of error, weighting procedures, and response rate. Any poll that withholds these disclosures should be treated with skepticism proportional to the depth of concealment.
3. Population definition is one of the most consequential methodological choices. Adult, registered voter, and likely voter polls measure different populations with systematically different political preferences. The LV/RV gap typically runs 2–6 points in favor of Republicans in midterm elections. Treating these as interchangeable produces analytical errors.
4. Likely voter screens embed turnout predictions. Every LV model is a prediction about who will vote, operationalized as a screening rule. Different LV models — single-item self-report, multi-item Gallup scale, voter file validation — produce different electorates and different toplines for the same race.
5. The margin of error captures only sampling variability. The stated MOE does not capture nonresponse bias, coverage error, measurement error, weighting uncertainty, or LV model uncertainty. Total uncertainty in a modern political poll is always larger — often substantially larger — than the reported margin of error.
6. "Within the margin of error" does not mean "too close to call." A 6-point lead with a ±4% MOE is still a 6-point lead — the best available estimate of the true margin. The MOE describes a range of plausible values consistent with the data, not a zone of complete ignorance. The leading candidate is genuinely more likely to be ahead.
7. House effects are systematic partisan biases in individual pollsters' estimates. They arise from consistent methodological choices: LV model stringency, weighting design, mode and frame, question order. House effects are estimable by comparing each pollster's results to concurrent averages excluding their own polls. Statistically significant house effects should be disclosed and accounted for in polling averages.
8. Herding artificially reduces apparent variance in the polling field. Pollsters who adjust results toward consensus destroy the statistical independence that makes aggregation meaningful. Herding is detectable by testing whether inter-poll variance is lower than sampling theory predicts.
9. Polling averages extract signal by canceling random errors. For n independent polls with the same MOE, the average has uncertainty ≈ MOE/√n. Quality-weighted averages — giving more weight to probability-based, larger-sample, independently sponsored polls — extract signal more efficiently than simple averages.
10. Trend analysis requires statistical discipline. A poll-to-poll change in margin must exceed approximately √2 × MOE to be statistically significant at 95% confidence. Most political journalism reports statistically insignificant changes as meaningful trend shifts. Tracking 14-day rolling averages and requiring consistent movement across multiple polls before concluding a trend is real are minimal standards for responsible trend analysis.
Python Skills Developed
- Loading and inspecting polling data with
pd.read_csv()anddf.describe() - Computing Democratic margins with
df['pct_d'] - df['pct_r'] - Quality-weighted averages using
np.average()with a weights parameter - Computing size weights using
np.sqrt(df['sample_size']) - Rolling averages using
df.rolling(window=14).mean()after resampling to daily frequency - Scatter plots of polling trends using
ax.scatter()with color-coded methodology - Line plots of rolling averages using
ax.plot() - House effects calculation via a custom function with concurrent average comparison
- One-sample t-tests using
scipy.stats.ttest_1samp() - Grouped summaries using
df.groupby().agg() - Quality scoring using a custom
apply()function
Practical Principles
- When a poll is dramatically different from all other concurrent polls, the null hypothesis is methodology, not movement.
- IVR polls of landline-only respondents with over-representation of 55+ voters should be scrutinized for coverage bias before inclusion in averages.
- Campaign-commissioned polls should receive reduced weight in any serious average — not because they are always wrong, but because the incentive structure for selective release systematically distorts what is available.
- The quality-weighted average is always preferable to the simple average when polls vary substantially in methodological quality.
- Document all weighting and adjustment decisions and test sensitivity of conclusions to the most consequential assumptions.
Connections to Upcoming Chapters
- Chapter 11 (The American Voter) will examine what drives individual vote choice — the behavior that LV models are trying to predict.
- Chapter 14 (Turnout) extends the LV modeling discussion to the determinants of voter participation.
- Chapter 17 (Poll Aggregation) builds directly on this chapter, covering the sophisticated aggregation methods used by professional election forecasters.
- Chapter 18 (Fundamentals Models) introduces the class of models that blend polling averages with economic and historical predictors to improve election forecasting.
- Chapter 20 (When Models Fail) returns to the 2020 polling error as a case study in systematic aggregation failure.
The Map and the Territory
This chapter's organizing theme — the gap between map and territory — is a reminder that every poll is a representation, not the thing itself. The territory is the actual distribution of political preferences and voting intentions in the electorate. The map is the poll, constructed through a complex measurement process with known and unknown sources of error.
Good poll evaluation is the practice of understanding the map-making process well enough to know which maps to trust, which to discount, and how to combine multiple imperfect maps into the best available representation of terrain that cannot be directly observed until Election Night — when the territory finally reveals itself, and every map is judged against what was actually there.