Case Study 9-2: The 2020 Polling Error — A Nonresponse Post-Mortem

Background

The 2020 presidential election produced one of the largest systematic polling errors in modern American electoral history. National polling averages showed Biden leading Trump by approximately 8 percentage points on the eve of the election. Biden's actual margin was 4.5 points — a miss of roughly 3.5 points. More striking was the direction: virtually every major pre-election poll overestimated Biden's support. State-level polls were even worse: in Wisconsin, the FiveThirtyEight average showed Biden +8.2; he won by 0.6 points.

This was not a random miss. It was systematic — the same direction across virtually all pollsters, all methodologies, and all geographic regions. Systematic errors of this type point not to sampling variability (which would produce random direction misses) but to a bias built into the survey process itself.

The AAPOR Task Force Findings

AAPOR convened a task force that released its report in July 2021. The task force examined multiple candidate explanations:

Hypothesis 1: Likely Voter Modeling Failure. Were pollsters misidentifying who would turn out? The task force found some evidence of this — turnout models that underweighted less educated white voters may have contributed — but unlikely voter errors alone could not explain the full magnitude of the error.

Hypothesis 2: Late Swing. Did a significant share of voters switch from Biden to Trump in the final days? Some evidence exists for late movement, but the task force found it insufficient to account for more than 1–1.5 points of the error.

Hypothesis 3: Social Desirability Bias. Were Trump supporters unwilling to reveal their preference to interviewers? The task force found mixed evidence. Mode comparisons (online vs. telephone) showed smaller Trump underestimates in online polls, consistent with SDB reducing reported Trump support in interviewer-administered surveys — but the pattern was not consistent across all tests.

Hypothesis 4: Differential Nonresponse by Party. Were Republicans, and specifically Trump supporters, less likely to participate in polls than Democrats and Biden supporters? The task force found this to be the most credible primary explanation.

The Differential Nonresponse Mechanism

The differential nonresponse hypothesis holds that:

  1. Trump supporters in 2020 were more likely to distrust media, academic, and professional institutions than Biden supporters.
  2. Survey research is perceived, consciously or not, as part of that institutional apparatus.
  3. Institutional distrust therefore created a differential propensity to participate in polls: Trump supporters were systematically less likely to answer survey requests.
  4. Standard demographic weighting (on age, education, race, gender) corrected for measurable differences but could not correct for the unmeasured attitudinal variable (institutional distrust) that drove differential nonresponse.
  5. The result was a sample systematically missing a definable group: Trump-supporting, institutionally skeptical voters.

Evidence for this mechanism came from several sources:

Voter file validation studies: Post-election studies matching survey respondents to voter files found that Biden supporters were more likely to have participated in pre-election polls than Trump supporters with the same demographic profile.

Trust measures as weights: Some pollsters that had included measures of institutional trust in their questionnaires found they could weight on trust as a proxy for the response propensity differential — and this correction reduced but did not eliminate the polling error.

Historical comparability: The same pattern appeared, at smaller scale, in 2016. The 2020 magnitude may have reflected the intensification of partisan sorting by institutional trust over the Trump presidency.

The Methodological Challenge

The 2020 error exposed a fundamental limitation of demographic weighting as the primary tool for nonresponse correction. Standard raking weights adjust samples on observable characteristics: age, race, education, gender, region. These variables are correlated with political preference, and weighting on them adjusts samples that are demographically skewed.

But nonresponse driven by political attitudes — particularly attitudes about the institutions conducting the survey — is not fully captured by demographics. Two 45-year-old white men without college degrees may have very different survey participation propensities if one is a strong Trump supporter who distrusts professional researchers and the other is a Biden supporter who takes a civic view of survey participation. Demographics cannot separate them.

Proposed solutions after 2020 included:

  • Partisan registration weights: Weighting samples to match voter-file party registration distributions. This corrects for partisan underrepresentation directly but introduces its own problems (registration is an imperfect proxy for vote choice; registration distributions vary by state and may not reflect election-day behavior).
  • Prior-vote weights: Asking respondents how they voted in previous elections and weighting on that. Vulnerable to faulty recall — losers of elections are "remembered" as having had fewer supporters than they actually did.
  • Institutional trust weights: Including trust measures and weighting on them. Practical but requires adding items to questionnaires and developing stable target distributions.
  • Fundamentals-adjusted forecasting: Moving away from pure polling aggregation toward models that blend polls with economic and historical variables less sensitive to nonresponse patterns.

None of these solutions is fully satisfying. The 2020 error revealed that the field has not yet solved the problem of differential nonresponse by political attitude.

Applying the Framework to the Garza-Whitfield Race

Carlos brings the 2020 post-mortem findings to a methodology review meeting with Vivian and Trish. He raises the obvious question: if differential nonresponse by party explains 2020, what should Meridian do differently for the Garza-Whitfield race?

Vivian's answer is characteristically measured: "We can't solve the problem. We can document it, try to minimize it, and be honest about residual uncertainty."

Specifically, Meridian's Garza-Whitfield protocols adopted three practices in response to 2020 lessons:

  1. Partisan registration benchmarking: Daily reports compare the partisan registration composition of completed interviews to the voter file. If Republicans are underrepresented by more than 3 points from their voter-file share, they trigger additional CATI call blocks targeted at Republican-registered numbers.

  2. Recall voting weights: The questionnaire asks respondents how they voted in the most recent general election. Completed interviews are checked against the actual vote distribution from that election, and weights are applied to correct overrepresentation of the winning candidate's voters in the sample.

  3. Uncertainty disclosure: Published results include an explicit statement acknowledging that differential nonresponse by political attitude is a known potential bias source not fully correctable through demographic weighting.

Discussion Questions

Question 1: The AAPOR task force identified differential nonresponse as the most credible primary explanation for the 2020 polling error. Explain in your own words why standard demographic weighting cannot fully correct for this type of nonresponse.

Question 2: One proposed solution is to weight polls by party registration to match voter-file distributions. Describe two specific problems this approach might introduce, and explain how each could distort a poll's topline.

Question 3: Suppose Meridian's Garza-Whitfield tracking poll consistently shows Garza leading by 5 points, but Nadia Osei (Garza's analytics director) notices that Whitfield supporters respond to their internal calls at a 40% lower rate than Garza supporters. Using the nonresponse bias formula, estimate the size and direction of potential bias if Whitfield supporters would have reported Whitfield +12 as a group and the sample underrepresents them by 6 percentage points of the total sample.

Question 4: Carlos proposes adding a single trust-in-institutions question to every Meridian poll and using it as a weighting variable. What would Vivian need to know before approving this change? What target distribution would you use for the institutional trust weight, and where would you obtain it?

Question 5: A journalist reads about the 2020 polling error and writes that "polls are useless — they just make up numbers." Write a 300-word response that acknowledges the genuine problem while defending the appropriate role of survey research in democratic discourse.

Key Lessons

The 2020 case teaches three enduring lessons about nonresponse in political polling:

Lesson 1: Systematic polling errors are diagnostic of bias in the survey process, not random sampling variability. When all polls miss in the same direction, the cause is in the shared methodology — nonresponse, coverage, measurement — not in sampling luck.

Lesson 2: The adequacy of standard demographic weighting is contingent on the assumption that nonresponse is driven by demographics rather than attitudes. When that assumption fails — as it apparently did in 2020 — demographic weights are insufficient, and the field must develop new correction strategies.

Lesson 3: Epistemic humility is a methodological requirement, not just a virtue. Polls should be reported with uncertainty ranges that account for known potential biases, not just sampling error. The margin of error reported in most polls reflects only sampling variability — it does not capture nonresponse bias, coverage error, or weighting uncertainty. A well-informed consumer of poll data understands this and calibrates their confidence accordingly.