Case Study 19-1: The 2016 Presidential Forecast and the Correlated Error Problem

Background

The 2016 presidential election is the defining case study in probabilistic election forecasting, not because forecasters were catastrophically wrong, but because the variation in their predictions exposed fundamental differences in how they modeled uncertainty — and particularly how they handled correlated errors across states.

This case study examines what the major forecasters predicted, why they differed, what actually happened, and what lessons political analysts should take from the experience.

The Pre-Election Forecasting Landscape

In the final week before Election Day 2016, the major probabilistic forecasters showed a wide range of Clinton win probabilities:

Forecaster Clinton Win Probability
Princeton Election Consortium 99%
HuffPost Pollster 98%
The Upshot (NYT) 85%
Daily Kos Elections 92%
FiveThirtyEight (polls-plus) 71.4%
FiveThirtyEight (polls-only) 71.4%
Prediction markets (PredictIt) ~85%

The range from 71% to 99% is extraordinary for elections contested by the same candidates at the same time, using largely the same underlying polling data. The variation was entirely driven by modeling choices — specifically, assumptions about the magnitude and correlation of polling errors.

The Methodological Differences

Princeton Election Consortium (Sam Wang): Wang's model assigned probabilities close to certainty by treating state-level polls as near-perfect signals. His model assumed that polling errors were small and essentially independent across states. Under these assumptions, the probability that Trump would win enough states to reach 270 electoral votes was vanishingly small — because doing so would require multiple independent polling errors all in the same direction simultaneously.

The Upshot (NYT/Nate Cohn): The Upshot's model incorporated larger historical polling errors and modeled some correlation between states, producing a more modest 85% Clinton probability.

FiveThirtyEight (Nate Silver): 538's model explicitly incorporated large historical polling errors and a substantial correlation structure between states. The model assumed that if Trump outperformed the polls in one Midwestern state, there was a meaningful probability that he'd outperform them in other demographically similar states — because the same underlying measurement problem (understating non-college white voter support) would affect all of them.

The key insight Silver used: in 2012, Mitt Romney outperformed his polling by approximately 3 points in several Midwestern states simultaneously. This isn't evidence of a conspiracy; it reflects the fact that polls in demographically similar states tend to share the same blind spots.

What Actually Happened

Donald Trump won the election by winning Wisconsin (+0.8% above polls), Michigan (+3.6% above polls), and Pennsylvania (+3.3% above polls) — three states where he outperformed the polls by 2-3 points simultaneously. Additionally, he outperformed the polls in Ohio and Iowa by 4-5 points.

This is exactly the "correlated error" scenario. The same underlying measurement problem — pollsters systematically underestimating Trump's support among non-college white voters — manifested in multiple states at the same time. Under an independence assumption, this was virtually impossible. Under a proper correlation model, it was a 25-30% probability scenario.

The Calibration Verdict:

  • Forecasters who gave Clinton 98-99% win probability experienced a roughly 50-70-sigma event (essentially impossible under their models). Their models were severely miscalibrated.
  • Forecasters who gave Clinton 85% experienced a roughly 3-sigma event — unusual but within the range of plausible outcomes their models acknowledged.
  • FiveThirtyEight's 71.4% Clinton probability meant a 28.6% Trump probability — and Trump won. This was a 1-sigma event by their model. Not a failure; a prediction that materialized in the expected range of outcomes.

The Post-Election Narrative Problem

Despite the calibration evidence, the post-election narrative in much of the media treated all forecasters equally as having "failed." The distinction between a model that said 1% (essentially impossible) and a model that said 29% (uncommon but plausible) was largely lost.

This narrative collapse had practical consequences: - It created cynicism about election forecasting broadly, lumping well-calibrated and poorly-calibrated models together - It fed into a "nobody can predict elections" discourse that undermines legitimate forecasting - It discouraged audiences from learning to read probabilistic statements correctly

The lesson is not that 2016 was unpredictable — it's that some forecasters modeled the uncertainty appropriately and others didn't. The outcome distinguished between the two.

The Structural Lesson: Model Assumptions Have Consequences

The 2016 case makes vivid a principle that can seem abstract in normal times: modeling assumptions have real consequences for predictions, and the assumptions most likely to matter are the ones most likely to be wrong.

The independence assumption — that state-level errors are uncorrelated — seems innocuous because in most elections it doesn't matter much. States tend to have similar overall environments, and errors tend to partially cancel rather than compound. But in an election where a specific measurement problem affects an entire region of the country, the independence assumption breaks down catastrophically.

This is a general lesson in statistical modeling: assumptions that work in normal circumstances can fail precisely in the cases where they matter most. Good models should be stress-tested against scenarios where their assumptions are violated.

Discussion Questions

1. The Princeton Election Consortium gave Clinton a 99% win probability. What standard should we use to evaluate whether this was a forecasting failure? Is being wrong about a 1% event different from being wrong about a 30% event?

2. Silver's model gave Trump 28.6% — which he has defended as prescient. Critics argued the model was still overconfident (the true probability was 50/50, they claim, given how close the result was). How would you adjudicate this disagreement? What evidence would you need?

3. The media narrative collapsed all forecasters into a single "failed prediction" story. What structural features of political media coverage drove this? How could election forecasters communicate more effectively to prevent this kind of narrative collapse?

4. If you were advising the Trump campaign in October 2016, how would you interpret the range of forecasts? Would you trust the 71% or the 99%? What does this suggest about how campaigns should use probabilistic forecasts?

5. After 2016, how should probabilistic modelers update their correlation assumptions? What data would you use? How would you test whether your updated model is better-calibrated?

Quantitative Extension

Using a simplified model: - Clinton's polling average in Wisconsin: +6 - Historical SD: 3.5 points - Independence assumption: What is the probability Trump wins Wisconsin? (Use z = -6/3.5 = -1.71; P(Z < -1.71) ≈ 4.4%)

Now apply the correlated error adjustment: - Assume there is a 25% probability of a 3-point national shift toward Republicans (e.g., due to systematic polling miss) - Under this adjustment: adjusted Wisconsin average = 6 - 3 = +3; P(Trump wins Wisconsin | adjustment) = P(Z < -3/3.5) ≈ 19.5% - Combined probability: 0.75 × 4.4% + 0.25 × 19.5% = 3.3% + 4.9% = 8.2%

a) How does the correlated error adjustment change the win probability for Trump in Wisconsin? b) Apply the same adjustment to Michigan (Clinton +5) and Pennsylvania (Clinton +4). What are the adjusted Trump win probabilities? c) If you assume all three states' errors are correlated (they all get the national shift), what is the joint probability that Trump wins all three? d) How does this compare to the result under full independence?