Case Study 2: When the Data Said Clinton Would Win

Case Study 2: When the Data Said Clinton Would Win

The Consensus That Wasn't

On the morning of November 8, 2016, virtually every major election forecast gave Hillary Clinton a substantial probability of winning the presidency. FiveThirtyEight's model put her chances at approximately 71 percent. The New York Times' Upshot model estimated 85 percent. The Princeton Election Consortium gave her a 99 percent chance. Betting markets implied a probability of roughly 80 to 85 percent. The polling averages showed Clinton leading in enough states to win the Electoral College comfortably.

By midnight, Donald Trump had won.

The 2016 election became the defining cautionary tale of modern political analytics---not because the data was useless, but because the way data was interpreted, communicated, and consumed failed at almost every level. Understanding what went wrong is essential for anyone who wants to practice political analytics responsibly.

What the Polls Actually Showed

The first and most important point is that the national polls were not catastrophically wrong. Clinton won the national popular vote by approximately 2.1 percentage points; the final RealClearPolitics polling average had her ahead by 3.2 points. The error was roughly one point---within the historical range of polling error and well within most polls' margins of error.

The problem was in the state-level polls, particularly in three states that proved decisive: Michigan, Wisconsin, and Pennsylvania. In each of these states, the final polling averages showed Clinton with a small lead---typically 3 to 5 points. Trump won all three, by margins of 0.2, 0.7, and 0.7 points respectively.

Post-election analyses by the American Association for Public Opinion Research (AAPOR) identified several factors that contributed to state-level polling errors:

Late-deciding voters. A significant number of voters made their decision in the final days of the campaign, after most polls had completed their fieldwork. Many of these late deciders broke toward Trump, particularly in Midwestern states. The Comey letter---FBI Director James Comey's announcement on October 28 that the bureau was examining additional Clinton-related emails---may have contributed to this late movement, but its precise effect remains debated.

Education-based nonresponse bias. Voters without a college degree were less likely to participate in polls, and this group broke heavily for Trump in key states. Many state polls did not weight by education, which meant their samples overrepresented college-educated voters (who were more likely to support Clinton) and underrepresented non-college voters (who were more likely to support Trump).

Turnout model assumptions. Some polls and forecasts assumed turnout patterns similar to 2012, when Obama's coalition of young voters, Black voters, and urban voters turned out at high rates. In 2016, turnout among these groups declined in several key states, while rural white turnout increased. Polls that used 2012 turnout models as their baseline were biased toward Clinton.

Correlated errors across states. Most forecasting models treated state-level polling errors as partially or fully independent. But the errors in 2016 were highly correlated: the same types of voters were underrepresented in polls across multiple Midwestern states. When Michigan was off, Wisconsin and Pennsylvania were off in the same direction and for the same reasons.

The Communication Failure

The polling errors, while real, were not the whole story. Equally important was how the data was communicated to the public. The 2016 election exposed a fundamental gap between what forecasters intended to communicate and what audiences understood.

When FiveThirtyEight published a forecast giving Clinton a 71 percent chance of winning, the intended message was: "Clinton is favored, but there is a substantial---nearly one in three---chance that Trump wins." But many readers interpreted "71 percent" as meaning Clinton was almost certain to win. A 71 percent probability feels overwhelming when you are rooting for or against a particular outcome. The human brain is not naturally calibrated to distinguish between 70 percent and 95 percent; both feel like "probably."

Other forecasts compounded this problem. When the Princeton Election Consortium published a 99 percent probability, it communicated near-certainty. The New York Times' 85 percent figure was more cautious than Princeton but still conveyed strong confidence. The proliferation of forecasts with different probability estimates confused the public and gave people permission to cherry-pick the number that confirmed their prior beliefs.

Media coverage amplified the problem. Most election reporting treated the forecasts as predictions rather than probability distributions. Headlines emphasized the horse race ("Clinton Maintains Lead") rather than the uncertainty ("Race Tighter Than It Appears, With Meaningful Chance of Trump Victory"). Pundits routinely stated that Clinton "would win" rather than that she "was favored to win"---a distinction that sounds pedantic but is analytically crucial.

Lessons for Political Analytics

The 2016 election offers several lessons that are central to this book's approach:

Uncertainty is not a weakness; it is the message. The most responsible forecast in 2016 was arguably FiveThirtyEight's, which gave Trump the highest probability of winning among the major forecasters. Nate Silver's team was criticized before the election for being too uncertain---but their model, which explicitly accounted for correlated polling errors and late-campaign volatility, proved more robust than models that assumed greater precision. The lesson is that communicating uncertainty is not hedging; it is accuracy.

Polls measure opinion at a moment in time, not destiny. A poll taken on October 25 captures opinion on October 25. It does not predict what will happen on November 8, especially in a volatile race where significant events (like the Comey letter) can occur between the last poll and Election Day. Treating polls as predictions is a category error.

The error is the story. After 2016, many commentators asked, "Why were the polls wrong?" But the more important question is, "What systematic biases produced the error, and how can they be corrected?" The AAPOR post-mortem identified education-based nonresponse bias as a key factor, and many pollsters subsequently began weighting by education. This is an example of how analytical failure, properly understood, leads to analytical improvement.

Correlation matters. Forecasting models that treated state polls as independent underestimated the probability of a systematic, multi-state polling miss. The lesson is that when the same methodology is used across multiple polls, errors are likely to be correlated---and models must account for this.

Context is king. The polls were not the only source of information about the 2016 race. Fundamentals models---based on economic indicators, presidential approval, and incumbency---suggested a closer race than the polls implied. Campaign reporting from the ground in Michigan and Wisconsin suggested that Clinton's operation there was weaker than it appeared. These signals were largely ignored in favor of the polling data. The lesson is that good analysis integrates multiple sources of evidence, not just the most quantitative one.

Connecting to the Garza-Whitfield Race

Imagine it is late October in the Garza-Whitfield race. Meridian Research Group's final poll shows Garza leading Whitfield by 3 points among likely voters. How should each of the following interpret this result, given what happened in 2016?

Vivian Park at Meridian would note the margin of error (likely plus or minus 3.5 points), meaning the race is statistically a toss-up. She would also worry about nonresponse bias---are there Whitfield supporters who are not answering Meridian's calls?
Nadia Osei on the Garza campaign would treat the poll as one data point among many, cross-referencing it with internal polling, early vote returns, and field reports. She would not relax her turnout operation based on a 3-point lead.
Jake Rourke on the Whitfield campaign would tell his candidate that the race is winnable and that a 3-point deficit in the polls might overstate Garza's actual lead, given historical polling errors in races with similar demographics.
Sam Harding at ODA would publish the poll with context: the margin of error, how it compares to other polls, and a reminder that polls are snapshots, not prophecies.

Discussion Questions

In 2016, forecasters gave Clinton probabilities ranging from 71 percent to 99 percent. How should consumers evaluate competing forecasts with such different estimates? What questions would you ask to determine which forecast was most reliable?
The chapter's theme of "The Gap Between the Map and the Territory" is directly relevant to 2016. In what sense were the polls and forecasts a "map" of the political landscape? Where did the map diverge from the territory, and why?
After 2016, many pollsters began weighting by education to correct the nonresponse bias that had distorted their results. Is this a satisfactory solution? What new biases might it introduce? What other variables might be important to weight on that are not currently standard?
Should election forecasters stop publishing probability estimates because the public misinterprets them? Or is the solution better education about what probabilities mean? What would you propose?
Nadia Osei left a PhD program partly because she was drawn to the immediacy of campaign work. How might her academic training help her avoid the mistakes that forecasters made in 2016? How might the pressures of campaign work make those mistakes more likely?