Case Study 1: Why Political Polls Get It Wrong — Sampling in the Real World

Contributors to Introduction to Data Science

Case Study 1: Why Political Polls Get It Wrong — Sampling in the Real World

Tier 2 — Attributed Findings: This case study discusses real polling events and documented methodological challenges. Statistics and examples are drawn from widely published post-election analyses by organizations including the American Association for Public Opinion Research (AAPOR), the Pew Research Center, and FiveThirtyEight. Specific poll results cited are from public reporting. Methodological details have been simplified for pedagogical purposes, and some illustrative examples are composites of documented patterns.

The Promise and Peril of Political Polling

Every election season, the same ritual plays out. Months before voters head to the polls, news organizations begin publishing poll results: "Candidate A leads Candidate B by 3 points." "The race is a dead heat." "Candidate C has surged to a 7-point lead." These numbers feel precise, authoritative, scientific.

And then sometimes the election happens, and the numbers are wrong.

Not always. Not usually, even. Most polls are reasonably accurate most of the time. But the spectacular failures — the ones where the polls confidently pointed in one direction and the voters went the other way — are what people remember. And those failures have everything to do with the sampling concepts you learned in Chapter 22.

This case study isn't about any one election or any one country. It's about a question that sits at the heart of sampling theory: How do you learn about millions of people by talking to a thousand of them, and what can go wrong?

The Golden Age: When Polling Seemed to Work

Let's start with why polling became trusted in the first place.

After George Gallup's triumph over the Literary Digest in 1936 (which we discussed in the chapter), polling became increasingly sophisticated. By the mid-20th century, polling organizations had developed rigorous methods:

Probability sampling: Every person in the target population had a known, non-zero probability of being selected. Pollsters used techniques like random digit dialing (calling phone numbers generated randomly) to approximate this ideal.
Weighting: If the raw sample over-represented some groups and under-represented others (say, too many college graduates and too few people without degrees), pollsters would weight the responses to match known population demographics.
Likely voter screens: Not everyone who answers a poll actually votes. Pollsters developed questions to identify "likely voters" and reported results only for that subset.

For decades, these methods worked well enough. National polls in US presidential elections typically predicted the popular vote within about 2 percentage points, and the vast majority of state-level polls correctly identified the winner.

The Cracks Appear: What Changed

Then the landscape shifted, gradually and then suddenly.

The Decline of Response Rates

In the 1970s, when a pollster called a random phone number, about 80% of people who were reached would agree to answer questions. By the 2000s, that rate had dropped to around 30%. By the late 2010s, it had fallen below 10% in many surveys. According to Pew Research Center reporting, their typical telephone survey response rate dropped from 36% in 1997 to about 6% by 2018.

Think about what that means in sampling terms. If 94% of the people you contact refuse to participate, the 6% who do agree are not a random sample of the population. They're a self-selected sample of people who are willing to spend 20 minutes answering questions from a stranger on the phone. And those people may systematically differ from the other 94% in ways that affect their political opinions.

This is non-response bias, and it's enormous.

The Cell Phone Problem

For decades, polling relied on landline telephones. Random digit dialing of landline numbers was a good approximation of random sampling because almost every household had a landline, and phone numbers were tied to geographic areas (so you could target specific regions).

But cell phones changed everything. By the 2020s, a large proportion of adults — especially younger adults — had abandoned landlines entirely. Calling only landlines systematically missed younger, more urban, and more mobile populations. Pollsters adapted by including cell phones in their samples, but cell phone samples are harder to work with: numbers aren't tied to geographic areas, calling cell phones has different legal restrictions, and people are less likely to answer unknown numbers on their cell phones.

The Rise of Online Polling

Many polling organizations shifted to online panels — pre-recruited groups of people who agree to take surveys. This solved the response rate problem (panel members agreed to participate), but introduced a different set of issues:

Coverage bias: People without internet access are excluded
Self-selection: People who join survey panels differ from those who don't
Attention and quality: Some panel members rush through surveys without reading carefully

Online polls can produce good results if the panels are carefully constructed and weighted, but they're not probability samples in the classical sense. The margin of error formula assumes probability sampling; applying it to a non-probability online panel is, at best, an approximation.

Anatomy of a Polling Miss

Let's walk through how sampling problems compound to produce an incorrect prediction.

Step 1: The Sample Isn't Quite Random

A polling firm sends out 50,000 text invitations to participate in a survey. About 3,000 people respond. Already, the 3,000 responders differ from the 47,000 non-responders in unknown ways. Maybe more politically engaged people are more likely to respond. Maybe people who feel strongly about the current political moment respond more readily. Maybe people who distrust institutions are less likely to participate in surveys.

Step 2: Weighting Corrects Some Biases but Not All

The firm compares its 3,000 respondents to the known demographics of the population. There are too many college graduates (40% in the sample vs. 33% in the population), too few Hispanic respondents, and the age distribution is skewed older. The firm applies weights to correct these imbalances — giving each non-college-graduate's response a bit more influence and each college-graduate's response a bit less.

This helps. But it only corrects for biases along the dimensions you weight on. If non-response is related to something you don't measure or weight by — say, social trust, or community engagement, or whether someone has ever been surveyed before — the weighting won't fix it.

Step 3: The "Margin of Error" Is Calculated as If the Sample Were Random

The firm reports: "48% of likely voters support Candidate A, with a margin of error of ±2.5 percentage points." That ±2.5 comes from the standard error formula you learned in this chapter: SE = √(p(1-p)/n).

But that formula assumes a simple random sample. The actual sample is a weighted, post-stratified, non-probability-ish sample of self-selected respondents. The formula's margin of error captures the sampling variability (the randomness from who happened to respond) but not the bias (the systematic tendency for responders to differ from non-responders).

Let's quantify this with a simulation:

import numpy as np
from scipy import stats

np.random.seed(42)

# True population: 100,000 voters, 51% support Candidate A
n_pop = 100000
true_support = 0.51  # True value we're trying to estimate
population = np.random.binomial(1, true_support, n_pop)

# Scenario 1: Perfect random sample
n_sample = 1000
random_sample = np.random.choice(population, size=n_sample, replace=False)
random_estimate = random_sample.mean()
random_se = np.sqrt(random_estimate * (1 - random_estimate) / n_sample)
random_moe = 1.96 * random_se

print("=== Perfect Random Sample ===")
print(f"Estimate: {random_estimate:.1%}")
print(f"Margin of error: ±{random_moe:.1%}")
print(f"True error: {abs(random_estimate - true_support):.1%}")

# Scenario 2: Biased sample (responders are 3 percentage points
# more likely to support Candidate A)
bias = 0.03
biased_support = true_support + bias
biased_sample = np.random.binomial(1, biased_support, n_sample)
biased_estimate = biased_sample.mean()
biased_se = np.sqrt(biased_estimate * (1 - biased_estimate) / n_sample)
biased_moe = 1.96 * biased_se

print("\n=== Biased Sample (3-point response bias) ===")
print(f"Estimate: {biased_estimate:.1%}")
print(f"Reported margin of error: ±{biased_moe:.1%}")
print(f"True error: {abs(biased_estimate - true_support):.1%}")
print(f"Bias is {abs(biased_estimate - true_support) / biased_moe:.1f}x "
      f"the reported MOE")

The biased sample's reported margin of error looks fine — about ±3.1%. But the actual error is around 3-6 percentage points, much larger than the reported margin. The formula gives a false sense of precision.

Step 4: The Reported Margin of Error Doesn't Capture the Real Uncertainty

This is the fundamental problem. When a poll says "±3 points," that number is a lower bound on the true uncertainty. The real uncertainty includes:

Sampling variability (what the formula measures)
Non-response bias (who didn't answer?)
Measurement error (did the question wording influence responses?)
Weighting model error (did we weight on the right variables?)
Likely voter model error (did we correctly identify who would actually vote?)
Late changes in opinion (people who changed their mind after the poll)

Some analyses have suggested that the "total survey error" for a typical pre-election poll is roughly 2-3 times the reported margin of error. A poll reporting ±3% might have real uncertainty of ±6-9%.

How Pollsters Try to Fix Things

The polling industry is well aware of these problems and has responded with increasingly sophisticated methods:

Multi-mode surveys: Combining phone calls, text messages, online panels, and even door-to-door interviews to reach different populations through different channels.

Advanced weighting: Weighting not just on demographics but on variables like past voting behavior, political engagement, and geographic characteristics.

Aggregation models: Sites like FiveThirtyEight and The Economist combine multiple polls, weight them by methodology and track record, and produce aggregated forecasts that are more stable than any individual poll.

Probabilistic forecasting: Instead of reporting "Candidate A leads by 3 points," reporting "Candidate A has a 70% chance of winning." This better communicates the uncertainty — though the public often misinterprets "70% chance" as "guaranteed."

The Lessons for Data Science

This case study isn't really about politics. It's about a set of principles that apply to every sampling problem you'll encounter:

Lesson 1: The Margin of Error Is a Floor, Not a Ceiling

The reported margin of error captures only one source of uncertainty — sampling variability. Real-world data collection introduces additional sources of error (bias, measurement error, processing error) that the formula doesn't see.

For your work: Whenever you compute a confidence interval, ask yourself: "What sources of error are not captured by this formula?" Be honest about the limitations.

Lesson 2: Non-Response Is the Silent Killer

When 90%+ of your target population doesn't respond, the 10% who do are not random. This applies far beyond polling: - Survey research: Who responds to your customer satisfaction survey? - Medical studies: Who volunteers for a clinical trial? - Data collection: Which countries report vaccination data to the WHO?

For your work: Always ask: "Who is missing from my data, and how might they differ from those who are present?"

Lesson 3: Bigger Isn't Better If the Methodology Is Flawed

The Literary Digest lesson from 1936 keeps repeating. A biased sample of 2 million is worse than a well-designed sample of 2,000. Large datasets from convenience sources (website traffic, social media, app usage) can produce very precise but very wrong estimates.

For your work: Before computing standard errors and confidence intervals, ensure that your sample is plausibly representative of the population you're studying.

Lesson 4: Transparency About Uncertainty Is a Feature, Not a Bug

The most trustworthy polls are the ones that honestly communicate their limitations. The most trustworthy data analyses do the same.

For your work: Report confidence intervals, not just point estimates. Discuss potential biases. Explain what your analysis can and cannot tell you. Your audience will trust you more, not less, for being honest about what you don't know.

Connecting to the Progressive Project

When you compute confidence intervals for vaccination rates by region in the progressive project, remember that your data has its own "non-response" problem: not all countries report vaccination data, and the ones that don't report are likely the ones with the weakest health systems and potentially the lowest vaccination rates. Your confidence intervals capture sampling variability, but they don't capture this systematic gap.

An honest analysis would note: "These confidence intervals reflect uncertainty from sampling, but they likely overestimate global vaccination coverage because countries with the weakest reporting systems — and potentially the lowest coverage — are underrepresented in the data."

That kind of transparent limitation statement is what separates good data science from misleading data science.

Discussion Questions

Sampling design: If you were designing a national health survey and your budget allowed you to contact 5,000 households, how would you design the sampling plan to minimize bias? What demographic, geographic, and socioeconomic variables would you stratify on?
Ethical considerations: Some polling experts have argued that publishing poll results can itself affect elections (by discouraging supporters of the trailing candidate from voting). Should polls be published before elections? Where do you draw the line between informing the public and influencing the outcome?
Lessons for data science: Think of a dataset you've worked with (or one you plan to work with). Who or what is "missing" from that data? How might the missing data bias your conclusions? What would you do about it?

Key Takeaway: The precision of a confidence interval is only as good as the quality of the sample it's built on. Formulas calculate uncertainty from randomness, but they can't detect or correct for bias. The most important question is always: "Is my sample representative of the population I care about?"