Case Study 8.2: The 1936 Election and the Birth of Scientific Polling

Case Study 8.2: The 1936 Election and the Birth of Scientific Polling

Context

The 1936 presidential election produced what remains the single most dramatic demonstration of the difference between large but biased samples and small but representative ones. Understanding this episode in detail — not just its punchline — teaches lessons about sampling theory that are as relevant today as they were ninety years ago.

The story has three protagonists: The Literary Digest, the established polling institution of its era with a twenty-year track record of correct presidential predictions; George Gallup, a young researcher with a new, scientifically grounded approach; and the American electorate, emerging from the depths of the Great Depression with a political geography that didn't look like anything the Digest's frame could capture.

The Literary Digest's Method

The Literary Digest had developed its straw poll methodology in the 1910s and 1920s, refining it across four successful presidential election cycles. The approach was scale: send as many questionnaires as possible to as many people as possible, and let the sheer volume of responses average out any individual biases. By 1936, the operation involved mailing tens of millions of postcard ballots to names drawn from three sources: automobile registrations, telephone directories, and the Digest's own subscriber list.

In 1936, the Digest mailed approximately 10 million postcards and received 2.4 million back — a return rate of roughly 24%. Based on these responses, they projected Landon would win with 57% of the popular vote.

Roosevelt won with 61%.

Why the Digest Was Wrong

Post-election analysis identified two distinct but reinforcing sources of error.

Coverage bias: Automobile registration and telephone directory frames systematically excluded lower-income Americans. In 1936, the United States was in the sixth year of the Great Depression. Unemployment had reached 25% at its peak in 1932-33 and was still above 17% in 1936. Car ownership and telephone service were markers of middle-class or upper-class economic status — exactly the groups most resistant to Roosevelt's New Deal economic policies. The rural poor, the urban working class, and the unemployed — the core of Roosevelt's coalition — simply weren't on the mailing lists.

This is not a subtle bias. Estimates suggest that the Digest's frame covered roughly 60-65% of the U.S. adult population, but that the uncovered 35-40% were disproportionately Democratic. The coverage gap created a structural floor beneath which the Digest's estimate couldn't fall, regardless of how many postcards were returned.

Nonresponse bias: Even among those who received postcards, return rates were not uniform. Evidence from later analyses suggests that Landon supporters were somewhat more likely to return their ballots than Roosevelt supporters — possibly because higher-income households, more likely to be Landon supporters, had more leisure time and saw more value in civic participation. This differential nonresponse amplified the coverage bias rather than correcting for it.

The combination was catastrophic. A large, precisely wrong estimate is worse than a smaller, appropriately uncertain one.

What Gallup Did Differently

George Gallup had been developing a different approach at the American Institute of Public Opinion, which he founded in 1935. His method drew heavily on quota sampling — not probability sampling in the modern sense, but a systematic attempt to select respondents proportionate to the demographic composition of the electorate.

Gallup assigned interviewers specific quotas: interview a certain number of urban/rural respondents, a certain number of men/women, a certain number of upper/middle/lower income respondents, until the aggregate sample composition matched known Census proportions. Unlike the Digest's passive mail approach, Gallup's method actively sought respondents from the demographic groups that mattered.

With a sample of approximately 50,000 — one-fiftieth the size of the Digest's — Gallup correctly predicted Roosevelt's win. He also published a prediction of the Literary Digest's error before the Digest's own results were released, based on his analysis of their methodology. When the election results confirmed his critique, it established the scientific polling movement's credibility in a single news cycle.

What Gallup Still Got Wrong

It is worth noting that Gallup's method was also flawed by modern standards. Quota sampling is not probability sampling: it gives each respondent a known demographic classification, but it does not give each member of the population a known selection probability. Interviewers could fill their quotas with whichever respondents were convenient — urban, accessible, English-speaking, willing to talk to a stranger.

Gallup's method worked reasonably well through 1944. In 1948, quota sampling failed spectacularly: all major polling organizations predicted Dewey would defeat Truman, partly because they stopped polling too early (missing a late swing toward Truman) and partly because quota sampling was overrepresenting accessible, higher-income respondents even within demographic groups. The 1948 debacle led directly to the adoption of probability sampling methods in professional survey research.

The Methodological Revolution

The period from 1936 to 1952 represents the foundational era of scientific political polling. Several methodological advances from this period remain the basis of modern practice:

Probability sampling replaced quota sampling as the theoretical foundation for inference. The work of Leslie Kish at the University of Michigan Survey Research Center, building on Jerzy Neyman's earlier sampling theory, established the mathematical framework that survives today.

Question wording standardization emerged from comparative studies showing that different question formulations produced different results. The commitment to standardized, pretested questions became a professional norm.

The interview as social interaction was recognized as a source of systematic measurement error. Interviewer effects — the influence of an interviewer's race, gender, or communication style on respondent answers — were documented empirically and became the subject of methodological research.

The Contemporary Parallel

There is a discomfiting parallel between the Literary Digest's methodology and contemporary online opt-in polling. The Digest's frame was biased toward automobile owners and telephone subscribers — the internet-engaged, survey-taking population of the 1930s. Online opt-in panels are biased toward internet-engaged, survey-taking adults — today's equivalent.

The Digest relied on scale (10 million postcards) to compensate for frame bias. Some contemporary online pollsters rely on scale (large panel sizes, sophisticated weighting) to compensate for self-selection bias. As in 1936, scale does not cure coverage bias; it only produces a more precisely wrong answer.

The key methodological question for contemporary polling is not whether online panels can be made to work — under some conditions, with appropriate weighting, they can. The question is whether the conditions are typically met, and whether pollsters are sufficiently transparent about the cases where they are not.

Trish McGovern, who keeps the Digest's final poll result framed in her office, offers the practical lesson: "The question isn't whether you have a lot of responses. The question is whether the people who responded are the people you needed to hear from."

Discussion Questions

The Literary Digest had a successful track record before 1936. Why did their method work in 1920, 1924, 1928, and 1932, but fail in 1936? What changed about the electorate that exposed the latent bias in their methodology?
Gallup's quota sampling was not probability sampling, yet it produced better results than the Digest's massive sample. What does this tell us about the relative importance of representativeness versus sample size?
The 1948 polling failure came from different sources than the 1936 Literary Digest failure. Compare the two errors: what do they have in common, and where do they differ? What methodological lessons follow from each?
The "contemporary parallel" argument holds that online opt-in polling resembles the Digest's convenience sampling in key ways. Is this analogy fair? What are the strongest arguments for and against the claim that modern weighting methods can overcome self-selection bias in online panels?
If Gallup had been wrong in 1936 and the Digest had been right, how might the history of political polling have unfolded? What does the 1936 episode suggest about the role of high-profile successes and failures in shaping professional methodology norms?