Case Study 4.2: The Turnout Model's Missing Universe
Background
The Garza campaign's field operation runs on a turnout model — a scored voter file that assigns every registered voter in the state a probability between 0 and 100 of casting a ballot in the November general election. Voters with high turnout propensity scores receive minimal campaign resources (they will vote regardless); voters with very low scores also receive minimal resources (they are unlikely to vote regardless of contact). The campaign concentrates its organizing dollars on voters in the middle range — those who are likely to vote if contacted and motivated, but unlikely to vote if ignored.
Nadia inherited this model from the previous analytics director. It was built using historical voting data from 2018, 2020, and 2022, cross-referenced with demographic variables from the voter file and a commercial data vendor. The model was validated against 2022 general election results and achieved a predictive accuracy of 71% — meaning that 71% of voters scored above 50 were correctly classified as eventual voters.
Three weeks before Election Day, a major problem surfaces.
The New Registrant Problem
The Garza campaign's Latino community organizing program, run in partnership with several community organizations, has driven 67,000 new voter registrations since January — a remarkable result that represents a significant expansion of the electorate. These new registrants are disproportionately young (median age 29), Latino, and first-time voters with no voting history.
The problem: because these voters have no voting history, the turnout model cannot score them reliably. The model was built on the assumption that past voting behavior is the strongest predictor of future voting behavior — which is true for habitual voters but provides no information about first-time registrants. The model assigns all 67,000 new registrants a default score of 45, placing them in the "moderate propensity" category by convention.
This default score is almost certainly wrong for a large share of these voters, but the error could run in either direction. Community organizers report high enthusiasm among new registrants — their own informal assessments suggest turnout propensity is much higher than 45. But first-time voters have historically notoriously lower actual turnout than their self-reported enthusiasm suggests; enthusiasm surveys tend to overestimate first-timer turnout by 15 to 25 percentage points.
The Analytical Decision
The campaign must decide how to allocate its remaining $300,000 in field resources. There are three options:
Option A: Trust the model. Score all new registrants at 45 (moderate propensity) and allocate field resources accordingly — roughly proportional to new registrant volume in each region, but without prioritizing them over scored voters in the same range.
Option B: Trust the organizers. Treat new registrants as high-propensity voters based on community organizer reports of enthusiasm. Concentrate resources on contact and mobilization of new registrants, particularly in the largest Latino community concentrations.
Option C: Build a quick update model. Use whatever behavioral data is available on new registrants (whether they have responded to prior campaign contact, whether they attended any campaign events, whether they match names in social media engagement data) to build a rough scoring supplement. Re-score the new registrant universe before allocating resources.
Nadia's Analysis
Nadia begins with the decision tree. The decision: $300,000 field resource allocation, with 21 days until Election Day. Time is the binding constraint — there is not enough of it to conduct a proper survey of new registrants or to run a proper randomized experiment on contact effects.
She focuses on what data she actually has. The campaign's own contact records show that among new registrants who have already been contacted (approximately 22,000 of the 67,000), the response rate — measured as positive engagement during door-knocking and phone banking — is 41%. Among comparably scored habitual voters in the same moderate-propensity range, the response rate during the same contact period was 34%. This is a meaningful difference, and it is available without additional data collection.
She also pulls the 2018 data, which was the last major Democratic mobilization cycle in the state. In 2018, the campaign ran a significant new registrant drive; those new registrants ultimately voted at a rate of approximately 38%, compared to the community organizers' enthusiasm-based estimates of 60 to 65%. The base rate for first-time voter turnout — even in a favorable mobilization environment — is considerably lower than enthusiasts expect.
Her conclusion is nuanced. She recommends a modified version of Option C: use the contact response rate data to identify "active" new registrants (those who have already positively engaged with the campaign) and score them as high-propensity. Score the remaining "passive" new registrants — those not yet contacted, or contacted without positive response — at a conservatively adjusted 35, lower than the default 45, based on the historical first-timer turnout base rate.
This hybrid approach improves on both pure options: it does not ignore the real enthusiasm signal from the active new registrant pool, and it does not over-invest in passive new registrants who historical patterns suggest may not vote.
The Broader Lesson: Missing Data Is Not Neutral
Nadia's analysis surfaces a broader principle that applies throughout political data work: missing data is almost never missing at random. The 67,000 new registrants are absent from the turnout model not because of a technical oversight but because they were, by definition, not registered during the periods when the historical training data was collected. Their absence from the model is systematic and correlated with their characteristics — they are younger, newer to civic participation, and differently distributed geographically than the voters the model was trained on.
When a model is applied outside its training distribution — as this model is being applied to a population it never saw — its predictions are unreliable in ways that are difficult to estimate. The 71% accuracy figure derived from 2022 validation data tells you nothing about accuracy for this new population. The model is, in statistical terms, being asked to extrapolate, and extrapolation is always more uncertain than interpolation.
This principle — that models applied outside their training distribution lose calibration — is one of the most important and underappreciated limitations of applied political analytics. The campaigns and consultants who forget it tend to produce the most costly analytical failures.
Jake Rourke's Parallel Problem
Interestingly, the Whitfield campaign faces a structurally identical problem. Whitfield's populist message has attracted what Jake calls "garage-door" voters — people who have never previously participated in politics but show up at Whitfield rallies, buy campaign signs, and are intensely enthusiastic. These voters are essentially invisible in any historical turnout model. Jake's instinct is to trust the rally crowds; Nadia's analysis of a parallel problem gives him reason to be cautiously skeptical. Rally enthusiasm is a notoriously unreliable predictor of actual turnout, particularly among first-time participants, and the base rate for "enthusiasm-driven but historically non-participating" voter segments is a complicated story across American political history.
Discussion Questions
1. Why is missing data "not neutral" in this context? What would it mean for data to be "missing at random," and how does the new registrant problem differ from that case?
2. Nadia uses contact response rate as a proxy for turnout propensity among new registrants. What are the strengths and limitations of this proxy? What alternative proxies might be available?
3. The community organizers' enthusiasm estimates of 60–65% turnout appear substantially higher than the historical base rate of ~38%. How should Nadia weight these two data sources? What are the different error modes of each?
4. Nadia recommends a hybrid approach (Option C modified). What risks does this approach carry that pure Options A or B would not? Is the increased complexity worth it given the 21-day time constraint?
5. The principle "models applied outside their training distribution lose calibration" has implications beyond this specific case. What other political contexts might involve applying a model to a population that is systematically different from the population on which it was validated?
Key Analytical Concepts Illustrated
- Missing not at random (MNAR): systematic absence from a dataset correlated with the phenomenon of interest
- Out-of-distribution prediction: applying a model to a population it was not trained on
- Base rate discipline: anchoring enthusiasm-based estimates to historical first-timer turnout
- Proxy measurement: using available behavioral signals as stand-ins for unobservable quantities
- Decision under uncertainty: structuring field resource allocation when model inputs are unreliable