Chapter 30 Key Takeaways: Field Experiments in Politics

The Fundamental Logic of Causal Inference

1. Observational comparisons of contacted versus non-contacted voters are not valid estimates of canvassing effects. Campaigns contact voters who are already more likely to support the candidate and turn out. Comparing contacted voters to all non-contacted voters conflates the causal effect of contact with the selection effect of who gets contacted. This confounding makes it impossible to recover the true effect from observational data alone.

2. Random assignment creates the counterfactual that observational data cannot provide. When voters are randomly assigned to receive contact (treatment) or not (control), the groups are, in expectation, equivalent on all characteristics before the experiment. Any subsequent difference in outcomes can be attributed to the treatment, not to pre-existing differences. This is the fundamental logic of experimental causal inference.

3. The ITT and LATE answer different questions, both of which matter. The intent-to-treat (ITT) estimate — the difference in outcomes between treatment-assigned and control-assigned groups — answers the campaign's operational question: "If I allocate canvassing resources to this universe, what gain in turnout should I expect?" The local average treatment effect (LATE) — the ITT divided by the contact rate — answers the researcher's mechanism question: "What does actual contact do to a contacted voter's probability of voting?" Both are valid and valuable; they serve different purposes.

Experimental Design

4. Blocked randomization improves precision without sacrificing validity. Stratifying the sample on key background variables (past turnout history, support score tier, geography) before randomizing within strata ensures balance and reduces outcome variance, producing more precise treatment effect estimates. In political experiments where effects are small and samples must be large, this precision improvement is worth the added design complexity.

5. Cluster randomization prevents spillover at a cost to statistical power. Randomizing entire precincts, blocks, or geographic clusters rather than individual voters prevents contamination of control groups through social network transmission and canvasser incidental contact. But it requires more units (clusters) to achieve the same statistical power as individual-level randomization, because outcomes within clusters are correlated. The right level of clustering depends on the specific spillover pathways of the treatment.

6. Statistical power requires large samples in political experiments. Political behavior effects are small relative to their variance. Detecting a 2–3 percentage point canvassing effect with 80% power requires thousands of voters per arm. The discipline of power analysis before running an experiment prevents the costly mistake of running an underpowered study that cannot distinguish "the treatment doesn't work" from "the experiment wasn't large enough to detect a real effect."

7. Operational fidelity is as important as statistical design. The most elegant randomization scheme fails if canvassers contact control-group voters, supervisors allocate better canvassers to treatment precincts, or protocol breaches contaminate the comparison. The gap between experimental design and execution is where most political field experiments encounter their largest problems. Trish McGovern's rigorous protocol management — canvassers blind to control lists, daily monitoring of approach flags, swift protocol breach response — illustrates what operational integrity looks like in practice.

Major Research Findings

8. Personal canvassing is the most effective single voter contact mode per contact. Meta-analyses find average canvassing effects of approximately 2–4 percentage points per contact, with substantial variation by population, canvasser quality, and context. Neighbor-to-neighbor canvassing (using geographically proximate volunteers) produces larger effects than stranger canvassing.

9. Social pressure mail can produce large effects but carries real backlash risks. The Gerber-Green-Larimer (2008) finding of an 8-point turnout effect from neighbor-comparison mailers has been replicated in various forms. However, recipients who receive the full neighbor-naming version often report anger that reduces their subsequent engagement with the sending organization. Modified versions (community norms without specific neighbor identification) produce smaller effects with less backlash.

10. Phone banking effects have declined substantially; text banking has somewhat higher efficacy. The collapse in answer rates for landline and cell phone calls has reduced the per-contact effectiveness of phone banking to approximately 0.5–1.5 percentage points. Text banking produces somewhat larger effects (1–2 percentage points), particularly among younger voters, at lower per-contact cost.

11. Direct mail effects are small but real and cost-competitive. Each piece of direct mail produces turnout effects in the range of 0.3–0.7 percentage points. At typical mail costs, this is competitive on a cost-per-vote basis, particularly for high-volume programs targeting low-propensity supporters.

12. Persuasion effects are smaller and more context-dependent than mobilization effects. Experiments that test whether voter contact changes vote choice (rather than just turnout) generally find much smaller effects than GOTV experiments — often statistically indistinguishable from zero. Moving voters across partisan lines is genuinely difficult, and no single contact mode reliably produces large persuasion effects.

Non-Experimental Alternatives

13. Regression discontinuity exploits threshold-based treatment to approximate random assignment near the cutoff. Comparing outcomes for units just above and just below a treatment threshold (where assignment changes discontinuously) provides credible causal estimates for the units near the threshold. The method works for political questions with natural thresholds — close election winners versus losers, campaign finance reporting cutoffs, competitive race designations — but its estimates apply only near the threshold, not to the full distribution.

14. Difference-in-differences requires strong assumptions about counterfactual trends. Comparing outcome changes over time between treatment and comparison groups can produce credible causal estimates when the parallel trends assumption holds — when, absent the treatment, the two groups would have changed at the same rate. This assumption should always be interrogated explicitly, because treatments are often applied to groups that were already on different trajectories.

Ethics and Practice

15. Field experiments in political contexts require ethical attention, even when legally permissible. The absence of an IRB requirement does not establish ethical permissibility. Experiments should be evaluated on: the harm potential of the treatment (are you treating voters differently in ways they would object to?), the adequacy of the comparison between treated and control conditions (is being in the control group a meaningful deprivation?), and the use of findings (do the findings benefit those who were subjected to the experiment, or only those who funded it?).

16. Experimental findings should be applied to new contexts with explicit attention to generalizability. Effect sizes from one election, population, and treatment context do not automatically translate to another. The 2.5-point meta-analytic canvassing estimate is an average across diverse conditions; any specific application may produce larger or smaller effects. Context-specific experimentation — as Meridian conducts for CEF — is more informative than applying literature averages.

17. Prediction and explanation are complementary, not competing. Campaign targeting models predict who will vote and who will support the candidate — enabling efficient resource allocation. Field experiments explain why interventions change behavior — enabling principled choice among contact strategies. The two approaches work together: experiments generate the causal knowledge that makes targeted programs effective; targeting ensures that experimentally validated strategies reach the right voters. Neither alone is sufficient for a well-designed campaign.