Chapter 28 Exercises: The Modern Data-Driven Campaign

Chapter 28 Exercises: The Modern Data-Driven Campaign

Conceptual Exercises

Exercise 28.1 — The Data Audit Imagine you have just joined a competitive congressional campaign as a junior analyst. Your first task is a data audit. List the five data sources you would check first, explain what you would be looking for in each, and describe one specific quality problem that would concern you most in each source. Present your audit plan as a short memo addressed to the campaign manager.

Exercise 28.2 — Voter File Comparison Using publicly available information about Catalist, i360, and TargetSmart, compare the three vendors across the following dimensions: (a) partisan alignment, (b) geographic coverage, (c) types of data enrichment offered, and (d) typical client profile. Identify one campaign scenario in which each vendor would be the most appropriate choice.

Exercise 28.3 — The Pipeline Problem A campaign's analytics director discovers midway through October that canvass data from two field regions has been misrouted — it has been entered into VAN but not syncing properly to the modeling team's database. The canvass covers approximately 15,000 voter contacts over six weeks. Describe: (a) the immediate operational impact, (b) the impact on the campaign's models, and (c) a remediation plan that minimizes further data loss while not disrupting field operations.

Exercise 28.4 — Universe Segmentation Design A statewide campaign has a registered voter file of 3.2 million voters. Design a universe segmentation for the campaign's GOTV program. Specify: (a) how many tiers you would create, (b) what criteria you would use to assign voters to each tier (support score ranges, turnout score ranges), (c) what resource allocation you would recommend for each tier, and (d) what you would do differently for persuasion vs. mobilization targeting.

Exercise 28.5 — Technology Philosophy Debate Construct the strongest possible case for Jake Rourke's hybrid approach — the argument that experienced local political knowledge should systematically override model outputs in specific circumstances. Then construct the strongest possible rebuttal from Nadia Osei's perspective. Which argument do you find more persuasive, and what evidence would you need to adjudicate between them?

Applied Exercises

Exercise 28.6 — Historical Case Study Select one of the following campaign analytics case studies and write a 600–800 word analysis: (a) the 2008 Obama campaign's data operation, (b) the 2012 Obama campaign's "Cave" analytics team, (c) the 2016 Clinton campaign's data operation and its post-election critiques, or (d) the 2022 midterm cycle's key data innovations. Your analysis should address what the campaign did, what worked, and what (if anything) failed.

Exercise 28.7 — Dashboard Design Design a weekly analytics dashboard for a statewide campaign manager who is not a data expert. The manager needs to make resource allocation decisions every Monday morning. Specify: (a) which five metrics you would include (justify each), (b) how you would visualize each metric, (c) what thresholds or benchmarks you would show alongside each metric, and (d) what information you would deliberately exclude to avoid information overload.

Exercise 28.8 — The Overfit Problem A campaign analytics team is presenting its support score model. The model was trained on the last four election cycles in the state and achieves 82% accuracy in predicting voter behavior on the training data. In testing on held-out data from previous cycles, it achieves 76% accuracy. When the team discusses the current cycle, however, there are signs that the electorate is behaving differently than in past cycles on two specific demographic dimensions. Write a memo from the analytics director to the campaign manager explaining: (a) what these numbers mean in plain language, (b) what the current-cycle anomaly suggests about model reliability, and (c) what additional data or tests would help assess the model's current-cycle accuracy.

Discussion Exercises

Exercise 28.9 — The Ethics of Voter File Commercialization The voter file is created through public democratic administration and maintained at public expense. It is then enriched, packaged, and sold commercially by firms like Catalist and i360. In small groups or in writing, discuss: Is this arrangement appropriate? Who benefits from the current system? Who bears costs? What alternative arrangements are possible, and what would each alternative mean for campaign practice?

Exercise 28.10 — Who Gets Counted? Nadia's universe segmentation, like all targeting universes, will produce a list of voters who receive campaign contact and voters who do not. Think through the following: Which categories of voters are most likely to be systematically excluded from campaign contact in a typical statewide race? What are the demographic and geographic correlates of exclusion? What, if anything, should campaigns do differently to address systematic non-contact of particular communities?

Quantitative Exercise

Exercise 28.11 — Match Rate Calculation A campaign receives a new consumer data file containing 850,000 records that it wants to match to its 2.1 million voter file. After the matching process, 612,000 records are successfully matched. (a) Calculate the match rate. (b) The campaign's analytics director considers anything below 65% a quality concern. Is this match rate concerning? (c) Among the unmatched records, the campaign suspects that recent movers are disproportionately represented. What additional data would you collect to test this hypothesis, and how would you interpret a positive finding?

Exercise 28.12 — Score Distribution Analysis A campaign's support score model produces scores ranging from 0 to 100 for 1.8 million registered voters. The distribution is roughly normal with a mean of 48 and standard deviation of 16. The campaign wants to define its "persuasion universe" as voters with support scores between 40 and 60. (a) Approximately what percentage of the voter file falls in this range? (b) If the campaign can afford to contact 200,000 voters in the persuasion universe, what additional criteria would you use to prioritize within that universe? (c) How would you validate that voters in the 40–60 range are genuinely more persuadable than those outside it?