Chapter 33 Exercises

DataField.Dev

Chapter 33 Exercises

All exercises assume you have a working Python environment with pandas, numpy, matplotlib, plotly, and scipy installed. Solutions are provided in code/exercise-solutions.py.

Exercise 1: Pace Sensitivity Analysis

The Garza campaign's voter contact program is currently running at 87% of the pace needed to reach its 87,000 contact goal by Election Day.

Part A — What-if analysis: Write code that computes the projected final contact count under four scenarios: current pace unchanged, +10% pace increase, +20% increase, and +50% increase. For each scenario, report: - Final contacts by Election Day - Whether the goal is reached - If the goal is reached, how many days before Election Day

Part B — Break-even calculation: Write a function find_breakeven_pace_increase(total_contacted, remaining, days_remaining) that computes the minimum percentage increase in daily contact pace required to hit the goal exactly by Election Day.

Part C — Visualization: Plot the cumulative contact curves for all four scenarios on a single chart, with the goal line marked. Color the scenarios that hit the goal in green and those that fall short in red.

Discussion: The campaign's field director estimates that adding 5 volunteers per canvassing shift would increase daily contacts by approximately 15%. Based on your analysis, would this be sufficient? What would she need to commit to closing the gap entirely?

Exercise 2: Equity-Weighted Prioritization

The standard voter contact prioritization model in this chapter maximizes contact efficiency for persuadable swing voters. This exercise asks you to modify the prioritization model to explicitly account for mobilization of low-turnout base voters.

Part A: Define a new prioritization component called mobilization_need that takes a value of 1 for voters who meet all three criteria: (a) support score above 65, (b) vote frequency of 0 or 1, (c) age 18-35. Give this component a weight of 20% in the overall priority score, redistributing weight from the persuadability component.

Part B: Compare the top 5,000 priority contacts from the standard model to the top 5,000 from the equity-weighted model. Report: - How many voters appear in both lists - The demographic profile (by race/ethnicity and age cohort) of voters who are added by the equity-weighted model - The demographic profile of voters who are dropped

Part C: Write a function compute_expected_value(priority_list, vote_multiplier=1.0) that estimates the expected number of additional votes generated by contacting the top N voters on the priority list, given an assumed contact-to-vote conversion rate. Compare the expected value of the standard vs. equity-weighted top 5,000 under different assumptions about conversion rates.

Discussion: Adaeze Nwosu argues that "the analytical methods should be available to any organization doing legitimate civic work — how you use them is a political choice." Do you agree that the choice of prioritization weights is a political choice rather than a technical one? What would it mean to make this choice more explicitly?

Exercise 3: Canvasser Performance Analysis

The Garza campaign's 40 active canvassers have varying performance records. The campaign data director wants to identify high and low performers to improve coaching and resource allocation.

Part A: Using the synthetic canvasser data generated in exercise-solutions.py, compute the following metrics for each canvasser: - Total contacts - Contacts per hour - Positive outcome rate (confirmed + soft support / total attempted) - Percent of contacts in persuadable segments (targeting quality)

Part B: Create a composite efficiency score that weights these four metrics. Justify your weighting choices. Who are the top three and bottom three performers?

Part C: Test whether there is a statistically significant difference in positive outcome rate between canvassers who score in the top quartile on contacts per hour and those who score in the bottom quartile. (Hint: use a chi-squared test or Fisher's exact test for 2x2 contingency tables.) What does this relationship — if any — tell you about the speed-quality tradeoff in canvassing?

Part D: Build a visualization that allows a field director to quickly identify canvassers who are fast but low quality, high quality but slow, and the ideal high quality / reasonable speed combination. (Suggestion: a scatter plot of contacts per hour vs. conversion rate, colored by targeting quality.)

Exercise 4: Contact Script A/B Test

The Garza campaign ran a field experiment in which canvassers used one of three contact scripts: a control script (standard economic message) and two treatment scripts (healthcare-focused and candidate biography). Canvassers were randomly assigned to scripts.

Part A: Using the A/B test data in exercise-solutions.py, compute the conversion rate (positive outcomes / total contacts) for each script version.

Part B: Test whether the differences in conversion rates are statistically significant using: 1. A chi-squared test across all three versions 2. Pairwise tests comparing each treatment to the control

Report the chi-squared statistic, degrees of freedom, p-value, and your conclusion about statistical significance at the 0.05 level.

Part C: Compute 95% confidence intervals for the conversion rate in each script version. Visualize the confidence intervals on a single chart. Based on the overlapping intervals, can you confidently say any script is better than the others?

Part D: The campaign wants to know which script to use for the remaining 35 days. Write a brief (150-word) recommendation memo to the campaign manager. Your memo should address: (a) the statistical evidence, (b) any important caveats about interpreting the results, and (c) your specific recommendation.

Exercise 5: Week-over-Week KPI Report

Campaign managers need to track whether their programs are improving over time, not just whether they're on pace to hit the final goal.

Part A: Write a function compute_weekly_kpis(df) that groups contact records by campaign week and computes the following for each week: - Total contacts - Contacts per day (average for that week) - Percent of contacts in persuadable segments - Average support score of contacted voters - Positive outcome conversion rate

Part B: Compute the week-over-week percent change for each KPI. Which KPIs are trending in the right direction? Which are declining?

Part C: Build a 4-panel matplotlib chart showing the week-over-week trend for: total weekly contacts, percent persuadable, average support score, and conversion rate. Add a linear regression trend line to each panel and indicate the slope and R-squared value.

Part D: Adaeze Nwosu argues that "measurement shapes reality" — that tracking specific KPIs changes the behavior of the people being measured. For each of the four KPIs in Part C, describe a specific way that canvassers or phone bankers might optimize for that KPI in ways that inflate the metric without actually improving campaign performance. How would you design the KPI framework to reduce these perverse incentives?