Nadia Osei had been building dashboards for three years, but she'd never built one quite like this before.
Learning Objectives
- Design a voter contact dashboard that serves campaign decision-making needs
- Load, clean, and validate voter file data using pandas
- Compute key performance indicators for voter contact programs
- Visualize contact progress and KPI trends using matplotlib
- Build interactive geographic and demographic breakdowns using plotly
- Create a voter prioritization tool that ranks contact targets by score and persuadability
- Understand the relationship between data infrastructure and campaign strategy
- Evaluate the ethical implications of algorithmic prioritization in voter contact
In This Chapter
- The Meeting That Changed the Dashboard
- 33.1 Dashboard Design Principles for Campaign Analytics
- 33.2 Setting Up the Environment and Loading Data
- 33.3 Key Performance Indicators for Voter Contact
- 33.4 Geographic Breakdown: County-Level Analysis
- 33.5 Cumulative Progress and Pace Visualization
- 33.6 Support Score Distributions by Demographic Segment
- 33.7 Building the Prioritization Tool
- 33.8 Interactive Dashboard with Plotly
- 33.9 Nadia Presents the Dashboard
- 33.10 Adaeze Nwosu and ODA's Broader Framework
- 33.11 Generating the Daily Report
- 33.12 Ethical Dimensions: What the Dashboard Does to Democracy
- Chapter Summary
- Key Terms
- 33.13 Connecting to Live Canvassing Apps: Real-Time Data Integration
- 33.14 Dashboard Views for Different Audiences
- 33.15 Troubleshooting Common Data Quality Issues in Voter Contact Data
- 33.16 Connecting the Dashboard to Chapter Themes
Chapter 33: Building a Voter Contact Dashboard (Python Lab)
The Meeting That Changed the Dashboard
Nadia Osei had been building dashboards for three years, but she'd never built one quite like this before.
She was sitting in a conference room at the Garza campaign's field headquarters — a former insurance office, the walls still faintly marked where motivational posters had hung — across from Yolanda Torres, the campaign manager, and two regional field directors. It was five weeks before Election Day. They had 340 paid and volunteer canvassers, 18 active phone bank locations, and a voter contact goal that Nadia had been told, firmly, was non-negotiable: 87,000 unique voter contacts in the final 35 days.
Yolanda slid a printed sheet across the table. "This is what I get from the field team every morning," she said. "Doors knocked yesterday. Calls completed. Pledge cards." The sheet had three rows of numbers. "I need to know more than this."
Nadia understood immediately what Yolanda wanted. Not just what happened yesterday — but whether what was happening was enough. Whether the campaign was on track to hit 87,000. Whether the canvassers were hitting the right doors — the persuadable voters who needed contact, not the strong partisans who would vote regardless. Whether the phone bank results in Riverside County, where the race was tightest, were diverging from the results in other counties in ways that mattered.
"I can build that," Nadia said. "I need the voter file, the contact history data, and your support scores."
Yolanda looked at the data director, who nodded. "Adaeze Nwosu at ODA has a data integration framework," the data director said. "She's been offering to help us connect the pieces."
Two days later, Nadia had a meeting scheduled with Adaeze Nwosu, executive director of OpenDemocracy Analytics, and Sam Harding, ODA's data journalist. That meeting produced the dataset and the framework that this chapter's analysis is built on.
33.1 Dashboard Design Principles for Campaign Analytics
Before writing a single line of code, Nadia spent two hours on a whiteboard thinking through what the dashboard needed to accomplish. This design-first approach is not merely good practice — it is essential for campaign analytics, where the risk of building technically impressive but strategically useless tools is substantial.
The Core Questions the Dashboard Must Answer
Good dashboard design starts with the questions decision-makers need to answer, not with the data that is available. Nadia's first whiteboard list:
- Are we on track to hit 87,000 contacts by Election Day? (Progress to goal, with projected completion date at current pace)
- Where are we over- or under-performing our county-level targets? (Geographic breakdown of contacts vs. goals)
- Are we reaching the right voters? (Are contacts concentrated in high-persuadability segments, or are canvassers defaulting to easy doors?)
- What is the support score profile of the voters we've reached? (Are we convincing the persuadable, or just preaching to the choir?)
- Who should we be contacting next? (Prioritization: which voters, in which geographies, have the highest return on contact investment?)
- How are key metrics trending? (Are contacts per day improving as the campaign scales, or flattening?)
These six questions drove every design decision about the dashboard. Nadia's rule for the project: every visualization must answer at least one of these six questions. Visualizations that don't answer a question don't ship.
Data Integration: What Goes In
The Garza dashboard integrated three data sources:
The voter file (from ODA): Demographic information, party registration, vote history, geographic identifiers, and modeled scores (support score, persuadability score) for every registered voter in the relevant geographies. Nadia worked with Sam Harding to use ODA's data integration framework, which provided a clean, documented version of the state voter file with standardized field names and consistent encoding. This became oda_voters.csv.
Contact history: Every voter contact attempt (canvass door knock, phone call, text) logged by the campaign's field organization, with outcome codes (contacted, not home, refused, already voted, moved, etc.). This was extracted from the campaign's VAN (Voter Activation Network) database.
Digital engagement data: Email opens, donation history, event RSVPs, and volunteer activity from the campaign's CRM. Used primarily for segmenting supporters, not for targeting persuasion contacts.
The core of the technical challenge was joining these three sources on voter ID, handling discrepancies, and producing a unified analysis dataset. ODA's framework provided standardized voter IDs that matched across all three sources — a significant technical advantage that campaigns without ODA access typically have to solve themselves, expensively.
💡 Intuition: Why Data Integration Is Harder Than It Sounds Campaign databases are notoriously messy. The voter file, the campaign VAN, and the email CRM all use different voter identifier systems. A voter might appear in the voter file as "Maria C. Rodriguez" with one address, in the contact log as "Maria Rodriguez" with a slightly different address, and in the email system by email address alone. Matching these records requires probabilistic record linkage — sophisticated deduplication that Nadia was relying on ODA's framework to have already solved. For campaigns without this infrastructure, data integration alone can consume weeks of staff time before any analysis is possible.
Dashboard Architecture
Nadia planned three tiers of the dashboard:
Tier 1 (Daily summary, for Yolanda): Single-page KPI dashboard. Total contacts, contacts yesterday, pace vs. goal, county-level progress bars. Updated every morning. Designed to be read in two minutes.
Tier 2 (Weekly analytical view, for field directors): County and precinct breakdowns, support score distributions by contact status, persuadability segment analysis, contact quality metrics. Updated weekly.
Tier 3 (On-demand prioritization tool, for data director): Interactive tool for generating priority contact lists by geography, filtered by support score and persuadability, with exportable lists for VAN upload.
This architecture — tactical summaries for executives, analytical depth for field managers, operational tools for data staff — reflects a principle that Adaeze Nwosu had emphasized in the ODA framework: "The dashboard should match what each person needs to make their next decision, not show everyone everything at once."
33.2 Setting Up the Environment and Loading Data
Let's set up the analysis environment and load the ODA voter dataset. All code in this chapter uses standard scientific Python libraries: pandas, numpy, matplotlib, and plotly.
# Install required libraries if needed:
# pip install pandas numpy matplotlib plotly scipy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
# Set consistent style for matplotlib plots
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
print("Environment ready.")
Loading and Inspecting the ODA Voter Dataset
# Load the ODA voter file
# In production, this would be a full state voter file joined with modeled scores.
# For this lab, oda_voters.csv contains a representative sample of 50,000 voters
# across the key counties in the Garza-Whitfield Senate race.
df = pd.read_csv('oda_voters.csv')
print(f"Dataset shape: {df.shape}")
print(f"\nColumn names:\n{df.columns.tolist()}")
print(f"\nData types:\n{df.dtypes}")
print(f"\nFirst 3 rows:\n{df.head(3).to_string()}")
The ODA dataset contains the following fields:
| Field | Description | Type |
|---|---|---|
voter_id |
Unique voter identifier | string |
state |
State abbreviation | string |
county |
County name | string |
age |
Voter age | integer |
gender |
Gender (M/F/Other/Unknown) | string |
race_ethnicity |
Race/ethnicity category | string |
education |
Education level | categorical |
income_bracket |
Household income bracket | categorical |
party_reg |
Party registration (D/R/Other/Unaffiliated) | string |
vote_history_2018 |
Voted in 2018? (1/0) | integer |
vote_history_2020 |
Voted in 2020? (1/0) | integer |
vote_history_2022 |
Voted in 2022? (1/0) | integer |
urban_rural |
Urban/Suburban/Rural classification | string |
support_score |
Modeled probability of supporting Garza (0-100) | float |
persuadability_score |
Modeled likelihood to be persuaded (0-100) | float |
# Basic data quality check
print("=== DATA QUALITY REPORT ===")
print(f"\nMissing values by column:")
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(2)
quality_df = pd.DataFrame({'missing_count': missing, 'missing_pct': missing_pct})
print(quality_df[quality_df['missing_count'] > 0])
print(f"\nValue counts for key categorical variables:")
for col in ['party_reg', 'urban_rural', 'gender']:
print(f"\n{col}:")
print(df[col].value_counts())
print(f"\nSupport and persuadability score distributions:")
for col in ['support_score', 'persuadability_score']:
print(f"\n{col}: mean={df[col].mean():.1f}, std={df[col].std():.1f}, "
f"min={df[col].min():.1f}, max={df[col].max():.1f}")
Data Cleaning
def clean_voter_data(df):
"""
Clean and validate the ODA voter dataset.
Returns cleaned dataframe and a cleaning report.
"""
df_clean = df.copy()
report = {}
# 1. Standardize string fields
for col in ['gender', 'race_ethnicity', 'education', 'income_bracket',
'party_reg', 'urban_rural', 'county']:
df_clean[col] = df_clean[col].str.strip().str.title()
# 2. Validate numeric ranges
invalid_support = df_clean[
(df_clean['support_score'] < 0) | (df_clean['support_score'] > 100)
]
report['invalid_support_scores'] = len(invalid_support)
df_clean.loc[invalid_support.index, 'support_score'] = np.nan
invalid_persuade = df_clean[
(df_clean['persuadability_score'] < 0) |
(df_clean['persuadability_score'] > 100)
]
report['invalid_persuadability_scores'] = len(invalid_persuade)
df_clean.loc[invalid_persuade.index, 'persuadability_score'] = np.nan
# 3. Age validation (registered voters must be 18+)
invalid_age = df_clean[df_clean['age'] < 18]
report['invalid_ages'] = len(invalid_age)
df_clean = df_clean[df_clean['age'] >= 18].copy()
# 4. Validate vote history fields (should be 0 or 1)
for col in ['vote_history_2018', 'vote_history_2020', 'vote_history_2022']:
df_clean[col] = pd.to_numeric(df_clean[col], errors='coerce')
df_clean[col] = df_clean[col].where(df_clean[col].isin([0, 1]))
# 5. Create derived fields
# Voter frequency score: how many of the last 3 elections did they vote in?
df_clean['vote_frequency'] = (
df_clean['vote_history_2018'].fillna(0) +
df_clean['vote_history_2020'].fillna(0) +
df_clean['vote_history_2022'].fillna(0)
)
# Age cohort
df_clean['age_cohort'] = pd.cut(
df_clean['age'],
bins=[17, 29, 44, 59, 74, 120],
labels=['18-29', '30-44', '45-59', '60-74', '75+']
)
# Voter segment: classify by support and persuadability
def assign_segment(row):
s = row['support_score']
p = row['persuadability_score']
if pd.isna(s) or pd.isna(p):
return 'Unknown'
if s >= 60 and p >= 50:
return 'Soft Support - Persuadable'
elif s >= 60 and p < 50:
return 'Hard Support - Mobilize'
elif s < 40 and p < 50:
return 'Hard Opposition - Skip'
elif s < 40 and p >= 50:
return 'Soft Opposition - Persuadable'
else: # 40-60 support
return 'True Persuadable'
df_clean['voter_segment'] = df_clean.apply(assign_segment, axis=1)
report['records_after_cleaning'] = len(df_clean)
report['records_removed'] = len(df) - len(df_clean)
return df_clean, report
df_clean, cleaning_report = clean_voter_data(df)
print("=== CLEANING REPORT ===")
for key, value in cleaning_report.items():
print(f" {key}: {value}")
print(f"\nFinal dataset: {len(df_clean):,} voters across {df_clean['county'].nunique()} counties")
📊 Real-World Application: ODA's Data Integration Framework Adaeze Nwosu built ODA's voter data integration framework specifically because she observed that progressive campaigns in the state were each independently solving the same data cleaning problems, wasting collective staff time and producing inconsistent results. "We were watching five campaigns each spend three weeks cleaning the same voter file," Nwosu told Sam Harding in a 2024 interview. "That's 15 weeks of analytical capacity that could have been spent on actual analysis." ODA's open-source framework handles standardized encoding, voter ID crosswalks, and quality validation, and is made available to campaigns and advocacy organizations at cost. The dataset used in this chapter represents a representative sample built from ODA's production voter file integration.
33.3 Key Performance Indicators for Voter Contact
Before we can track progress, we need to define what we're tracking. Nadia and Yolanda agreed on a standard set of KPIs at the outset of the dashboard project.
# Simulate contact history data
# In production, this would come from the campaign's VAN database export.
# We generate realistic synthetic contact data joined to the voter file.
np.random.seed(42)
# Campaign parameters
CAMPAIGN_START = pd.Timestamp('2026-09-15')
TODAY = pd.Timestamp('2026-10-06') # 35 days before Election Day
ELECTION_DAY = pd.Timestamp('2026-11-10')
CONTACT_GOAL = 87_000
days_elapsed = (TODAY - CAMPAIGN_START).days + 1
days_remaining = (ELECTION_DAY - TODAY).days
# Simulate which voters have been contacted
# Campaigns typically prioritize moderate support scores and high persuadability
contact_prob = (
0.15 # base contact rate
+ 0.10 * (df_clean['persuadability_score'].fillna(50) / 100)
+ 0.05 * (df_clean['support_score'].fillna(50).between(30, 70).astype(int))
+ 0.10 * df_clean['vote_frequency'] / 3
)
contact_prob = contact_prob.clip(0.05, 0.60)
df_clean['contacted'] = np.random.binomial(1, contact_prob)
# Contact date (for those contacted)
# Generates a realistic distribution of contact dates
contact_dates = pd.date_range(CAMPAIGN_START, TODAY, freq='D')
weights = np.linspace(0.5, 1.5, len(contact_dates)) # ramping up over time
weights /= weights.sum()
df_clean['contact_date'] = np.where(
df_clean['contacted'] == 1,
np.random.choice(contact_dates, size=len(df_clean), p=weights),
pd.NaT
)
df_clean['contact_date'] = pd.to_datetime(df_clean['contact_date'])
# Contact method
df_clean['contact_method'] = np.where(
df_clean['contacted'] == 0,
'Not Contacted',
np.random.choice(
['Canvass', 'Phone', 'Text'],
size=len(df_clean),
p=[0.45, 0.35, 0.20]
)
)
# Contact outcome (for contacted voters)
def assign_outcome(row):
if row['contacted'] == 0:
return 'Not Attempted'
support = row['support_score'] if not pd.isna(row['support_score']) else 50
if support > 70:
return np.random.choice(
['Confirmed Support', 'Soft Support', 'Not Home'], p=[0.45, 0.30, 0.25]
)
elif support > 40:
return np.random.choice(
['Soft Support', 'Undecided', 'Soft Opposition', 'Not Home'],
p=[0.25, 0.30, 0.20, 0.25]
)
else:
return np.random.choice(
['Soft Opposition', 'Hard Opposition', 'Not Home'], p=[0.30, 0.40, 0.30]
)
df_clean['contact_outcome'] = df_clean.apply(assign_outcome, axis=1)
total_contacted = df_clean['contacted'].sum()
print(f"Total voters contacted: {total_contacted:,}")
print(f"Contact rate: {total_contacted/len(df_clean)*100:.1f}%")
print(f"\nContact method breakdown:")
print(df_clean[df_clean['contacted']==1]['contact_method'].value_counts())
print(f"\nContact outcome breakdown:")
print(df_clean[df_clean['contacted']==1]['contact_outcome'].value_counts())
Computing Core KPIs
def compute_campaign_kpis(df, contact_goal, today, election_day, campaign_start):
"""
Compute the core voter contact KPIs for the Garza campaign dashboard.
Returns a dictionary of KPI values and a daily trend dataframe.
"""
kpis = {}
# --- Overall Progress ---
kpis['total_contacted'] = int(df['contacted'].sum())
kpis['contact_goal'] = contact_goal
kpis['pct_of_goal'] = kpis['total_contacted'] / contact_goal * 100
kpis['remaining_to_goal'] = contact_goal - kpis['total_contacted']
# --- Pace Analysis ---
days_elapsed = (today - campaign_start).days + 1
days_remaining = (election_day - today).days
kpis['days_elapsed'] = days_elapsed
kpis['days_remaining'] = days_remaining
kpis['contacts_per_day_actual'] = kpis['total_contacted'] / days_elapsed
kpis['contacts_per_day_needed'] = kpis['remaining_to_goal'] / max(days_remaining, 1)
kpis['pace_ratio'] = kpis['contacts_per_day_actual'] / kpis['contacts_per_day_needed']
# Projected completion at current pace
if kpis['contacts_per_day_actual'] > 0:
days_to_goal = kpis['remaining_to_goal'] / kpis['contacts_per_day_actual']
kpis['projected_completion'] = today + pd.Timedelta(days=days_to_goal)
else:
kpis['projected_completion'] = None
# --- Quality Metrics ---
contacted = df[df['contacted'] == 1]
kpis['total_universe'] = len(df)
kpis['contact_rate'] = kpis['total_contacted'] / len(df) * 100
# Persuadability targeting efficiency:
# what % of contacts are in high-persuadability segments?
persuadable_segments = [
'True Persuadable', 'Soft Support - Persuadable', 'Soft Opposition - Persuadable'
]
kpis['pct_contacts_persuadable'] = (
contacted['voter_segment'].isin(persuadable_segments).sum() /
len(contacted) * 100
)
# Universe benchmark: what % of universe is persuadable?
kpis['pct_universe_persuadable'] = (
df['voter_segment'].isin(persuadable_segments).sum() /
len(df) * 100
)
kpis['persuadability_targeting_lift'] = (
kpis['pct_contacts_persuadable'] - kpis['pct_universe_persuadable']
)
# Support score average among contacted voters
kpis['avg_support_score_contacted'] = contacted['support_score'].mean()
kpis['avg_support_score_universe'] = df['support_score'].mean()
# --- Outcome rates (among attempted contacts) ---
attempted = contacted[contacted['contact_outcome'] != 'Not Attempted']
kpis['positive_outcomes'] = attempted['contact_outcome'].isin(
['Confirmed Support', 'Soft Support']
).sum()
kpis['conversion_rate'] = (
kpis['positive_outcomes'] / len(attempted) * 100 if len(attempted) > 0 else 0
)
# --- Daily Trend ---
daily_trend = (
df[df['contacted'] == 1]
.groupby('contact_date')
.size()
.reset_index(name='daily_contacts')
.sort_values('contact_date')
)
daily_trend['cumulative_contacts'] = daily_trend['daily_contacts'].cumsum()
daily_trend['pct_of_goal'] = daily_trend['cumulative_contacts'] / contact_goal * 100
return kpis, daily_trend
kpis, daily_trend = compute_campaign_kpis(
df_clean, CONTACT_GOAL, TODAY, ELECTION_DAY, CAMPAIGN_START
)
print("=== GARZA CAMPAIGN VOTER CONTACT KPIs ===")
print(f"As of: {TODAY.strftime('%B %d, %Y')}")
print(f"\n--- PROGRESS ---")
print(f"Total Contacts: {kpis['total_contacted']:>10,}")
print(f"Contact Goal: {kpis['contact_goal']:>10,}")
print(f"% of Goal: {kpis['pct_of_goal']:>9.1f}%")
print(f"Remaining to Goal: {kpis['remaining_to_goal']:>10,}")
print(f"\n--- PACE ---")
print(f"Days Elapsed: {kpis['days_elapsed']:>10}")
print(f"Days Remaining: {kpis['days_remaining']:>10}")
print(f"Actual Pace (c/day): {kpis['contacts_per_day_actual']:>10.0f}")
print(f"Needed Pace (c/day): {kpis['contacts_per_day_needed']:>10.0f}")
print(f"Pace Ratio: {kpis['pace_ratio']:>10.2f}x")
if kpis['projected_completion']:
print(f"Projected Completion: {kpis['projected_completion'].strftime('%B %d'):>10}")
print(f"\n--- QUALITY ---")
print(f"Contact Rate: {kpis['contact_rate']:>9.1f}%")
print(f"% Contacts Persuadable: {kpis['pct_contacts_persuadable']:>7.1f}%")
print(f"% Universe Persuadable: {kpis['pct_universe_persuadable']:>7.1f}%")
print(f"Targeting Lift: {kpis['persuadability_targeting_lift']:>+9.1f}pp")
print(f"Avg Support (contacted): {kpis['avg_support_score_contacted']:>7.1f}")
print(f"Conversion Rate: {kpis['conversion_rate']:>9.1f}%")
33.4 Geographic Breakdown: County-Level Analysis
The race's outcome will be determined at the county level. Nadia needed a geographic breakdown showing where contacts were exceeding and falling short of targets.
# County-level contact goals (allocated by vote universe size)
# In production, these come from field plan spreadsheets
county_goals = {
'Riverside': 28_000,
'San Bernardino': 22_000,
'Orange': 18_000,
'Los Angeles': 12_000,
'San Diego': 7_000
}
def county_level_analysis(df, county_goals):
"""Compute contact progress and quality by county."""
county_stats = []
for county in df['county'].unique():
county_df = df[df['county'] == county]
contacted_county = county_df[county_df['contacted'] == 1]
goal = county_goals.get(county, len(county_df) * 0.30) # default 30%
row = {
'county': county,
'universe': len(county_df),
'contacted': len(contacted_county),
'goal': goal,
'pct_of_goal': len(contacted_county) / goal * 100,
'contact_rate': len(contacted_county) / len(county_df) * 100,
'avg_support': county_df['support_score'].mean(),
'avg_persuadability': county_df['persuadability_score'].mean(),
'pct_persuadable_contacted': (
contacted_county['voter_segment'].isin([
'True Persuadable', 'Soft Support - Persuadable',
'Soft Opposition - Persuadable'
]).sum() / max(len(contacted_county), 1) * 100
),
'remaining_to_goal': max(0, goal - len(contacted_county))
}
county_stats.append(row)
county_df_out = pd.DataFrame(county_stats).sort_values('pct_of_goal', ascending=False)
return county_df_out
county_stats = county_level_analysis(df_clean, county_goals)
print("=== COUNTY-LEVEL CONTACT PROGRESS ===")
display_cols = ['county', 'contacted', 'goal', 'pct_of_goal',
'contact_rate', 'avg_support', 'remaining_to_goal']
print(county_stats[display_cols].to_string(
index=False,
float_format=lambda x: f'{x:.1f}'
))
Visualizing County Progress
def plot_county_progress(county_stats):
"""
Horizontal bar chart showing each county's progress toward contact goal.
Color-coded: green (on/ahead), yellow (within 10% of pace), red (behind).
"""
county_stats_sorted = county_stats.sort_values('pct_of_goal')
colors = []
for pct in county_stats_sorted['pct_of_goal']:
if pct >= 90:
colors.append('#2ecc71') # green: on track
elif pct >= 70:
colors.append('#f39c12') # yellow: caution
else:
colors.append('#e74c3c') # red: behind
fig, ax = plt.subplots(figsize=(12, 7))
bars = ax.barh(
county_stats_sorted['county'],
county_stats_sorted['pct_of_goal'],
color=colors,
edgecolor='white',
linewidth=0.5,
height=0.6
)
# Goal line
ax.axvline(x=100, color='black', linestyle='--', linewidth=1.5,
label='100% of Goal', alpha=0.7)
# Add value labels
for i, (bar, val) in enumerate(zip(bars, county_stats_sorted['pct_of_goal'])):
ax.text(
min(val + 1, 105), bar.get_y() + bar.get_height() / 2,
f'{val:.0f}%',
va='center', ha='left', fontsize=10, fontweight='bold'
)
# Add contact numbers as secondary labels
for i, (bar, row) in enumerate(zip(bars, county_stats_sorted.itertuples())):
ax.text(
2, bar.get_y() + bar.get_height() / 2,
f'{int(row.contacted):,} / {int(row.goal):,}',
va='center', ha='left', fontsize=9, color='white', fontweight='bold'
)
ax.set_xlim(0, 115)
ax.set_xlabel('% of Contact Goal', fontsize=12)
ax.set_title(
f'Garza Campaign: Voter Contact Progress by County\n'
f'As of {TODAY.strftime("%B %d, %Y")} | {kpis["days_remaining"]} days remaining',
fontsize=13, fontweight='bold'
)
ax.legend(loc='lower right')
ax.xaxis.set_major_formatter(mticker.PercentFormatter())
plt.tight_layout()
plt.savefig('county_progress.png', dpi=150, bbox_inches='tight')
plt.show()
print("Chart saved: county_progress.png")
plot_county_progress(county_stats)
33.5 Cumulative Progress and Pace Visualization
The most important chart for Yolanda Torres — "are we on track?" — is the cumulative contacts over time compared to the pacing curve needed to hit the goal.
def plot_contact_pace(daily_trend, contact_goal, campaign_start, today, election_day):
"""
Dual-panel plot:
- Panel 1: Cumulative contacts (actual) vs. goal pacing line
- Panel 2: Daily contacts with 7-day rolling average
"""
days_total = (election_day - campaign_start).days
# Generate goal pacing line
all_dates = pd.date_range(campaign_start, election_day, freq='D')
goal_pacing = pd.DataFrame({
'date': all_dates,
'goal_cumulative': np.linspace(0, contact_goal, len(all_dates))
})
# Project forward from today at current pace
actual_end = daily_trend[daily_trend['contact_date'] <= today]['cumulative_contacts'].iloc[-1]
current_pace = kpis['contacts_per_day_actual']
future_dates = pd.date_range(today + pd.Timedelta(days=1), election_day, freq='D')
projected_contacts = actual_end + np.arange(1, len(future_dates) + 1) * current_pace
projection_df = pd.DataFrame({
'date': future_dates,
'projected': projected_contacts
})
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(13, 10))
# --- Panel 1: Cumulative Progress ---
# Goal pacing line
ax1.plot(
goal_pacing['date'], goal_pacing['goal_cumulative'],
color='gray', linestyle='--', linewidth=2, label='Goal Pace', alpha=0.8
)
# Actual contacts
ax1.plot(
daily_trend['contact_date'], daily_trend['cumulative_contacts'],
color='#2c7bb6', linewidth=3, label='Actual Contacts', marker='o',
markersize=4
)
# Projection
proj_x = [today] + list(projection_df['date'])
proj_y = [actual_end] + list(projection_df['projected'])
ax1.plot(
proj_x, proj_y,
color='#2c7bb6', linewidth=2, linestyle=':', alpha=0.6,
label=f'Projected at Current Pace'
)
# Goal line
ax1.axhline(y=contact_goal, color='black', linewidth=1.5, alpha=0.5)
ax1.text(campaign_start, contact_goal * 1.01, f'Goal: {contact_goal:,}',
fontsize=10, color='black')
# Today marker
ax1.axvline(x=today, color='red', linewidth=1.5, linestyle='-', alpha=0.5,
label='Today')
ax1.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))
ax1.set_title('Cumulative Voter Contacts vs. Goal Pacing', fontsize=13, fontweight='bold')
ax1.set_ylabel('Total Contacts')
ax1.legend(loc='upper left', fontsize=10)
ax1.grid(True, alpha=0.3)
# --- Panel 2: Daily Contacts ---
ax2.bar(
daily_trend['contact_date'],
daily_trend['daily_contacts'],
color='#7fcdbb', alpha=0.7, label='Daily Contacts', width=0.8
)
# 7-day rolling average
daily_trend['rolling_7d'] = daily_trend['daily_contacts'].rolling(7, min_periods=1).mean()
ax2.plot(
daily_trend['contact_date'], daily_trend['rolling_7d'],
color='#225ea8', linewidth=2.5, label='7-Day Average'
)
# Needed daily pace line
ax2.axhline(
y=kpis['contacts_per_day_needed'],
color='red', linewidth=2, linestyle='--', alpha=0.8,
label=f'Needed Pace: {kpis["contacts_per_day_needed"]:,.0f}/day'
)
ax2.axvline(x=today, color='red', linewidth=1.5, alpha=0.5)
ax2.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))
ax2.set_title('Daily Voter Contacts', fontsize=13, fontweight='bold')
ax2.set_ylabel('Contacts per Day')
ax2.legend(loc='upper left', fontsize=10)
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('contact_pace.png', dpi=150, bbox_inches='tight')
plt.show()
print("Chart saved: contact_pace.png")
plot_contact_pace(daily_trend, CONTACT_GOAL, CAMPAIGN_START, TODAY, ELECTION_DAY)
⚠️ Common Pitfall: The Cumulative Chart Illusion Cumulative contact charts can be visually misleading. A line that curves upward looks like progress even if the rate of contacts is declining. Always pair the cumulative chart with the daily contacts chart — the daily view reveals whether the campaign is accelerating, decelerating, or plateauing. Yolanda Torres initially wanted only the cumulative chart ("it looks more impressive"), but Nadia insisted on the daily view: "The cumulative chart tells you where you are. The daily chart tells you where you're going."
33.6 Support Score Distributions by Demographic Segment
Understanding who is being contacted — and whether the support score profile of contacted voters matches what the campaign needs — is the core quality analysis question.
def plot_support_distributions(df, contacted_col='contacted'):
"""
Compare support score distributions for contacted vs. not-yet-contacted voters,
broken down by voter segment and key demographics.
"""
contacted = df[df[contacted_col] == 1]
not_contacted = df[df[contacted_col] == 0]
fig = plt.figure(figsize=(16, 12))
# --- Panel 1: Overall Support Score Distribution ---
ax1 = fig.add_subplot(2, 3, 1)
ax1.hist(
not_contacted['support_score'].dropna(),
bins=30, alpha=0.6, color='#e74c3c', label='Not Contacted',
density=True
)
ax1.hist(
contacted['support_score'].dropna(),
bins=30, alpha=0.6, color='#2ecc71', label='Contacted',
density=True
)
ax1.axvline(x=50, color='black', linestyle='--', alpha=0.5, label='Score = 50')
ax1.set_title('Support Score Distribution\n(All Voters)', fontsize=11)
ax1.set_xlabel('Support Score (0-100)')
ax1.set_ylabel('Density')
ax1.legend(fontsize=9)
# Statistical test: are these distributions different?
ks_stat, ks_p = stats.ks_2samp(
contacted['support_score'].dropna(),
not_contacted['support_score'].dropna()
)
ax1.text(0.05, 0.90, f'KS test p={ks_p:.4f}', transform=ax1.transAxes,
fontsize=8, color='navy')
# --- Panel 2: Support Distribution by Party Registration ---
ax2 = fig.add_subplot(2, 3, 2)
parties_to_show = ['D', 'R', 'Other', 'Unaffiliated']
party_colors = {'D': '#2c7bb6', 'R': '#d7191c',
'Other': '#756bb1', 'Unaffiliated': '#636363'}
for party in parties_to_show:
party_data = contacted[contacted['party_reg'] == party]['support_score'].dropna()
if len(party_data) > 10:
ax2.hist(
party_data, bins=20, alpha=0.55,
color=party_colors.get(party, 'gray'),
label=f'{party} (n={len(party_data):,})',
density=True
)
ax2.set_title('Support Score by Party Reg\n(Contacted Voters)', fontsize=11)
ax2.set_xlabel('Support Score')
ax2.legend(fontsize=8)
# --- Panel 3: Support by Urban/Rural ---
ax3 = fig.add_subplot(2, 3, 3)
ur_colors = {'Urban': '#2c7bb6', 'Suburban': '#1a9641', 'Rural': '#d7191c'}
for ur_type in ['Urban', 'Suburban', 'Rural']:
subset = df[df['urban_rural'] == ur_type]['support_score'].dropna()
if len(subset) > 10:
ax3.hist(
subset, bins=25, alpha=0.55,
color=ur_colors.get(ur_type, 'gray'),
label=f'{ur_type} (n={len(subset):,})',
density=True
)
ax3.set_title('Support Score by Urban/Rural\n(Full Universe)', fontsize=11)
ax3.set_xlabel('Support Score')
ax3.legend(fontsize=9)
# --- Panel 4: Voter Segment Breakdown (Contacted) ---
ax4 = fig.add_subplot(2, 3, 4)
segment_counts = contacted['voter_segment'].value_counts()
seg_colors = {
'Hard Support - Mobilize': '#1a9641',
'Soft Support - Persuadable': '#78c679',
'True Persuadable': '#fee08b',
'Soft Opposition - Persuadable': '#fdae61',
'Hard Opposition - Skip': '#d7191c',
'Unknown': '#999999'
}
colors_list = [seg_colors.get(s, '#999999') for s in segment_counts.index]
ax4.barh(
range(len(segment_counts)),
segment_counts.values,
color=colors_list
)
ax4.set_yticks(range(len(segment_counts)))
ax4.set_yticklabels(
[s.replace(' - ', '\n') for s in segment_counts.index],
fontsize=8
)
ax4.set_title('Voter Segments Among\nContacted Voters', fontsize=11)
ax4.set_xlabel('Count')
ax4.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))
# --- Panel 5: Support Score by Age Cohort ---
ax5 = fig.add_subplot(2, 3, 5)
age_support = df.groupby('age_cohort', observed=True)['support_score'].agg(
['mean', 'std', 'count']
).reset_index()
bars = ax5.bar(
age_support['age_cohort'].astype(str),
age_support['mean'],
yerr=age_support['std'] / np.sqrt(age_support['count']),
color='#2c7bb6', alpha=0.8, capsize=4
)
ax5.axhline(y=50, color='black', linestyle='--', alpha=0.5, label='Score = 50')
ax5.set_ylim(30, 70)
ax5.set_title('Average Support Score\nby Age Cohort', fontsize=11)
ax5.set_xlabel('Age Group')
ax5.set_ylabel('Mean Support Score')
for bar, (_, row) in zip(bars, age_support.iterrows()):
ax5.text(
bar.get_x() + bar.get_width() / 2, bar.get_height() + 1.5,
f'n={int(row["count"]):,}',
ha='center', va='bottom', fontsize=7
)
# --- Panel 6: Persuadability Score Distribution ---
ax6 = fig.add_subplot(2, 3, 6)
# Scatter plot: support vs persuadability for contacted voters
sample = contacted.sample(min(2000, len(contacted)), random_state=42)
scatter = ax6.scatter(
sample['support_score'],
sample['persuadability_score'],
c=sample['support_score'],
cmap='RdYlGn',
alpha=0.4,
s=15,
vmin=0, vmax=100
)
ax6.axvline(x=40, color='gray', linestyle='--', alpha=0.4)
ax6.axvline(x=60, color='gray', linestyle='--', alpha=0.4)
ax6.axhline(y=50, color='gray', linestyle='--', alpha=0.4)
ax6.set_xlabel('Support Score')
ax6.set_ylabel('Persuadability Score')
ax6.set_title('Support vs. Persuadability\n(Sample: Contacted Voters)', fontsize=11)
plt.colorbar(scatter, ax=ax6, label='Support Score', fraction=0.046)
plt.suptitle(
'Garza Campaign: Voter Contact Quality Analysis',
fontsize=14, fontweight='bold', y=1.01
)
plt.tight_layout()
plt.savefig('support_distributions.png', dpi=150, bbox_inches='tight')
plt.show()
print("Chart saved: support_distributions.png")
plot_support_distributions(df_clean)
33.7 Building the Prioritization Tool
The most operationally valuable part of the dashboard is the prioritization tool: given the voters who haven't yet been contacted, who should the campaign contact next?
The prioritization logic is where theory meets practice. Campaigns typically want to prioritize: - Voters with high persuadability (most likely to be moved) - Voters with support scores in the 40-65 range (genuinely persuadable — not already committed, not hard opposition) - Voters with moderate-to-high vote history (likely to actually vote) - Voters in geographies that are behind on their contact goals
def build_prioritization_tool(df, county_goals, county_stats):
"""
Build a ranked priority contact list for the campaign's field team.
Priority score formula:
- Base persuadability (50% weight)
- Support score modifier: highest for scores 45-65 (30% weight)
- Vote likelihood: based on vote frequency (15% weight)
- County priority: boost for counties behind on contact goals (5% weight)
Returns a ranked dataframe of uncontacted voters.
"""
uncontacted = df[df['contacted'] == 0].copy()
# --- Component 1: Persuadability (50% weight) ---
uncontacted['persuadability_component'] = (
uncontacted['persuadability_score'].fillna(50) / 100 * 50
)
# --- Component 2: Support Score Targeting (30% weight) ---
# Maximum value when support score is 55 (true swing voter)
# Decays toward 0 as score moves toward 0 or 100
def support_targeting_score(s):
if pd.isna(s):
return 15 # neutral default
# Gaussian centered at 55, std=15
return 30 * np.exp(-((s - 55) ** 2) / (2 * 15 ** 2))
uncontacted['support_component'] = uncontacted['support_score'].apply(
support_targeting_score
)
# --- Component 3: Vote Likelihood (15% weight) ---
uncontacted['vote_component'] = uncontacted['vote_frequency'] / 3 * 15
# --- Component 4: County Priority Boost (5% weight) ---
# Counties behind on goal get a boost; counties ahead get reduced priority
county_priority = {}
for _, row in county_stats.iterrows():
pct_of_goal = row['pct_of_goal'] / 100
# Boost if < 80% of goal; reduce if > 110%
priority = min(1.0, max(0, (1.0 - pct_of_goal + 0.2)))
county_priority[row['county']] = priority
uncontacted['county_priority'] = uncontacted['county'].map(county_priority).fillna(0.5)
uncontacted['county_component'] = uncontacted['county_priority'] * 5
# --- Overall Priority Score ---
uncontacted['priority_score'] = (
uncontacted['persuadability_component'] +
uncontacted['support_component'] +
uncontacted['vote_component'] +
uncontacted['county_component']
)
# Rank within county (field teams often work county-by-county)
uncontacted['county_rank'] = uncontacted.groupby('county')['priority_score'].rank(
ascending=False, method='first'
).astype(int)
# Overall rank
uncontacted['overall_rank'] = uncontacted['priority_score'].rank(
ascending=False, method='first'
).astype(int)
# Select relevant columns for output
output_cols = [
'voter_id', 'county', 'age', 'gender', 'race_ethnicity',
'party_reg', 'urban_rural', 'vote_frequency',
'support_score', 'persuadability_score', 'voter_segment',
'priority_score', 'county_rank', 'overall_rank'
]
priority_list = uncontacted[output_cols].sort_values('overall_rank')
return priority_list
priority_list = build_prioritization_tool(df_clean, county_goals, county_stats)
print("=== VOTER CONTACT PRIORITY LIST ===")
print(f"Total uncontacted voters: {len(priority_list):,}")
print(f"\nTop 15 priority contacts:")
print(
priority_list.head(15)[
['voter_id', 'county', 'age', 'party_reg', 'support_score',
'persuadability_score', 'voter_segment', 'priority_score', 'overall_rank']
].to_string(index=False, float_format=lambda x: f'{x:.1f}')
)
print(f"\nTop priority contacts by county:")
for county in priority_list['county'].unique():
county_top = priority_list[priority_list['county'] == county].head(3)
print(f"\n {county} (top 3):")
for _, row in county_top.iterrows():
print(f" #{row['county_rank']:3d}: {row['voter_id'][:8]}... | "
f"Support: {row['support_score']:.0f} | "
f"Persuade: {row['persuadability_score']:.0f} | "
f"Segment: {row['voter_segment']}")
Visualizing the Priority Score Distribution
def plot_priority_analysis(priority_list, county_stats):
"""
Two-panel plot:
- Priority score distribution with segment breakdown
- Top uncontacted voters by county (count and priority profile)
"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7))
# --- Panel 1: Priority Score by Voter Segment ---
seg_order = [
'True Persuadable',
'Soft Support - Persuadable',
'Soft Opposition - Persuadable',
'Hard Support - Mobilize',
'Hard Opposition - Skip',
'Unknown'
]
seg_colors_list = [
'#fee08b', '#78c679', '#fdae61',
'#1a9641', '#d7191c', '#999999'
]
for seg, color in zip(seg_order, seg_colors_list):
seg_data = priority_list[priority_list['voter_segment'] == seg]['priority_score']
if len(seg_data) > 10:
ax1.hist(
seg_data, bins=25, alpha=0.65, color=color,
label=f'{seg}\n(n={len(seg_data):,})',
density=True
)
ax1.set_title('Priority Score Distribution\nby Voter Segment (Uncontacted Voters)',
fontsize=12, fontweight='bold')
ax1.set_xlabel('Priority Score (0-100)')
ax1.set_ylabel('Density')
ax1.legend(fontsize=7, loc='upper left')
# --- Panel 2: Remaining Work by County ---
county_remaining = county_stats.sort_values('remaining_to_goal', ascending=False)
bar_colors = []
for pct in county_remaining['pct_of_goal']:
if pct >= 90:
bar_colors.append('#2ecc71')
elif pct >= 70:
bar_colors.append('#f39c12')
else:
bar_colors.append('#e74c3c')
bars = ax2.barh(
county_remaining['county'],
county_remaining['remaining_to_goal'],
color=bar_colors
)
for bar, (_, row) in zip(bars, county_remaining.iterrows()):
ax2.text(
bar.get_width() + 50,
bar.get_y() + bar.get_height() / 2,
f'{int(row["remaining_to_goal"]):,} voters\n({row["pct_of_goal"]:.0f}% done)',
va='center', fontsize=9
)
ax2.set_title('Remaining Contacts to Goal by County', fontsize=12, fontweight='bold')
ax2.set_xlabel('Contacts Still Needed')
ax2.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))
plt.tight_layout()
plt.savefig('priority_analysis.png', dpi=150, bbox_inches='tight')
plt.show()
print("Chart saved: priority_analysis.png")
plot_priority_analysis(priority_list, county_stats)
33.8 Interactive Dashboard with Plotly
The static matplotlib charts work well for PDF reports and email attachments. For the interactive campaign dashboard, Nadia used Plotly to build charts that field directors could filter and explore.
def build_interactive_dashboard(df, daily_trend, contact_goal, county_stats, kpis, today):
"""
Build an interactive Plotly dashboard with:
1. Overall KPI summary cards
2. Interactive cumulative progress chart
3. County breakdown (interactive bar chart)
4. Support score scatter (filterable by segment)
"""
# --- Subplot layout ---
fig = make_subplots(
rows=2, cols=2,
subplot_titles=(
'Cumulative Contact Progress vs. Goal',
'County Contact Progress (% of Goal)',
'Support vs. Persuadability by Segment',
'Daily Contacts with 7-Day Average'
),
specs=[
[{'type': 'scatter'}, {'type': 'bar'}],
[{'type': 'scatter'}, {'type': 'bar'}]
],
vertical_spacing=0.15,
horizontal_spacing=0.10
)
# --- Panel 1: Cumulative Progress ---
# Goal pacing
all_dates = pd.date_range(
daily_trend['contact_date'].min(),
today + pd.Timedelta(days=kpis['days_remaining']),
freq='D'
)
goal_values = np.linspace(0, contact_goal, len(all_dates))
fig.add_trace(
go.Scatter(
x=all_dates, y=goal_values,
mode='lines',
name='Goal Pace',
line=dict(color='gray', dash='dash', width=2),
showlegend=True
),
row=1, col=1
)
fig.add_trace(
go.Scatter(
x=daily_trend['contact_date'],
y=daily_trend['cumulative_contacts'],
mode='lines+markers',
name='Actual Contacts',
line=dict(color='#2c7bb6', width=3),
marker=dict(size=5),
hovertemplate='<b>%{x|%b %d}</b><br>Cumulative: %{y:,}<extra></extra>'
),
row=1, col=1
)
fig.add_hline(y=contact_goal, line_dash='dot', line_color='black',
annotation_text=f'Goal: {contact_goal:,}', row=1, col=1)
# --- Panel 2: County Progress ---
county_colors = [
'#2ecc71' if pct >= 90 else '#f39c12' if pct >= 70 else '#e74c3c'
for pct in county_stats['pct_of_goal']
]
fig.add_trace(
go.Bar(
x=county_stats['pct_of_goal'],
y=county_stats['county'],
orientation='h',
marker_color=county_colors,
text=[f'{p:.0f}%' for p in county_stats['pct_of_goal']],
textposition='inside',
name='County Progress',
hovertemplate='<b>%{y}</b><br>Progress: %{x:.1f}%<extra></extra>',
showlegend=False
),
row=1, col=2
)
fig.add_vline(x=100, line_dash='dash', line_color='black', row=1, col=2)
# --- Panel 3: Support vs. Persuadability Scatter ---
sample = df[df['contacted'] == 1].sample(
min(3000, df['contacted'].sum()), random_state=42
)
segment_colors_px = {
'True Persuadable': '#fee08b',
'Soft Support - Persuadable': '#78c679',
'Soft Opposition - Persuadable': '#fdae61',
'Hard Support - Mobilize': '#1a9641',
'Hard Opposition - Skip': '#d7191c',
'Unknown': '#999999'
}
for segment, color in segment_colors_px.items():
seg_data = sample[sample['voter_segment'] == segment]
if len(seg_data) > 0:
fig.add_trace(
go.Scatter(
x=seg_data['support_score'],
y=seg_data['persuadability_score'],
mode='markers',
name=segment,
marker=dict(color=color, size=4, opacity=0.6),
hovertemplate=(
'<b>%{text}</b><br>'
'Support: %{x:.0f}<br>'
'Persuadability: %{y:.0f}<extra></extra>'
),
text=seg_data['voter_id'].str[:8]
),
row=2, col=1
)
# --- Panel 4: Daily Contacts ---
fig.add_trace(
go.Bar(
x=daily_trend['contact_date'],
y=daily_trend['daily_contacts'],
name='Daily Contacts',
marker_color='#7fcdbb',
opacity=0.7,
hovertemplate='<b>%{x|%b %d}</b><br>Daily: %{y:,}<extra></extra>',
showlegend=False
),
row=2, col=2
)
daily_trend['rolling_7d'] = daily_trend['daily_contacts'].rolling(7, min_periods=1).mean()
fig.add_trace(
go.Scatter(
x=daily_trend['contact_date'],
y=daily_trend['rolling_7d'],
mode='lines',
name='7-Day Average',
line=dict(color='#225ea8', width=2.5),
showlegend=False
),
row=2, col=2
)
fig.add_hline(
y=kpis['contacts_per_day_needed'],
line_dash='dash', line_color='red',
annotation_text=f'Needed: {kpis["contacts_per_day_needed"]:.0f}/day',
row=2, col=2
)
# --- Layout ---
fig.update_layout(
title=dict(
text=f'<b>Garza Campaign — Voter Contact Dashboard</b><br>'
f'<sub>As of {today.strftime("%B %d, %Y")} | '
f'{kpis["days_remaining"]} days remaining | '
f'{kpis["total_contacted"]:,} of {contact_goal:,} contacts ({kpis["pct_of_goal"]:.1f}%)</sub>',
x=0.5, xanchor='center'
),
height=800,
width=1400,
showlegend=True,
legend=dict(x=0.60, y=0.45),
template='plotly_white',
font=dict(family='Arial', size=11)
)
fig.update_xaxes(title_text='Date', row=1, col=1)
fig.update_yaxes(title_text='Cumulative Contacts', row=1, col=1,
tickformat=',')
fig.update_xaxes(title_text='% of Goal', row=1, col=2)
fig.update_xaxes(title_text='Support Score (0-100)', row=2, col=1)
fig.update_yaxes(title_text='Persuadability Score', row=2, col=1)
fig.update_xaxes(title_text='Date', row=2, col=2)
fig.update_yaxes(title_text='Contacts per Day', row=2, col=2, tickformat=',')
# Save
fig.write_html('garza_dashboard.html')
fig.write_image('garza_dashboard.png', scale=2)
print("Interactive dashboard saved: garza_dashboard.html")
print("Static image saved: garza_dashboard.png")
return fig
dashboard_fig = build_interactive_dashboard(
df_clean, daily_trend, CONTACT_GOAL, county_stats, kpis, TODAY
)
# In Jupyter: dashboard_fig.show()
33.9 Nadia Presents the Dashboard
The morning after the dashboard went live, Nadia sat across from Yolanda Torres again. This time, instead of a three-row spreadsheet, Yolanda had the interactive dashboard open on her laptop.
Nadia walked through what the numbers meant.
"The pace ratio is 0.87," Nadia said. "That means we're contacting at 87% of the rate we need to hit 87,000 contacts. If we don't change something, we'll end Election Day at about 76,000. That's 11,000 short."
Yolanda looked at the county breakdown chart. "Riverside is at 69% of goal. That's our most competitive county."
"Right. And look at the persuadability targeting lift." Nadia pointed to the KPI summary. "We're contacting persuadable voters at a rate about 8 percentage points higher than their share of the universe. That's good — it means canvassers are actually using the contact priorities and not just defaulting to every door. But in Riverside, that lift is only 2 points. Canvassers there are less systematically targeting the priority voters."
Yolanda was quiet for a moment. "What would closing the gap in Riverside require?"
Nadia pulled up the priority list, filtered to Riverside, and sorted by county rank. "The top 4,000 uncontacted priority voters in Riverside have support scores between 45 and 65 and persuadability scores above 55. If we add one canvassing day focused exclusively on these 4,000 voters — splitting them between a weekend door shift and a weekday phone bank — we close most of the targeting gap."
This is what a campaign analytics dashboard is supposed to do. Not just report what happened, but make clear what needs to happen next.
🔴 Critical Thinking: What the Dashboard Can't Tell You The voter contact dashboard is a powerful tool, but it operates on modeled data. Support scores and persuadability scores are predictions, not measurements. When the dashboard tells Nadia that voter 00847523 has a support score of 52 and a persuadability score of 74, it is telling her what a model — trained on prior elections, demographic data, and survey results — predicts about this voter's likelihood of supporting Garza and being moved by contact. That model is not perfect. It may be systematically wrong about certain demographic segments. It reflects the data that went into building it, which may underrepresent certain communities. A canvasser dispatched to contact voter 00847523 based on her priority score may encounter someone whose actual views are completely different from what the model predicted.
This gap — between the map the dashboard presents and the territory of real voters — is the central limitation of algorithmic prioritization in voter contact. The dashboard is a tool for allocating scarce contact resources more effectively; it is not a substitute for the qualitative intelligence that experienced canvassers bring back from doors.
33.10 Adaeze Nwosu and ODA's Broader Framework
After the dashboard presentation, Nadia stayed in touch with Adaeze Nwosu at ODA. Their conversation about the technical implementation had turned into a broader conversation about what voter contact analytics could and should do.
"The technical capacity isn't the limiting factor anymore," Adaeze told Nadia. "Pretty much any campaign with a half-decent data person can build something like what you built. The limiting factor is what you do with it — whether the field director actually uses the prioritization list, whether the canvasser trusts the app over her own judgment, whether the daily KPI call is actually about the numbers or just a ritual."
Sam Harding, who was documenting the ODA framework for a data journalism project, had been in the room. "The campaigns that use data best," Sam said, "use it to structure what they were going to do anyway — give field directors a quantitative reason to make the resource allocation decision they already had intuitions about. The campaigns that use it worst treat the model output as a script."
Adaeze nodded. "Measurement shapes reality. When you start measuring contacts per day, canvassers start optimizing for contacts per day. Sometimes at the expense of contact quality. We had a campaign in 2022 that was hitting every door on their list — including clearly vacant houses and unregistered addresses — because the KPI was contacts, and they were going to hit their contact number."
This is the "Measurement Shapes Reality" theme at the operational level. The choice of what to count — doors knocked, voters contacted, pledge cards returned — shapes the behavior of the people doing the counting.
ODA's Open-Source Tools
ODA publishes its voter contact analytics framework as open-source software, available to any campaign or advocacy organization. The tools include:
- Data integration utilities: Standardized voter file import, VAN export parsing, and record linkage
- KPI computation library: The functions developed in this chapter, parameterized for different campaign contexts
- Dashboard templates: Pre-built matplotlib and plotly visualizations with documentation
- Prioritization models: Multiple prioritization approaches with documentation of their assumptions and limitations
"We're not neutral about who wins," Adaeze said. "ODA is a progressive organization. But we think the analytical methods should be available to any organization doing legitimate civic work. The voter contact prioritization tool doesn't have partisan politics baked into the math — it maximizes contact efficiency. How you use it is a political choice."
✅ Best Practice: Document the Model's Assumptions Every voter contact prioritization model makes assumptions. The model in this chapter assumes that support score and persuadability score are well-calibrated (that a score of 52 is actually more persuadable than a score of 48). It assumes that vote frequency is a reasonable proxy for vote likelihood. It assumes that county-level goal allocation is appropriate and that the same priority logic works across different geographic and demographic contexts. Document these assumptions and revisit them as the campaign accumulates contact data. A model built in August based on prior election data should be updated in October based on what the canvassers are actually finding at the doors.
33.11 Generating the Daily Report
The final piece of the operational dashboard is a text-based daily summary that Nadia sends to Yolanda every morning at 7 AM.
def generate_daily_report(kpis, county_stats, today, election_day):
"""
Generate a text-based daily briefing for campaign leadership.
Format: brief, action-oriented, designed to be read in 90 seconds.
"""
report_lines = [
"=" * 60,
f"GARZA CAMPAIGN — VOTER CONTACT DAILY REPORT",
f"Morning of {today.strftime('%A, %B %d, %Y')}",
f"Days to Election: {(election_day - today).days}",
"=" * 60,
"",
"HEADLINE NUMBERS",
f" Total Contacts: {kpis['total_contacted']:>10,}",
f" Goal: {kpis['contact_goal']:>10,}",
f" % of Goal: {kpis['pct_of_goal']:>9.1f}%",
f" Remaining: {kpis['remaining_to_goal']:>10,}",
"",
"PACE",
f" Actual pace: {kpis['contacts_per_day_actual']:>8.0f} contacts/day",
f" Needed pace: {kpis['contacts_per_day_needed']:>8.0f} contacts/day",
f" Pace ratio: {kpis['pace_ratio']:>9.2f}x needed",
]
if kpis['projected_completion']:
proj_str = kpis['projected_completion'].strftime('%B %d')
days_before = (ELECTION_DAY - kpis['projected_completion']).days
if days_before > 0:
report_lines.append(
f" At current pace: Goal reached {proj_str} "
f"({days_before} days before election) ✓"
)
else:
report_lines.append(
f" At current pace: Goal NOT reached before election ⚠️"
)
report_lines.extend([
"",
"COUNTY STATUS",
])
for _, row in county_stats.sort_values('pct_of_goal').iterrows():
status = "✓" if row['pct_of_goal'] >= 90 else "⚠" if row['pct_of_goal'] >= 70 else "✗"
report_lines.append(
f" {status} {row['county']:<16} "
f"{row['pct_of_goal']:>5.1f}% of goal "
f"({int(row['remaining_to_goal']):,} remaining)"
)
report_lines.extend([
"",
"QUALITY METRICS",
f" Persuadable targeting lift: {kpis['persuadability_targeting_lift']:>+.1f}pp "
f"({'good' if kpis['persuadability_targeting_lift'] > 5 else 'needs attention'})",
f" Conversion rate: {kpis['conversion_rate']:>5.1f}% positive outcomes",
f" Avg support (contacted): {kpis['avg_support_score_contacted']:>5.1f}",
"",
"KEY ACTIONS RECOMMENDED",
])
# Automated recommendations
if kpis['pace_ratio'] < 0.9:
report_lines.append(" 1. URGENT: Increase canvassing and phone bank capacity")
for _, row in county_stats.iterrows():
if row['pct_of_goal'] < 75:
report_lines.append(
f" 2. Priority: Redirect resources to {row['county']} "
f"({row['pct_of_goal']:.0f}% of goal)"
)
if kpis['persuadability_targeting_lift'] < 3:
report_lines.append(
" 3. Training: Canvassers not using priority lists — reinforce priority targeting"
)
report_lines.extend([
"",
"=" * 60,
"Dashboard: garza_dashboard.html | Questions: nadia@garzaforsenate.com",
"=" * 60
])
report = '\n'.join(report_lines)
print(report)
# In production: save to file or send via email API
with open('daily_report.txt', 'w') as f:
f.write(report)
return report
daily_report = generate_daily_report(kpis, county_stats, TODAY, ELECTION_DAY)
33.12 Ethical Dimensions: What the Dashboard Does to Democracy
Before closing this chapter, Nadia had a conversation with Sam Harding that is worth reporting in full.
"What worries you about this?" Sam asked. It was a journalist's question, designed to elicit the honest answer rather than the PR answer.
Nadia thought for a moment. "The prioritization logic assumes that persuadable voters are more worth contacting than committed supporters who haven't voted yet. But 'persuadable' in the model means likely to change their vote. What about the committed Garza supporter who needs a ride to the polls? She doesn't score high on persuadability, so she drops to the bottom of the priority list. But she's as important as the swing voter — maybe more, because her support is certain."
Sam nodded. "And who tends to score lower on persuadability?"
"Voters with strong partisan registration. In this dataset, that means both committed Democrats and committed Republicans." She paused. "But the committed Democrats with low turnout history — who in this state disproportionately include Black and Latino voters in low-income urban precincts — those are voters the priority model is deprioritizing in favor of white suburban swing voters. The model is making a political choice. It just dresses it up as math."
This exchange illustrates the "Data in Democracy" theme at its deepest level. The voter contact prioritization dashboard is not a neutral tool. It embeds choices — about whose vote matters, about what it means to be persuadable, about how to allocate scarce campaign resources — that have implications for whose voices get heard and whose get screened out. Those choices reflect, in part, whose data is well-modeled and whose is not, which communities have been surveyed and which haven't, which prior elections are used as training data and which are omitted.
None of this means the dashboard is wrong to build or wrong to use. It means the people who build and use it have an obligation to understand what assumptions it makes and whose interests those assumptions serve.
⚖️ Ethical Analysis: Algorithmic Prioritization and Voter Equity The voter contact prioritization model described in this chapter — and used by virtually every sophisticated campaign — makes mathematically defensible choices that have politically significant implications. Prioritizing persuadable voters in swing geographies concentrates campaign contact activity on a demographic profile that is often whiter, more suburban, and more affluent than the campaign's base. Base mobilization — contacting committed supporters who might not vote without encouragement — is typically left to later in the campaign, after persuasion resources have been deployed. Research by political scientists has consistently shown that mobilization can be as important as persuasion for electoral outcomes, yet mobilization receives systematically less resource attention from data-driven campaigns. ODA's framework includes an optional "equity weighting" parameter that adjusts the priority score to explicitly weight turnout history differently for communities with documented barriers to voting. Campaigns must choose whether to use it.
Chapter Summary
This chapter built a voter contact analytics dashboard in Python, working through the full pipeline from data loading and cleaning to KPI computation, geographic visualization, support score distribution analysis, prioritization model construction, and interactive dashboard deployment.
The technical implementation used pandas for data manipulation, matplotlib for static visualization, plotly for interactive charts, and scipy for statistical validation. The modular code structure — separate functions for cleaning, KPI computation, visualization, and prioritization — reflects production practices for campaign analytics systems.
The narrative through-line: Nadia Osei built this dashboard not to produce interesting charts, but to answer Yolanda Torres's operational questions. Are we on track? Where are we behind? Are we contacting the right voters? Who should we contact next? Good campaign analytics is always in service of those operational questions. When the dashboard answers them clearly and quickly, it changes decisions. When it produces charts for their own sake, it changes nothing.
The ethical dimension is inseparable from the technical: the prioritization model makes choices that determine which voters get campaign contact and which don't. Understanding and being honest about those choices is not optional for practitioners who want their data work to be compatible with democratic values.
Key Terms
Voter contact KPI: A key performance indicator for the voter contact program — total contacts, contacts per day, pace ratio, persuadability targeting lift, conversion rate.
Pace ratio: The ratio of actual contacts per day to the contacts per day needed to reach the goal by Election Day. A pace ratio below 1.0 means the campaign is behind target.
Persuadability targeting lift: The percentage-point difference between the share of persuadable voters among contacted voters and their share of the overall universe. Positive lift means the campaign is successfully concentrating contact on persuadable targets.
Voter segment: A classification of voters by support score and persuadability score quadrant: Hard Support, Soft Support - Persuadable, True Persuadable, Soft Opposition - Persuadable, Hard Opposition.
Priority score: A composite score for ranking uncontacted voters by the expected value of contacting them, weighted by persuadability, support score targeting, vote frequency, and county priority.
Context collapse: In this data context, the risk that a model score based on aggregate patterns is treated as an accurate description of an individual voter.
Equity weighting: An optional adjustment to prioritization models that explicitly accounts for historical underrepresentation or turnout barriers in specific communities.
33.13 Connecting to Live Canvassing Apps: Real-Time Data Integration
The dashboard built in this chapter assumes a daily export workflow: the campaign exports contact data from its voter activation database each morning, and Nadia's pipeline ingests it to produce the day's report. This is a practical and common approach, but it means the dashboard is always looking at yesterday. As campaigns have invested in mobile canvassing applications — where field staff log contacts in real time — the opportunity has emerged to build dashboards that update continuously rather than once daily.
33.13.1 The Architecture of Real-Time Integration
Most campaign voter activation platforms (VAN, Action Network, NGP) offer API access that allows programmatic queries against the contact database without waiting for a manual export. Real-time integration typically involves three components:
Polling or webhook connections: The dashboard can either poll the API on a schedule (every five minutes, every fifteen minutes) or, if the platform supports webhooks, receive push notifications whenever a new contact is logged. Polling is simpler to implement; webhooks are more efficient at scale.
Incremental update logic: Rather than reloading the entire dataset on each refresh, production dashboards ingest only new records since the last update. This requires tracking a "last updated" timestamp and issuing API queries with date-range filters.
Live KPI recomputation: As new contact records arrive, the KPI functions from Section 33.3 recompute on the updated dataset. In a well-architected pipeline, this recomputation takes seconds; in a poorly optimized one, it can create lag that undermines the "real-time" promise.
import time
import requests
from datetime import datetime, timedelta
# Simplified illustration of a polling-based update loop
# In production, replace mock_api_call with actual VAN/NGP API client
def mock_api_call(since_timestamp):
"""
Placeholder for a real voter activation platform API call.
Returns new contact records since the given timestamp.
In production: replace with VAN SmartVAN API or NGP 8 API client.
"""
# Returns a list of new contact dictionaries
return []
def live_update_loop(df_existing, kpis, poll_interval_seconds=300):
"""
Illustration of a continuous polling loop for real-time dashboard updates.
Polls the canvassing app API every poll_interval_seconds and updates KPIs.
Parameters:
df_existing: The current voter contact DataFrame
kpis: Current KPI dictionary
poll_interval_seconds: How often to check for new data (default: 5 min)
Note: In a production environment, this loop would run in a background
process or serverless function, not interactively in a notebook.
"""
last_update = datetime.now()
while True:
print(f"[{datetime.now().strftime('%H:%M:%S')}] Polling for new contacts...")
new_records = mock_api_call(since_timestamp=last_update)
if new_records:
new_df = pd.DataFrame(new_records)
df_existing = pd.concat([df_existing, new_df], ignore_index=True)
kpis, daily_trend = compute_campaign_kpis(
df_existing, CONTACT_GOAL, TODAY, ELECTION_DAY, CAMPAIGN_START
)
print(f" Added {len(new_records)} new contacts. "
f"Total: {kpis['total_contacted']:,} ({kpis['pct_of_goal']:.1f}% of goal)")
last_update = datetime.now()
else:
print(f" No new contacts since last update.")
time.sleep(poll_interval_seconds)
# In production, this would be called from a scheduled job or background thread:
# live_update_loop(df_clean, kpis, poll_interval_seconds=300)
print("Live update loop illustrated. See production deployment notes above.")
💡 Implementation Note: The Right Tool for the Job For campaigns with a field operation of a few dozen canvassers, daily exports are usually sufficient — the marginal value of real-time data rarely justifies the engineering overhead of API integration. For campaigns running thousands of doors per day across multiple counties, real-time integration can catch problems (a county falling behind pace, canvassers not logging contacts) while there is still time to intervene. The decision should be driven by operational need, not technical sophistication for its own sake.
33.13.2 Data Quality Challenges in Live Canvassing Streams
Live data introduces data quality problems that daily exports partially filter out. When a canvasser logs a contact on a mobile app in real time, the following issues are common:
Duplicate contact records: A canvasser may log a contact, lose connectivity, and re-submit when connectivity is restored — creating two records for the same interaction. Deduplication in a live stream requires either platform-level transaction IDs (preferred) or fuzzy matching on voter ID, contact date, and method — a more fragile approach.
Timestamp errors: Mobile devices with incorrect system clocks, or canvassers logging contacts hours after the interaction, produce contact timestamps that are inaccurate. Campaigns that use timestamps for pace analysis may see anomalous spikes and troughs if timestamp quality is not validated.
Out-of-universe contacts: Canvassers sometimes knock on doors not on their turf sheet — because a neighbor flagged them down, because they mis-read a house number — and log contacts for voter IDs not in the target universe. These contacts are not campaign goals, but they inflate total contact counts if not filtered.
Missing outcome codes: When canvassers are rushed, they sometimes log a contact without entering an outcome (confirmed support, soft support, not home) — creating a contact record with no useful qualitative information. Campaigns that rely on outcome-coded data for subsequent targeting decisions need a protocol for following up on outcome-missing records.
✅ Best Practice: Data Quality Flags in the Live Pipeline
The production version of Nadia's pipeline includes a data quality flag layer that runs on each incoming batch of records. Each flag is logged to a separate data_quality_report table that is included in the morning briefing:
duplicate_flag: Records where voter_id + contact_date match an existing recordtimestamp_anomaly_flag: Contact timestamps more than 24 hours in the past or futureout_of_universe_flag: Voter IDs not present in the master voter filemissing_outcome_flag: Contacted records with null outcome code
Field directors receive a weekly data quality summary and are expected to address systematic problems — a canvasser consistently logging contacts without outcomes, a county showing anomalous timestamp distributions — through direct coaching.
33.14 Dashboard Views for Different Audiences
One of the most important insights Nadia gained from building the Garza campaign dashboard was that different users need to see different things. A single dashboard designed for everyone is often useful to no one.
33.14.1 The Field Director View
Yolanda Torres, the field director, needs operational information. She is making decisions about canvasser deployment, shift planning, and resource allocation. Her primary questions:
- Are we on pace? The cumulative progress chart and pace ratio KPI answer this.
- Where are we behind? The county breakdown answers this.
- Who should we contact today? The prioritization tool answers this.
- Are canvassers targeting the right voters? The persuadability targeting lift answers this.
Yolanda's ideal dashboard view: a single screen with the pace ratio prominently displayed, the county progress chart, and a one-click export of the day's priority contact list by county. She does not need the support score distribution histograms or the statistical test results. The field director view strips the dashboard to its operational essentials — the questions that need to be answered before the morning canvassing briefing, presented in language that requires no analytical training to interpret.
def build_field_director_view(kpis, county_stats, today):
"""
A simplified dashboard view for field directors.
Focuses on pace, county status, and a clear call to action.
High information density, low analytical complexity.
"""
import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig = make_subplots(
rows=1, cols=2,
subplot_titles=('Campaign Pace', 'County Status (% of Goal)'),
column_widths=[0.4, 0.6]
)
# Pace gauge
pace_pct = kpis['pace_ratio'] * 100
pace_color = '#2ecc71' if pace_pct >= 90 else '#f39c12' if pace_pct >= 70 else '#e74c3c'
fig.add_trace(
go.Indicator(
mode="gauge+number+delta",
value=pace_pct,
title={'text': f"Pace ({kpis['contacts_per_day_actual']:.0f} contacts/day)"},
delta={'reference': 100, 'suffix': '%'},
gauge={
'axis': {'range': [0, 130]},
'bar': {'color': pace_color},
'steps': [
{'range': [0, 70], 'color': '#fadbd8'},
{'range': [70, 90], 'color': '#fef9e7'},
{'range': [90, 130], 'color': '#eafaf1'}
],
'threshold': {'line': {'color': 'black', 'width': 3}, 'value': 100}
},
number={'suffix': '% of needed pace'}
),
row=1, col=1
)
# County status bar chart
county_sorted = county_stats.sort_values('pct_of_goal')
bar_colors = [
'#2ecc71' if p >= 90 else '#f39c12' if p >= 70 else '#e74c3c'
for p in county_sorted['pct_of_goal']
]
fig.add_trace(
go.Bar(
y=county_sorted['county'],
x=county_sorted['pct_of_goal'],
orientation='h',
marker_color=bar_colors,
text=[f"{p:.0f}% ({int(r):,} remain)"
for p, r in zip(county_sorted['pct_of_goal'],
county_sorted['remaining_to_goal'])],
textposition='inside',
showlegend=False
),
row=1, col=2
)
fig.add_vline(x=100, line_dash='dash', line_color='black', row=1, col=2)
fig.update_layout(
title=f"FIELD DIRECTOR VIEW — {today.strftime('%A, %B %d')} | "
f"{kpis['days_remaining']} days remaining",
height=400, template='plotly_white'
)
fig.write_html('field_director_view.html')
print("Field director view saved: field_director_view.html")
return fig
build_field_director_view(kpis, county_stats, TODAY)
33.14.2 The Analytics Director View
Nadia herself — the analytics director — needs the full picture. Her questions extend beyond the operational into the diagnostic: Is the persuadability targeting working as designed, or is there drift over time? Are the support score distributions shifting as contact accumulates? Are any demographic groups being systematically under-contacted relative to their share of the target universe? Is the model's calibration holding — if voters with support scores of 70+ are responding positively at much lower rates than predicted, the model may be miscalibrated on specific segments?
The analytics director view is the full four-panel dashboard from Section 33.8, supplemented by a model calibration panel that plots observed contact outcomes against predicted support scores. A well-calibrated model should show a monotonic relationship: voters with higher support scores should confirm support at higher rates. Deviations from monotonicity indicate calibration problems that may require updating the scoring model mid-campaign.
33.14.3 The Campaign Manager View
The campaign manager is making strategic rather than operational decisions. Their primary questions operate at a higher level of abstraction: Are we on track to win, not just to hit our contact number? Are we allocating resources correctly across field, phone, text, and digital? Where is the marginal return on contact investment highest?
def build_manager_summary(kpis, county_stats):
"""
One-page strategic summary for the campaign manager.
Plain-language status, county alerts, and recommended actions.
"""
status = ('ON TRACK' if kpis['pace_ratio'] >= 0.9 else
'CAUTION' if kpis['pace_ratio'] >= 0.75 else 'BEHIND')
print("=== CAMPAIGN MANAGER STRATEGIC SUMMARY ===")
print(f"STATUS: {status}")
print(f"Progress: {kpis['pct_of_goal']:.1f}% of {kpis['contact_goal']:,} goal")
print(f"Days Remaining: {kpis['days_remaining']}")
print(f"Pace: Current {kpis['contacts_per_day_actual']:.0f}/day vs. "
f"needed {kpis['contacts_per_day_needed']:.0f}/day")
print(f"Targeting quality: {kpis['persuadability_targeting_lift']:+.1f}pp "
f"persuadable lift vs. universe")
county_alerts = county_stats[county_stats['pct_of_goal'] < 80]
if len(county_alerts) > 0:
print("\nCOUNTY ALERTS (below 80% of goal):")
for _, row in county_alerts.iterrows():
print(f" {row['county']}: {row['pct_of_goal']:.0f}% done, "
f"{int(row['remaining_to_goal']):,} contacts remaining")
build_manager_summary(kpis, county_stats)
📊 Real-World Application: The "Single Source of Truth" Problem One of the persistent challenges in campaign analytics is that different users often have different versions of the same data. The field director is looking at an export from last Tuesday; the campaign manager is looking at a Google Sheet someone built independently; the candidate is being briefed from a summary that combines numbers from both. "Whose numbers are right?" is a common and disruptive question in campaign data operations. ODA's standard recommendation: one pipeline, multiple views. Every audience gets a view built from the same underlying dataset, updated at the same time, by the same process. The views differ; the data source does not.
33.15 Troubleshooting Common Data Quality Issues in Voter Contact Data
Voter contact data is notoriously messy. Campaigns use data from multiple sources — state voter files, commercial data vendors, VAN exports, volunteer data entry — that were built for different purposes, at different times, under different quality standards. The data cleaning function in Section 33.2 handles the most common issues, but field campaigns generate additional data quality problems that merit systematic attention.
33.15.1 Diagnosing Volume Anomalies
When daily contact totals show a spike — 500 contacts on a day when typical volume is 150 — the most common explanations are: a batch data entry of contacts accumulated over multiple days, a shift that was particularly large, an error in date recording (canvassers entered the wrong date), or a duplicate import. Diagnosis begins with comparing the spike-day total to the number of scheduled canvassers on that date and the average contacts-per-hour rate. If the implied productivity per canvasser is implausibly high (more than 15-20 contacts per hour for door canvassing), the data likely reflects accumulated entry, not a single-day effort.
def diagnose_volume_anomalies(daily_trend, sigma_threshold=2.5):
"""
Flag days with anomalous contact volumes — more than sigma_threshold
standard deviations above or below the rolling mean.
Returns flagged records for investigation.
"""
daily_trend = daily_trend.copy()
daily_trend['rolling_mean'] = (
daily_trend['daily_contacts'].rolling(7, min_periods=3).mean()
)
daily_trend['rolling_std'] = (
daily_trend['daily_contacts'].rolling(7, min_periods=3).std()
)
daily_trend['zscore'] = (
(daily_trend['daily_contacts'] - daily_trend['rolling_mean']) /
daily_trend['rolling_std'].replace(0, 1)
)
anomalies = daily_trend[daily_trend['zscore'].abs() > sigma_threshold]
if len(anomalies) > 0:
print(f"⚠️ Volume anomalies detected ({len(anomalies)} days):")
for _, row in anomalies.iterrows():
direction = 'SPIKE' if row['zscore'] > 0 else 'DROP'
print(f" {row['contact_date'].strftime('%Y-%m-%d')}: "
f"{int(row['daily_contacts']):,} contacts "
f"({direction}, z={row['zscore']:.1f})")
print(" Investigate: Check for batch entries, date errors, or missed shifts.")
else:
print("No volume anomalies detected.")
return anomalies
diagnose_volume_anomalies(daily_trend)
33.15.2 Checking Score Calibration and Default-Fill Artifacts
If the support score distribution in a county looks suspiciously uniform — every decile represented at exactly the same rate — it may indicate that the commercial score vendor filled missing values with a default (often 50) rather than computing actual scores. Scores clustering exactly at 50 are a red flag: they are the most common default value, not a meaningful estimate.
def check_score_calibration(df):
"""
Diagnose common score data quality issues:
- Excess clustering at round numbers (default fill indicators)
- Implausibly low variance (possible wholesale default filling)
- County-level variation in score distributions (legitimate or artifact?)
"""
print("=== SCORE CALIBRATION DIAGNOSTICS ===")
for col in ['support_score', 'persuadability_score']:
pct_at_50 = df[col].between(49, 51).mean() * 100
pct_at_0 = (df[col] == 0).mean() * 100
pct_at_100 = (df[col] == 100).mean() * 100
std_dev = df[col].std()
print(f"\n{col}:")
print(f" % in 49-51 range: {pct_at_50:.1f}% "
f"{'⚠️ HIGH — possible default fill' if pct_at_50 > 10 else '✓'}")
print(f" % exactly 0: {pct_at_0:.1f}% "
f"{'⚠️ HIGH' if pct_at_0 > 5 else '✓'}")
print(f" % exactly 100: {pct_at_100:.1f}% "
f"{'⚠️ HIGH' if pct_at_100 > 5 else '✓'}")
print(f" Overall std dev: {std_dev:.1f} "
f"{'⚠️ LOW — possible widespread default fill' if std_dev < 10 else '✓'}")
county_var = df.groupby('county')['support_score'].std()
print(f"\nCounty-level support score std dev:")
print(county_var.round(1).to_string())
if county_var.max() / (county_var.min() + 0.001) > 2.5:
print(" ⚠️ High variance across counties — check whether the same "
"vendor model was applied uniformly or different models per county.")
check_score_calibration(df_clean)
33.15.3 The Contact Quality Problem
A contact is typically defined as a successfully completed interaction — a voter who answered the door, answered the phone, or responded to a text. But campaigns vary in how they define "contact." Some require the voter to say something substantive; others count any door where someone answered; others count only interactions that resulted in an outcome code being logged.
This definitional ambiguity means that comparing raw contact counts across campaigns, counties, or time periods is often misleading. "We made 2,000 contacts yesterday" means something different if 1,800 of those contacts were five-second exchanges at the door versus 1,800 substantive conversations with outcome codes recorded.
The best proxy for contact quality is the outcome distribution. If 60% of contacts result in "Confirmed Support" or "Soft Support" across the full campaign, and one county shows only 25%, that county may have canvassers who are logging contacts too liberally, may have a less supportive electorate, or may have operational problems. The multi-hypothesis framing matters: do not assume data quality issues without ruling out substantive explanations.
⚠️ Critical Warning: Do Not Impute Race/Ethnicity from Surname Campaigns sometimes impute race and ethnicity from surname and geographic data when the voter file does not contain explicit race fields. This practice produces errors at the individual level — particularly for voters with names common across ethnic groups — and should never be used to make contact decisions about individual voters. Surname-geography imputation may be appropriate for aggregate analysis (estimating the ethnic composition of a geographic area) with explicit uncertainty acknowledgment, but must not be used to classify individual voters for contact prioritization. The equity implications of wrongly classifying a voter's identity for targeting purposes are significant, and the practice has attracted legal scrutiny in several states.
33.15.4 The Common Pattern: Data Problems That Hide Behind Good-Looking Charts
The most dangerous data quality issue is not the one that produces an error message — it is the one that produces a plausible-looking but systematically wrong output. A dashboard that shows smooth, gently rising contact curves and sensible county breakdowns can still be built on:
- Support scores that are 30% default-filled
- Contact records where 15% have wrong dates from a bulk import
- A voter universe that is missing 8,000 records from a registration file update that did not get incorporated
Nadia's practice: before every new campaign cycle or after any major data update, run the full diagnostic suite from this section. Look for the anomalies you do not expect. The anomalies you expect are already caught in the cleaning function; the ones that matter are the ones that surprise you.
33.16 Connecting the Dashboard to Chapter Themes
This chapter's two themes — "Measurement Shapes Reality" and "Data in Democracy: Tool or Weapon?" — run through every design decision Nadia made in building the voter contact analytics dashboard.
Measurement Shapes Reality: The choice of what to count as a contact, how to define the contact goal, and what KPIs to display are not neutral technical decisions. They are choices that shape behavior. When pace ratio becomes the primary field director metric, canvassers optimize for volume. When persuadability targeting lift is tracked prominently, field coordinators push canvassers to use the priority list rather than their own intuitions. When a county-level breakdown makes geographic progress visible to senior staff, county coordinators feel accountability pressure that would not exist if progress were reported only as a campaign-wide total. The dashboard does not merely describe the campaign — it influences the campaign it describes.
Data in Democracy: The prioritization model embedded in the dashboard makes choices about whose vote matters. By optimizing contact toward persuadable swing voters, it implicitly prioritizes the political center over the partisan base, suburban swing districts over low-income urban neighborhoods with lower modeled persuadability scores. These choices are politically consequential, mathematically defensible, and ethically contestable. The analyst who builds the dashboard and the campaign manager who uses it share responsibility for understanding and owning those choices.
The voter contact analytics dashboard is, in this sense, a microcosm of the broader question that runs through political analytics: can sophisticated data tools strengthen democracy, or do they risk serving only the campaigns sophisticated enough to deploy them and the voter segments sophisticated enough to appear in their models? The answer depends not on the tools themselves, but on the values and accountability structures of the people who build and use them.
🔗 Connection to Chapter 34 The next chapter turns from voter contact to voter targeting — the statistical and machine learning models that generate the support scores and persuadability scores at the foundation of this dashboard. Understanding how those models are constructed illuminates both their power and their limitations, and connects the operational layer of the dashboard to the inferential layer that produces the scores driving prioritization decisions.