Case Study 28.1: The Garza Campaign's Data Audit — Foundations of an Analytics Operation

Case Study 28.1: The Garza Campaign's Data Audit — Foundations of an Analytics Operation

Background

When Nadia Osei joined the Garza for Senate campaign in early March, twelve months before Election Day, her title was Analytics Director. Her first task was deceptively straightforward: figure out what data the campaign actually had and whether it was any good.

The Garza campaign had been operating informally for nearly eight months before Nadia arrived — Maria Garza had announced her candidacy the previous summer, built a small staff, and begun raising money. During those eight months, data had accumulated without a coherent system to organize it: email sign-ups from events, donor records from the ActBlue platform, volunteer intake forms, and a VAN instance that the state Democratic Party had set up but that no one had used consistently.

Nadia had three weeks to complete the audit before she needed to present a data strategy to the campaign manager, Elena Cruz, and the candidate. This case study follows that audit and what it revealed.

The Audit Process

Nadia began by mapping the campaign's data ecosystem — identifying every place that data lived and who was responsible for it.

The VAN Instance: The state party had configured VAN for the campaign, preloaded with the state's current voter file (approximately 3.4 million registered voters, sourced from Catalist's latest update). The instance contained VAN's standard suite of contact history codes and survey fields. However, Nadia discovered that the previous eight months of event-based volunteer outreach had been logged inconsistently — some events had been entered, others had not, and the contact records that did exist used a mix of different coding conventions.

The Email List: The campaign had accumulated approximately 28,000 email addresses through event sign-ups and the campaign website. The email service provider (MailChimp) had clean engagement data — open rates, click rates, unsubscribes — but the email addresses had not been matched to VAN records. This meant the campaign had a substantial list of engaged supporters whose voter file identities were unknown.

The Donor File: ActBlue maintained clean records of approximately 4,200 donors who had contributed to the campaign, with donation amounts, dates, and contact information. Again, these records had not been matched to VAN.

The Catalist Connection: The campaign's Catalist subscription provided access to the enriched voter file with consumer data, previous cycle canvass scores, and Catalist's standard suite of modeled scores. However, the subscription had been set up six months earlier and the scores had not been refreshed.

Key Findings

Finding 1: Identity Resolution Gap

The most significant finding was that the campaign had three separate pools of engaged people — email subscribers, donors, and event attendees — with minimal overlap in the records. Someone who had donated, signed up for emails, and attended two events might appear three times in three different systems with no connection between the records.

Nadia estimated that the campaign's total pool of identified supporters was probably around 40,000 unique individuals. But because the records weren't matched, the campaign couldn't answer basic questions: What is the overlap between donors and volunteers? Are email subscribers more likely to have voted in past elections? Do our event attendees come from target precincts?

Finding 2: The Stale Score Problem

Catalist's modeled scores in the campaign's voter file were built on the previous cycle's data. The state had undergone notable demographic shifts in the two years since — significant growth in suburban Hispanic and Asian-American populations, shifts in educational composition in several counties — that weren't fully reflected in scores calibrated on older patterns.

More concretely: the campaign's current turnout propensity scores gave relatively low ratings to the exact demographic groups whose share of the electorate had grown most substantially. If the campaign built its GOTV universe on these scores, it would systematically underweight communities that were increasingly central to the Garza coalition's math.

Finding 3: Geographic Blind Spots

The campaign's eight months of event activity had been concentrated in the three largest metropolitan areas. The VAN instance contained essentially no canvass history for the state's second and third tiers of cities — mid-sized metros with substantial populations of exactly the kind of suburban voters who would be decisive in the race.

This wasn't a data problem per se — it was a reflection of where the early campaign had focused its energy. But it meant the campaign had almost no ground-truth signal from those areas: no canvass responses, no survey data, no events-based contacts. The analytics team would be making targeting decisions for those areas based entirely on modeled scores, with no empirical validation from actual voter contact.

Finding 4: Digital-Field Disconnect

The campaign was running a modest paid digital program — Facebook and Instagram ads, primarily targeted at email acquisition and small-dollar fundraising. The digital team was tracking ad performance through platform-native analytics (Facebook Ads Manager). The field team was tracking voter contact through VAN. The two systems were completely separate, with no mechanism for attributing digital engagement to specific voter file records or understanding whether digital-engaged voters were also canvass-engaged voters.

Implications for Strategy

Nadia presented her findings to Elena Cruz in a forty-minute meeting that she described later as "the most important conversation I had all cycle." The audit's findings had direct strategic implications.

On identity resolution: The campaign needed to invest immediately in matching its email list and donor file to VAN records. This would take two to three weeks of data work but would immediately give the campaign a much richer picture of its current supporter base — and would allow it to model those supporters' political characteristics.

On stale scores: Rather than using Catalist's existing modeled scores directly, the campaign should treat them as inputs to a fresh model trained on data from the current cycle. Nadia proposed a rapid-cycle model build: take the first six weeks of canvass results, overlay them on the Catalist features, and generate updated scores that reflected current-cycle patterns rather than past cycles'. This would require getting canvassers in the field quickly in geographically diverse areas — not just to make voter contacts, but to generate the training data that would make the model valid.

On geographic blind spots: The campaign should treat model outputs in the blind-spot areas as inherently less reliable and should allocate early field resources partly for intelligence-gathering rather than pure GOTV purposes. Low-cost voter contact in mid-sized cities — phone banking, not door-knocking — could generate survey responses that helped calibrate the model without requiring the full per-contact investment of a canvass.

On the digital-field disconnect: Nadia proposed building a regular data matching process — running the campaign's digital engaged universe (email openers, ad clickers, website visitors) through a voter file matching algorithm every two weeks. This wouldn't be perfect, but it would begin to close the gap between what the digital team knew and what the field team knew.

The Model Build

Six weeks after Nadia's audit, the campaign had completed identity resolution on approximately 72% of its email list and 89% of its donor file. The matched records revealed several surprises. Donors were significantly more likely to be high-turnout voters than the broader supporter pool — which was expected — but they were also disproportionately concentrated in a handful of precincts that were already strong for Garza. The email list, by contrast, contained a larger fraction of low-turnout registrants, including a significant cohort of voters who had registered in the last two years but had not yet voted in a statewide election.

These newly registered, low-turnout email subscribers became a priority target for the campaign's early organizing — a group of potential Garza voters who were engaged enough to have signed up for emails but who might not vote without active mobilization.

Discussion Questions

Why is identity resolution (matching records across data systems) so foundational to campaign analytics? What strategic decisions become possible after matching that are impossible before?
Nadia finds that Catalist's modeled scores are stale — calibrated on past cycles that don't fully reflect the current electorate. What does this suggest about the limits of third-party vendor data, and what can campaigns do to mitigate this problem?
The audit reveals that the campaign's event activity has been geographically concentrated in major metros. How would you expect this to affect the campaign's data quality and analytical reliability across different parts of the state?
The digital-field disconnect Nadia identifies is described as common across campaigns. Why might this disconnect persist despite its obvious costs? What organizational or incentive factors might maintain it?
Imagine you are Jake Rourke's junior analyst, Marcus, conducting a similar audit of the Whitfield campaign's data infrastructure. Based on what the chapter describes about the Whitfield operation, what would you expect to find? What would be your top three priorities?