Case Study 33-2: ODA's Open-Source Framework and Data Democracy
Background
Three months before the Garza campaign hired Nadia Osei, Adaeze Nwosu had a conversation that changed how she thought about her organization's work.
ODA — OpenDemocracy Analytics — had been founded on the premise that better data tools produced better civic outcomes. Adaeze had spent six years building the infrastructure: the standardized voter file integration framework, the KPI computation library, the dashboard templates. ODA had helped 34 campaigns in three election cycles, and its data showed that campaigns using ODA tools had, on average, 11% higher contact efficiency than comparable campaigns that did without.
The conversation was with Sam Harding, ODA's data journalist and the person who had been tracking the organization's impact data. Sam had been working on a longer analysis of who benefited from ODA's tools and had brought Adaeze a finding that she found uncomfortable.
"Of the 34 campaigns that have used our full toolkit," Sam said, sliding a spreadsheet across the table, "29 are in competitive districts. Of those 29, 26 are campaigns where the candidate is in a substantially advantaged resource position relative to their opponent. The three where the candidate is resource-disadvantaged all used ODA tools for voter file integration only — they didn't have the staff capacity to actually build and use the dashboard."
Adaeze stared at the spreadsheet. "So we're systematically more useful to campaigns that are already well-resourced."
"Not just more useful," Sam said. "We might be making the resource advantage larger. A campaign that can afford a data analyst can use our tools to get 11% more efficient. A campaign that can't afford a data analyst gets the voter file integration — which is useful — but not the efficiency gain. If we're both better at doing this, we're widening the gap."
ODA's Mission and the Equity Problem
ODA's stated mission was "making advanced civic analytics accessible to campaigns and organizations working toward a more equitable democracy." The equity framing was intentional; Adaeze had built ODA partly in response to her observation that sophisticated campaign analytics had been concentrated in well-funded campaigns, giving already-advantaged candidates an additional structural advantage.
But Sam's data suggested that the "accessible" part of the mission was succeeding better than the "equity" part. The tools were more accessible — a campaign could use ODA's voter file integration for a fraction of the cost of building it independently. But the campaigns that derived the greatest benefit were those with staff capacity to use the full analytical toolkit, and those campaigns tended to be better resourced to begin with.
"There's a deeper version of this problem," Sam told Adaeze. "The support score and persuadability score models are trained on prior election data and demographic surveys. Those surveys overrepresent certain communities. The model is better calibrated for white suburban voters than for voters in communities that have been less surveyed. So when our prioritization tool says 'contact this persuadable voter first,' it's more likely to be right about a suburban white voter than about a Black voter in a dense urban precinct."
Adaeze had known this theoretically. Seeing it articulated as a systematic bias in ODA's core product was different.
The Product Decision
ODA's board asked Adaeze to respond to Sam's analysis. The organization had three options:
Option 1: Do nothing different. The tools are available to any campaign; if resource-advantaged campaigns use them more effectively, that reflects their resource advantage, which ODA can't solve. The 11% efficiency gain is real and helps any campaign that can use it.
Option 2: Create a capacity-building program. ODA would offer subsidized staff time — essentially providing a part-time data analyst to under-resourced campaigns as an in-kind service. This would close the capacity gap but require significant additional funding and organizational capacity.
Option 3: Redesign the tools for lower-capacity users. Rather than building tools that required a data analyst to use, ODA would invest in building simpler, more automated versions of the dashboard that could be run by a field director without analytical training. The tradeoff: simpler tools would be less flexible and less powerful.
Option 4: Address the model bias problem directly. Invest in community-level data collection to improve model calibration in under-surveyed communities, which would make the prioritization tools more equitable in their accuracy across demographic groups.
Sam's Data on Model Calibration
Sam's analysis of model calibration showed the following accuracy rates for support score models (measured by how well the model's predictions matched actual voter behavior in post-election survey data):
| Demographic Group | Model Calibration (correlation) |
|---|---|
| White, suburban, bachelor's degree+ | 0.72 |
| White, rural | 0.61 |
| Hispanic/Latino, urban | 0.54 |
| Black/African American, urban | 0.49 |
| Asian/Pacific Islander | 0.58 |
| Young voters (18-29), any demographic | 0.47 |
Higher correlation means the model is more accurate in predicting individual voter support. The systematic differences in calibration reflected systematic differences in survey representation in the training data.
Practically: when a Nadia Osei uses ODA's prioritization tool to identify the most persuadable voters, the tool is more reliable for some demographic groups than others. Canvassers dispatched to contact the "most persuadable" voters in predominantly Black urban precincts based on ODA's scores may find that the scores are substantially wrong more often than canvassers in predominantly white suburban precincts.
Discussion Questions
1. Sam's finding is that ODA's tools widen the advantage for already well-resourced campaigns. Does this finding undermine ODA's mission? Does it matter whether the campaigns benefiting most are of a particular political orientation?
2. Evaluate the four options Adaeze is considering. For each option, assess: What would it cost ODA organizationally? Who would benefit most from it? What problem does it fail to address?
3. The model calibration table shows systematic differences by demographic group. What is the ethical significance of a campaign analytics tool that is more accurate for white suburban voters than for Black urban voters? Who bears the cost of this inaccuracy?
4. Sam says: "The prioritization tool makes the 'contact this persuadable voter first' decision. It's more likely to be right about a suburban white voter than about a Black voter in a dense urban precinct." From the campaign's perspective, if you know the tool is less accurate for certain segments, how should this change how you use the tool? Is it appropriate to continue using the tool in those segments, and if so, with what caveats?
5. Adaeze has said that "how you use the tools is a political choice." But Sam's analysis suggests that the tools themselves encode choices — through training data selection, model architecture, calibration — that have differential effects on different communities. Is the distinction between "neutral tools" and "political choices about how to use them" sustainable? What would it mean to build a genuinely neutral voter contact analytics tool?
Quantitative Analysis
Using the calibration data in the table:
a) If the Garza campaign's prioritization tool ranks 1,000 Black urban voters as "high persuadability" (score above 65), and the model's correlation for this group is 0.49, how does this calibration level affect the expected accuracy of those 1,000 rankings compared to if the model had a correlation of 0.72 (white suburban calibration)?
b) Assume the campaign allocates 500 canvassing contacts based on the priority ranking for each of the following groups: white suburban voters, Hispanic/Latino urban voters, and Black urban voters. Using the calibration correlations as a proxy for what percentage of "priority" contacts are genuinely the highest-priority voters (i.e., correlation of 0.72 means 72% of the top-ranked voters are genuinely among the most persuadable), compute the expected number of "correctly prioritized" contacts in each group.
c) What is the implied contact efficiency loss for the Hispanic/Latino urban and Black urban segments compared to white suburban? Express this as a percentage of contacts that are misallocated due to model inaccuracy.
d) If ODA invested in community surveys to improve the calibration for Black urban voters from 0.49 to 0.65, how many additional correctly prioritized contacts would the campaign achieve in a deployment of 500 canvass contacts to this segment?