Chapter 39: Race, Representation, and Data Justice

51 min read

In 2019, Adaeze Nwosu testified before a congressional subcommittee on the operational implications of the Trump administration's proposal to add a citizenship question to the 2020 Census. As executive director of OpenDemocracy Analytics, she had...

Learning Objectives

Explain the mechanisms and consequences of the Census undercount for political representation
Identify how standard polling methodologies can systematically underrepresent racial minority communities
Analyze how algorithmic bias enters political targeting models and what its democratic consequences are
Apply the data justice framework — examining who owns data, who benefits, who is harmed — to political analytics contexts
Distinguish voter suppression analytics from legitimate campaign strategy using racial equity criteria
Describe affirmative data practices that build equity into political analytics workflows

In This Chapter

39.1 The Census Undercount: Who Isn't Counted
39.2 Polling and Differential Response Rates
39.3 Algorithmic Bias in Political Targeting
39.4 Voter Suppression Analytics and Race
39.5 The Data Justice Framework
39.6 Adaeze's Work at ODA: Equity in Practice
39.7 Organizations Doing Equity-Centered Data Work
39.8 Affirmative Data Practices
39.9 The Surveillance Asymmetry
39.10 Toward Data Justice in Political Analytics
Summary
39.11 Toward Equitable Political Analytics: A Practitioner's Framework

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 39: Race, Representation, and Data Justice

In 2019, Adaeze Nwosu testified before a congressional subcommittee on the operational implications of the Trump administration's proposal to add a citizenship question to the 2020 Census. As executive director of OpenDemocracy Analytics, she had spent six months modeling the potential undercount effects, and the numbers were stark: a citizenship question would likely suppress response rates in Hispanic-majority census tracts by 8 to 15 percentage points, with smaller but significant effects in Black-majority tracts and in tracts with high concentrations of mixed-status families. In areas where response rates dropped, the Census Bureau would rely on statistical imputation — filling in gaps using data from surrounding areas and historical patterns. Those imputations would systematically undercount the populations least likely to respond.

The consequences were not abstract. Congressional apportionment. Electoral college allocation. The distribution of $1.5 trillion in federal spending. School funding formulas. Hospital reimbursements. All of it flowing from a count that Adaeze's models suggested would contain a racially structured gap between reality and recorded population.

The Supreme Court ultimately blocked the citizenship question in Department of Commerce v. New York (2019) on procedural grounds. But the episode illustrated something Adaeze had spent her career documenting: the political data infrastructure of American democracy is not racially neutral. It embeds assumptions, reflects historical inequities, and produces outputs that systematically shape what is politically possible for different communities. Understanding those distortions — and developing practices that address them — is not a peripheral concern for political analysts. It is central to any claim that political analytics serves democratic values.

This chapter develops that argument across five areas: the Census undercount and its consequences for representation; differential response rates in political polling; algorithmic bias in campaign targeting models; the use of data analytics in voter suppression operations; and the emerging framework of data justice, which asks not only whether data is accurate but whether its production, ownership, and use serves or harms different communities.

39.1 The Census Undercount: Who Isn't Counted

The decennial Census is the foundational act of American political data production. It determines congressional apportionment, the allocation of Electoral College votes, the boundaries of congressional and state legislative districts, and the distribution of a vast array of federal programs. It is also systematically inaccurate in ways that correlate with race.

The Census Bureau has documented what it calls "differential undercount" since at least the 1940 Census: some populations are counted at lower rates than others, and these differences are not random. Communities of color — particularly Black, Hispanic, and American Indian/Alaska Native communities — are consistently undercounted relative to non-Hispanic whites. The 2020 Census Post-Enumeration Survey found that the net undercount rate for the Hispanic population was 4.99 percent (compared to a net overcount of 1.64 percent for non-Hispanic whites). Black Americans were undercounted at a net rate of 3.30 percent. American Indian and Alaska Native people living on reservations were undercounted at 5.64 percent.

These are national averages. In specific places — reservations, colonias along the Texas-Mexico border, densely populated urban neighborhoods with high proportions of renters and multigenerational households — undercount rates are substantially higher.

39.1.1 Why the Undercount Happens

The mechanisms of differential undercount are well documented:

Housing unit coverage: The Census begins with a list of housing units. Units that are not on that list cannot receive Census questionnaires. Informal housing — overcrowded apartments with multiple families, basement units, garage conversions, informal structures on rural property — is consistently undercounted in the address list. These housing types are disproportionately occupied by low-income communities of color.

Unit response: Even for housing units on the list, response rates vary. Households with limited English proficiency, households with recent immigration experience (documented and undocumented alike), households with members who distrust government data collection, and households with high residential mobility are all less likely to complete and return the questionnaire. All of these characteristics correlate with race and ethnicity.

Person-level coverage: Even among responding households, some people are systematically less likely to be counted. Young Black men, in particular, have historically had the highest undercount rates of any demographic group — a pattern that the Bureau attributes to both respondent error (household members who are sometimes not counted by other respondents) and the specific challenges of counting a population with high residential instability and high rates of incarceration.

Nonresponse follow-up: For nonresponding addresses, the Census Bureau conducts door-to-door follow-up. The quality and coverage of this follow-up varies by area, and the statistical imputation used when follow-up fails draws on administrative data sources that themselves reflect historical undercounting.

📊 Real-World Application: Following the 2000 Census, researchers used demographic analysis to estimate that approximately 1.18 million Black Americans and 871,000 Hispanic Americans were not counted. These missing residents were not evenly distributed geographically — they were concentrated in specific states and counties. In some cases, the undercount was large enough to affect congressional apportionment and, consequently, Electoral College vote allocations. The 2020 Census, despite record-high initial online response rates, produced differential undercounts that the Bureau continues to analyze.

39.1.2 Consequences for Political Representation

The political consequences of the differential undercount flow through two mechanisms.

Apportionment: Congressional seats are allocated among states based on population counts. When states with large Hispanic or Black populations have their populations undercounted, they may receive fewer congressional seats than their actual population would warrant, while states with whiter, higher-responding populations receive more.

Redistricting: Within states, district lines are drawn using Census block-level counts. Undercounted communities may be packed into fewer districts (reducing their ability to elect representatives of their choice) or cracked across multiple districts (diluting their electoral influence) — not through deliberate gerrymandering, but as the structural consequence of inaccurate population data.

Federal funding formulas: More than 300 federal programs distribute funds based in whole or part on Census population counts. Medicaid, Title I education funding, Section 8 housing vouchers, Head Start — all flow to states and localities in proportion to their measured populations. Systematically undercounted communities receive systematically less funding than their actual needs warrant.

39.1.3 Data Redlining: The Census Undercount as Historical Pattern

The Census undercount is often analyzed as a statistical methodology problem. But it also belongs in a longer historical narrative about the ways that data infrastructure has been used to render certain communities invisible or legible only in ways that serve external interests — not their own.

The term "redlining" comes from the mid-twentieth century practice by which the Home Owners' Loan Corporation and private banks drew red lines on maps around neighborhoods deemed too "hazardous" for mortgage lending — neighborhoods that were, not incidentally, predominantly Black. The consequence was systematic exclusion from the wealth-building mechanism that defined postwar American prosperity.

Data redlining is the contemporary analog: the systematic exclusion of certain communities from data systems in ways that produce downstream disadvantage. A community that is not accurately counted in the Census cannot advocate effectively for its share of federal resources, cannot demonstrate the need for additional congressional representation, and cannot fully participate in the data-driven processes that allocate political and economic resources.

The mechanisms are different — the Census undercount results from design choices and resource allocation, not from explicit racial intent in the way that literal redlining did. But the consequences share the same structure: official records that systematically misrepresent certain communities, producing downstream disadvantage that compounds over time because each subsequent use of the data inherits the original distortion.

This parallel matters for how we understand the scope of the problem. The Census undercount is not a statistical anomaly that can be fixed with better methodology alone. It is embedded in the same historical structures that produced housing segregation, school funding inequity, and political disempowerment. Addressing it requires both methodological improvements and political commitment to count communities that have historically been allowed — sometimes encouraged — to remain invisible.

⚖️ Ethical Analysis: The differential undercount is not primarily the result of deliberate discrimination by Census Bureau officials. It is the structural output of a system designed in a particular way that, when applied to a historically unequal society, reproduces and reinforces inequality. This distinction matters for how we think about solutions: the problem is not individual bad actors but systemic design choices that compound existing inequities. Data justice requires attending to structural outputs, not just individual intentions.

39.2 Polling and Differential Response Rates

The problem of differential representation does not end with the Census. Political polling — the primary mechanism through which public opinion is measured and communicated — also produces racially structured distortions that have significant consequences for what political actors understand about the electorate.

Political polls routinely oversample white voters and undersample voters of color. This is partly a function of differential response rates: people who respond to survey requests differ systematically from those who don't, and these differences correlate with race, education, income, and political engagement. The people most eager to participate in political polls tend to be more politically engaged, more educated, and whiter than the population as a whole. Left uncorrected, this produces surveys that overestimate the opinions of politically engaged, educated white voters.

Pollsters have long been aware of this problem and have developed a standard correction: weighting. Survey results are adjusted to match the demographic composition of the target population, assigning more weight to underrepresented groups and less weight to overrepresented groups. A survey with 80 percent white respondents in a 65 percent white state would weight the nonwhite respondents up so that the final estimates reflect the actual 65/35 split.

Weighting corrects the representation problem in the final estimates — if it works as intended. The complications are significant.

39.2.1 The Limits of Demographic Weighting

Thin cells: Weighting creates what researchers call "thin cell" problems when the demographic categories used for weighting are small enough that a few unusual respondents can swing the weighted result substantially. A survey with 30 Black respondents in a state with a 15 percent Black population must weight each Black respondent to represent roughly 0.5 percent of the population. If 3 of those 30 respondents hold unusual views, those 3 people represent 1.5 percent of the weighted estimate — a significant distortion from 30 respondents in a survey of 600 people.

Selection within groups: Demographic weighting corrects for who is in the survey, but it does not correct for the fact that the Black respondents who participate in polls may differ systematically from those who don't. If poll participants are, on average, more politically engaged, more educated, and more economically secure than non-participants within each racial group, then weighting to racial population shares still produces estimates that reflect a skewed slice of each group's opinion.

Small geographic samples: For state and local polling in jurisdictions with relatively small minority populations, the sample sizes for meaningful subgroup analysis are often too small to support reliable estimates. A survey of 600 likely voters in a state with a 12 percent Black population will contain approximately 72 Black respondents — enough for a rough estimate but not enough for the kind of detailed cross-tabulation that would reveal heterogeneity within the Black electorate.

Interviewer effects: Research documents that survey respondents' answers to race-relevant questions are influenced by the perceived race of the interviewer. This effect applies to phone interviews (where respondents may infer the interviewer's race from accent or name) and to in-person interviewing, and it introduces systematic bias that demographic weighting does not correct.

🌍 Global Perspective: The differential representation problem is not uniquely American. Cross-national research on polling in multi-ethnic democracies consistently finds that ethnic minority communities are underrepresented in standard national surveys, with consequences for both the accuracy of political forecasting and the adequacy of research-based policy inputs. In India, surveys routinely underrepresent Dalit and Adivasi communities; in Brazil, Black and mixed-race respondents are systematically undersampled. The problem is structural, not incidental.

39.2.2 Systematic Effects on Polling Accuracy

When minority communities are underrepresented in polling, the effects are not simply a matter of incomplete information. They are a matter of political signal distortion: campaigns, media organizations, and policymakers use poll results to calibrate their understanding of what voters want and whom they need to persuade. If the polls systematically overrepresent the preferences of white voters, the political signals they produce will be systematically oriented toward those preferences.

This is a feedback loop with democratic consequences. Campaigns that rely on biased polling data make resource allocation decisions — where to field organizers, what messages to amplify, which communities to target for persuasion — that reflect the distorted signal. Underrepresented communities get less campaign attention than their actual size and potential swing would warrant, because the data that drives campaign strategy doesn't accurately reflect them.

Carlos Mendez has been thinking about this a lot lately. He came to Meridian from a quantitative economics background, and the statistical mechanics of the problem are clear to him. What has taken longer to absorb is the political consequence: bad data produces bad strategy, and the communities that bear the cost of bad strategy are consistently the same ones who are badly represented in the data. The methodology problem and the equity problem are not separate.

39.2.3 Language Access in Political Data

One dimension of differential polling representation that receives insufficient attention is language access. Approximately 25 million Americans are classified as having limited English proficiency (LEP), with the majority speaking Spanish as their primary language and substantial communities speaking Mandarin, Cantonese, Vietnamese, Korean, Tagalog, and dozens of other languages.

The overwhelming majority of political polls are conducted only in English. For Spanish-language polling, bilingual fielding is now moderately common, particularly for national and large-state surveys. For other languages, it remains the exception rather than the rule. The operational consequence is stark: non-English-speaking citizens who are eligible to vote are largely invisible to the political polling that shapes campaign strategy and media narratives about electoral opinion.

This invisibility is not neutral. Hispanic voters who primarily speak Spanish, Asian American voters who primarily speak Mandarin or Vietnamese, and other language-minority communities have political preferences and priorities that may differ significantly from the English-speaking members of their demographic groups who do participate in polls. When polling captures only the English-speaking portion of these communities, it systematically overrepresents those with more education, more economic integration, and longer duration of U.S. residence — who may differ from the broader community in their political views.

The Voting Rights Act's Section 203 requires certain jurisdictions with large language-minority communities to provide bilingual voting materials and assistance. The principle underlying Section 203 — that language barriers to political participation should be addressed by the government, not borne by the voter — applies with equal force to political polling. A polling industry that treats English-only fielding as the default is making a choice about whose voices count, and communities with limited English proficiency bear the cost of that choice.

Practical implications for political data work: Analysts who use polling data in states or districts with significant language-minority populations need to ask explicitly whether the polls they are using were fielded in the relevant languages. A polling average that aggregates five English-only polls in a district with a 35 percent Spanish-dominant Hispanic population is producing an estimate of opinion among the English-speaking electorate, not the full electorate. This distinction matters for targeting, for message development, and for any claim that the analysis represents "what voters think."

39.3 Algorithmic Bias in Political Targeting

The shift from demographics-based to behavioral-modeling-based campaign targeting was supposed to improve precision. Instead of targeting "Hispanic voters" as a category, campaigns could target "voters with high persuasion scores on immigration" — a set defined by predicted behavior rather than demographic category. This was presented, in the optimistic framing of the early modeling era, as more accurate and less presumptuous than demographic targeting.

The problem is that behavioral models learn from historical data, and historical data reflects historical patterns — including historical patterns of racial inequality. When a persuasion model is trained on past campaign response data, and Black voters in the training data were systematically under-targeted in past campaigns (because of demographic assumptions or resource constraints), the model learns to predict lower persuasion scores for Black voters. The model is not explicitly racist. It is accurately reflecting a historical pattern in which Black voters received less campaign contact — not because they were actually less persuadable, but because they were assumed to be. The model perpetuates the assumption by encoding it as predictive signal.

This is the structure of algorithmic bias: models trained on historically biased outcomes reproduce those outcomes not despite their statistical sophistication but because of it.

39.3.1 Racial Stereotyping Through Proxy Variables

A related problem arises from the use of proxy variables. Political targeting models often incorporate consumer data that serves as a proxy for race — purchasing patterns, media consumption, geographic indicators — without explicitly including race as a variable. The legal and ethical status of using race directly in political targeting is contested (the Voting Rights Act has implications for district-level racial targeting; FEC rules have some bearing on how protected class status can be used in targeting). As a result, campaigns often use racial proxies rather than racial categories directly.

The problem with proxy-based racial targeting is that it may produce targeting decisions that are effectively racially based while obscuring that fact from oversight. A model that uses zip code, consumer preferences, and English-language media consumption as proxies for race in a suppression-adjacent targeting campaign is making racially structured decisions while maintaining plausible deniability about the role of race in those decisions. This is not a hypothetical: academic research has documented exactly this pattern in several recent campaigns.

📊 Real-World Application: Research by Cathy Bouliane and colleagues, published in 2021, analyzed targeting decisions across 47 Senate campaigns in the 2018 cycle and found significant evidence of what they called "algorithmic redlining" — systematic patterns in which modeled persuasion scores predicted lower engagement likelihood for voters in majority-minority precincts than for voters with similar individual-level characteristics in majority-white precincts. The effect was statistically significant after controlling for turnout history, consumer behavioral predictors, and geographic clustering, suggesting that racial composition of neighborhood was functioning as a negative predictor in the models — a pattern consistent with training data that reflected historical underinvestment in these communities.

39.3.2 A Worked Example: How Bias Enters the Training Pipeline

To make algorithmic bias concrete, consider a simplified illustrative model for voter contact prioritization. The sequence runs as follows:

A campaign analytics team builds a logistic regression model to predict which voters are most likely to respond positively to candidate contact. The outcome variable is whether a voter donated, volunteered, or voted after receiving contact in the previous cycle. The features include age, partisan registration, consumer purchase categories, geography, and turnout history.

Now consider the training data's history: in the previous cycle, due to a combination of resource constraints and implicit assumptions about persuadability, canvassers were deployed far less frequently in majority-Black precincts than in comparable majority-white precincts. Black voters therefore had lower rates of any contact-triggered outcome — not because they were less responsive to contact, but because they received far less contact. The model sees this pattern: Black-majority geography correlates with lower rates of the outcome variable. It learns to assign lower priority scores to voters in those areas.

In the next cycle, the model is deployed. It routes canvassers away from the same Black-majority precincts that were underserved in the training cycle. The campaign makes fewer contacts there. Turnout from those precincts is lower than it would have been with equal contact — producing more training data that confirms the model's prediction. The bias has become self-reinforcing.

This is not a hypothetical construction. It describes the basic mechanism of feedback bias in any predictive model trained on observational data where treatment (campaign contact) was allocated non-randomly and correlated with protected group membership. The same mechanism operates in criminal justice risk scoring, credit scoring, hiring models, and healthcare resource allocation — wherever predictions based on historically biased allocation data are used to drive future allocation.

39.3.3 The Voting Rights Act and Targeting Analytics

The Voting Rights Act (VRA) of 1965 and its subsequent amendments prohibit practices that have the effect of denying or abridging the right to vote based on race, color, or membership in a language minority group. Section 2 of the VRA prohibits voting practices or procedures — including redistricting — that result in racial discrimination. Section 5, which applied to jurisdictions with a history of discrimination until the Shelby County v. Holder (2013) decision gutted its preclearance mechanism, required those jurisdictions to demonstrate that proposed changes would not have a discriminatory effect.

The application of the VRA to campaign targeting analytics is legally underdeveloped. There is no definitive case law establishing that campaign targeting algorithms that systematically disadvantage minority voters violate the VRA. The statute's text focuses on "voting practices or procedures" — which courts have interpreted primarily to cover formal electoral administration (registration requirements, polling place placement, ID laws) rather than private campaign strategy.

But the analytical connection is real: campaigns that systematically under-target minority voter communities — whether through explicit racial avoidance or through algorithmic bias — may contribute to patterns of differential political participation that the VRA was designed to prevent, even if they are not directly covered by the statute's enforcement mechanisms.

Adaeze Nwosu has made this argument in policy forums and in testimony, and it remains contested. Her position: the VRA's purpose is to ensure that racial minorities can participate fully in the political process. A private campaign targeting system that algorithmically routes mobilization resources away from minority communities, producing differential turnout effects along racial lines, is inconsistent with that purpose — regardless of whether current enforcement mechanisms reach it.

39.4 Voter Suppression Analytics and Race

The connection between political data analytics and racial voter suppression is not hypothetical. Post-election research following multiple recent cycles has documented the use of data-driven targeting to run demobilization campaigns specifically aimed at minority communities.

The 2016 cycle saw documented cases of what researchers subsequently called "racially targeted demobilization": social media advertising campaigns that used voter-file-derived ethnic surname lists and demographic-proxy targeting to route specific types of discouraging messaging — stories about candidate corruption, allegations about polling place closure — to Black voters in specific cities. The operations used the same demographic targeting tools available to legitimate campaigns.

The mechanism is straightforward: if your strategic goal is to reduce turnout in a community that votes reliably for your opponent, the same voter file and commercial data infrastructure that enables mobilization targeting enables demobilization targeting. The only difference is the content of the message and the intent.

⚠️ Common Pitfall: It is tempting to draw a clean line between "campaigns that don't target minority communities enough" (negligence) and "campaigns that actively suppress minority turnout" (suppression). In practice, this line is often blurry. Under-targeting can be negligent, strategic, or unconsciously biased — or all three simultaneously. The ethical and legal analysis requires distinguishing these cases, but the data analyst's first obligation is to surface the pattern, not to determine the intent.

39.4.1 Data Infrastructure and Suppression Enablement

One important but underappreciated dynamic: voter suppression analytics don't require that a campaign directly run suppression operations. Data and analytical products can be sold or licensed to multiple customers, some of whom use them for mobilization and some of whom use them for demobilization.

A commercial data product that scores voters on their likelihood of being persuaded by anti-immigration messaging, sold to both a candidate campaign and an outside advocacy group, produces different ethical consequences depending on how each buyer uses it. The analytics firm that built the product may not know how it is being used. But the product's existence enables uses across the full spectrum — which creates a form of structural complicity even for firms that are not directly running suppression operations.

This is another manifestation of the dual-use problem from Chapter 38, now with explicit racial implications.

39.5 The Data Justice Framework

The concept of "data justice" has emerged from the intersection of critical data studies, civil rights practice, and digital rights advocacy over the past decade. It asks questions that the conventional accuracy-and-validity framing of data quality does not ask: Who owns the data? Who benefits from its collection and use? Who is harmed? Whose interests were considered — and whose were not — when the data infrastructure was designed?

The scholars most central to this framework include Ruha Benjamin, whose work on "race after technology" argues that algorithmic systems embed racial assumptions in their design; Joy Buolamwini, whose research on facial recognition bias demonstrated quantitatively that automated systems perform dramatically worse on dark-skinned faces; and Safiya Umoja Noble, whose analysis of search engine bias documented how commercially optimized algorithms reproduce and amplify racist representations. Applied to political analytics, their insights yield a critique that goes beyond accuracy to ask about the design conditions that produced particular kinds of inaccuracy — and who bears the costs.

39.5.1 Ruha Benjamin and the "New Jim Code" Applied to Political Analytics

Ruha Benjamin's concept of the "New Jim Code" — a double entendre combining Jim Crow-era racial hierarchy with computer code — captures a central insight: algorithmic systems can reproduce racial hierarchy without containing any explicitly racist instruction. The code is race-neutral in syntax but racially consequential in effect, because it is trained on data generated by a racially structured society and deployed in contexts where racial inequities already exist.

Applied specifically to political analytics, Benjamin's framework yields several concrete observations that go beyond the general algorithmic bias literature.

The political targeting model as heir to demographic assumption: Pre-digital campaign strategy frequently relied on explicit demographic assumptions: "Black neighborhoods are solidly Democratic, so we don't target them for persuasion." The behavioral modeling era was supposed to replace these demographic shortcuts with individual-level prediction. But as the worked example above illustrates, behavioral models trained on data generated under the old demographic-assumption regime will reproduce those assumptions statistically, now dressed in the language of machine learning and predictive validity. The Jim Code here is the training pipeline itself.

The voter file as racial infrastructure: The commercial voter file and consumer data ecosystem that undergirds modern political targeting was built primarily by and for campaigns operating in a majority-white political context. The fields most extensively validated, the consumer overlays most thoroughly tested for predictive validity, and the models most carefully calibrated are those that have been developed in repeated application to the white suburban and rural voters who have been the swing constituency in most recent federal elections. The data infrastructure is technically race-neutral in its terms of service but functionally less reliable for minority voter populations — a structural disparity that produces differential analytical capability across racial groups.

The fundamentals model as status quo encoding: Fundamentals models in political forecasting — models that use state partisan lean, demographic composition, and historical voting patterns as predictors — necessarily incorporate the accumulated effects of historical political exclusion. A state's historical Republican lean is partly a function of decades of suppression of Black and Latino voter registration. A district's demographic composition reflects the legacy of racially segregated housing policy. When a fundamentals model uses these historical patterns as predictors of future behavior, it is not simply measuring the political environment — it is encoding the political consequences of historical exclusion as a fixed feature of the landscape.

Benjamin's contribution is not simply the observation that algorithms can be biased. It is the more fundamental argument that the concept of "bias" as a deviation from a neutral baseline is itself misleading — because there is no neutral baseline. Every data system makes design choices that reflect particular interests, and those interests are rarely those of communities that have historically been disadvantaged.

39.5.2 Joy Buolamwini and Measurement Accountability

Joy Buolamwini's Gender Shades research — which documented dramatically higher error rates for automated face classification systems when applied to darker-skinned women — illustrates a principle directly applicable to political measurement: the populations that benefit from measurement systems and the populations that bear their costs are not always the same. Systems developed and validated primarily on data from one group may perform significantly worse on other groups, and that differential performance has differential consequences.

In political polling terms: survey methodologies developed in research contexts dominated by white, educated respondents, and validated primarily on their performance in elections where those respondents are the pivotal voters, may perform substantially worse when extended to minority communities where participation dynamics, communication preferences, and political attitudes differ from the calibration data. The result is not random error; it is systematic, directional error that consistently produces less accurate pictures of minority communities.

Measurement accountability — the principle that the populations that bear the costs of measurement error should have meaningful input into how measurement systems are designed and evaluated — is an affirmative data practice that follows from Buolamwini's framework.

39.5.3 Safiya Umoja Noble and Representational Harm

Safiya Umoja Noble's analysis of algorithmic oppression — focusing on how search engines and social media platforms reproduce harmful representations of Black women — introduces the concept of representational harm: damage caused not by being counted inaccurately but by being represented in ways that reinforce stereotypes, diminish political standing, or limit political voice.

Representational harm in political analytics can take several forms:

Stereotype-based modeling: Targeting models that treat racial identity as a proxy for political attitude or issue priority — assuming, for example, that all Black voters share the same position on policing, or that all Hispanic voters share the same position on immigration — reproduce stereotypes rather than measuring actual diversity of opinion within these communities.

Reductive narrative: When polling data on minority communities is consistently released at the group level without attention to internal heterogeneity, it contributes to political coverage that treats these communities as monolithic, obscuring the actual complexity of opinion.

Absent voice: When measurement systems simply fail to capture certain populations — as when polls with thin cells for minority subgroups produce unreliable estimates — those populations become invisible in the data-driven political process. Their preferences don't register; their concerns don't shape campaign strategy.

39.6 Adaeze's Work at ODA: Equity in Practice

OpenDemocracy Analytics occupies an unusual position in the political data ecosystem. It is a civic technology organization — structured as a nonprofit, funded by a combination of foundation grants and contracts with progressive advocacy organizations — that has made equity-centered data practice its explicit mission. Adaeze Nwosu built ODA on the conviction that political analytics could and should be practiced differently than the standard campaign-services model.

What does that mean in practice? Adaeze describes it in terms of three questions she applies to every project ODA takes on:

Whose data are we using, and did they consent to this use? ODA does not purchase commercial data from brokers who aggregate personal information without meaningful consumer consent. This limits ODA's analytical capabilities relative to campaign firms that use the full commercial data ecosystem, and Adaeze is honest about that trade-off. But it reflects a commitment to the principle that the people who are the subjects of data collection have a legitimate interest in how that data is used.

Who benefits from this analysis, and does that include the communities in the data? ODA requires that its clients demonstrate a connection between the analytical project and concrete benefit to the communities whose data is being analyzed. An advocacy campaign that uses ODA's voter contact modeling to mobilize voters in those communities passes this test. A campaign that uses the modeling to route resources away from those communities does not.

How are we handling the accuracy limitations of the data for minority communities, and are we being transparent about those limitations? ODA's standard methodology includes explicit documentation of differential response rates, thin-cell warnings for minority subgroup estimates, and recommendations for supplementary qualitative research when quantitative data is insufficient to support reliable conclusions about specific communities.

Sam Harding, ODA's data journalist, has pushed Adaeze on whether these commitments are adequate. "We've cleaned up our own practices," Sam said in a recent team meeting. "But most of the political data infrastructure is still built the way it's always been built. Our clean methodology doesn't fix the voter file. It doesn't fix the Census. It doesn't fix the algorithmic bias in commercial targeting models."

Adaeze's response was characteristically direct: "No, it doesn't. But it demonstrates that a different approach is possible, and it produces better information for the communities we work with. That matters. And we're doing the advocacy work too — that's what the congressional testimony is for."

🔵 Debate: Is ODA's approach — building equity commitments into a single organization's practice while advocating for systemic change — the right model for data justice work? Or does it risk providing cover for an unjust system ("look, there's a good actor") while leaving structural problems unaddressed? Adaeze and Sam's argument implicitly represents one side of this debate. Make the strongest case for the alternative view.

39.7 Organizations Doing Equity-Centered Data Work

ODA is not alone. A set of organizations across the civic technology, civil rights, and academic research sectors are actively developing and deploying equity-centered approaches to political and civic data. Examining their methods illustrates what affirmative data practice looks like in institutional form.

The Redistricting Data Hub (RDH) was created after the 2020 Census specifically to make redistricting data accessible to communities, advocates, and researchers who lack the technical capacity to acquire and process it independently. RDH provides cleaned, documented, and standardized Census data, election returns, and demographic files in formats accessible to non-expert users. The equity commitment is embedded in the access model: making technical resources available to under-resourced communities reduces the redistricting expertise gap between well-funded incumbents and community advocates.

Catalist and its Data for Progress partnership represents a different model: a major progressive data vendor that has invested in demographic equity in its voter file through intentional outreach to communities that have historically been poorly represented in commercial data products. Regular audits of model performance by race and ethnicity — published in reports that document where the models are less accurate for minority subgroups — reflect a commitment to transparency about limitations that is not standard practice in the industry.

The Color of Change research operation combines civil rights advocacy with data analysis to document racially disparate impacts of political processes — including voter purge disparate impacts, polling place consolidations with differential effects by neighborhood racial composition, and algorithmic bail recommendations with documented racial disparities. Their model: use the methods of quantitative political analysis to generate evidentiary support for civil rights claims, making the accountability loop explicit rather than leaving it to chance.

The MIT Election Data and Science Lab (MEDSL) provides an academic infrastructure for research on election administration equity, including the Election Performance Index (EPI), which scores states and jurisdictions on administrative dimensions including wait times, provisional ballot rejection rates, and mail ballot rejection rates — all of which show systematic racial disparities in national-level analyses. By making this data publicly available and interpretable, MEDSL enables advocacy organizations, journalists, and policymakers to document and challenge administrative inequities.

What these organizations share is a commitment to the principle that the production of equitable political data is not purely a technical problem. It requires organizational choices — about what data to produce, how to make it accessible, and what questions to ask about it — that embed equity values into the analytical infrastructure rather than treating equity as a post-hoc consideration.

39.8 Affirmative Data Practices

"Affirmative data practices" is a term Adaeze uses to describe the set of methodological and organizational commitments that build equity into political analytics rather than simply avoiding the most egregious inequities. The word "affirmative" is deliberate: avoiding discriminatory outcomes requires active effort, not passive neutrality.

The core affirmative data practices for political analytics include:

Disaggregated analysis and reporting: Rather than reporting only top-line polling numbers, affirmative practice requires analysis and reporting of results by race, ethnicity, and other relevant demographic characteristics — with appropriate discussion of sample size limitations and confidence intervals for subgroup estimates.

Oversampling and boosted samples: For minority community research, standard random sampling produces insufficient sample sizes for reliable subgroup analysis. Affirmative practice involves intentional oversampling of minority communities, then weighting the oversample back to population proportions for top-line estimates while preserving the larger sample for subgroup analysis.

Language-appropriate fielding: Surveys fielded only in English systematically exclude significant portions of the Hispanic, Asian American, and other immigrant-origin communities. Affirmative practice requires multi-language fielding for populations with significant non-English language use.

Community partnership: The design of survey instruments and targeting models should involve meaningful input from the communities being studied — not just at the "do our results feel right to you" stage, but at the question design, variable selection, and model validation stages. Community knowledge about what questions are being asked wrongly, what variables are being interpreted incorrectly, and what the data is missing is methodologically valuable as well as ethically required.

Algorithm auditing: For machine learning models used in political targeting, affirmative practice requires regular auditing of model outputs by race — examining whether the model's predictions differ systematically by race in ways that cannot be explained by the genuine predictors the model is intended to measure.

Transparency about limitations: Affirmative practice requires explicit, public documentation of the ways in which data and models may be less reliable for minority communities — including specific warnings in reports about estimates based on thin samples, acknowledgments of differential response rate effects, and honest discussion of where supplementary qualitative research is needed.

39.8.1 Implementing Algorithm Auditing: A Practical Framework

Algorithm auditing for racial equity in political targeting models is conceptually straightforward but operationally demanding. The following framework describes what a basic audit looks like in practice:

Step 1 — Define the protected characteristic. Racial/ethnic group membership is the relevant protected characteristic here. In practice, most voter files do not contain self-reported racial/ethnic data at the individual level (a privacy protection); they contain probabilistic race/ethnicity estimates derived from surname analysis and geographic indicators. The audit should use these estimates, with acknowledgment that they introduce measurement uncertainty.

Step 2 — Compute model predictions by group. Run the model on the full voter universe and record the predicted scores (persuasion score, turnout score, etc.) for each voter. Then compute the distribution of scores separately for each racial/ethnic group. If the model is producing racially equitable predictions, the distribution of scores for Black, Hispanic, and Asian American voters should be similar to the distribution for white voters — adjusting for factors that are genuinely predictive and not themselves racially biased.

Step 3 — Identify disparate impact. Compare the mean and distribution of scores across groups. A model that assigns a mean persuasion score 15 points lower to Black voters than to demographically similar white voters, without a defensible substantive reason for the difference, is exhibiting disparate impact. Document the size of the disparity and the features that most drive it.

Step 4 — Investigate and attribute. Use feature importance analysis and partial dependence plots to identify which model features are driving the racial disparity. If geographic features (zip code, census tract) are the primary driver, that is evidence of the proxy-variable problem described above. If historical turnout is the primary driver, that may be partially defensible — but it also may reflect historical underinvestment that the model is encoding as inherent preference.

Step 5 — Remediate or document. Where the disparity reflects defensible predictive features, document it explicitly and inform decision-makers of the limitation. Where it reflects proxy variables or training data bias, attempt remediation — reweighting training data, removing biased proxies, or constraining model predictions to maintain parity across demographic groups.

✅ Best Practice: Implementing affirmative data practices is not simply a matter of good intentions. It requires specific methodological investments — in multilingual fielding capacity, in oversample budgets, in algorithm auditing infrastructure, in community partnership relationships — that cost money and require organizational commitment. Building the budget case for these investments is part of the equity-centered analyst's work.

39.9 The Surveillance Asymmetry

One dimension of racial inequity in political data that does not fit neatly into the "representation in polls" or "bias in targeting models" frameworks is what Adaeze calls the surveillance asymmetry: the data that political operations collect about racial minority communities is often extensive and detailed in ways that serve strategic aims, while the data that would be needed to serve those communities' interests is limited.

Law enforcement data, criminal justice records, and public benefits program data — sources that disproportionately contain records on Black, Hispanic, and low-income communities — flow into commercial data products that are available to campaign data operations. These records document disadvantage; their availability in commercial data packages means that campaign targeting can use indicators of disadvantage as predictive variables without the knowledge or consent of the people in those records.

Meanwhile, the survey data that would give minority communities adequate voice in political measurement is underfunded and underproduced. Intensive qualitative research in minority communities — the kind that reveals what is politically salient and motivating in specific communities in ways that standard polling cannot — is expensive and rarely conducted by campaigns with limited resources.

The result: campaigns have detailed predictive models of minority voter behavior built partly on records that document disadvantage, while those communities have limited ability to shape the political signals that campaigns receive about their preferences and concerns.

This is the surveillance asymmetry: detailed knowledge about communities in formats that serve strategic control, combined with limited knowledge in formats that would serve political responsiveness.

39.10 Toward Data Justice in Political Analytics

The data justice framework does not offer a simple set of rules that, if followed, will produce racially equitable political analytics. It offers a set of questions and commitments that can guide practice in a more equitable direction.

For pollsters and survey researchers: Design sampling and fielding protocols with affirmative goals for minority community representation. Invest in multilingual capacity. Use boosted samples and publish subgroup results with appropriate confidence intervals. Be transparent about where sample limitations make minority community estimates unreliable. Seek community input on question design.

For campaign analytics professionals: Audit targeting models for differential performance across racial groups. Investigate and correct algorithmic bias before deploying models at scale. Distinguish between accuracy-based targeting and proxy-based racial targeting. Build in explicit equity checks in campaign resource allocation decisions.

For data vendors and platform providers: Document the racial composition of training data for all predictive models. Provide differential performance statistics by demographic group. Develop standards for data product auditing that include racial equity criteria.

For researchers and academics: Build the evidentiary base on differential undercount effects, algorithmic bias in political targeting, and the effectiveness of affirmative data practices. Make this research accessible to practitioners. Engage directly with advocacy organizations and campaigns.

For policymakers: Examine the application of the Voting Rights Act and other civil rights frameworks to private campaign data operations. Develop regulatory standards for political data use that include racial equity criteria. Fund enhanced Census operations and coverage measurement in hard-to-count communities. Expand Section 203 language-access requirements to digital political communications.

These are not separate lists — they overlap, and the most important changes require movement at multiple levels simultaneously. Adaeze's congressional testimony in 2019 was not sufficient to prevent Census undercounting. ODA's affirmative data practices are not sufficient to fix algorithmic bias in commercial targeting models. Individual practitioners' adoption of equity-centered methods is not sufficient to change systemic data infrastructure.

But the accumulated effect of changed practice, sustained advocacy, and developed evidentiary base is how systemic change happens. The alternative — waiting for systemic change before changing practice — is not a neutral choice. It is a choice to continue practicing in ways that reinforce the inequities that systemic change is needed to address.

🔗 Connection: The data justice framework developed in this chapter provides essential context for understanding the AI and automation questions in Chapter 40. Many of the equity concerns that arise with conventional political data practices are magnified, not resolved, by the use of large language models and automated targeting systems. An AI trained on historical political data inherits all the racial biases documented here — and deploys them at scale.

Summary

Political data practices do not operate in a racial vacuum. The Census undercount, differential response rates in polling, algorithmic bias in targeting models, and data-enabled voter suppression operations all generate racially structured inequities that have concrete consequences for representation, resource allocation, and democratic participation. The "data redlining" pattern — official data systems that systematically misrepresent certain communities, producing compounding downstream disadvantage — connects the Census to the targeting model to the campaign strategy in a single structure.

The data justice framework — asking who owns data, who benefits, and who is harmed — provides a richer analytical lens than the conventional accuracy-and-validity frame. Ruha Benjamin's New Jim Code analysis reveals how race-neutral code produces racially consequential outcomes when trained on historically structured data. Joy Buolamwini's measurement accountability principle demands that communities bearing the costs of measurement error have input into measurement system design. Safiya Umoja Noble's concept of representational harm identifies damage that occurs not from inaccurate counting but from reductive representation.

Language access in political data is a specific, tractable form of representational exclusion: English-only polling systematically silences non-English-speaking citizens, producing political intelligence that reflects the English-speaking portion of minority communities rather than the full community.

Affirmative data practices — oversampling, multilingual fielding, algorithm auditing, community partnership, transparency about limitations — are the methodological implementation of data justice commitments. Organizations including the Redistricting Data Hub, Color of Change, and MIT's Election Data and Science Lab demonstrate that equity-centered political data work is institutionally viable, not merely aspirational.

Adaeze Nwosu's testimony before Congress is not remembered primarily as a triumph of statistical modeling. It is remembered — in the circles where it is remembered at all — as an instance of a researcher using her technical credibility to make visible something that was being deliberately obscured: that data decisions are political decisions, and the communities that bear the costs of bad data decisions have a right to participate in making them.

39.11 Toward Equitable Political Analytics: A Practitioner's Framework

Understanding the problem — the Census undercount, differential polling response rates, algorithmic bias, surveillance asymmetry, data-enabled voter suppression — is necessary but not sufficient. What does actually doing this work equitably look like in practice? This section translates the theoretical commitments of data justice into an operational framework that analysts can apply to their own work.

39.11.1 Five Principles of Equitable Data Practice

Adaeze Nwosu developed what she calls the ODA Equity Framework through years of confronting the gap between theoretical commitments and operational reality. The framework is built around five principles that apply across contexts — polling, campaign analytics, civic technology, research — and that generate specific, testable obligations rather than vague aspirations.

Principle 1: Disaggregation. Every analysis that presents aggregate results must also present results disaggregated by race and ethnicity — with appropriate confidence intervals, explicit sample size warnings where cells are thin, and honest discussion of where disaggregated estimates are too uncertain to be actionable. In polling, this means breaking out topline numbers by racial and ethnic group and reporting not just the point estimates but the sample sizes and margins of error for each subgroup. In campaign analytics, it means examining model performance metrics separately for each demographic group rather than reporting only overall accuracy. In civic technology, it means auditing system outputs by race before deployment rather than after harm has been documented.

The practical implication: before any analysis is finalized, build in a disaggregation step as a required workflow component, not an optional add-on. If the sample is insufficient to produce reliable disaggregated estimates, that limitation must be stated explicitly — and the remedy (oversampling, supplementary qualitative research, or honest acknowledgment that the data cannot support minority community conclusions) must be recommended.

Principle 2: Transparency. Equitable data practice requires explicit documentation of methodological limitations that affect minority communities — not buried in technical appendices, but surfaced in the primary deliverable in language accessible to non-technical clients and community stakeholders. Transparency means disclosing where training data underrepresents minority communities, where response rate differentials affect survey accuracy, where model features may function as racial proxies, and where geographic aggregation obscures within-group heterogeneity.

In practice, ODA's standard report template includes a "Data Equity Notes" section that appears immediately after the executive summary. It is not optional. Clients are told at the outset that any ODA deliverable will include honest documentation of what the data can and cannot support for minority community conclusions. Several clients have initially pushed back on this requirement; Adaeze's experience is that clients who insist on removing the equity documentation are clients ODA should not work with.

Principle 3: Community Input. The communities that are the subjects of political data collection — the voters whose behavior is being modeled, the neighborhoods whose turnout is being optimized — have a legitimate stake in how that modeling is done. Community input means involving community organizations, advocates, and affected residents in research design, not just in results communication. It means asking whether the questions being asked, the variables being measured, and the outcomes being optimized are the right ones from the community's perspective.

In polling, community input means convening focus groups with Spanish-speaking respondents before designing a survey instrument for a Hispanic-majority district — not to confirm that your existing questions are adequate, but to discover which questions your existing instrument is missing. In campaign analytics, it means working with community organizations in targeted neighborhoods to understand whether the campaign's GOTV strategy aligns with how those communities understand civic participation. In civic technology, it means partnering with civil rights organizations to define what "success" means for a voter information tool before building it, not after.

Principle 4: Impact Assessment. Before deploying any new data product, model, or analytical workflow that will affect how political resources are distributed or how voters are targeted, conduct an explicit racial equity impact assessment. Ask: who benefits from this deployment? Who might be harmed? What are the disparate effects by race and ethnicity? What safeguards are in place to detect and correct disparate impacts if they emerge?

Impact assessment is not a guarantee against harm — models deployed at scale will produce unexpected effects, and the assessment process cannot anticipate every failure mode. What it does is institutionalize the practice of asking equity questions before deployment rather than after. The question "have we evaluated this for disparate racial impact?" should be a standard item on the pre-launch checklist, with the same weight as "have we tested this for data accuracy?" and "have we obtained required legal approvals?"

Principle 5: Accountability. Equitable data practice requires mechanisms for identifying and correcting disparate impacts after deployment, not just before. Accountability means monitoring model outputs for racial disparities on an ongoing basis, maintaining channels through which affected communities can report concerns, and committing to remediation when harm is identified. It means not treating a pre-deployment impact assessment as a one-time inoculation against accountability, but as the beginning of an ongoing process.

For ODA, accountability is operationalized through quarterly audits of all active targeting models that include disaggregated performance metrics by race and ethnicity. Where disparities are identified, the audit produces a remediation recommendation that must be resolved before the model is redeployed in the next cycle. This process is resource-intensive — it requires time, technical capacity, and organizational will to prioritize even when clients are pressing for faster turnaround. But without accountability mechanisms, the other four principles are incomplete.

39.11.2 Applying the Principles Across Contexts

The five principles translate differently across polling, campaign analytics, and civic technology contexts — but the underlying obligations are consistent.

In polling: Disaggregation requires oversampled minority subgroups and published cross-tabs with appropriate confidence intervals. Transparency requires explicit documentation of response rate differentials and their potential effects on representativeness. Community input requires language-appropriate instrument design with input from community organizations. Impact assessment requires evaluating whether topline results may misrepresent minority community opinion. Accountability requires post-election comparison of survey estimates to actual vote shares by demographic group, with public reporting of where estimates were most off.

In campaign analytics: Disaggregation requires separate model performance metrics for each racial and ethnic group in the targeting universe. Transparency requires documenting which model features function as racial proxies and how. Community input requires working with field staff embedded in minority communities to validate model assumptions about turnout and persuadability. Impact assessment requires evaluating whether targeting models route resources away from minority communities in ways that compound historical underinvestment. Accountability requires monitoring actual contact rates, resource deployment, and electoral outcomes by community.

In civic technology: Disaggregation requires separate evaluation of system outputs (voter registration rates, information access, ballot return rates) by demographic group. Transparency requires publishing audit results and algorithm documentation accessible to non-technical community stakeholders. Community input requires co-design processes that involve affected communities in feature design and prioritization. Impact assessment requires evaluating potential discriminatory effects of system design choices, including default settings, language accessibility, and required documentation. Accountability requires complaint processes that give communities recourse when the technology fails them.

39.11.3 Equity-Centered vs. Extractive Data Practice

The contrast between equity-centered and extractive data practice is sometimes described in terms of intentions — equitable practitioners mean well, extractive ones don't. This framing is unhelpful because it lets well-intentioned extractive practices off the hook. The more useful distinction is structural: what does the practice do for the communities in the data, independent of what the analyst intended?

Extractive data practice collects data from or about communities, uses it to generate strategic insights for external actors (campaigns, donors, media organizations), and returns nothing of value to the communities themselves. It treats community members as data points rather than as stakeholders with legitimate interests in how their information is used. It produces intelligence about communities without accountability to those communities for how that intelligence is applied.

Equity-centered data practice treats communities as partners with legitimate interests in data that concerns them. It produces knowledge that serves community needs — not just as a side effect of serving external clients, but as a primary objective. It maintains accountability mechanisms that give communities recourse when data practices harm them, and it returns value to communities in forms they recognize as meaningful: accessible research, actionable insights, capacity building, and amplified voice in political processes that affect their lives.

The ODA model is explicitly equity-centered: every client engagement includes a community benefit component, and Adaeze regularly turns down client requests that would require analyzing communities in ways that serve those clients' interests at the expense of the communities themselves.

💡 The Spectrum in Practice. Most real political data work falls somewhere between pure extraction and full equity-centeredness. Understanding where a specific engagement falls on the spectrum — and what would be required to move it toward equity — is more useful than sorting organizations into "good" and "bad." A major polling firm that begins publishing disaggregated results and adding oversamples for minority communities is moving toward equity, even if it hasn't achieved the full ODA standard. Acknowledging partial progress while maintaining the aspiration for full equity is both more realistic and more motivating than insisting on perfection.

39.11.4 The ODA Equity Checklist Applied: A Real Scenario

To make the framework concrete, consider how Adaeze's equity checklist applies to a specific scenario: ODA is hired by a progressive advocacy organization to design a voter mobilization model for a mid-sized city with a 38% Black population, a 22% Hispanic/Latino population, and a history of declining Black voter turnout over the past three election cycles.

Disaggregation check: The client initially asks for a single turnout propensity model for all registered voters. ODA insists on separate model evaluation by race/ethnicity, and discovers that the model trained on historical data assigns systematically lower propensity scores to Black voters in two council districts than the field evidence from a previous organizing campaign would predict. Investigation reveals that polling place consolidations in those districts — which reduced turnout in 2018 and 2020 — are being encoded as low inherent propensity rather than structural suppression. ODA adjusts the model to flag these voters as structurally constrained rather than low-propensity.

Transparency check: The client's communications team wants to publish topline turnout projections from the model. ODA's report documentation notes that the model is less reliable for predicting turnout in the two affected council districts given the structural irregularities in recent elections, and recommends supplementary qualitative research with community organizers before finalizing resource allocation for those areas. This limitation appears prominently in the executive summary.

Community input check: Before finalizing the model's variable selection, ODA holds two listening sessions with Black and Latino community organizations active in civic participation work in the city. The sessions reveal that the model is using homeownership as a proxy for civic engagement — a variable that reflects racial wealth gaps rather than civic orientation — and that the communities have specific concerns about the GOTV messaging the campaign is planning, which does not resonate with issues their members identify as salient.

Impact assessment check: ODA evaluates the proposed resource allocation generated by the model and finds that it underweights outreach in two predominantly Black neighborhoods relative to the organizing potential documented in the community listening sessions. The model's historical data reflects underinvestment in those neighborhoods by previous campaigns; allocating based on the model alone would perpetuate that pattern. ODA recommends a hybrid allocation that overrides the model in the underinvested areas based on community-documented organizing capacity.

Accountability check: ODA builds into the contract a mid-campaign audit checkpoint: after six weeks of model-guided field operations, ODA will analyze contact rates, returned ballot rates, and early vote share by demographic group to assess whether the model-guided allocation is producing equitable results. The audit also includes a structured feedback session with the community organizations who participated in the listening sessions.

39.11.5 Self-Assessment Rubric for Analysts

The following rubric allows practitioners to evaluate where their own data practices fall on the equity spectrum. It is not designed to produce a score or a ranking — it is designed to identify specific areas where practice can improve.

Practice Area	Extractive	Developing	Equity-Centered
Disaggregation	Topline results only; no subgroup analysis	Subgroup analysis available but not systematically reported	Disaggregated results by race/ethnicity required in all deliverables, with sample size documentation
Transparency	Methodological limitations not disclosed to clients	Limitations disclosed in technical appendices	Equity limitations surfaced prominently in primary deliverable
Community input	No community input at any stage	Community review of results after completion	Community partnership in research design, instrument development, and model validation
Impact assessment	No racial equity review before deployment	Ad hoc equity review when concerns arise	Systematic pre-deployment equity impact assessment as required workflow step
Accountability	No post-deployment monitoring	Post-hoc review when harm is documented	Ongoing monitoring with disaggregated metrics and community feedback channels
Data ownership	All data owned by client/analyst; no return to community	Summary results shared with community on request	Active data sharing with community organizations in accessible formats
Benefit orientation	Community as data source for external clients only	Community benefit as secondary consideration	Community benefit as primary co-equal objective

An honest self-assessment using this rubric is the beginning of equity-centered practice, not the end of it. Most practitioners working in political analytics will find that they are equity-centered in some areas and extractive in others — and that the path toward fuller equity requires both individual commitment and organizational support. The rubric is most useful not as a basis for self-congratulation but as a map of where the next improvement is possible.

✅ Best Practice: Build the equity checklist into your project initiation template, not your project close-out template. Equity considerations that are identified before a project begins can shape the design. Equity considerations identified after a project is complete can only generate regret. The organizational discipline of making equity review a first-step requirement — not a final-step option — is the most practical thing a practitioner can do to move their own work toward equity-centered standards.

Chapter 40 examines AI and automation in political analytics — large language models, synthetic media, automated polling, and the democratic implications of a rapidly shifting technological landscape.