Chapter 39 Exercises: Race, Representation, and Data Justice

DataField.Dev

Chapter 39 Exercises: Race, Representation, and Data Justice

Exercise 39.1 — Census Undercount Impact Analysis (Individual, 60 minutes)

Using publicly available data from the Census Bureau's Post-Enumeration Survey (available at census.gov), complete the following:

Part A: Identify the five states with the largest estimated net undercount for Hispanic residents and the five states with the largest estimated net undercount for Black residents in the 2020 Census. Are there any states that appear on both lists?

Part B: For each of those states, look up the current congressional delegation and determine whether any congressional districts are designated as majority-minority districts (districts where a racial or ethnic minority group constitutes a majority of the voting-age population).

Part C: Write a 400-word analysis explaining how the undercount in those specific states could affect: (a) congressional apportionment; (b) the drawing of majority-minority districts; (c) federal funding distributions. Be specific about the mechanisms of each effect.

Part D: Research one federal program with a funding formula based directly on Census counts. Describe the formula and estimate the per-person impact of a 5% undercount in a state where the affected population constitutes 15% of the total.

Exercise 39.2 — Polling Methodology Audit (Pairs, 45 minutes)

Locate two publicly released polls from the same state and the same 30-day period from a recent election cycle. The polls should ideally be on the same race or ballot measure. Try to find polls with different levels of methodological transparency.

For each poll, answer the following questions if the information is available:

What was the racial/ethnic composition of the sample?
What weighting variables were used?
Were results reported separately for any racial/ethnic subgroups?
What was the margin of error for any reported subgroup results?
Was the survey fielded in languages other than English?
What was the overall response rate, and was a differential response rate by subgroup reported?

Compare the two polls on these dimensions. Which is more methodologically transparent? Which is likely to produce more reliable estimates for minority subgroups? Write a 300-word memo addressed to a campaign analytics director recommending which poll's results they should weight more heavily in their decision-making, and why.

Exercise 39.3 — Algorithmic Bias Detection Exercise (Individual, 75 minutes)

This exercise uses a simplified simulation. Using a dataset of your choosing (or a provided synthetic dataset), build a simple logistic regression model predicting voter turnout. Your model should use at least five predictor variables drawn from a voter file analog.

Part A: Train the model on the full dataset and report overall accuracy.

Part B: Report the model's accuracy separately for: (a) white voters; (b) Black voters; (c) Hispanic voters; (d) other racial/ethnic categories.

Part C: If there are accuracy differences across racial groups, investigate potential causes: - Are certain variables missing or less complete for some groups? - Are certain variables functioning as racial proxies? - Is the training data balanced across racial groups?

Part D: Apply at least one bias mitigation technique (e.g., reweighting training data, removing identified proxy variables, using a fairness-aware modeling approach) and report whether it improves differential accuracy.

Part E: Write a 400-word "algorithm audit report" suitable for sharing with a client, explaining what you found and what the implications are for using this model in a targeting campaign.

Exercise 39.4 — Data Justice Framework Application (Individual, 45 minutes)

Apply the three questions Adaeze uses in her work — (1) whose data, and consent?; (2) who benefits?; (3) how are accuracy limitations being handled? — to the following three scenarios:

Scenario A: A campaign analytics firm builds a persuasion model using voter file data merged with consumer purchase data from a commercial broker. The model is used to identify and contact likely persuadable voters in a Senate race.

Scenario B: A university researcher, with IRB approval, conducts a survey of 1,200 registered voters in three counties with high Hispanic populations. The survey is fielded in English only and asks about support for a bilingual education ballot measure. The researcher publishes results for the full sample but notes that Hispanic subgroup results are based on n=180 respondents.

Scenario C: A civic technology nonprofit builds a voter registration assistance tool that uses address data to identify likely eligible but unregistered people in a city. The tool uses commercial address data that was originally collected for direct mail marketing.

For each scenario, write 200 words applying the three data justice questions. Identify the most significant equity concern and propose one affirmative practice that would address it.

Exercise 39.5 — Community Partnership Design (Groups of 3–4, 90 minutes)

Your team has been hired to conduct a poll of registered voters in a congressional district that is 38 percent Black, 22 percent Hispanic, 8 percent Asian American, and 32 percent white. The district includes both urban neighborhoods and small rural communities. Your client wants to understand opinion on three ballot propositions and on the incumbent congressmember's favorability.

Design a polling methodology that incorporates affirmative data practices. Your design should address:

Sample design: What is your target total sample size? What is your target sample size for each racial/ethnic group, and how do you justify those targets?
Fielding approach: What modes will you use (phone, online, mail, in-person)? Why?
Language accessibility: Which languages will you offer the survey in? How will you ensure translation quality?
Community input: How and when will you seek community input on the questionnaire design?
Reporting: How will you report subgroup results? What caveats will you include for thin-cell estimates?
Budget implications: Estimate the additional cost of your affirmative practices compared to a standard approach.

Present your methodology in a 10-minute team presentation to the class, including your justification for each design choice.

Exercise 39.6 — Critical Reading: Ruha Benjamin (Individual, 60 minutes)

Read at least one chapter from Ruha Benjamin's Race After Technology (2019) or a substantive review/excerpt of the book available online.

Write a 600-word critical application essay responding to this prompt: How does Benjamin's concept of the "New Jim Code" apply to political targeting analytics specifically? Identify two concrete practices in campaign data operations that fit her framework, and identify one aspect of political analytics that you believe her framework does not adequately account for.

Your essay should demonstrate engagement with Benjamin's specific arguments, not just a general paraphrase of "algorithms can be biased."

Exercise 39.7 — Surveillance Asymmetry Mapping (Individual, 30 minutes)

In section 39.8, the chapter describes the "surveillance asymmetry" — detailed data about minority communities in formats that serve strategic control, combined with limited data in formats that would serve political responsiveness.

Create a two-column table. In the left column, list five types of data that are commonly available about minority communities in commercial data packages used by campaigns. In the right column, list five types of information about minority community political interests and concerns that are rarely captured in standard political data systems.

Then write a 300-word reflection: Who controls the flow of each type of information? Who benefits from the asymmetry? What would it take to rebalance it?