Exercises: Bias in Data, Bias in Machines

DataField.Dev

Exercises: Bias in Data, Bias in Machines

These exercises progress from concept checks to challenging applications, including Python coding challenges. Estimated completion time: 4-5 hours.

Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)

Python Note: Exercises marked with [PYTHON] require coding. You will need Python 3.7+ and the pandas library.

Part A: Conceptual Understanding ⭐

Test your grasp of core concepts from Chapter 14.

A.1. Section 14.1.1 defines algorithmic bias as systematic, disadvantaging, and unjustified. Explain why each of these three elements is necessary. What would happen if you removed any one element from the definition? Provide an example for each element that illustrates its importance.

A.2. The chapter identifies six types of bias: historical, representation, measurement, aggregation, evaluation, and deployment bias (Section 14.2.1). For each of the following scenarios, identify the primary type of bias present and explain your reasoning:

(a) A language translation model trained primarily on European-language text performs poorly on African languages.
(b) A clinical trial for a new drug is conducted exclusively with male participants, but the drug is prescribed to both men and women.
(c) A hiring algorithm trained on a company's historical hiring data recommends fewer women for engineering roles.
(d) A credit-scoring model uses zip code as a feature, which correlates with race due to residential segregation.
(e) A facial recognition system is tested on a benchmark dataset that is 85% white faces and reports 97% overall accuracy.
(f) A student dropout prediction model trained on data from a large urban university is deployed at a small rural college.

A.3. Explain the concept of a proxy variable as described in Section 14.3.1. Why is removing a protected attribute (e.g., race, gender) from a model's features insufficient to prevent bias? What is "redundant encoding," and why does it make bias resilient to simple feature removal?

A.4. The chapter argues that "the myth of algorithmic objectivity" rests on a "category error" (Section 14.1.2). Reconstruct the flawed syllogism presented in the text and explain precisely where the logical error occurs. Then rewrite the syllogism to reflect what actually happens when algorithms process data.

A.5. Section 14.1.3 distinguishes between bias, discrimination, and unfairness. Construct a scenario in which an algorithmic system exhibits bias but not discrimination (the bias does not track protected characteristics). Then construct a scenario in which a system is discriminatory without anyone intending it.

A.6. Explain the concept of a feedback loop as described in Section 14.8. Using the predictive policing example, trace how an initial bias in the data can be amplified through subsequent cycles of prediction, action, and data generation. Why are feedback loops particularly dangerous in algorithmic systems?

A.7. Section 14.7.1 describes the four-fifths rule (80% rule) for disparate impact. A company's hiring algorithm approves 55% of male applicants and 38% of female applicants. Calculate the disparate impact ratio and determine whether the system triggers the four-fifths threshold. Show your work.

Part B: Applied Analysis ⭐⭐

Analyze scenarios, arguments, and real-world situations using concepts from Chapter 14.

B.1. Consider the following scenario:

A large health insurance company develops an algorithm to identify members who would benefit from a diabetes prevention program. The program has limited capacity (5,000 slots per year). The algorithm uses claims data, prescription history, BMI, age, and geographic region to predict which members are most likely to develop Type 2 diabetes. The company reports that the algorithm has 85% accuracy overall.

Using the bias pipeline framework from Section 14.3, analyze this system at each of the six stages (problem formulation, data collection, feature engineering, model training, evaluation, deployment). At each stage, identify at least one potential source of bias and explain how it could lead to disparate impact across racial or socioeconomic groups.

B.2. The Amazon hiring algorithm case (Section 14.6) reveals that removing the explicit gender feature did not eliminate gender bias. Explain why this happened, and connect the explanation to the concepts of proxy variables, redundant encoding, and historical bias. What would a more effective approach to bias mitigation look like in this context?

B.3. Mira discovers that VitraMed's patient risk model uses healthcare utilization as a dominant feature (Section 14.5.4). She connects this to the Obermeyer et al. (2019) finding that healthcare spending is a biased proxy for health need. If you were advising Mira, what specific steps would you recommend she take to (a) verify whether the bias exists in VitraMed's model, (b) quantify its magnitude, and (c) propose alternatives? Be specific about what data she would need and what analyses she would run.

B.4. A university uses an algorithm to predict which first-year students are at risk of dropping out. The algorithm's features include: high school GPA, SAT/ACT scores, first-generation college student status, financial aid amount, and the number of times the student swiped their meal card in the first two weeks. Analyze each feature for potential bias. Which features might serve as proxies for race or socioeconomic status? Which might introduce measurement bias? What would you change?

B.5. The chapter presents three frameworks for understanding the COMPAS case (Section 14.4.4): the technical failure reading (the algorithm needs better calibration), the structural reading (the algorithm encodes the consequences of systemic racism), and the accountability reading (no single actor is responsible). Write a paragraph defending each reading. Which do you find most compelling, and why?

B.6. [PYTHON] Using the BiasAuditor class from Section 14.7, create a dataset that represents a hiring scenario where 100 applicants from Group A and 80 applicants from Group B apply for positions. Set selection rates such that the system would not trigger the four-fifths rule. Then modify the rates so that it would trigger. Print both audit reports and explain the difference.

from dataclasses import dataclass, field
from typing import Any
import pandas as pd

# Use the BiasAuditor class from Section 14.7.2
# Create your test data and run the audit for both scenarios.

Part C: Python Coding Challenges ⭐⭐-⭐⭐⭐

These exercises require you to write and run Python code. All exercises build on or extend the BiasAuditor class from Section 14.7.

C.1. ⭐⭐ [PYTHON] Selection Rate Calculator. Write a function called calculate_selection_rates that takes two lists — predictions (list of 0s and 1s) and groups (list of group labels) — and returns a dictionary mapping each group to its selection rate. Do not use the BiasAuditor class; write the logic from scratch. Test it with the following data:

predictions = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0]
groups = ["A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
          "B", "B", "B", "B", "B", "B", "B", "B", "B", "B"]

Expected output: Group A selection rate = 0.60, Group B selection rate = 0.40.

C.2. ⭐⭐ [PYTHON] Extending BiasAuditor: Multiple Groups. The BiasAuditor class in the chapter computes the disparate impact ratio between the most-selected and least-selected groups. Extend the class (or write a standalone function) to compute the disparate impact ratio for every pair of groups when there are three or more groups. Test with the following data:

predictions = ([1]*60 + [0]*40 +   # Group A: 60% selected
               [1]*45 + [0]*55 +   # Group B: 45% selected
               [1]*30 + [0]*70)    # Group C: 30% selected
groups = (["A"]*100 + ["B"]*100 + ["C"]*100)

Your output should show all pairwise disparate impact ratios (A vs. B, A vs. C, B vs. C) and flag any pair that falls below 0.8.

C.3. ⭐⭐⭐ [PYTHON] Simulating a Feedback Loop. Write a Python simulation that demonstrates how a biased policing algorithm creates a feedback loop. Your simulation should:

Initialize two neighborhoods, A and B, each with 1000 residents and an equal true crime rate of 5%.
In Round 1, due to historical over-policing, Neighborhood A has 50% more police patrols than Neighborhood B.
More patrols lead to more detected crimes (not more actual crimes). The detection rate is proportional to patrol intensity: if patrol intensity is 1.5x, detection rate is 1.5x.
The algorithm updates its prediction of "high-crime areas" based on detected crime rates.
Patrol allocation for the next round is based on the algorithm's prediction.
Run the simulation for 10 rounds and plot (or print) how the detected crime rates diverge between the two neighborhoods, even though the actual crime rates remain equal.

Print the actual vs. detected crime rates for each neighborhood at each round. Write a paragraph interpreting the results.

C.4. ⭐⭐⭐ [PYTHON] Intersectional Bias Analysis. The chapter discusses intersectionality in Section 14.9 — the idea that bias against Black women may be worse than the sum of bias against Black people and bias against women separately. Write a Python function that takes predictions, actual outcomes, and two protected attributes (e.g., race and gender) and computes selection rates for all intersectional subgroups (e.g., Black women, Black men, white women, white men). Test with the following data:

import random
random.seed(42)

n = 400  # 100 per intersectional group
groups_race = (["Black"]*200 + ["White"]*200)
groups_gender = (["Female"]*100 + ["Male"]*100) * 2

# Simulate biased predictions
# White Male: 75% selected, White Female: 65%, Black Male: 55%, Black Female: 35%
predictions = []
for race, gender in zip(groups_race, groups_gender):
    if race == "White" and gender == "Male":
        predictions.append(1 if random.random() < 0.75 else 0)
    elif race == "White" and gender == "Female":
        predictions.append(1 if random.random() < 0.65 else 0)
    elif race == "Black" and gender == "Male":
        predictions.append(1 if random.random() < 0.55 else 0)
    else:  # Black Female
        predictions.append(1 if random.random() < 0.35 else 0)

Compute and display: (a) overall selection rate, (b) selection rates by race only, (c) selection rates by gender only, (d) selection rates by intersectional group (race x gender). Then compute the disparate impact ratio between the most-selected and least-selected intersectional groups. Write a paragraph explaining why single-axis analysis (race alone or gender alone) would miss the severity of the bias.

C.5. ⭐⭐⭐ [PYTHON] BiasAuditor Visualization. Extend the BiasAuditor to produce a text-based bar chart (using characters like # or |) that visually represents the selection rates for each group. The visualization should clearly show when the four-fifths threshold is violated. Test with a scenario of your choosing that includes at least three groups.

Part D: Synthesis & Critical Thinking ⭐⭐⭐

These questions require you to integrate multiple concepts from Chapter 14 and think beyond the material presented.

D.1. The chapter argues that "the most dangerous biases are structural — embedded in the data, the features, and the optimization objectives by the same social forces that produce inequality in the first place" (Section 14.1.1). Some critics respond that if algorithms simply reflect social reality, then the problem is society, not algorithms, and fixing algorithms is treating a symptom rather than a cause. Evaluate this argument. Is it valid? Is it complete? What does the chapter's analysis of feedback loops (Section 14.8) add to this debate?

D.2. The Obermeyer healthcare study (Section 14.5) found that using healthcare spending as a proxy for health need systematically disadvantaged Black patients. The chapter notes that this proxy was chosen because it was "readily available, continuously updated, predictive of future costs, and operationally useful." Imagine you are the data scientist who built this system. Write a reflective essay (300-500 words) in which you explain: How might a well-intentioned data scientist have arrived at this proxy? At what point should they have recognized the problem? What institutional conditions would have helped them catch it earlier?

D.3. Section 14.9 discusses intersectional bias. Crenshaw's concept of intersectionality holds that the experience of a Black woman is not simply the sum of "being Black" + "being female" — it is a distinct experience that can involve unique forms of discrimination. Apply this concept to algorithmic systems. Why might a facial recognition system have higher error rates for dark-skinned women specifically, even if it performs reasonably well for dark-skinned men and for light-skinned women? What does this imply about how bias audits should be structured?

D.4. Dr. Adeyemi asks: "The problem only shows up when you look for it. And nobody was looking. Ask yourself: why wasn't anyone looking?" (Section 14.5.4). Write a response to this question that draws on at least three concepts from the chapter. Consider: organizational incentives, evaluation bias, the composition of data science teams, and the distribution of power between algorithm designers and the populations affected.

Part E: Research & Extension ⭐⭐⭐⭐

These are open-ended projects for students seeking deeper engagement. Each requires independent research beyond the textbook.

E.1. [PYTHON] Replicating ProPublica's Analysis. ProPublica released the dataset used in their COMPAS investigation. Download the dataset (available at ProPublica's GitHub repository). Using Python (pandas), replicate the core finding: calculate the false positive rates and false negative rates for Black and white defendants. Compare your results to those reported in the chapter. Write a 1,000-word report discussing your methodology, findings, and reflections.

E.2. The Gender Shades Study. Research Buolamwini and Gebru's "Gender Shades" study (2018) in depth. Write a 1,000-word analysis covering: (a) what the study measured and how, (b) what it found across three commercial facial recognition systems, (c) why the intersectional methodology was essential, (d) how the companies responded, and (e) what changes occurred in the field as a result. Use at least three sources beyond this textbook.

E.3. Bias in Your Own Domain. Choose a professional domain you are interested in or studying (healthcare, education, criminal justice, marketing, HR, etc.). Research a specific documented case of algorithmic bias in that domain. Write a 1,200-word case analysis structured around the bias pipeline: trace how bias entered at each stage of the system's development. Propose three specific interventions and evaluate their feasibility.

Solutions

Selected solutions are available in appendices/answers-to-selected.md.