Exercises: Fairness — Definitions, Tensions, and Trade-offs
These exercises progress from concept checks to challenging applications, including Python coding challenges that use the FairnessCalculator class. Estimated completion time: 4-5 hours.
Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)
Python Note: Exercises marked with [PYTHON] require coding. You will need Python 3.7+ and the pandas library.
Part A: Conceptual Understanding ⭐
Test your grasp of core concepts from Chapter 15.
A.1. Section 15.1.1 opens with a puzzle: ProPublica said COMPAS was unfair, and Northpointe said COMPAS was fair — and both were right. Explain how this is possible, identifying which fairness definition each side was using and what each definition measures.
A.2. Define demographic parity (Section 15.2). Then explain the moral intuition it captures and identify its most significant limitation. In what scenario might demographic parity produce harmful outcomes for the group it appears to protect?
A.3. Define equalized odds (Section 15.3). Explain the difference between full equalized odds (equalizing both TPR and FPR) and the relaxed version called equal opportunity (equalizing only TPR). Why might someone prefer one version over the other? In what context would equalizing FPR matter more than equalizing TPR?
A.4. Define calibration (Section 15.4). Explain the moral intuition it captures. Then explain why a perfectly calibrated system can still produce racially disparate outcomes that many would consider unfair.
A.5. State the impossibility theorem (Section 15.6) in your own words. What does it prove? Under what condition does the theorem not apply? Why is this condition rarely met in practice?
A.6. Explain the distinction between group fairness and individual fairness (Section 15.5). Why does the chapter argue that individual fairness does not "escape politics" — that it merely "hides them in the definition of similarity"?
A.7. Dr. Adeyemi says: "Obvious — until you try to define what 'equally' means" (Section 15.1.1). Using the four fairness definitions from this chapter, write four different sentences that each complete the phrase "A fair algorithm treats everyone equally by..." Show that each sentence implies a different — and potentially conflicting — standard.
Part B: Applied Analysis ⭐⭐
Analyze scenarios, arguments, and real-world situations using concepts from Chapter 15.
B.1. Consider a loan approval algorithm that operates on two groups:
| Group A | Group B | |
|---|---|---|
| Applicants | 1,000 | 1,000 |
| Approved | 600 (60%) | 400 (40%) |
| Actually repaid (among approved) | 540 (90%) | 360 (90%) |
| Actually would have repaid (among denied) | 200 (50%) | 300 (50%) |
- (a) Does this system satisfy demographic parity? Calculate the selection rates and assess.
- (b) Is this system calibrated? Among those approved, is the repayment rate the same across groups?
- (c) Is this system fair? Construct arguments from at least two different fairness perspectives.
B.2. A hospital has 100 slots in an intensive cardiac monitoring program. An algorithm assigns risk scores to patients. The data shows:
| Group A (n=500) | Group B (n=500) | |
|---|---|---|
| Base rate (actually develop cardiac event) | 20% | 10% |
| Predicted positive (flagged for program) | 80 | 20 |
| True positives | 60 | 15 |
| False positives | 20 | 5 |
| False negatives | 40 | 35 |
Calculate the TPR, FPR, and PPV for each group. Then determine whether the system satisfies: (a) demographic parity, (b) equalized odds, (c) calibration. Which fairness definition, if any, is satisfied? Which is violated most severely?
B.3. A school district uses an algorithm to identify students "at risk of academic failure" for early intervention tutoring. The algorithm achieves demographic parity — the same percentage of students in each racial group is identified for intervention. A parent objects: "My child was placed in intervention even though her grades are good. This is because the algorithm is filling a quota." Apply the concepts from this chapter to evaluate this objection. Is the parent's concern legitimate? What does it reveal about the tension between demographic parity and individual fairness?
B.4. Mira proposes that VitraMed's patient risk model should satisfy equalized odds — specifically, that the true positive rate (correctly identifying patients who need intervention) should be equal across racial groups. A colleague objects: "If we equalize TPR, we'll have to lower the threshold for one group, which means more false positives in that group — more patients flagged who don't actually need the program. That wastes resources." Evaluate this objection using the fairness trade-off framework from Section 15.6. Is the colleague right? What values are in tension?
B.5. The chapter argues that "choosing a fairness metric is a political and ethical decision, not merely a technical one" (Section 15.6.2). A technology company responds: "We let stakeholders choose the fairness metric — we just implement whatever they decide." Evaluate this response. Does delegating the fairness decision to "stakeholders" resolve the political problem? Who are the relevant stakeholders? Whose voice is most likely to be absent?
Part C: Python Coding Challenges ⭐⭐-⭐⭐⭐
These exercises require you to write and run Python code using the FairnessCalculator class from Section 15.7.
C.1. ⭐⭐ [PYTHON] Demonstrating the Impossibility Theorem. Create a dataset with two groups (A and B) that have different base rates. Use the FairnessCalculator to show that the system can satisfy calibration but violates equalized odds. Then modify the predictions to satisfy equalized odds and show that calibration is now violated. Print both reports and write a paragraph explaining why both cannot be satisfied simultaneously.
# Suggested setup:
# Group A: 200 individuals, 40% base rate (80 positive, 120 negative)
# Group B: 200 individuals, 20% base rate (40 positive, 160 negative)
# Create predictions that achieve calibration, then adjust for equalized odds.
C.2. ⭐⭐ [PYTHON] Confusion Matrix Builder. Write a function build_confusion_matrix(predictions, actuals, group_label) that takes lists of predictions, actual outcomes, and a group label, and prints a formatted confusion matrix with TP, FP, TN, FN counts plus TPR, FPR, and PPV. Test it with the COMPAS-like data from the chapter:
# Group A (Black defendants): n=3175
# Base rate: ~51%
# FPR: ~45%, FNR: ~28%
# Group B (White defendants): n=2103
# Base rate: ~39%
# FPR: ~24%, FNR: ~48%
# Construct synthetic data consistent with these approximate rates.
C.3. ⭐⭐⭐ [PYTHON] Threshold Explorer. A risk scoring system assigns scores from 0 to 100 to each individual. The threshold determines who is classified as positive. Write a Python program that:
- Generates synthetic risk scores for two groups with different base rates (Group A: 35% positive, Group B: 15% positive)
- For each threshold from 10 to 90 (in steps of 10), calculates: selection rate, TPR, FPR, and PPV for each group
- Identifies which threshold (if any) achieves approximate demographic parity (selection rate difference < 0.05)
- Identifies which threshold (if any) achieves approximate equalized odds (TPR and FPR difference < 0.05)
- Identifies which threshold (if any) achieves approximate calibration (PPV difference < 0.05)
- Reports whether any single threshold satisfies all three
import random
random.seed(42)
# Generate scores: higher scores for positive cases, with noise
def generate_scores(n, positive_rate, positive_mean=65, negative_mean=35, std=15):
"""Generate risk scores for a group."""
scores, actuals = [], []
for _ in range(n):
is_positive = random.random() < positive_rate
actuals.append(1 if is_positive else 0)
mean = positive_mean if is_positive else negative_mean
score = max(0, min(100, random.gauss(mean, std)))
scores.append(score)
return scores, actuals
# Group A: 500 people, 35% base rate
# Group B: 500 people, 15% base rate
# Your code here...
C.4. ⭐⭐⭐ [PYTHON] Fairness Dashboard. Extend the FairnessCalculator class to include a dashboard() method that prints a comprehensive, human-readable report covering:
- Per-group confusion matrix
- Per-group metrics (base rate, selection rate, TPR, FPR, PPV)
- Demographic parity assessment
- Equalized odds assessment
- Calibration assessment
- A final summary stating which criteria are satisfied and which are violated
Format the output so that a non-technical stakeholder (e.g., a hospital administrator or a judge) could understand the results. Test with a dataset of your choosing.
C.5. ⭐⭐⭐ [PYTHON] Fairness-Accuracy Trade-off. Write a program that demonstrates the trade-off between overall accuracy and group fairness. Start with predictions that maximize overall accuracy for a dataset with two groups and different base rates. Then iteratively adjust predictions to move toward demographic parity. At each step, record the overall accuracy and the demographic parity difference. Print a table or text chart showing how accuracy decreases as demographic parity improves. Write a paragraph interpreting the trade-off.
Part D: Synthesis & Critical Thinking ⭐⭐⭐
These questions require you to integrate multiple concepts from Chapter 15 and think beyond the material presented.
D.1. The impossibility theorem proves that multiple fairness definitions cannot be simultaneously satisfied when base rates differ. Some scholars respond: "Then the solution is to equalize base rates — address the structural causes of differential outcomes, so that the impossibility constraint relaxes." Evaluate this response. Is equalizing base rates a realistic goal? What would it require in criminal justice? In healthcare? In hiring? Is it fair to hold an algorithm to a fairness standard that depends on solving social inequality first?
D.2. The chapter presents fairness as a choice among competing values. But in practice, who makes this choice? Consider the following stakeholders in a criminal justice risk assessment: the algorithm's developers, the deploying jurisdiction, the judge, the defendant, the victim, the community. Each might prefer a different fairness definition. Write a short essay (300-500 words) proposing a process for making the fairness choice — one that is transparent, inclusive, and accountable.
D.3. An AI company markets a "fair ML" product that claims to satisfy all fairness criteria. Based on the impossibility theorem, evaluate this claim. Under what (very limited) conditions could it be true? What questions would you ask the company to assess whether their claim is legitimate or misleading?
D.4. The chapter notes that fairness definitions can be applied to different types of errors. In healthcare, equalizing false negatives (ensuring sick patients from all groups are equally likely to be identified) prioritizes the most vulnerable — sick people who need care. In criminal justice, equalizing false positives (ensuring innocent people from all groups are equally unlikely to be falsely flagged) prioritizes the least deserving of harm — innocent people who do not deserve detention. Write a paragraph explaining why the choice of which error to equalize is a moral question, not a technical one. How should this choice be made, and by whom?
Part E: Research & Extension ⭐⭐⭐⭐
These are open-ended projects for students seeking deeper engagement.
E.1. [PYTHON] The COMPAS Fairness Analysis. Download the ProPublica COMPAS dataset (available on GitHub). Using the FairnessCalculator class (or your own code), compute: (a) demographic parity, (b) equalized odds, and (c) calibration for the system, disaggregated by race. Confirm the chapter's claim that the system satisfies calibration but violates equalized odds. Write a 1,000-word report with your code, results, and interpretation.
E.2. Fairness in Your Domain. Choose a domain (healthcare, education, hiring, lending, criminal justice) and research how fairness definitions have been applied in practice. Identify at least one real-world case where different stakeholders preferred different fairness definitions. Write a 1,200-word analysis explaining: What definitions were in play? Who advocated for each? What was the outcome? Was the process for choosing a definition transparent?
E.3. The Philosophy of Fairness. The chapter connects fairness definitions to political philosophy: demographic parity maps to equality of outcome, equalized odds maps to equality of process, calibration maps to meritocracy. Research one philosophical tradition (Rawlsian justice, libertarianism, capabilities approach, or Ubuntu philosophy) and write a 1,000-word essay arguing which fairness definition that tradition would endorse, and why.
Solutions
Selected solutions are available in appendices/answers-to-selected.md.