Quiz: Fairness — Definitions, Tensions, and Trade-offs

DataField.Dev

Quiz: Fairness — Definitions, Tensions, and Trade-offs

Test your understanding before moving to the next chapter. Target: 70% or higher to proceed.

Section 1: Multiple Choice (1 point each)

1. Demographic parity requires that:

A) The algorithm's overall accuracy be the same across groups.
B) The selection rate (proportion predicted positive) be equal across groups.
C) The false positive rate be equal across groups.
D) Individuals with similar characteristics receive similar predictions.

Answer

**B)** The selection rate (proportion predicted positive) be equal across groups. *Explanation:* Section 15.2.1 defines demographic parity (statistical parity) as requiring equal selection rates across groups: P(Predicted Positive | Group A) = P(Predicted Positive | Group B). Demographic parity focuses on outcomes — the same proportion of each group should be selected — without considering whether the selections are *correct*. Options A, C, and D describe different fairness criteria (accuracy parity, equalized odds, and individual fairness, respectively).

2. Which fairness definition was ProPublica implicitly using when it argued that COMPAS was unfair?

A) Demographic parity — the selection rates were different across racial groups.
B) Equalized odds — the false positive and false negative rates were different across racial groups.
C) Calibration — defendants with the same score had different recidivism rates across groups.
D) Individual fairness — similar defendants received different scores.

Answer

**B)** Equalized odds — the false positive and false negative rates were different across racial groups. *Explanation:* Section 15.1.1 explains that ProPublica's analysis focused on error rate disparities: Black defendants who did not reoffend were nearly twice as likely to be falsely flagged as high-risk (different FPR), and white defendants who did reoffend were more likely to be falsely labeled low-risk (different FNR). This corresponds to a violation of equalized odds, which requires equal TPR and FPR across groups.

3. Which fairness definition was Northpointe using when it argued that COMPAS was fair?

A) Demographic parity
B) Equalized odds
C) Calibration (predictive parity) — defendants with the same score had similar recidivism rates regardless of race.
D) Individual fairness

Answer

**C)** Calibration (predictive parity) — defendants with the same score had similar recidivism rates regardless of race. *Explanation:* Section 15.4.2 explains that Northpointe defended COMPAS by arguing it was calibrated: among defendants who received the same risk score, the actual recidivism rates were similar across racial groups. A score of "7" meant approximately the same probability of reoffending for Black and white defendants. This is the definition of calibration.

4. The impossibility theorem (Chouldechova, 2017; Kleinberg et al., 2016) proves that:

A) All algorithmic systems are inherently unfair and should not be used.
B) When base rates differ across groups, a classifier cannot simultaneously achieve equal false positive rates, equal false negative rates, and equal positive predictive values.
C) Fairness can always be achieved by using enough training data.
D) Calibration is always superior to equalized odds as a fairness metric.

Answer

**B)** When base rates differ across groups, a classifier cannot simultaneously achieve equal false positive rates, equal false negative rates, and equal positive predictive values. *Explanation:* Section 15.6.1 presents the impossibility theorem: when base rates differ across groups, calibration and equalized odds are mathematically incompatible. This means that every fairness-constrained system must *choose* which definition to prioritize — a choice that is political and ethical, not technical. Option A overstates the implication; Option C is incorrect; Option D imposes a value judgment the theorem does not support.

5. Individual fairness, as proposed by Dwork et al. (2012), requires that:

A) Every individual receives the same prediction regardless of group membership.
B) Similar individuals, according to a task-relevant similarity metric, receive similar predictions.
C) The algorithm treats all groups as if they have the same base rate.
D) No demographic information is used in the model's features.

Answer

**B)** Similar individuals, according to a task-relevant similarity metric, receive similar predictions. *Explanation:* Section 15.5.1 defines individual fairness as requiring that individuals who are "similar" by some task-relevant metric receive similar treatment. The appeal is obvious — equals should be treated equally — but the challenge lies in defining "similar," which is itself a value-laden choice. Option A describes demographic blindness, not individual fairness. Option C describes base rate adjustment. Option D describes fairness through unawareness.

6. A key limitation of demographic parity is that:

A) It is too difficult to compute mathematically.
B) It can be achieved by a completely random system (e.g., a coin flip), which demonstrates that equal selection rates do not guarantee useful or accurate predictions.
C) It requires access to proprietary algorithms.
D) It is incompatible with all other fairness metrics under all conditions.

Answer

**B)** It can be achieved by a completely random system (e.g., a coin flip), which demonstrates that equal selection rates do not guarantee useful or accurate predictions. *Explanation:* Section 15.2.4 identifies this as a key limitation: demographic parity asks only about *outcomes* (selection rates), not about whether those outcomes are *correct*. A coin flip that approves 50% of each group achieves perfect demographic parity — but provides zero useful information. This means demographic parity is a necessary but not sufficient condition for a fair system.

7. The "base rate" in the context of fairness definitions refers to:

A) The minimum acceptable accuracy for the algorithm.
B) The actual rate of the positive outcome in a group (the ground truth prevalence).
C) The rate at which the algorithm makes predictions.
D) The cost of operating the algorithmic system.

Answer

**B)** The actual rate of the positive outcome in a group (the ground truth prevalence). *Explanation:* Section 15.1.2 defines the base rate as the actual prevalence of the target outcome in a group. In criminal justice, it is the actual recidivism rate; in healthcare, it is the actual rate of a medical condition; in lending, it is the actual default rate. Different base rates across groups are what trigger the impossibility theorem — when base rates are equal, the fairness definitions converge.

8. The chapter argues that choosing a fairness metric is:

A) A straightforward technical decision that can be made by data scientists alone.
B) Unnecessary because the impossibility theorem shows fairness is unachievable.
C) A political and ethical decision, not merely a technical one, because different metrics encode different values.
D) Always determined by applicable law, leaving no room for choice.

Answer

**C)** A political and ethical decision, not merely a technical one, because different metrics encode different values. *Explanation:* Section 15.6.2 states this directly: "The choice is political." Prioritizing demographic parity reflects a commitment to equal outcomes. Prioritizing equalized odds reflects a commitment to equal error rates. Prioritizing calibration reflects a commitment to predictive accuracy. These are different value commitments, and the impossibility theorem means you must choose among them. No purely technical process can make this choice.

9. Equal opportunity, a relaxed version of equalized odds, requires:

A) Equal selection rates across groups.
B) Equal true positive rates across groups, without constraining false positive rates.
C) Equal positive predictive values across groups.
D) Equal overall accuracy across groups.

Answer

**B)** Equal true positive rates across groups, without constraining false positive rates. *Explanation:* Section 15.3.1 explains that equal opportunity is a relaxation of equalized odds that requires only that the TPR (sensitivity/recall) be equal across groups. This ensures that qualified/positive individuals are equally likely to be correctly identified, regardless of group membership. FPR is not constrained, which makes equal opportunity easier to achieve but provides less protection against false positives.

10. The chapter's concrete illustration of the impossibility theorem (Section 15.6.3) uses two groups with base rates of 40% and 20%. If the system achieves calibration, what necessarily follows?

A) The selection rates will be equal across groups, satisfying demographic parity.
B) The false positive rates will be equal across groups, satisfying equalized odds.
C) The selection rates and false positive rates will differ across groups, violating demographic parity and equalized odds.
D) The overall accuracy will decline below acceptable levels.

Answer

**C)** The selection rates and false positive rates will differ across groups, violating demographic parity and equalized odds. *Explanation:* Section 15.6.3 demonstrates that when a system is calibrated and base rates differ, more members of the higher-base-rate group will be predicted positive (violating demographic parity), and the false positive rates will differ (violating equalized odds). The mathematics is unforgiving — calibration forces unequal selection and unequal error rates when the underlying reality differs across groups.

Section 2: True/False with Justification (1 point each)

11. "A system that satisfies calibration cannot produce racially disparate outcomes."

Answer

**False.** *Explanation:* Section 15.4.4 explicitly states that a perfectly calibrated system *can* coexist with dramatically disparate outcomes — higher selection rates, more detentions, or more resource allocation for one group over another. Calibration ensures only that the system's predictions are equally *accurate* across groups, not that outcomes are *equal*. If base rates differ, a calibrated system will produce different selection rates that mirror those base rate differences.

12. "The impossibility theorem applies only when base rates differ across groups. If base rates are equal, multiple fairness definitions can be simultaneously satisfied."

Answer

**True.** *Explanation:* Section 15.6.4 explicitly confirms this. The impossibility result holds when base rates differ across groups. If the base rates are equal, the fairness definitions converge and can be simultaneously satisfied. This is one reason why addressing underlying social inequality matters even from a purely technical perspective — equalizing base rates relaxes the impossibility constraint.

13. "Individual fairness avoids the political challenges of group fairness because it does not require thinking about groups at all."

Answer

**False.** *Explanation:* Section 15.5.3 directly challenges this claim: "Individual fairness does not escape politics. It hides them in the definition of similarity." Defining what makes two individuals "similar" for a given task is itself a value-laden decision. If the similarity metric includes features shaped by structural inequality (credit history, educational background, zip code), individual fairness will encode those inequalities as neutral facts.

14. "The four-fifths rule and demographic parity are essentially the same concept, differing only in the threshold used."

Answer

**True** (with qualification). *Explanation:* Section 15.2.3 notes that the four-fifths rule (disparate impact framework from [Chapter 14](../chapter-14-bias-in-data-bias-in-machines/index.md)) is "essentially a relaxed form of demographic parity." Demographic parity requires equal selection rates; the four-fifths rule permits selection rates to differ by up to 20% (a ratio above 0.8). The conceptual foundation is the same — comparing selection rates across groups — but the four-fifths rule allows a margin of tolerance.

15. "The fairness-accuracy trade-off means that any intervention to improve fairness necessarily reduces overall accuracy."

Answer

**True** (generally, with nuance). *Explanation:* The chapter discusses the fairness-accuracy trade-off: when a system is optimized purely for accuracy and the underlying data reflects group differences, imposing fairness constraints typically requires adjusting predictions in ways that reduce overall accuracy. However, this is a general tendency, not an absolute rule — in some cases, addressing bias can improve both fairness and accuracy (e.g., when bias causes the model to ignore relevant information about underrepresented groups). The trade-off is most severe when base rates differ significantly.

Section 3: Short Answer (2 points each)

16. Explain the impossibility theorem in a way that a non-technical person (e.g., a judge or a hospital administrator) could understand. Use a concrete example — do not rely on mathematical notation. Your explanation should convey why the theorem matters for real-world decisions.

Sample Answer

Imagine a risk assessment tool used to decide which defendants should be held before trial. There are two goals that both seem obviously fair: **Goal 1 (Calibration):** If the tool says a defendant is "high risk," that should mean the same thing regardless of the defendant's race. A "7 out of 10" risk score should represent the same probability of reoffending for a Black defendant as for a white defendant. **Goal 2 (Equal error rates):** The tool should make the same types of mistakes for all racial groups. Specifically, innocent people who will not reoffend should face the same chance of being wrongly labeled "high risk," regardless of race. Both goals seem obviously right. But mathematically, you cannot achieve both at the same time — unless the actual rate of reoffending is identical across racial groups. In practice, these rates differ (for reasons rooted in poverty, policing patterns, and historical discrimination), which means any tool will inevitably violate one goal to satisfy the other. This matters because it means there is no "perfectly fair" algorithm. Every system involves a trade-off, and the choice of which trade-off to accept is a decision about values — not a technical problem that engineers can solve alone. *Key points for full credit:* - Explains the theorem without mathematical notation - Uses a concrete example - Conveys the key insight: you must choose, and the choice is about values

17. The chapter distinguishes between group fairness and individual fairness. Construct a specific scenario in which a system satisfies individual fairness (similar individuals get similar treatment) but violates demographic parity (selection rates differ dramatically across groups). Explain why the two definitions diverge in this case.

Sample Answer

Consider a loan approval system. The similarity metric for individual fairness is based on credit score, income, and debt-to-income ratio. Two individuals with identical credit scores, incomes, and debt ratios receive identical decisions, regardless of race. The system satisfies individual fairness — similar individuals are treated similarly. However, due to historical redlining, wealth disparities, and employment discrimination, Black applicants as a group have lower average credit scores and higher average debt-to-income ratios than white applicants. The system, applying the same criteria to everyone, approves 65% of white applicants and 35% of Black applicants — a clear violation of demographic parity. The definitions diverge because individual fairness treats structural inequality as a given and asks only whether the system treats similarly-situated individuals similarly. Demographic parity asks whether the system produces equitable group-level outcomes. When the "similarity" criteria are themselves shaped by historical discrimination, treating individuals "similarly" based on those criteria perpetuates the group-level disparities. *Key points for full credit:* - Constructs a clear, specific scenario - Shows individual fairness satisfied and demographic parity violated - Explains the structural reason for the divergence

18. The chapter argues that the impossibility theorem means "technical solutions are insufficient" (Section 15.6.2). What does it mean for a problem to be beyond technical solution? What kind of process does the chapter suggest is needed instead of — or in addition to — technical fairness interventions?

Sample Answer

A problem is "beyond technical solution" when no algorithm, model, or mathematical technique can resolve it — because the problem involves competing values that require a choice between them, and no mathematical formula can determine which values should take priority. The impossibility theorem proves that multiple fairness definitions cannot be simultaneously satisfied. Therefore, building a "fair" system requires *choosing* which fairness definition to prioritize — and that choice involves moral and political judgment about what matters most: equal outcomes, equal accuracy, equal error rates, or equal treatment of similar individuals. No amount of engineering can substitute for this deliberation. The chapter suggests that what is needed is *transparent deliberation among stakeholders* — a process in which affected communities, domain experts, policymakers, and system designers discuss which trade-offs are acceptable and document the rationale for their choices. This process should be inclusive (involving those affected by the system), transparent (publicly documented), and accountable (subject to review and revision). *Key points for full credit:* - Explains what "beyond technical solution" means - Connects to the impossibility theorem - Describes the kind of deliberative process needed

19. Using a specific example, explain how two different fairness definitions can lead to contradictory recommendations for the same system. What does this contradiction imply for policymakers who want to mandate "fair algorithms" through regulation?

Sample Answer

Consider a hiring algorithm that produces different selection rates for men and women: 50% of men and 35% of women are selected. By demographic parity, the system is unfair — it selects a lower proportion of women. The remedy would be to adjust the system to select equal proportions of each group. Now suppose we examine the system's calibration: among those selected, 80% of both men and women prove to be successful employees. The system is calibrated — its predictions are equally accurate for both groups. By calibration, the system is fair, and adjusting selection rates (to achieve demographic parity) would reduce calibration, making the predictions *less* accurate for one group. The recommendations contradict: demographic parity demands adjustment; calibration demands none. This implies that regulations mandating "fair algorithms" cannot simply require "fairness" — they must specify *which definition* of fairness they mean. A regulation requiring demographic parity produces different algorithmic designs than one requiring calibration. Policymakers must confront the impossibility theorem and make an explicit value judgment, which many regulatory frameworks have been reluctant to do. *Key points for full credit:* - Provides a specific example with contradictory recommendations - Names the two fairness definitions in play - Draws out the regulatory implication

Section 4: Applied Scenario (5 points)

20. Read the following scenario and answer all parts.

Scenario: CreditFair Pro

A fintech company launches "CreditFair Pro," a loan approval algorithm designed with fairness in mind. The system processes applications from two demographic groups (Group A and Group B). After one year of operation, the company publishes the following results:

Metric Group A (n=5,000) Group B (n=5,000)

Base rate (actually repaid loan) 75% 60%

Selection rate (approved for loan) 62% 52%

True positive rate (qualified applicants correctly approved) 78% 79%

False positive rate (unqualified applicants incorrectly approved) 18% 17%

Positive predictive value (approved applicants who repay) 88% 87%

(a) Evaluate whether CreditFair Pro satisfies demographic parity. Show your calculation and explain whether the difference is meaningful. (1 point)

(b) Evaluate whether CreditFair Pro satisfies equalized odds. Examine both the TPR and FPR differences. (1 point)

(c) Evaluate whether CreditFair Pro satisfies calibration. Examine the PPV difference. (1 point)

(d) Given the base rate difference (75% vs. 60%), how does the impossibility theorem predict the relationship between these metrics? Is the CreditFair Pro data consistent with what the theorem predicts? Explain. (1 point)

(e) A consumer advocacy group argues that the 10-percentage-point difference in selection rates (62% vs. 52%) is unfair because Group B members are denied loans at a higher rate. The company responds that the system is calibrated and has equalized error rates. Write a paragraph evaluating both positions and explaining what the chapter would say about this disagreement. (1 point)

Sample Answer

**(a)** Demographic parity requires equal selection rates. Group A: 62%, Group B: 52%. The difference is 10 percentage points, and the ratio is 52/62 = 0.839 — above the four-fifths threshold (0.8) but below perfect parity. **Demographic parity is not fully satisfied** (the difference exceeds the common 0.05 tolerance), though it is close to the legal disparate impact threshold. **(b)** Equalized odds requires equal TPR and equal FPR across groups. TPR difference: |78% - 79%| = 1 percentage point. FPR difference: |18% - 17%| = 1 percentage point. Both differences are well within any reasonable tolerance. **Equalized odds is approximately satisfied.** The system makes the same types of errors at the same rates for both groups. **(c)** Calibration requires equal PPV across groups. PPV difference: |88% - 87%| = 1 percentage point. This is within tolerance. **Calibration is approximately satisfied.** Among approved applicants, the repayment rates are nearly identical across groups. **(d)** The impossibility theorem predicts that when base rates differ (75% vs. 60%), a system cannot simultaneously achieve calibration, equalized odds, *and* demographic parity. CreditFair Pro data is consistent with this prediction: the system achieves approximate calibration and equalized odds, but at the cost of unequal selection rates (violating demographic parity). The 10-point selection rate gap is the "price" of maintaining equal error rates and equal predictive accuracy across groups with different base rates. **(e)** The consumer advocacy group's concern is legitimate under the framework of demographic parity: Group B members are approved at a lower rate, which means fewer Group B members receive the economic benefits of credit access. The company's defense is also legitimate: the system treats individuals fairly in the sense that its predictions are equally accurate (calibration) and its errors are equally distributed (equalized odds). The chapter would say that this disagreement is *not* a failure of analysis — it is the impossibility theorem in action. The two positions reflect different value commitments: the advocacy group prioritizes equal outcomes (demographic parity), while the company prioritizes equal accuracy (calibration + equalized odds). Neither is objectively correct. The choice between them is political and ethical, and it should be made through transparent, inclusive deliberation — not by the algorithm's designers alone.

Scoring & Review Recommendations

Score Range	Assessment	Next Steps
Below 50% (< 15 pts)	Needs review	Re-read Sections 15.1-15.4 carefully, redo Part A exercises
50-69% (15-20 pts)	Partial understanding	Review the impossibility theorem and work through the concrete illustration
70-85% (21-25 pts)	Solid understanding	Ready to proceed to Chapter 16; review any missed topics
Above 85% (> 25 pts)	Strong mastery	Proceed to Chapter 16: Transparency, Explainability, and the Black Box Problem

Section	Points Available
Section 1: Multiple Choice	10 points (10 questions x 1 pt)
Section 2: True/False with Justification	5 points (5 questions x 1 pt)
Section 3: Short Answer	8 points (4 questions x 2 pts)
Section 4: Applied Scenario	5 points (5 parts x 1 pt)
Total	28 points

Metric	Group A (n=5,000)	Group B (n=5,000)
Base rate (actually repaid loan)	75%	60%
Selection rate (approved for loan)	62%	52%
True positive rate (qualified applicants correctly approved)	78%	79%
False positive rate (unqualified applicants incorrectly approved)	18%	17%
Positive predictive value (approved applicants who repay)	88%	87%