Chapter 31: Key Takeaways

DataField.Dev

Chapter 31: Key Takeaways

Fairness is a family of incompatible mathematical criteria, not a single property. Demographic parity, equalized odds, equal opportunity, predictive parity, and calibration by group each encode a different ethical commitment about what "fair" means. The impossibility theorem (Chouldechova, 2017; Kleinberg, Mullainathan, and Raghavan, 2016) proves that calibration and equalized odds cannot hold simultaneously when base rates differ across groups — which is the case in virtually every real-world setting. Every deployment must explicitly choose which criterion to enforce, accept the tradeoffs the impossibility theorem imposes, and document the reasoning. "We optimized for fairness" is meaningless without specifying which fairness, for which groups, at what threshold.
Removing protected attributes from features does not prevent discrimination. Proxy features — zip code, employer, credit history patterns, language — correlate with protected attributes because of structural inequality. A model that has never seen a race variable can reproduce the statistical footprint of racial discrimination through these proxies. Fairness requires examining model outcomes disaggregated by protected group, not merely auditing the feature list. The Meridian Financial case study demonstrated this: zip code was the leading predictor of credit risk for Black applicants despite race being absent from the model.
The technical toolkit spans three intervention points, each with distinct tradeoffs. Pre-processing (reweighing, disparate impact removal) modifies the training data — model-agnostic but limited to correcting marginal statistical dependence. In-processing (Fairlearn's ExponentiatedGradient, adversarial debiasing) modifies the training algorithm — more powerful but model-specific and harder to explain to regulators. Post-processing (threshold adjustment, reject option classification) modifies predictions after training — preserves model validation artifacts and is the most transparent, but cannot address root causes. Layered interventions — post-processing for immediate relief, pre-processing for the next retraining, feature engineering for root-cause resolution — are more effective than any single-point fix.
The fairness-accuracy tradeoff is real but often modest. Imposing a fairness constraint on an unconstrained model necessarily reduces accuracy — a mathematical consequence of constraining the optimization. But empirically, the first several percentage points of fairness improvement are often nearly free. Meridian Financial lost 0.01 AUC (0.83 to 0.82) while moving from a four-fifths ratio of 0.69 to 0.86. StreamRec lost 3.2% relative Hit@10 while nearly doubling the exposure equity for the most underserved creator group. The severe tradeoffs feared by engineering teams rarely materialize in practice.
Intersectional analysis is essential because single-attribute audits can mask compound disparities. A model may satisfy fairness criteria for race (averaged over gender) and for gender (averaged over race) while being profoundly unfair to specific intersectional groups. The StreamRec audit found that new Arabic-language creators had an exposure equity ratio of 0.11 — a disparity invisible in either the language-only or tenure-only analysis. Intersectional fairness requires computing metrics for cross-product groups, managing the statistical challenge of small group sizes, and reporting at multiple granularities.
Organizational structures sustain fairness; technical fixes do not. A fairness review board (FRB) selects the criteria, sets thresholds, reviews audits, and makes the ethical tradeoffs that no algorithm can make. A metric selection framework maps harms to criteria. Ongoing monitoring — automated metric computation at every retraining, alerting thresholds, quarterly FRB review — catches drift before it causes harm. Without these structures, a one-time fairness fix erodes as the data shifts, the population changes, and the model retrains. Fairness is a continuous practice, not a one-time assessment.
Fairness in marketplace systems (recommendations, hiring, lending) has multiple stakeholders whose interests can conflict. StreamRec's recommendation fairness has two sides: creator fairness (equitable exposure) and user fairness (equitable recommendation quality). Maximizing one can harm the other — surfacing underexposed content may reduce short-term engagement for users. Meridian Financial's credit scoring fairness affects applicants (who bear the cost of denial), the lender (who bears the cost of default), and the broader community (who bears the cost of credit access inequality). The FRB's role is to balance these interests explicitly, not to optimize a single metric.