Chapter 31: Fairness in Machine Learning — Definitions, Impossibility Results, Mitigation Strategies, and Organizational Practice

DataField.Dev

27 min read

> "It is impossible to be fair in all ways at once. But it is possible to be explicit about which ways you have chosen to be fair, and to defend that choice."

In This Chapter

Learning Objectives
31.1 There Is No "Fair" Algorithm — Only Specific Fairness Criteria
31.2 The Setup: Protected Attributes, Outcomes, and Predictions
31.3 Group Fairness Metrics
31.4 Individual Fairness
31.5 Intersectionality and Multi-Group Extensions
31.6 The Impossibility Theorem
31.7 The Fairness-Accuracy Tradeoff
31.8 Pre-Processing Interventions
31.9 In-Processing Interventions
31.10 Post-Processing Interventions
31.11 Fairness Auditing with Fairlearn and AIF360
31.12 Organizational Practice
31.13 Progressive Project M15: Fairness Audit of StreamRec
31.14 Synthesis: The Fairness Audit Framework
31.15 Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 31: Fairness in Machine Learning — Definitions, Impossibility Results, Mitigation Strategies, and Organizational Practice

"It is impossible to be fair in all ways at once. But it is possible to be explicit about which ways you have chosen to be fair, and to defend that choice." — Arvind Narayanan, "21 Fairness Definitions and Their Politics" (FAT* Tutorial, 2018)

Learning Objectives

By the end of this chapter, you will be able to:

Define and compute the major fairness metrics — demographic parity, equalized odds, equal opportunity, predictive parity, calibration by group, and individual fairness — and explain why they cannot all hold simultaneously
State and prove the impossibility theorem (Chouldechova, 2017; Kleinberg, Mullainathan, and Raghavan, 2016) and articulate its implications for practice
Implement pre-processing, in-processing, and post-processing fairness interventions using Fairlearn and AIF360
Conduct a fairness audit of a production ML system using the Fairlearn dashboard and programmatic metric computation
Design organizational processes — including fairness review boards, metric selection frameworks, and ongoing monitoring — for sustained fairness practice

31.1 There Is No "Fair" Algorithm — Only Specific Fairness Criteria

A credit scoring model at Meridian Financial achieves AUC 0.83 on held-out data. It does not use race, ethnicity, gender, or any other protected attribute as a feature. The model is "blind" to protected characteristics. It is also, by any nontrivial fairness metric, unfair.

The approval rate for white applicants is 74%. The approval rate for Black applicants is 51%. The model does not know an applicant's race — but it knows their zip code, their employer, and their credit history. These features correlate with race because of historical redlining, employment discrimination, and differential access to credit. The model has learned the statistical footprint of structural racism without ever seeing a race variable. This is proxy discrimination: the use of facially neutral features that correlate with protected attributes strongly enough to reproduce discriminatory outcomes.

This chapter begins with an uncomfortable truth: there is no algorithm that is "fair" in any universal sense. Fairness is not a single property that a model either has or lacks. It is a family of incompatible mathematical criteria, each encoding a different ethical commitment. Choosing a fairness criterion is an ethical decision, not a technical one. The role of the data scientist is to make that choice explicit, to quantify the tradeoffs, and to implement the chosen criterion with engineering rigor.

Know How Your Model Is Wrong: This chapter operationalizes the Part VI theme at its most consequential. Knowing how your model is wrong about aggregate accuracy (calibration, uncertainty) is important. Knowing how your model is wrong about specific groups of people — and whether those errors correlate with protected characteristics — is both an ethical imperative and, in regulated industries, a legal requirement. The impossibility theorem guarantees that your model is wrong about fairness in at least one way. The question is which way you choose, and whether you have made that choice deliberately.

The chapter proceeds in five movements. Section 31.2-31.5 define the major fairness metrics, organized by group fairness (demographic parity, equalized odds, equal opportunity, predictive parity, calibration) and individual fairness. Section 31.6-31.7 present the impossibility theorem and its implications. Section 31.8-31.11 cover mitigation strategies: pre-processing, in-processing, and post-processing interventions with Fairlearn and AIF360. Section 31.12-31.13 address organizational practice: fairness review boards, metric selection, monitoring, and the StreamRec progressive project. Section 31.14 synthesizes the chapter's lessons into a fairness audit framework.

31.2 The Setup: Protected Attributes, Outcomes, and Predictions

Before defining fairness metrics, we establish notation. A model takes features $X$ and produces a prediction $\hat{Y} \in \{0, 1\}$ (binary classification) or a score $S \in [0, 1]$ (probability). The true outcome is $Y \in \{0, 1\}$. A protected attribute $A \in \{0, 1\}$ identifies group membership — for example, $A = 1$ might denote membership in a historically disadvantaged group.

In practice, the protected attribute may be multi-valued (race with multiple categories), intersectional (race $\times$ gender), or unobserved (the model must operate without access to $A$ at inference time). We start with the binary case for clarity and extend to multi-valued and intersectional settings in Section 31.5.

The fundamental quantities are the confusion matrix entries, conditioned on group membership:

Quantity	Definition	Notation
True positive rate (TPR)	$P(\hat{Y}=1 \mid Y=1, A=a)$	$\text{TPR}_a$
False positive rate (FPR)	$P(\hat{Y}=1 \mid Y=0, A=a)$	$\text{FPR}_a$
Positive predictive value (PPV)	$P(Y=1 \mid \hat{Y}=1, A=a)$	$\text{PPV}_a$
False discovery rate (FDR)	$P(Y=0 \mid \hat{Y}=1, A=a)$	$\text{FDR}_a$
Selection rate	$P(\hat{Y}=1 \mid A=a)$	$\text{SR}_a$
Base rate	$P(Y=1 \mid A=a)$	$\text{BR}_a$

Every group fairness metric is a constraint on a specific combination of these quantities across groups.

from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Optional
import numpy as np
import pandas as pd


@dataclass
class FairnessMetrics:
    """Compute and store group fairness metrics for binary classification.

    Computes confusion-matrix-derived fairness metrics conditioned on a
    protected attribute. Supports binary and multi-valued protected attributes.

    Attributes:
        y_true: Ground truth binary labels (0 or 1).
        y_pred: Predicted binary labels (0 or 1).
        y_score: Predicted probabilities (optional, for calibration metrics).
        sensitive: Protected attribute values.
    """

    y_true: np.ndarray
    y_pred: np.ndarray
    y_score: Optional[np.ndarray]
    sensitive: np.ndarray

    def _group_mask(self, group: int) -> np.ndarray:
        """Return boolean mask for a specific group."""
        return self.sensitive == group

    def _confusion_rates(self, group: int) -> Dict[str, float]:
        """Compute confusion-matrix rates for a single group."""
        mask = self._group_mask(group)
        y_t = self.y_true[mask]
        y_p = self.y_pred[mask]

        n_pos = y_t.sum()
        n_neg = len(y_t) - n_pos

        tpr = (y_p[y_t == 1].sum() / n_pos) if n_pos > 0 else 0.0
        fpr = (y_p[y_t == 0].sum() / n_neg) if n_neg > 0 else 0.0
        ppv_denom = y_p.sum()
        ppv = (y_t[y_p == 1].sum() / ppv_denom) if ppv_denom > 0 else 0.0
        selection_rate = y_p.mean()
        base_rate = y_t.mean()

        return {
            "tpr": float(tpr),
            "fpr": float(fpr),
            "ppv": float(ppv),
            "fdr": float(1.0 - ppv),
            "selection_rate": float(selection_rate),
            "base_rate": float(base_rate),
            "n": int(mask.sum()),
        }

    def group_metrics(self) -> Dict[int, Dict[str, float]]:
        """Compute confusion-matrix rates for all groups.

        Returns:
            Dictionary mapping group label to its confusion-matrix rates.
        """
        groups = np.unique(self.sensitive)
        return {int(g): self._confusion_rates(g) for g in groups}

    def demographic_parity_difference(self) -> float:
        """Compute demographic parity difference (max - min selection rate).

        Returns:
            Maximum absolute difference in selection rates across groups.
        """
        metrics = self.group_metrics()
        rates = [m["selection_rate"] for m in metrics.values()]
        return float(max(rates) - min(rates))

    def demographic_parity_ratio(self) -> float:
        """Compute demographic parity ratio (min / max selection rate).

        This is the four-fifths rule metric: ratio < 0.8 indicates
        potential disparate impact under EEOC guidelines.

        Returns:
            Ratio of minimum to maximum selection rate across groups.
        """
        metrics = self.group_metrics()
        rates = [m["selection_rate"] for m in metrics.values()]
        return float(min(rates) / max(rates)) if max(rates) > 0 else 0.0

    def equalized_odds_difference(self) -> float:
        """Compute equalized odds difference.

        Maximum of |TPR_a - TPR_b| and |FPR_a - FPR_b| across all
        group pairs. Equalized odds requires both TPR and FPR to be
        equal across groups.

        Returns:
            Maximum equalized odds violation across groups.
        """
        metrics = self.group_metrics()
        groups = list(metrics.keys())
        max_diff = 0.0
        for i in range(len(groups)):
            for j in range(i + 1, len(groups)):
                tpr_diff = abs(metrics[groups[i]]["tpr"] - metrics[groups[j]]["tpr"])
                fpr_diff = abs(metrics[groups[i]]["fpr"] - metrics[groups[j]]["fpr"])
                max_diff = max(max_diff, tpr_diff, fpr_diff)
        return float(max_diff)

    def equal_opportunity_difference(self) -> float:
        """Compute equal opportunity difference (max TPR gap).

        Equal opportunity requires equal TPR across groups — the model
        is equally likely to correctly identify positive cases regardless
        of group membership.

        Returns:
            Maximum absolute TPR difference across groups.
        """
        metrics = self.group_metrics()
        tprs = [m["tpr"] for m in metrics.values()]
        return float(max(tprs) - min(tprs))

    def predictive_parity_difference(self) -> float:
        """Compute predictive parity difference (max PPV gap).

        Returns:
            Maximum absolute PPV difference across groups.
        """
        metrics = self.group_metrics()
        ppvs = [m["ppv"] for m in metrics.values()]
        return float(max(ppvs) - min(ppvs))

    def summary(self) -> pd.DataFrame:
        """Return a DataFrame summarizing all fairness metrics by group."""
        metrics = self.group_metrics()
        df = pd.DataFrame(metrics).T
        df.index.name = "group"
        return df

This FairnessMetrics class will serve as our computational substrate throughout the chapter. We now define each fairness criterion it measures, beginning with group-level metrics.

31.3 Group Fairness Metrics

31.3.1 Demographic Parity (Statistical Parity)

Definition: A classifier satisfies demographic parity if the probability of receiving a positive prediction is the same across all groups:

$$P(\hat{Y} = 1 \mid A = 0) = P(\hat{Y} = 1 \mid A = 1)$$

Equivalently, the selection rate is independent of group membership: $\hat{Y} \perp A$.

Intuition: Demographic parity requires that the model selects positive outcomes at equal rates across groups. If 30% of group 0 applicants are approved, then 30% of group 1 applicants must also be approved.

Strengths: Demographic parity is intuitive, easy to compute, and directly addresses representation. It captures the idea that outcomes should not systematically differ across groups.

Weaknesses: Demographic parity ignores the ground truth label $Y$ entirely. If the base rate genuinely differs across groups (e.g., default rates differ due to legitimate economic factors), enforcing demographic parity requires approving applicants the model predicts will default — or denying applicants the model predicts will repay — solely to equalize rates. It can also be satisfied by a model that is randomly correct: a coin flip satisfies demographic parity perfectly.

The Four-Fifths Rule. U.S. employment law provides a practical threshold. The Uniform Guidelines on Employee Selection Procedures (1978) codify the four-fifths rule: if the selection rate for any protected group is less than four-fifths (80%) of the selection rate for the group with the highest rate, the selection procedure is considered to have disparate impact and the burden of proof shifts to the employer to demonstrate business necessity. For credit scoring, ECOA (Equal Credit Opportunity Act) applies an analogous standard, though the four-fifths rule is not the sole test.

def check_four_fifths_rule(
    y_pred: np.ndarray,
    sensitive: np.ndarray,
    threshold: float = 0.8,
) -> Dict[str, object]:
    """Check the four-fifths (80%) rule for disparate impact.

    The four-fifths rule states that the selection rate for any group
    should be at least 80% of the highest group's selection rate.

    Args:
        y_pred: Binary predictions (0 or 1).
        sensitive: Protected attribute values (integer-encoded).
        threshold: Disparate impact ratio threshold (default 0.8).

    Returns:
        Dictionary with selection rates, ratios, and pass/fail status.
    """
    groups = np.unique(sensitive)
    selection_rates = {}
    for g in groups:
        mask = sensitive == g
        selection_rates[int(g)] = float(y_pred[mask].mean())

    max_rate = max(selection_rates.values())
    ratios = {
        g: rate / max_rate if max_rate > 0 else 0.0
        for g, rate in selection_rates.items()
    }
    violations = {g: r for g, r in ratios.items() if r < threshold}

    return {
        "selection_rates": selection_rates,
        "disparate_impact_ratios": ratios,
        "threshold": threshold,
        "violations": violations,
        "passes": len(violations) == 0,
    }

31.3.2 Equalized Odds

Definition: A classifier satisfies equalized odds (Hardt, Price, and Srebro, 2016) if the prediction $\hat{Y}$ is conditionally independent of the protected attribute $A$ given the true outcome $Y$:

$$P(\hat{Y} = 1 \mid Y = y, A = 0) = P(\hat{Y} = 1 \mid Y = y, A = 1) \quad \text{for } y \in \{0, 1\}$$

This requires both equal TPR and equal FPR across groups.

Intuition: Equalized odds requires that the model makes errors at equal rates across groups, conditioned on the true outcome. Among applicants who would actually repay the loan, the approval rate must be the same regardless of group. Among applicants who would actually default, the false approval rate must also be the same.

Strengths: Equalized odds accounts for the true label $Y$. If the base rate genuinely differs across groups, equalized odds permits different selection rates — it requires equal error rates, not equal selection rates. This makes it more compatible with accuracy than demographic parity.

Weaknesses: Equalized odds requires access to the true label $Y$ for evaluation, which is often delayed (e.g., loan defaults occur 6-12 months after origination). It is also sensitive to label bias: if the ground truth labels themselves reflect historical discrimination (e.g., past lending decisions were discriminatory, so the "default" labels are conditioned on biased approvals), equalized odds perpetuates that bias.

31.3.3 Equal Opportunity

Definition: A classifier satisfies equal opportunity (Hardt, Price, and Srebro, 2016) if the TPR is equal across groups:

$$P(\hat{Y} = 1 \mid Y = 1, A = 0) = P(\hat{Y} = 1 \mid Y = 1, A = 1)$$

Equal opportunity is the relaxation of equalized odds that constrains only the true positive rate, not the false positive rate.

Intuition: Among qualified individuals (those with $Y = 1$), the model should identify them at equal rates regardless of group. A qualified Black applicant should have the same probability of approval as a qualified white applicant. Equal opportunity says nothing about false positives — unqualified applicants may be approved at different rates across groups.

Strengths: Equal opportunity focuses on the most ethically salient error: denying a positive outcome to a qualified individual. In credit scoring, this means a creditworthy applicant is not denied because of their group membership. It is less restrictive than equalized odds and therefore easier to achieve while maintaining accuracy.

31.3.4 Predictive Parity

Definition: A classifier satisfies predictive parity if the positive predictive value is equal across groups:

$$P(Y = 1 \mid \hat{Y} = 1, A = 0) = P(Y = 1 \mid \hat{Y} = 1, A = 1)$$

Intuition: Among applicants the model approves, the fraction who actually repay should be the same across groups. If the model approves a pool of white applicants with a 95% repayment rate, the model should also approve a pool of Black applicants with a 95% repayment rate. Predictive parity ensures that a positive prediction means the same thing regardless of group.

Strengths: Predictive parity is directly relevant to decision-makers. A lender wants to know that an approved applicant has the same expected performance regardless of demographic group. It aligns the model's semantics across groups.

Weaknesses: Predictive parity is compatible with large disparities in selection rates and error rates. A model could approve very few members of group 1 while maintaining equal PPV — it simply rejects all borderline cases in that group, achieving high precision at the cost of low recall.

31.3.5 Calibration by Group

Definition: A model is calibrated by group if, for any predicted probability $s$, the true probability of a positive outcome is the same across groups:

$$P(Y = 1 \mid S = s, A = 0) = P(Y = 1 \mid S = s, A = 1) = s$$

This is a stronger condition than predictive parity. It requires that the model's probability estimates are accurate for every group at every threshold — not just at the decision boundary.

Intuition: When the model says "this applicant has a 70% probability of repayment," it should be correct for white applicants and Black applicants alike. Calibration by group ensures that scores are comparable across groups and that any threshold applied to scores treats groups fairly in a predictive sense.

Strengths: Calibration is a property of the probability estimates, independent of the threshold. A well-calibrated model allows decision-makers to choose any threshold and know that the PPV will be (approximately) equal across groups at that threshold. It is also compatible with differing base rates — a well-calibrated model assigns different distributions of scores to groups with different base rates, and that is not unfair.

Verification in practice:

def check_group_calibration(
    y_true: np.ndarray,
    y_score: np.ndarray,
    sensitive: np.ndarray,
    n_bins: int = 10,
) -> pd.DataFrame:
    """Check calibration by group using binned calibration analysis.

    For each group and each score bin, compares the mean predicted
    probability to the observed positive rate. Well-calibrated models
    show agreement across all bins and groups.

    Args:
        y_true: Ground truth binary labels.
        y_score: Predicted probabilities.
        sensitive: Protected attribute values.
        n_bins: Number of calibration bins.

    Returns:
        DataFrame with columns: group, bin_center, mean_predicted,
        observed_rate, bin_count.
    """
    bin_edges = np.linspace(0, 1, n_bins + 1)
    rows = []
    for g in np.unique(sensitive):
        mask = sensitive == g
        y_t_g = y_true[mask]
        y_s_g = y_score[mask]
        for i in range(n_bins):
            lo, hi = bin_edges[i], bin_edges[i + 1]
            bin_mask = (y_s_g >= lo) & (y_s_g < hi)
            if bin_mask.sum() == 0:
                continue
            rows.append({
                "group": int(g),
                "bin_center": float((lo + hi) / 2),
                "mean_predicted": float(y_s_g[bin_mask].mean()),
                "observed_rate": float(y_t_g[bin_mask].mean()),
                "bin_count": int(bin_mask.sum()),
            })
    return pd.DataFrame(rows)

31.4 Individual Fairness

31.4.1 Lipschitz Individual Fairness

Group fairness metrics constrain average behavior across groups. They do not prevent the model from treating two similar individuals from different groups very differently. Individual fairness (Dwork, Zemel, et al., 2012) addresses this gap.

Definition: A model satisfies individual fairness if similar individuals receive similar predictions:

$$d_{\text{output}}(\hat{Y}(x_i), \hat{Y}(x_j)) \leq L \cdot d_{\text{input}}(x_i, x_j)$$

where $d_{\text{input}}$ is a distance metric on the feature space, $d_{\text{output}}$ is a distance metric on predictions, and $L$ is a Lipschitz constant.

The fundamental challenge is defining $d_{\text{input}}$. What does it mean for two loan applicants to be "similar"? If two applicants differ only in their zip code, are they similar? The answer depends on whether you believe zip code reflects a legitimate creditworthiness signal (neighborhood stability) or a proxy for race (redlining). Individual fairness pushes the ethical question into the metric design.

31.4.2 Counterfactual Fairness

Definition (Kusner et al., 2017): A prediction $\hat{Y}$ is counterfactually fair if it would remain the same in a counterfactual world where the individual belonged to a different group:

$$P(\hat{Y}_{A \leftarrow a}(U) = y \mid X = x, A = a) = P(\hat{Y}_{A \leftarrow a'}(U) = y \mid X = x, A = a)$$

where $U$ represents unobserved background variables and $A \leftarrow a'$ denotes a counterfactual intervention setting $A$ to value $a'$.

Intuition: Counterfactual fairness asks: "Would this person have received the same prediction if they had been a different race/gender/etc., with everything else held constant?" This requires a causal model (Chapter 17) that specifies which features are causally downstream of the protected attribute and which are not. Features caused by the protected attribute (e.g., income, if historical discrimination affected earning potential) would be adjusted in the counterfactual; features not caused by the protected attribute (e.g., innate ability, if measurable) would remain unchanged.

Strengths: Counterfactual fairness provides the strongest intuitive alignment with individual-level fairness. It directly formalizes the legal concept of disparate treatment: the outcome should not have changed if the protected attribute had been different.

Challenges: Counterfactual fairness requires a causal graph specifying the causal relationships between the protected attribute, features, and the outcome. This graph is often contested. Whether education quality is causally downstream of race (via school funding inequality) or an independent factor is an empirical and philosophical question with no universally agreed answer. Different causal graphs yield different definitions of counterfactual fairness.

31.5 Intersectionality and Multi-Group Extensions

Real-world fairness analysis cannot examine one protected attribute at a time. A model that is fair with respect to race (when averaged over gender) and fair with respect to gender (when averaged over race) may be profoundly unfair to Black women — a phenomenon documented in facial recognition systems (Buolamwini and Gebru, 2018) and hiring algorithms.

Intersectional fairness requires computing metrics for all relevant cross-groups: race $\times$ gender, race $\times$ age, gender $\times$ disability status, and higher-order combinations. The challenge is statistical: as the number of cross-groups grows, the sample size per group shrinks, and metric estimates become noisy.

def intersectional_fairness_audit(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    sensitive_features: pd.DataFrame,
    min_group_size: int = 30,
) -> pd.DataFrame:
    """Compute fairness metrics for intersectional groups.

    Creates cross-product groups from multiple sensitive features and
    computes per-group metrics. Groups below min_group_size are flagged
    but still reported for transparency.

    Args:
        y_true: Ground truth binary labels.
        y_pred: Predicted binary labels.
        sensitive_features: DataFrame with one column per sensitive attribute.
        min_group_size: Minimum group size for reliable metric estimation.

    Returns:
        DataFrame with intersectional group metrics and reliability flags.
    """
    # Create composite group labels
    group_labels = sensitive_features.apply(
        lambda row: "_".join(str(v) for v in row), axis=1
    )

    rows = []
    for group_label in group_labels.unique():
        mask = group_labels == group_label
        n = mask.sum()
        y_t = y_true[mask]
        y_p = y_pred[mask]

        n_pos = y_t.sum()
        n_neg = n - n_pos

        tpr = float(y_p[y_t == 1].sum() / n_pos) if n_pos > 0 else np.nan
        fpr = float(y_p[y_t == 0].sum() / n_neg) if n_neg > 0 else np.nan
        selection_rate = float(y_p.mean())
        base_rate = float(y_t.mean())
        ppv_denom = y_p.sum()
        ppv = float(y_t[y_p == 1].sum() / ppv_denom) if ppv_denom > 0 else np.nan

        rows.append({
            "group": group_label,
            "n": int(n),
            "base_rate": base_rate,
            "selection_rate": selection_rate,
            "tpr": tpr,
            "fpr": fpr,
            "ppv": ppv,
            "reliable": n >= min_group_size,
        })

    df = pd.DataFrame(rows).sort_values("selection_rate")
    return df

A practical approach is to compute metrics at multiple granularities — single-attribute, pairwise intersections, and the finest intersectional level — and use the coarser levels for summary reporting and the finer levels for detecting patterns masked by aggregation.

31.6 The Impossibility Theorem

We have now defined five group fairness metrics. A natural question: can a model satisfy all of them simultaneously? The answer, in almost all practical settings, is no.

31.6.1 Statement

Theorem (Chouldechova, 2017; Kleinberg, Mullainathan, and Raghavan, 2016): Consider a binary classifier applied to two groups with unequal base rates ($\text{BR}_0 \neq \text{BR}_1$). The following three conditions cannot hold simultaneously unless the classifier is perfect (zero error rate):

Calibration by group (predictive parity): $\text{PPV}_0 = \text{PPV}_1$ and $\text{FDR}_0 = \text{FDR}_1$
Equal false positive rates: $\text{FPR}_0 = \text{FPR}_1$
Equal false negative rates: $\text{FNR}_0 = \text{FNR}_1$

Conditions 2 and 3 together constitute equalized odds. Therefore, calibration and equalized odds are incompatible when base rates differ — which is the case in virtually every real-world setting.

31.6.2 Proof Sketch

The proof follows from the algebraic relationship between the quantities. Define for each group $a$:

PPV: $\text{PPV}_a = P(Y=1 | \hat{Y}=1, A=a)$
FDR: $\text{FDR}_a = 1 - \text{PPV}_a = P(Y=0 | \hat{Y}=1, A=a)$
FPR: $\text{FPR}_a = P(\hat{Y}=1 | Y=0, A=a)$
FNR: $\text{FNR}_a = P(\hat{Y}=0 | Y=1, A=a)$

By Bayes' theorem, we can express the PPV in terms of the other quantities:

$$\text{PPV}_a = \frac{\text{TPR}_a \cdot \text{BR}_a}{\text{TPR}_a \cdot \text{BR}_a + \text{FPR}_a \cdot (1 - \text{BR}_a)}$$

where $\text{TPR}_a = 1 - \text{FNR}_a$.

Suppose conditions 2 and 3 hold: $\text{FPR}_0 = \text{FPR}_1 = f$ and $\text{FNR}_0 = \text{FNR}_1 = m$ (and therefore $\text{TPR}_0 = \text{TPR}_1 = 1 - m$). Then:

$$\text{PPV}_a = \frac{(1 - m) \cdot \text{BR}_a}{(1 - m) \cdot \text{BR}_a + f \cdot (1 - \text{BR}_a)}$$

This is a function of $\text{BR}_a$ (for fixed $f$ and $m$). If $\text{BR}_0 \neq \text{BR}_1$, then $\text{PPV}_0 \neq \text{PPV}_1$, violating condition 1.

The only escape is if $f = 0$ and $m = 0$ (the classifier is perfect). In that case, $\text{PPV}_a = 1$ for all groups regardless of base rate. But a perfect classifier does not exist for any nontrivial prediction problem.

31.6.3 Numerical Demonstration

def demonstrate_impossibility(
    br_0: float = 0.10,
    br_1: float = 0.25,
    fpr: float = 0.05,
    fnr: float = 0.20,
) -> pd.DataFrame:
    """Demonstrate the impossibility theorem numerically.

    Given equal FPR and FNR across two groups with different base rates,
    shows that PPV must differ.

    Args:
        br_0: Base rate for group 0.
        br_1: Base rate for group 1.
        fpr: Common false positive rate.
        fnr: Common false negative rate.

    Returns:
        DataFrame showing PPV, NPV, and selection rates for both groups.
    """
    tpr = 1.0 - fnr
    rows = []
    for group, br in [(0, br_0), (1, br_1)]:
        ppv_num = tpr * br
        ppv_den = tpr * br + fpr * (1 - br)
        ppv = ppv_num / ppv_den if ppv_den > 0 else 0.0

        npv_num = (1 - fpr) * (1 - br)
        npv_den = (1 - fpr) * (1 - br) + fnr * br
        npv = npv_num / npv_den if npv_den > 0 else 0.0

        selection_rate = tpr * br + fpr * (1 - br)

        rows.append({
            "group": group,
            "base_rate": br,
            "tpr": tpr,
            "fpr": fpr,
            "fnr": fnr,
            "ppv": round(ppv, 4),
            "npv": round(npv, 4),
            "selection_rate": round(selection_rate, 4),
        })

    df = pd.DataFrame(rows)
    print("=== Impossibility Theorem Demonstration ===")
    print(f"FPR = {fpr:.2f} (equal), FNR = {fnr:.2f} (equal)")
    print(f"Base rates: group 0 = {br_0:.2f}, group 1 = {br_1:.2f}")
    print(f"\nPPV group 0: {df.loc[0, 'ppv']:.4f}")
    print(f"PPV group 1: {df.loc[1, 'ppv']:.4f}")
    print(f"PPV difference: {abs(df.loc[0, 'ppv'] - df.loc[1, 'ppv']):.4f}")
    print(f"\nSelection rate group 0: {df.loc[0, 'selection_rate']:.4f}")
    print(f"Selection rate group 1: {df.loc[1, 'selection_rate']:.4f}")
    print("\nConclusion: Equal FPR and FNR (equalized odds) with unequal")
    print("base rates necessarily produces unequal PPV (calibration violation).")
    return df


# Example: credit scoring with different default rates by group
# Group 0: 10% default rate, Group 1: 25% default rate
result = demonstrate_impossibility(br_0=0.10, br_1=0.25, fpr=0.05, fnr=0.20)

Running this produces:

=== Impossibility Theorem Demonstration ===
FPR = 0.05 (equal), FNR = 0.20 (equal)
Base rates: group 0 = 0.10, group 1 = 0.25

PPV group 0: 0.6400
PPV group 1: 0.8421
PPV difference: 0.2021

Selection rate group 0: 0.1250
Selection rate group 1: 0.2375

Conclusion: Equal FPR and FNR (equalized odds) with unequal
base rates necessarily produces unequal PPV (calibration violation).

The numbers are stark. With equalized odds enforced (FPR = 0.05, FNR = 0.20 for both groups), the PPV for group 0 is 0.64 while the PPV for group 1 is 0.84. A positive prediction means something very different for the two groups. For group 0, an approved applicant has a 64% probability of being truly creditworthy. For group 1, an approved applicant has an 84% probability. The model's semantic meaning is group-dependent — even though its error rates are group-independent.

31.6.4 Implications for Practice

The impossibility theorem has four practical implications:

Fairness is a choice, not a discovery. There is no metric that captures "fairness" universally. Every deployment requires a deliberate selection among incompatible criteria. The choice depends on the domain, the legal framework, the stakeholders, and the specific harm you are trying to prevent.
The choice must be documented and defended. Because the choice is ethical, not technical, it must be made by people with the authority and context to make ethical decisions — not by the data scientist alone, and certainly not by the optimizer. Section 31.13 covers organizational processes for making and documenting this choice.
"We optimized for fairness" is meaningless without specifying which criterion. Claims of algorithmic fairness that do not specify the exact criterion, the protected attributes, and the measured values are incomplete at best and misleading at worst.
Base rate differences are the root of the tension. The impossibility theorem only binds when base rates differ. Understanding why base rates differ — legitimate differences in qualification, historical discrimination affecting outcomes, measurement bias in the labels — is essential for choosing the right criterion. If base rate differences are themselves the product of historical injustice, calibrating to those base rates perpetuates the injustice. If base rate differences reflect genuine risk differences, ignoring them imposes costs that may harm the population you intend to help (e.g., higher interest rates for all, because the lender cannot differentiate risk).

31.7 The Fairness-Accuracy Tradeoff

Imposing any fairness constraint on an unconstrained model generally reduces overall accuracy. This is not a defect of the method — it is a mathematical consequence of constraining an optimization problem. The unconstrained model maximizes accuracy by definition; any additional constraint can only reduce the feasible set.

The question is not whether there is a tradeoff, but how severe it is. Empirically, the tradeoff is often modest. Hardt et al. (2016) showed that post-processing for equalized odds typically reduces accuracy by 1-3 percentage points on standard benchmarks. In credit scoring, Meridian Financial found that enforcing equal opportunity reduced AUC from 0.83 to 0.81 — within the 0.78 floor set by the risk team.

def plot_fairness_accuracy_frontier(
    y_true: np.ndarray,
    y_score: np.ndarray,
    sensitive: np.ndarray,
    n_thresholds: int = 100,
) -> pd.DataFrame:
    """Compute the fairness-accuracy frontier across thresholds.

    For each threshold, computes accuracy and demographic parity difference.
    The resulting frontier shows the tradeoff between overall accuracy and
    fairness as the decision threshold varies.

    Args:
        y_true: Ground truth binary labels.
        y_score: Predicted probabilities.
        sensitive: Protected attribute values.
        n_thresholds: Number of thresholds to evaluate.

    Returns:
        DataFrame with threshold, accuracy, dp_difference, and dp_ratio.
    """
    thresholds = np.linspace(0.01, 0.99, n_thresholds)
    rows = []
    for t in thresholds:
        y_pred = (y_score >= t).astype(int)
        accuracy = float((y_pred == y_true).mean())

        groups = np.unique(sensitive)
        rates = {
            int(g): float(y_pred[sensitive == g].mean())
            for g in groups
        }
        rate_values = list(rates.values())
        dp_diff = max(rate_values) - min(rate_values)
        dp_ratio = (
            min(rate_values) / max(rate_values)
            if max(rate_values) > 0 else 0.0
        )

        rows.append({
            "threshold": float(t),
            "accuracy": accuracy,
            "dp_difference": dp_diff,
            "dp_ratio": dp_ratio,
            "selection_rates": rates,
        })
    return pd.DataFrame(rows)

The frontier reveals that the tradeoff is typically nonlinear: the first few percentage points of fairness improvement are often "free" (they can be achieved by adjusting the threshold with minimal accuracy loss), while the last few percentage points are expensive (they require substantial accuracy sacrifice). This motivates the search for interventions that improve the Pareto frontier itself, rather than simply sliding along it.

31.8 Pre-Processing Interventions

Pre-processing interventions modify the training data before the model is trained, with the goal of reducing the statistical dependence between the protected attribute and the features or labels.

31.8.1 Reweighing

Reweighing (Kamiran and Calders, 2012) assigns sample weights that correct for the disparity between the observed joint distribution $P(Y, A)$ and the expected joint distribution under independence:

$$w(y, a) = \frac{P(Y = y) \cdot P(A = a)}{P(Y = y, A = a)}$$

This upweights underrepresented (label, group) combinations and downweights overrepresented ones. The reweighed dataset has (approximately) equal base rates across groups.

def compute_reweighing_weights(
    y_true: np.ndarray,
    sensitive: np.ndarray,
) -> np.ndarray:
    """Compute sample weights for the reweighing pre-processing method.

    Adjusts sample weights so that the joint distribution P(Y, A) matches
    the product of marginals P(Y) * P(A), removing statistical dependence
    between the label and the protected attribute.

    Args:
        y_true: Ground truth binary labels.
        sensitive: Protected attribute values.

    Returns:
        Array of sample weights, one per instance.
    """
    n = len(y_true)
    weights = np.ones(n, dtype=float)

    groups = np.unique(sensitive)
    labels = np.unique(y_true)

    p_y = {}
    p_a = {}
    p_ya = {}

    for y_val in labels:
        p_y[y_val] = (y_true == y_val).sum() / n

    for a_val in groups:
        p_a[a_val] = (sensitive == a_val).sum() / n

    for y_val in labels:
        for a_val in groups:
            mask = (y_true == y_val) & (sensitive == a_val)
            p_ya[(y_val, a_val)] = mask.sum() / n

    for i in range(n):
        y_val = y_true[i]
        a_val = sensitive[i]
        joint = p_ya[(y_val, a_val)]
        if joint > 0:
            weights[i] = (p_y[y_val] * p_a[a_val]) / joint

    return weights

Strengths: Reweighing is model-agnostic — any model that accepts sample weights can use it. It does not modify the features or labels, preserving the original data for auditing.

Weaknesses: Reweighing can only correct for marginal statistical dependence between $Y$ and $A$. It cannot correct for proxy discrimination through features that are correlated with $A$ but are not $A$ itself.

31.8.2 Disparate Impact Remover

The disparate impact remover (Feldman et al., 2015) transforms features to remove their correlation with the protected attribute while preserving their rank ordering within each group. For each feature, it maps each group's distribution to a common median distribution, ensuring that the feature's marginal distribution is the same regardless of group membership.

This is a stronger intervention than reweighing: it modifies the feature space itself, making it harder for any downstream model to recover group membership from features.

Tradeoff: The transformation necessarily reduces the feature's predictive power if the feature's relationship with the outcome genuinely differs across groups. In credit scoring, if average income differs across groups due to structural inequality, the disparate impact remover reduces income's ability to predict default — which may increase overall model error while reducing disparate impact.

31.9 In-Processing Interventions

In-processing interventions modify the training algorithm itself to incorporate fairness constraints.

31.9.1 Constrained Optimization with Fairlearn

Fairlearn's ExponentiatedGradient implements constrained optimization for fairness. The key idea (Agarwal et al., 2018) is to reduce the constrained optimization problem to a sequence of cost-sensitive classification problems, where the costs encode the fairness constraint. The algorithm alternates between training a classifier with specific costs and updating the costs based on the constraint violation.

from fairlearn.reductions import (
    ExponentiatedGradient,
    DemographicParity,
    EqualizedOdds,
    TruePositiveRateParity,
)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier


def train_with_fairness_constraint(
    X_train: np.ndarray,
    y_train: np.ndarray,
    sensitive_train: np.ndarray,
    constraint_name: str = "equalized_odds",
    base_estimator: Optional[object] = None,
) -> object:
    """Train a model with a fairness constraint using Fairlearn.

    Uses ExponentiatedGradient to solve the constrained optimization
    problem: minimize error subject to a fairness constraint.

    Args:
        X_train: Training features.
        y_train: Training labels.
        sensitive_train: Protected attribute for training data.
        constraint_name: One of 'demographic_parity', 'equalized_odds',
            'true_positive_rate_parity'.
        base_estimator: Base classifier (default: LogisticRegression).

    Returns:
        Fitted ExponentiatedGradient model.
    """
    constraints = {
        "demographic_parity": DemographicParity(),
        "equalized_odds": EqualizedOdds(),
        "true_positive_rate_parity": TruePositiveRateParity(),
    }

    if constraint_name not in constraints:
        raise ValueError(
            f"Unknown constraint: {constraint_name}. "
            f"Choose from {list(constraints.keys())}"
        )

    if base_estimator is None:
        base_estimator = LogisticRegression(
            solver="liblinear",
            max_iter=1000,
            random_state=42,
        )

    mitigator = ExponentiatedGradient(
        estimator=base_estimator,
        constraints=constraints[constraint_name],
        max_iter=50,
        nu=1e-6,
    )

    mitigator.fit(
        X_train,
        y_train,
        sensitive_features=sensitive_train,
    )

    return mitigator

31.9.2 Adversarial Debiasing

Adversarial debiasing (Zhang, Lemoine, and Mitchell, 2018) adds an adversary network that tries to predict the protected attribute from the model's predictions. The primary model is trained to maximize prediction accuracy while minimizing the adversary's ability to recover the protected attribute. The resulting model's predictions are approximately independent of the protected attribute, conditional on the features.

import torch
import torch.nn as nn
import torch.optim as optim


class AdversarialDebiasing(nn.Module):
    """Adversarial debiasing model with predictor and adversary networks.

    The predictor learns to predict the outcome Y from features X.
    The adversary learns to predict the sensitive attribute A from
    the predictor's output. The predictor is trained to maximize
    accuracy while minimizing the adversary's ability to recover A.

    Args:
        input_dim: Number of input features.
        hidden_dim: Hidden layer dimension for both networks.
        adversary_loss_weight: Weight of adversary loss in combined
            objective (lambda). Higher values enforce stronger fairness
            at the cost of accuracy.
    """

    def __init__(
        self,
        input_dim: int,
        hidden_dim: int = 64,
        adversary_loss_weight: float = 1.0,
    ):
        super().__init__()
        self.adversary_loss_weight = adversary_loss_weight

        # Predictor network: X -> Y
        self.predictor = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1),
            nn.Sigmoid(),
        )

        # Adversary network: predictor_output -> A
        self.adversary = nn.Sequential(
            nn.Linear(1, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1),
            nn.Sigmoid(),
        )

    def forward(
        self, x: torch.Tensor
    ) -> tuple[torch.Tensor, torch.Tensor]:
        """Forward pass through both networks.

        Args:
            x: Input features tensor.

        Returns:
            Tuple of (prediction logits, adversary logits).
        """
        y_pred = self.predictor(x)
        a_pred = self.adversary(y_pred)
        return y_pred, a_pred


def train_adversarial_debiasing(
    model: AdversarialDebiasing,
    X_train: torch.Tensor,
    y_train: torch.Tensor,
    a_train: torch.Tensor,
    n_epochs: int = 100,
    lr_predictor: float = 1e-3,
    lr_adversary: float = 1e-3,
    batch_size: int = 256,
) -> Dict[str, List[float]]:
    """Train adversarial debiasing model with alternating optimization.

    Each epoch:
    1. Train adversary to predict A from predictor output (adversary step).
    2. Train predictor to predict Y while fooling adversary (predictor step).

    Args:
        model: AdversarialDebiasing model.
        X_train: Training features.
        y_train: Training labels.
        a_train: Protected attribute.
        n_epochs: Number of training epochs.
        lr_predictor: Learning rate for predictor.
        lr_adversary: Learning rate for adversary.
        batch_size: Mini-batch size.

    Returns:
        Dictionary with training history (losses per epoch).
    """
    pred_criterion = nn.BCELoss()
    adv_criterion = nn.BCELoss()

    pred_optimizer = optim.Adam(model.predictor.parameters(), lr=lr_predictor)
    adv_optimizer = optim.Adam(model.adversary.parameters(), lr=lr_adversary)

    dataset = torch.utils.data.TensorDataset(X_train, y_train, a_train)
    loader = torch.utils.data.DataLoader(
        dataset, batch_size=batch_size, shuffle=True
    )

    history = {"pred_loss": [], "adv_loss": [], "combined_loss": []}

    for epoch in range(n_epochs):
        epoch_pred_loss = 0.0
        epoch_adv_loss = 0.0
        n_batches = 0

        for x_batch, y_batch, a_batch in loader:
            # --- Adversary step: maximize adversary accuracy ---
            adv_optimizer.zero_grad()
            with torch.no_grad():
                y_pred = model.predictor(x_batch)
            a_pred = model.adversary(y_pred.detach())
            adv_loss = adv_criterion(a_pred.squeeze(), a_batch)
            adv_loss.backward()
            adv_optimizer.step()

            # --- Predictor step: maximize accuracy, minimize adversary ---
            pred_optimizer.zero_grad()
            y_pred = model.predictor(x_batch)
            a_pred = model.adversary(y_pred)

            pred_loss = pred_criterion(y_pred.squeeze(), y_batch)
            adv_loss_pred = adv_criterion(a_pred.squeeze(), a_batch)

            # Predictor wants to MINIMIZE pred_loss and MAXIMIZE adv_loss
            # (fool the adversary), so we subtract the adversary loss
            combined_loss = pred_loss - model.adversary_loss_weight * adv_loss_pred
            combined_loss.backward()
            pred_optimizer.step()

            epoch_pred_loss += pred_loss.item()
            epoch_adv_loss += adv_loss.item()
            n_batches += 1

        history["pred_loss"].append(epoch_pred_loss / n_batches)
        history["adv_loss"].append(epoch_adv_loss / n_batches)
        history["combined_loss"].append(
            history["pred_loss"][-1]
            - model.adversary_loss_weight * history["adv_loss"][-1]
        )

    return history

The adversary_loss_weight ($\lambda$) parameter controls the fairness-accuracy tradeoff. Setting $\lambda = 0$ recovers the unconstrained model. Increasing $\lambda$ enforces stronger fairness at the cost of accuracy. In practice, $\lambda$ is tuned on a validation set using the fairness metric of interest, and the final value represents the team's explicit tradeoff choice.

31.10 Post-Processing Interventions

Post-processing interventions modify the model's predictions after training, leaving the model itself unchanged. This separation is valuable in regulated settings: the model is validated once, and the post-processing layer can be adjusted independently.

31.10.1 Threshold Adjustment (Hardt et al., 2016)

The simplest and most widely used post-processing method is group-specific threshold adjustment. Instead of applying a single threshold $t$ to the model's score $S$, we apply group-specific thresholds $t_0$ and $t_1$ chosen to satisfy the desired fairness criterion.

For equalized odds, we find thresholds $(t_0, t_1)$ that minimize the equalized odds violation subject to accuracy constraints. For demographic parity, we find thresholds that equalize the selection rates across groups.

from scipy.optimize import minimize_scalar, minimize
from sklearn.metrics import accuracy_score


def find_equalized_odds_thresholds(
    y_true: np.ndarray,
    y_score: np.ndarray,
    sensitive: np.ndarray,
    accuracy_floor: float = 0.0,
) -> Dict[str, object]:
    """Find group-specific thresholds that minimize equalized odds violation.

    Searches over thresholds for each group to minimize the maximum of
    |TPR_0 - TPR_1| and |FPR_0 - FPR_1|, subject to an overall accuracy
    floor.

    Args:
        y_true: Ground truth binary labels.
        y_score: Predicted probabilities.
        sensitive: Protected attribute values (binary: 0 or 1).
        accuracy_floor: Minimum acceptable overall accuracy.

    Returns:
        Dictionary with optimal thresholds, metrics before and after.
    """
    groups = np.unique(sensitive)
    assert len(groups) == 2, "Binary protected attribute required."
    g0, g1 = groups[0], groups[1]

    mask_0 = sensitive == g0
    mask_1 = sensitive == g1

    def _rates(threshold_0: float, threshold_1: float):
        y_pred = np.zeros_like(y_true)
        y_pred[mask_0] = (y_score[mask_0] >= threshold_0).astype(int)
        y_pred[mask_1] = (y_score[mask_1] >= threshold_1).astype(int)

        tpr_0 = y_pred[mask_0 & (y_true == 1)].mean()
        tpr_1 = y_pred[mask_1 & (y_true == 1)].mean()
        fpr_0 = y_pred[mask_0 & (y_true == 0)].mean()
        fpr_1 = y_pred[mask_1 & (y_true == 0)].mean()
        acc = accuracy_score(y_true, y_pred)

        return tpr_0, tpr_1, fpr_0, fpr_1, acc, y_pred

    def _objective(params):
        t0, t1 = params
        tpr_0, tpr_1, fpr_0, fpr_1, acc, _ = _rates(t0, t1)
        if acc < accuracy_floor:
            return 1e6  # Penalty for violating accuracy floor
        eq_odds_violation = max(abs(tpr_0 - tpr_1), abs(fpr_0 - fpr_1))
        return eq_odds_violation

    # Grid search for initialization
    best_params = (0.5, 0.5)
    best_obj = _objective(best_params)
    for t0 in np.linspace(0.1, 0.9, 17):
        for t1 in np.linspace(0.1, 0.9, 17):
            obj = _objective((t0, t1))
            if obj < best_obj:
                best_obj = obj
                best_params = (t0, t1)

    # Refine with Nelder-Mead
    result = minimize(
        _objective,
        x0=best_params,
        method="Nelder-Mead",
        options={"xatol": 1e-4, "fatol": 1e-6, "maxiter": 1000},
    )
    opt_t0, opt_t1 = result.x

    # Compute before/after metrics
    default_threshold = 0.5
    _, _, _, _, acc_before, y_pred_before = _rates(
        default_threshold, default_threshold
    )
    tpr_0_a, tpr_1_a, fpr_0_a, fpr_1_a, acc_after, y_pred_after = _rates(
        opt_t0, opt_t1
    )

    fm_before = FairnessMetrics(y_true, y_pred_before, y_score, sensitive)
    fm_after = FairnessMetrics(y_true, y_pred_after, y_score, sensitive)

    return {
        "thresholds": {int(g0): float(opt_t0), int(g1): float(opt_t1)},
        "before": {
            "accuracy": float(acc_before),
            "eq_odds_diff": float(fm_before.equalized_odds_difference()),
            "dp_diff": float(fm_before.demographic_parity_difference()),
        },
        "after": {
            "accuracy": float(acc_after),
            "eq_odds_diff": float(fm_after.equalized_odds_difference()),
            "dp_diff": float(fm_after.demographic_parity_difference()),
        },
    }

31.10.2 Reject Option Classification

Reject option classification (Kamiran, Karim, and Zhang, 2012) applies fairness corrections only to instances in the uncertainty region — where the model's score is close to the decision boundary. For instances far from the boundary (very high or very low scores), the model's prediction is accepted as-is. For instances near the boundary, the prediction is flipped in favor of the disadvantaged group.

The intuition is that the model is most confident about instances far from the boundary, and the fairness correction has the least accuracy cost when applied to borderline cases. This limits the fairness-accuracy tradeoff to the region where the model is already uncertain.

def reject_option_classification(
    y_score: np.ndarray,
    sensitive: np.ndarray,
    threshold: float = 0.5,
    margin: float = 0.1,
    favorable_group: int = 0,
) -> np.ndarray:
    """Apply reject option classification for fairness.

    In the uncertainty region [threshold - margin, threshold + margin],
    assigns favorable outcomes to the disadvantaged group and unfavorable
    outcomes to the advantaged group. Outside this region, predictions
    are determined by the threshold alone.

    Args:
        y_score: Predicted probabilities.
        sensitive: Protected attribute values.
        threshold: Decision threshold.
        margin: Half-width of the uncertainty region.
        favorable_group: The group that currently receives favorable
            treatment (will be constrained in the uncertainty region).

    Returns:
        Adjusted binary predictions.
    """
    y_pred = (y_score >= threshold).astype(int)

    # Identify the uncertainty region
    in_margin = (y_score >= threshold - margin) & (y_score < threshold + margin)

    # In the uncertainty region:
    # - Disadvantaged group gets favorable outcome (1)
    # - Advantaged group gets unfavorable outcome (0)
    disadvantaged = sensitive != favorable_group

    y_pred[in_margin & disadvantaged] = 1
    y_pred[in_margin & ~disadvantaged] = 0

    return y_pred

31.11 Fairness Auditing with Fairlearn and AIF360

31.11.1 Fairlearn: The MetricFrame

Fairlearn's MetricFrame is the primary tool for fairness auditing. It computes any scikit-learn-compatible metric disaggregated by protected attribute.

from fairlearn.metrics import (
    MetricFrame,
    demographic_parity_difference,
    demographic_parity_ratio,
    equalized_odds_difference,
    selection_rate,
)
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
)


def fairlearn_audit(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    y_score: np.ndarray,
    sensitive: np.ndarray,
    group_names: Optional[Dict[int, str]] = None,
) -> Dict[str, object]:
    """Conduct a fairness audit using Fairlearn's MetricFrame.

    Computes standard performance metrics disaggregated by protected
    attribute, plus aggregate fairness metrics.

    Args:
        y_true: Ground truth binary labels.
        y_pred: Binary predictions.
        y_score: Predicted probabilities.
        sensitive: Protected attribute values.
        group_names: Optional mapping from group codes to readable names.

    Returns:
        Dictionary with 'per_group' (DataFrame), 'aggregate' (dict),
        and 'disparity' (dict) results.
    """
    metrics_dict = {
        "accuracy": accuracy_score,
        "precision": precision_score,
        "recall": recall_score,
        "f1": f1_score,
        "selection_rate": selection_rate,
    }

    mf = MetricFrame(
        metrics=metrics_dict,
        y_true=y_true,
        y_pred=y_pred,
        sensitive_features=sensitive,
    )

    per_group = mf.by_group.copy()
    if group_names is not None:
        per_group.index = per_group.index.map(
            lambda g: group_names.get(g, str(g))
        )

    aggregate_fairness = {
        "demographic_parity_difference": float(
            demographic_parity_difference(y_true, y_pred,
                                          sensitive_features=sensitive)
        ),
        "demographic_parity_ratio": float(
            demographic_parity_ratio(y_true, y_pred,
                                     sensitive_features=sensitive)
        ),
        "equalized_odds_difference": float(
            equalized_odds_difference(y_true, y_pred,
                                      sensitive_features=sensitive)
        ),
    }

    disparity = mf.difference(method="between_groups")

    return {
        "per_group": per_group,
        "aggregate": mf.overall.to_dict(),
        "fairness_metrics": aggregate_fairness,
        "disparity": disparity.to_dict(),
    }

31.11.2 AIF360: Bias Detection and Mitigation Pipeline

IBM's AI Fairness 360 (AIF360) provides a more comprehensive pipeline, including pre-processing, in-processing, and post-processing algorithms, plus a rich set of dataset-level and classifier-level fairness metrics.

from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing as AIF360Reweighing
from aif360.algorithms.postprocessing import (
    CalibratedEqOddsPostprocessing,
    RejectOptionClassification as AIF360ROC,
)


def aif360_audit(
    df: pd.DataFrame,
    label_col: str,
    favorable_label: int,
    protected_col: str,
    privileged_value: int,
    score_col: Optional[str] = None,
    pred_col: Optional[str] = None,
) -> Dict[str, float]:
    """Conduct a fairness audit using AIF360.

    Computes dataset-level and (optionally) classifier-level fairness
    metrics using AIF360's metric classes.

    Args:
        df: DataFrame containing labels, predictions, and protected attribute.
        label_col: Column name for true labels.
        favorable_label: The favorable outcome value (e.g., 1 for approved).
        protected_col: Column name for protected attribute.
        privileged_value: Value of protected attribute for privileged group.
        score_col: Column name for predicted scores (optional).
        pred_col: Column name for predicted labels (optional).

    Returns:
        Dictionary of fairness metric values.
    """
    dataset = BinaryLabelDataset(
        df=df,
        label_names=[label_col],
        protected_attribute_names=[protected_col],
        favorable_label=favorable_label,
        unfavorable_label=1 - favorable_label,
    )

    privileged_groups = [{protected_col: privileged_value}]
    unprivileged_groups = [{protected_col: 1 - privileged_value}]

    dataset_metric = BinaryLabelDatasetMetric(
        dataset,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )

    results = {
        "statistical_parity_difference": float(
            dataset_metric.statistical_parity_difference()
        ),
        "disparate_impact_ratio": float(
            dataset_metric.disparate_impact()
        ),
    }

    if pred_col is not None:
        pred_df = df.copy()
        pred_dataset = BinaryLabelDataset(
            df=pred_df,
            label_names=[pred_col],
            protected_attribute_names=[protected_col],
            favorable_label=favorable_label,
            unfavorable_label=1 - favorable_label,
        )

        clf_metric = ClassificationMetric(
            dataset,
            pred_dataset,
            unprivileged_groups=unprivileged_groups,
            privileged_groups=privileged_groups,
        )

        results.update({
            "equal_opportunity_difference": float(
                clf_metric.equal_opportunity_difference()
            ),
            "average_odds_difference": float(
                clf_metric.average_odds_difference()
            ),
            "theil_index": float(clf_metric.theil_index()),
        })

    return results

31.11.3 Choosing Between Fairlearn and AIF360

Dimension	Fairlearn	AIF360
Integration	Native scikit-learn compatibility	Requires AIF360 dataset wrappers
Metric computation	`MetricFrame` — any sklearn metric, disaggregated	Specialized metric classes, more metrics
Mitigation	`ExponentiatedGradient`, `ThresholdOptimizer`	Reweighing, adversarial debiasing, calibrated eq. odds
Dashboard	Interactive Fairlearn Dashboard (deprecated) / `MetricFrame` + matplotlib	No built-in dashboard
Maintenance	Actively maintained (Microsoft)	Less active maintenance
Recommendation	Preferred for most production workflows	Useful for its broader algorithm library

In practice, many teams use Fairlearn for day-to-day metric computation and monitoring and draw on AIF360's algorithm library for specific mitigation strategies not available in Fairlearn.

31.12 Organizational Practice

Technical interventions are necessary but not sufficient. Fairness in ML requires organizational structures that ensure the right questions are asked, the right criteria are chosen, and the right monitoring is sustained.

31.12.1 The Fairness Review Board

A fairness review board (FRB) is a cross-functional body that reviews ML systems for fairness before deployment and periodically after deployment. Its composition typically includes:

Data scientists (technical assessment)
Legal/compliance (regulatory requirements)
Domain experts (business context)
Ethicists or external advisors (values alignment)
Representatives from affected communities (lived experience)

The FRB's core responsibilities:

Metric selection. For each model, the FRB selects the fairness criteria to enforce. This decision is based on the domain (what harm are we trying to prevent?), the legal framework (what does the law require?), and stakeholder input (what do affected communities prioritize?). The impossibility theorem guarantees that this choice involves tradeoffs, and the FRB documents the reasoning.
Threshold setting. The FRB sets acceptable bounds for fairness metrics. For example: demographic parity ratio above 0.80 (the four-fifths rule) and equal opportunity difference below 0.05.
Pre-deployment review. Before any model serving protected-attribute-sensitive decisions reaches production, the FRB reviews the fairness audit report.
Ongoing monitoring review. The FRB reviews fairness monitoring dashboards quarterly, or sooner if alerts trigger.

31.12.2 The Fairness Metric Selection Framework

Selecting the right fairness criterion is the most important — and most difficult — decision in fairness practice. The following framework provides structure:

Question	Criteria Favored
Is the primary harm denying a benefit to qualified individuals?	Equal opportunity
Is the primary harm differential error rates?	Equalized odds
Is the primary harm underrepresentation in outcomes?	Demographic parity
Must predictions mean the same thing across groups?	Calibration / predictive parity
Is the decision subject to disparate impact law?	Four-fifths rule (demographic parity ratio)
Is individual-level treatment the core concern?	Individual fairness / counterfactual fairness

In practice, teams often monitor multiple metrics and set binding constraints on one or two. Meridian Financial, operating under ECOA, sets a binding constraint on the four-fifths rule (demographic parity ratio $\geq 0.80$) and monitors equalized odds difference as a secondary metric.

31.12.3 Ongoing Monitoring

Fairness is not a one-time check. Model fairness can degrade over time as the data distribution shifts, the population changes, or upstream features evolve. Ongoing monitoring requires:

Automated metric computation. Fairness metrics computed at every model retraining and every scoring batch.
Alerting thresholds. Alerts when any fairness metric crosses a predefined threshold.
Dashboard. A dashboard showing fairness metrics over time, disaggregated by protected attribute.
Quarterly review. The FRB reviews the monitoring dashboard and the last quarter's metric trends.

@dataclass
class FairnessMonitorConfig:
    """Configuration for ongoing fairness monitoring.

    Defines the metrics to track, the thresholds for alerting,
    and the monitoring cadence.

    Attributes:
        model_name: Name of the monitored model.
        protected_attributes: List of protected attribute column names.
        metrics: Mapping from metric name to alert threshold.
        monitoring_cadence: How often to compute metrics ('daily',
            'weekly', 'per_retrain').
        escalation_policy: Who to notify when thresholds are breached.
    """

    model_name: str
    protected_attributes: List[str]
    metrics: Dict[str, float] = field(default_factory=lambda: {
        "demographic_parity_ratio": 0.80,
        "equalized_odds_difference": 0.10,
        "equal_opportunity_difference": 0.05,
    })
    monitoring_cadence: str = "per_retrain"
    escalation_policy: Dict[str, List[str]] = field(default_factory=lambda: {
        "warning": ["ml-team@company.com"],
        "critical": ["ml-team@company.com", "legal@company.com",
                      "fairness-review-board@company.com"],
    })

    def check_metrics(
        self,
        computed_metrics: Dict[str, float],
    ) -> Dict[str, str]:
        """Check computed metrics against thresholds.

        Args:
            computed_metrics: Dictionary of metric_name -> measured_value.

        Returns:
            Dictionary of metric_name -> status ('pass', 'warning', 'critical').
        """
        results = {}
        for metric_name, threshold in self.metrics.items():
            if metric_name not in computed_metrics:
                results[metric_name] = "missing"
                continue

            value = computed_metrics[metric_name]

            # For ratio metrics, lower is worse
            if "ratio" in metric_name:
                if value >= threshold:
                    results[metric_name] = "pass"
                elif value >= threshold * 0.9:
                    results[metric_name] = "warning"
                else:
                    results[metric_name] = "critical"
            # For difference metrics, higher is worse
            else:
                if value <= threshold:
                    results[metric_name] = "pass"
                elif value <= threshold * 1.1:
                    results[metric_name] = "warning"
                else:
                    results[metric_name] = "critical"

        return results

31.13 Progressive Project M15: Fairness Audit of StreamRec

The StreamRec recommendation system, developed throughout Parts I-V, serves content recommendations to millions of users. In this milestone, we conduct a fairness audit along two dimensions: creator fairness (are content creators given equitable exposure?) and user fairness (do users from different demographic groups receive equally good recommendations?).

31.13.1 Creator Fairness: Exposure Equity

Recommendation systems create winner-take-all dynamics. A small fraction of creators receive the vast majority of recommendations, while the long tail receives almost none. This is partly a reflection of quality and popularity — but it is also a reflection of the recommendation algorithm's optimization objective (maximize user engagement), which systematically favors established creators with proven engagement metrics.

Fairness definition for creators: We adopt a demographic parity-like criterion for creator exposure. For each creator demographic group (defined by region, language, and account age), the fraction of total impressions received should be roughly proportional to the fraction of content produced.

@dataclass
class CreatorFairnessAudit:
    """Audit creator fairness in recommendation exposure.

    Computes whether recommendation exposure is distributed equitably
    across creator demographic groups, relative to their content
    production share.

    Attributes:
        creator_impressions: DataFrame with columns 'creator_id',
            'impressions', and demographic columns.
        creator_catalog: DataFrame with columns 'creator_id',
            'n_items', and demographic columns.
        demographic_cols: List of column names defining creator groups.
    """

    creator_impressions: pd.DataFrame
    creator_catalog: pd.DataFrame
    demographic_cols: List[str]

    def compute_exposure_equity(self) -> pd.DataFrame:
        """Compute exposure-to-production ratio by demographic group.

        For each group, computes:
        - production_share: fraction of total items in catalog
        - impression_share: fraction of total impressions
        - equity_ratio: impression_share / production_share
            (1.0 = perfectly proportional)

        Returns:
            DataFrame with group-level exposure equity metrics.
        """
        # Aggregate catalog by demographic group
        catalog_agg = (
            self.creator_catalog
            .groupby(self.demographic_cols)
            .agg(n_items=("n_items", "sum"), n_creators=("creator_id", "nunique"))
            .reset_index()
        )
        total_items = catalog_agg["n_items"].sum()
        catalog_agg["production_share"] = catalog_agg["n_items"] / total_items

        # Aggregate impressions by demographic group
        impressions_agg = (
            self.creator_impressions
            .groupby(self.demographic_cols)
            .agg(
                total_impressions=("impressions", "sum"),
                n_creators=("creator_id", "nunique"),
            )
            .reset_index()
        )
        total_impressions = impressions_agg["total_impressions"].sum()
        impressions_agg["impression_share"] = (
            impressions_agg["total_impressions"] / total_impressions
        )

        # Merge and compute equity ratio
        merged = catalog_agg.merge(
            impressions_agg,
            on=self.demographic_cols,
            suffixes=("_catalog", "_impressions"),
        )
        merged["equity_ratio"] = (
            merged["impression_share"] / merged["production_share"]
        )

        return merged[
            self.demographic_cols
            + ["production_share", "impression_share", "equity_ratio",
               "n_creators_catalog", "n_creators_impressions"]
        ]

    def flag_underexposed_groups(
        self,
        equity_threshold: float = 0.5,
    ) -> pd.DataFrame:
        """Identify creator groups receiving disproportionately low exposure.

        A group is flagged if its equity_ratio (impression_share /
        production_share) falls below the threshold.

        Args:
            equity_threshold: Minimum acceptable equity ratio.

        Returns:
            Filtered DataFrame of underexposed groups.
        """
        equity = self.compute_exposure_equity()
        return equity[equity["equity_ratio"] < equity_threshold]

31.13.2 User Fairness: Recommendation Quality Across Demographics

Fairness definition for users: We adopt an equal opportunity criterion. The recommendation quality (measured by Hit@10, NDCG@10, and completion rate) should be approximately equal across user demographic groups.

@dataclass
class UserFairnessAudit:
    """Audit user fairness in recommendation quality.

    Computes whether recommendation quality is equitable across
    user demographic groups.

    Attributes:
        user_metrics: DataFrame with columns 'user_id', 'hit_at_10',
            'ndcg_at_10', 'completion_rate', and demographic columns.
        demographic_cols: List of column names defining user groups.
    """

    user_metrics: pd.DataFrame
    demographic_cols: List[str]

    def compute_quality_by_group(self) -> pd.DataFrame:
        """Compute recommendation quality metrics by user demographic group.

        Returns:
            DataFrame with group-level mean metrics and group sizes.
        """
        quality_cols = ["hit_at_10", "ndcg_at_10", "completion_rate"]
        grouped = (
            self.user_metrics
            .groupby(self.demographic_cols)
            .agg(
                n_users=("user_id", "nunique"),
                **{
                    f"mean_{col}": (col, "mean")
                    for col in quality_cols
                },
                **{
                    f"std_{col}": (col, "std")
                    for col in quality_cols
                },
            )
            .reset_index()
        )
        return grouped

    def compute_quality_disparity(self) -> Dict[str, float]:
        """Compute the maximum disparity in each quality metric.

        For each quality metric, computes the difference between the
        best-served and worst-served group.

        Returns:
            Dictionary of metric_name -> max group disparity.
        """
        grouped = self.compute_quality_by_group()
        quality_cols = ["hit_at_10", "ndcg_at_10", "completion_rate"]
        disparities = {}
        for col in quality_cols:
            mean_col = f"mean_{col}"
            disparity = grouped[mean_col].max() - grouped[mean_col].min()
            disparities[col] = float(disparity)
        return disparities

    def generate_audit_report(self) -> str:
        """Generate a human-readable fairness audit report.

        Returns:
            Formatted string with group metrics and disparity analysis.
        """
        grouped = self.compute_quality_by_group()
        disparities = self.compute_quality_disparity()

        lines = ["=== StreamRec User Fairness Audit ===", ""]
        lines.append("Quality Metrics by User Group:")
        lines.append(grouped.to_string(index=False))
        lines.append("")
        lines.append("Maximum Disparities:")
        for metric, value in disparities.items():
            status = "PASS" if value < 0.05 else "REVIEW"
            lines.append(f"  {metric}: {value:.4f} [{status}]")
        return "\n".join(lines)

31.13.3 Post-Processing Adjustments

After auditing, the team applies post-processing adjustments to the re-ranking stage of the StreamRec pipeline (Chapter 24, Section 24.8). The re-ranker already applies diversity and freshness constraints; the fairness module adds an exposure equity boost for underexposed creator groups and a quality floor for underserved user groups.

def fairness_aware_reranking(
    candidate_scores: np.ndarray,
    creator_groups: np.ndarray,
    underexposed_groups: set,
    exposure_boost: float = 0.15,
    top_k: int = 10,
) -> np.ndarray:
    """Apply fairness-aware re-ranking to recommendation candidates.

    Boosts scores for items from underexposed creator groups to
    improve exposure equity. The boost is applied before final
    top-k selection.

    Args:
        candidate_scores: Model scores for each candidate item.
        creator_groups: Group label for each candidate item's creator.
        underexposed_groups: Set of group labels to boost.
        exposure_boost: Additive score boost for underexposed groups.
        top_k: Number of items to return.

    Returns:
        Indices of selected items after fairness-aware re-ranking.
    """
    adjusted_scores = candidate_scores.copy()

    for i, group in enumerate(creator_groups):
        if group in underexposed_groups:
            adjusted_scores[i] += exposure_boost

    top_indices = np.argsort(-adjusted_scores)[:top_k]
    return top_indices

The exposure_boost parameter is calibrated empirically: increase the boost until the equity ratio for underexposed groups exceeds the threshold (e.g., 0.7), then verify that the user-side quality metrics (Hit@10, NDCG@10) do not degrade beyond the acceptable floor.

31.14 Synthesis: The Fairness Audit Framework

The chapter's technical and organizational components assemble into a five-stage fairness audit framework:

Stage 1: Scope and Context. Define the decision being made, the protected attributes, the potential harms, and the legal framework. For Meridian Financial: credit decisions under ECOA, protected attributes of race, gender, age, and national origin, potential harm of credit denial to qualified applicants.

Stage 2: Metric Selection. Using the metric selection framework (Section 31.12.2), choose the primary fairness criterion and secondary monitoring metrics. Document the reasoning and the tradeoffs accepted. For Meridian: primary criterion is four-fifths rule (demographic parity ratio $\geq 0.80$), secondary monitoring of equalized odds and equal opportunity.

Stage 3: Baseline Assessment. Compute all fairness metrics on the current model using FairnessMetrics, Fairlearn's MetricFrame, or AIF360's ClassificationMetric. Compute intersectional metrics. Identify the worst-served groups.

Stage 4: Intervention. If the baseline fails the selected criterion, apply interventions in order of increasing invasiveness: (1) post-processing threshold adjustment, (2) pre-processing reweighing, (3) in-processing constrained optimization, (4) model redesign (feature removal, alternative architectures). Evaluate each intervention against both the fairness criterion and the accuracy floor.

Stage 5: Monitoring and Review. Deploy the FairnessMonitorConfig with automated alerting. Schedule quarterly FRB reviews. Re-audit when the model is retrained, when data sources change, or when monitoring alerts trigger.

Production ML = Software Engineering: The fairness audit framework is not a research exercise — it is a production system with monitoring, alerting, and incident response, integrated into the ML pipeline alongside the data validation (Chapter 28), model validation (Chapter 28), and monitoring (Chapter 30) infrastructure. Fairness metrics are just another category of model quality metrics, computed at every retraining and every scoring batch, with the same alerting and escalation policies as any other production metric. The difference is that fairness metrics require human judgment for threshold setting and incident response — a judgment that the FRB provides.

31.15 Summary

Fairness in machine learning is not a property that a model either has or lacks. It is a family of incompatible mathematical criteria, each encoding a different ethical commitment. The impossibility theorem (Chouldechova, 2017; Kleinberg, Mullainathan, and Raghavan, 2016) proves that calibration and equalized odds cannot hold simultaneously when base rates differ — forcing every deployment to make an explicit choice among fairness definitions.

The choice is not arbitrary. It is guided by the domain (what harm are we preventing?), the legal framework (what does the law require?), and the affected stakeholders (what do communities prioritize?). The data scientist's role is to make the choice explicit, quantify the tradeoffs, implement the selected criterion with engineering rigor, and monitor it over time.

The technical toolkit spans three intervention points: pre-processing (reweighing, disparate impact removal), in-processing (constrained optimization with Fairlearn, adversarial debiasing), and post-processing (threshold adjustment, reject option classification). The organizational toolkit includes fairness review boards, metric selection frameworks, and ongoing monitoring integrated into the production ML pipeline.

For StreamRec, the M15 fairness audit examined both creator fairness (exposure equity) and user fairness (recommendation quality across demographics), using Fairlearn's MetricFrame for user-side metrics and a custom CreatorFairnessAudit for the supply side. Post-processing adjustments to the re-ranking layer improved exposure equity without degrading user-side quality below the acceptable floor.

There is no "fair" algorithm. There are specific fairness criteria, specific measurements, specific tradeoffs, and specific organizational commitments. The discipline of fairness in ML is the discipline of making those specifics explicit.