20 min read

> Core Principle --- Fairness is not a feature you add at the end. It is a constraint you design into the system from the start. I have watched teams build excellent models --- AUC above 0.90, precision that would make a textbook proud --- and then...

Chapter 33: Fairness, Bias, and Responsible ML

When Your Model Works Better for Some People Than Others


Learning Objectives

By the end of this chapter, you will be able to:

  1. Identify sources of bias in ML systems (historical, representation, measurement, aggregation)
  2. Define and compute fairness metrics (demographic parity, equalized odds, predictive parity)
  3. Understand the impossibility theorem: you cannot satisfy all fairness criteria simultaneously
  4. Apply bias mitigation techniques (pre-processing, in-processing, post-processing)
  5. Create model cards for responsible deployment documentation

Fairness Is Not a Feature You Add at the End

Core Principle --- Fairness is not a feature you add at the end. It is a constraint you design into the system from the start. I have watched teams build excellent models --- AUC above 0.90, precision that would make a textbook proud --- and then discover, three months into production, that the model works significantly worse for Black patients than white patients. Not because anyone intended harm, but because the training data had 2.4 times as many white patients as Black patients, and the model learned to predict well where the data was dense and poorly where it was sparse. The result: a care coordination team that allocates follow-up resources based on model scores is systematically under-serving the patients who need help most.

This is not a hypothetical. It is a documented pattern in healthcare, criminal justice, lending, hiring, and insurance. And it is the central concern of this chapter.

Here is the uncomfortable truth about ML fairness: your model inherits every bias in your data. If the historical data reflects decades of systemic inequality --- and it does --- then the model will learn to reproduce that inequality, confidently and at scale. A model trained on historical hiring decisions will learn that men are better candidates, because men were historically hired more often. A model trained on historical lending decisions will learn that certain zip codes are higher risk, because those zip codes were historically redlined. A model trained on historical clinical data will predict better for populations that received better care, because more data was collected about them.

The model is not wrong. The model is faithfully reproducing the patterns in the data. The problem is that those patterns encode injustice.

This chapter gives you the tools to detect, measure, and mitigate that injustice. We cover four areas:

  1. Sources of bias --- where bias enters the ML pipeline
  2. Fairness metrics --- how to quantify whether a model is fair (and for whom)
  3. The impossibility theorem --- why you cannot satisfy all fairness definitions simultaneously
  4. Mitigation strategies --- pre-processing, in-processing, and post-processing techniques to reduce bias

The running examples are Metro General Hospital (where a readmission prediction model has different error rates across racial groups) and StreamFlow (where a churn model may predict differently for different demographics). Both are real patterns. Both demand a response.


Part 1: Where Bias Comes From

Bias does not appear from nowhere. It enters the ML pipeline at specific points, through specific mechanisms. Understanding these entry points is the first step toward mitigation.

Historical Bias

Definition: The training data reflects historical patterns of discrimination or inequality, and the model learns to reproduce them.

Example: Metro General Hospital's readmission dataset contains 10 years of patient records. During that period, Black and Hispanic patients in the surrounding community had lower rates of health insurance, fewer primary care providers per capita, and longer average distances to pharmacies. As a result, their readmission rates are higher --- not because of inherent health differences, but because of systemic barriers to post-discharge care. A model trained on this data learns that race (or its proxies: zip code, insurance type) is predictive of readmission. It is predictive --- because the system was inequitable. The model then allocates follow-up resources based on these predictions, reinforcing the existing disparity.

Key insight: Removing the protected attribute (race) from the feature set does not solve historical bias. The information leaks through correlated features --- zip code, insurance type, language, admission source. This is called the problem of proxy variables.

Representation Bias

Definition: The training data does not adequately represent all subgroups, leading to worse model performance for underrepresented groups.

Example: Metro General's dataset contains 14,200 patient records. The breakdown by race:

Group Count Percentage
White 6,390 45.0%
Black 3,550 25.0%
Hispanic 2,840 20.0%
Asian 710 5.0%
Other/Unknown 710 5.0%

The model has seen 6,390 white patients and 710 Asian patients. It has learned far more about the patterns that predict readmission for white patients. The result: AUC of 0.84 for white patients, but only 0.71 for Asian patients. The model is not biased in intent --- it simply has less information about some groups than others.

Measurement Bias

Definition: The outcome variable or the features are measured differently for different groups.

Example: Metro General uses the LACE index (Length of stay, Acuity of admission, Comorbidities, Emergency department visits in the past 6 months) as one of its input features. But "emergency department visits in the past 6 months" is measured from Metro General's records only. Patients who use a different hospital's ED --- a pattern more common among patients without a regular primary care provider, who are disproportionately low-income and minority --- will have artificially low ED visit counts. The feature systematically underestimates risk for exactly the patients who are at highest risk.

Aggregation Bias

Definition: A single model is built for all groups, but the relationship between features and outcome differs across groups.

Example: For elderly patients (age > 75) at Metro General, the strongest predictor of readmission is discharge disposition (whether the patient goes home alone vs. to a skilled nursing facility). For younger patients (age 25--45), the strongest predictor is number of prior admissions in the last 12 months. A single model averages across these distinct patterns, performing adequately for both but optimally for neither.

The Bias Pipeline

The following diagram shows where each type of bias enters the ML workflow:

Real World         Data Collection       Feature Engineering     Model Training       Deployment
    |                    |                       |                     |                  |
 Historical         Representation          Measurement          Aggregation         Feedback
   bias               bias                    bias                 bias               loops
    |                    |                       |                     |                  |
 Systemic           Who is in             How variables         One model for       Model predictions
 inequality         the dataset?          are measured?         all groups?         affect future data

Key Insight --- Bias is not a single problem with a single solution. It is a family of problems that enter at different stages and require different interventions. A team that only checks for bias at the evaluation stage will miss representation bias (data collection), measurement bias (feature engineering), and historical bias (the world itself).


Part 2: Fairness Metrics --- Quantifying What "Fair" Means

The word "fair" is dangerously vague. In everyday language, it means "just" or "equitable." In ML, it has multiple precise mathematical definitions, and they frequently contradict each other. This section defines the five most important fairness metrics, shows how to compute them, and demonstrates the tension between them.

Setup: The Metro General Readmission Model

We will use a simulated version of Metro General's readmission model throughout this section. The model predicts whether a patient will be readmitted within 30 days. The protected attribute is race.

import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    confusion_matrix, roc_auc_score
)

np.random.seed(42)

n = 14200

# Demographics
race = np.random.choice(
    ['White', 'Black', 'Hispanic', 'Asian'],
    size=n,
    p=[0.45, 0.25, 0.20, 0.10]
)

age = np.random.normal(62, 15, n).clip(18, 99).astype(int)
length_of_stay = np.random.exponential(4.5, n).round(1).clip(1, 45)
num_prior_admissions = np.random.poisson(1.2, n)
num_medications = np.random.poisson(8, n).clip(1, 30)
comorbidity_index = np.random.poisson(2, n)
ed_visits_6m = np.random.poisson(1.5, n)
has_pcp = np.random.binomial(1, 0.65, n)
insurance_type = np.random.choice(
    ['Medicare', 'Medicaid', 'Commercial', 'Uninsured'],
    size=n, p=[0.38, 0.22, 0.31, 0.09]
)
discharge_to_snf = np.random.binomial(1, 0.15, n)

# Encode insurance
insurance_map = {'Medicare': 0, 'Medicaid': 1, 'Commercial': 2, 'Uninsured': 3}
insurance_encoded = np.array([insurance_map[i] for i in insurance_type])

# Encode race for data generation (NOT as a model feature)
race_risk = {'White': 0.0, 'Black': 0.06, 'Hispanic': 0.04, 'Asian': -0.02}
race_offset = np.array([race_risk[r] for r in race])

# Generate readmission outcome with group-varying base rates
logit = (
    -2.0
    + 0.015 * age
    + 0.08 * length_of_stay
    + 0.25 * num_prior_admissions
    + 0.03 * num_medications
    + 0.15 * comorbidity_index
    + 0.10 * ed_visits_6m
    - 0.30 * has_pcp
    - 0.20 * discharge_to_snf
    + 0.10 * (insurance_encoded == 3).astype(int)  # uninsured penalty
    + race_offset  # differential base rates
)
prob = 1 / (1 + np.exp(-logit))
readmitted = np.random.binomial(1, prob)

df = pd.DataFrame({
    'age': age,
    'length_of_stay': length_of_stay,
    'num_prior_admissions': num_prior_admissions,
    'num_medications': num_medications,
    'comorbidity_index': comorbidity_index,
    'ed_visits_6m': ed_visits_6m,
    'has_pcp': has_pcp,
    'insurance_encoded': insurance_encoded,
    'discharge_to_snf': discharge_to_snf,
    'race': race,
    'readmitted': readmitted,
})

# Base rates by group
print("Readmission rates by race:")
print(df.groupby('race')['readmitted'].mean().round(3))
print(f"\nOverall readmission rate: {df['readmitted'].mean():.3f}")

Notice what this data encodes: different racial groups have different base rates of readmission. These base rate differences are the source of almost every fairness tension that follows.

Training the Model

The model uses clinical features only --- no race variable. This is a common and insufficient approach to fairness.

feature_cols = [
    'age', 'length_of_stay', 'num_prior_admissions',
    'num_medications', 'comorbidity_index', 'ed_visits_6m',
    'has_pcp', 'insurance_encoded', 'discharge_to_snf'
]

X = df[feature_cols]
y = df['readmitted']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Keep race for fairness analysis (not used in training)
race_test = df.loc[X_test.index, 'race'].values

model = GradientBoostingClassifier(
    n_estimators=200, max_depth=4, learning_rate=0.1, random_state=42
)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

print(f"Overall Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Overall AUC:      {roc_auc_score(y_test, y_prob):.3f}")

Warning

--- Removing the protected attribute from the feature set does not make the model fair. Correlated features (insurance type, zip code, comorbidity patterns) carry much of the same information. This is sometimes called "fairness through unawareness," and it is the most common --- and least effective --- approach to fairness.

Metric 1: Demographic Parity (Statistical Parity)

Definition: A model satisfies demographic parity if the probability of a positive prediction is the same across all groups.

$$P(\hat{Y} = 1 \mid A = a) = P(\hat{Y} = 1 \mid A = b) \quad \forall \, a, b$$

In plain language: the model should flag the same proportion of patients in each racial group for follow-up.

def demographic_parity(y_pred, groups):
    """Compute positive prediction rate for each group."""
    results = {}
    for group in np.unique(groups):
        mask = groups == group
        results[group] = y_pred[mask].mean()
    return pd.Series(results, name='P(Y_hat=1)')

dp = demographic_parity(y_pred, race_test)
print("Demographic Parity (positive prediction rate by group):")
print(dp.round(3))
print(f"\nMax disparity: {dp.max() - dp.min():.3f}")

Interpretation: If one group has a positive prediction rate of 0.22 and another has 0.15, the model is flagging more patients in the first group for follow-up. Whether this is a problem depends on context. If the first group genuinely has a higher readmission rate (different base rates), demographic parity would require the model to ignore that difference --- which means under-intervening for the high-risk group or over-intervening for the low-risk group.

When demographic parity matters: In settings where the decision itself should be independent of group membership, such as hiring or lending. If you believe that equally qualified applicants should receive offers at the same rate regardless of race, demographic parity is the right metric.

When demographic parity is misleading: In settings where base rates genuinely differ and the model's purpose is to allocate resources based on risk. If Black patients at Metro General genuinely have a higher readmission rate (due to systemic barriers to post-discharge care), then a model that satisfies demographic parity would flag fewer Black patients than their actual risk warrants --- resulting in less follow-up care for the group that needs it most.

Metric 2: Equalized Odds

Definition: A model satisfies equalized odds if the true positive rate (TPR) and false positive rate (FPR) are the same across all groups.

$$P(\hat{Y} = 1 \mid Y = 1, A = a) = P(\hat{Y} = 1 \mid Y = 1, A = b)$$ $$P(\hat{Y} = 1 \mid Y = 0, A = a) = P(\hat{Y} = 1 \mid Y = 0, A = b)$$

In plain language: among patients who will be readmitted, the model catches them at the same rate in every group. And among patients who will not be readmitted, the model incorrectly flags them at the same rate in every group.

def equalized_odds(y_true, y_pred, groups):
    """Compute TPR and FPR for each group."""
    results = {}
    for group in np.unique(groups):
        mask = groups == group
        tn, fp, fn, tp = confusion_matrix(
            y_true[mask], y_pred[mask]
        ).ravel()
        tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
        fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
        results[group] = {'TPR': round(tpr, 3), 'FPR': round(fpr, 3)}
    return pd.DataFrame(results).T

eo = equalized_odds(y_test.values, y_pred, race_test)
print("Equalized Odds:")
print(eo)
print(f"\nTPR range: {eo['TPR'].max() - eo['TPR'].min():.3f}")
print(f"FPR range: {eo['FPR'].max() - eo['FPR'].min():.3f}")

Interpretation: If the model has a TPR of 0.72 for white patients and 0.58 for Black patients, it is catching 72% of white readmissions but only 58% of Black readmissions. The model is systematically missing more Black patients who will be readmitted. This directly translates to inequitable care: Black patients who need follow-up are less likely to receive it.

When equalized odds matters: In any setting where errors have consequences that should be equally distributed. A criminal justice risk assessment that has a higher false positive rate for Black defendants than white defendants means more Black defendants are unnecessarily detained. A medical screening tool with a lower TPR for one group means that group's diseases are caught less often.

Metric 3: Equal Opportunity

Definition: A relaxed version of equalized odds. Only the true positive rate must be equal across groups.

$$P(\hat{Y} = 1 \mid Y = 1, A = a) = P(\hat{Y} = 1 \mid Y = 1, A = b)$$

In plain language: among patients who will actually be readmitted, the model catches them at the same rate in every group. This is equalized odds without the FPR constraint.

When to use: When the cost of false negatives is much higher than the cost of false positives. In readmission prediction, missing a patient who will be readmitted (false negative) has a higher human cost than unnecessarily following up with a patient who will not be readmitted (false positive). Equal opportunity ensures the model's misses are equally distributed.

Metric 4: Predictive Parity

Definition: A model satisfies predictive parity if the positive predictive value (precision) is the same across groups.

$$P(Y = 1 \mid \hat{Y} = 1, A = a) = P(Y = 1 \mid \hat{Y} = 1, A = b)$$

In plain language: when the model flags a patient as high-risk, the probability that the patient is actually readmitted is the same regardless of which group the patient belongs to.

def predictive_parity(y_true, y_pred, groups):
    """Compute precision (PPV) for each group."""
    results = {}
    for group in np.unique(groups):
        mask = groups == group
        tp = ((y_pred[mask] == 1) & (y_true.values[mask] == 1)).sum()
        pp = (y_pred[mask] == 1).sum()
        ppv = tp / pp if pp > 0 else 0
        results[group] = round(ppv, 3)
    return pd.Series(results, name='PPV (Precision)')

pp = predictive_parity(y_test, y_pred, race_test)
print("Predictive Parity (precision by group):")
print(pp)
print(f"\nPrecision range: {pp.max() - pp.min():.3f}")

When predictive parity matters: When the consequences of acting on a positive prediction are significant. If a "high-risk" label triggers an expensive intervention (a home visit, a loan denial, a parole revocation), predictive parity ensures that the intervention is equally well-targeted across groups. Without predictive parity, a "high-risk" label for one group might mean a 50% chance of actual readmission, while for another group it means a 30% chance --- the same label carries different information depending on who receives it.

Metric 5: Calibration by Group

Definition: A model is calibrated by group if, for a given predicted probability, the actual outcome rate is the same across groups.

$$P(Y = 1 \mid S = s, A = a) = P(Y = 1 \mid S = s, A = b) = s$$

In plain language: if the model predicts a 25% readmission probability for a patient, it should not matter whether that patient is white, Black, Hispanic, or Asian --- the actual readmission rate among patients with a 25% prediction should be approximately 25% in every group.

def calibration_by_group(y_true, y_prob, groups, n_bins=5):
    """Check calibration across groups using probability bins."""
    bins = np.linspace(0, 1, n_bins + 1)
    results = []
    for group in np.unique(groups):
        mask = groups == group
        for i in range(n_bins):
            bin_mask = mask & (y_prob >= bins[i]) & (y_prob < bins[i+1])
            if bin_mask.sum() > 10:
                results.append({
                    'group': group,
                    'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
                    'mean_predicted': y_prob[bin_mask].mean(),
                    'mean_actual': y_true.values[bin_mask].mean(),
                    'count': bin_mask.sum()
                })
    return pd.DataFrame(results)

cal = calibration_by_group(y_test, y_prob, race_test)
print("Calibration by Group (selected bins):")
print(cal.to_string(index=False))

Why calibration matters: Calibration determines whether the model's predicted probabilities can be used as actual risk estimates. If a clinician sees "35% readmission risk" and uses that number to make a treatment decision, the number needs to mean the same thing for every patient. Miscalibration by group means the model is systematically overstating or understating risk for certain populations.

The Disparate Impact Ratio

A regulatory shorthand often used in practice is the disparate impact ratio (also called the 80% rule or four-fifths rule, originating from U.S. employment law):

$$\text{Disparate Impact Ratio} = \frac{P(\hat{Y} = 1 \mid A = \text{disadvantaged})}{P(\hat{Y} = 1 \mid A = \text{advantaged})}$$

A ratio below 0.80 is generally considered evidence of disparate impact.

def disparate_impact_ratio(y_pred, groups, reference_group):
    """Compute disparate impact ratio relative to a reference group."""
    ref_rate = y_pred[groups == reference_group].mean()
    results = {}
    for group in np.unique(groups):
        group_rate = y_pred[groups == group].mean()
        results[group] = round(group_rate / ref_rate, 3) if ref_rate > 0 else np.nan
    return pd.Series(results, name='DI Ratio')

di = disparate_impact_ratio(y_pred, race_test, reference_group='White')
print("Disparate Impact Ratio (reference: White):")
print(di)

Part 3: The Impossibility Theorem --- You Cannot Have It All

This is the most important result in the chapter, and the one most practitioners have never heard of.

The Impossibility Theorem (Chouldechova, 2017; Kleinberg, Mullainathan, & Raghavan, 2016) --- If base rates differ between groups, it is mathematically impossible to simultaneously satisfy:

  1. Equal false positive rates across groups (part of equalized odds)
  2. Equal false negative rates across groups (part of equalized odds)
  3. Calibration by group (equal predictive value across groups)

You must choose which fairness criterion to prioritize. There is no model, no algorithm, no technique that satisfies all three when the underlying base rates are unequal.

This is not a limitation of current algorithms. It is a mathematical fact. Let us see why with a concrete example.

The Arithmetic of Impossibility

Consider two groups at Metro General:

Group A (White) Group B (Black)
Population 1000 1000
Base rate (actual readmission rate) 15% 22%
Actually readmitted 150 220
Not readmitted 850 780

Suppose the model achieves perfect calibration: when it predicts a 20% probability, 20% of patients in that bucket are actually readmitted, in both groups.

Now suppose we also want equal false positive rates. The false positive rate is:

$$\text{FPR} = \frac{\text{False Positives}}{\text{True Negatives} + \text{False Positives}}$$

Because the base rates differ (15% vs. 22%), the model must set different thresholds to achieve the same FPR in both groups --- which breaks calibration. Or it can maintain calibration, but the FPR will differ.

# Demonstration: impossibility with different base rates
from sklearn.metrics import confusion_matrix

np.random.seed(42)

# Simulate calibrated scores for two groups with different base rates
n_per_group = 2000

# Group A: 15% base rate
y_true_a = np.random.binomial(1, 0.15, n_per_group)
# Group B: 22% base rate
y_true_b = np.random.binomial(1, 0.22, n_per_group)

# Generate calibrated probabilities (model is well-calibrated)
noise_a = np.random.normal(0, 0.8, n_per_group)
score_a = 1 / (1 + np.exp(-(np.log(0.15/0.85) + noise_a + 1.5 * y_true_a)))
noise_b = np.random.normal(0, 0.8, n_per_group)
score_b = 1 / (1 + np.exp(-(np.log(0.22/0.78) + noise_b + 1.5 * y_true_b)))

# Apply a single threshold to both groups
threshold = 0.20
pred_a = (score_a >= threshold).astype(int)
pred_b = (score_b >= threshold).astype(int)

def group_metrics(y_true, y_pred, label):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    tpr = tp / (tp + fn)
    fpr = fp / (fp + tn)
    ppv = tp / (tp + fp) if (tp + fp) > 0 else 0
    return {'Group': label, 'TPR': round(tpr, 3), 'FPR': round(fpr, 3),
            'PPV': round(ppv, 3), 'Pos Rate': round(y_pred.mean(), 3)}

results = pd.DataFrame([
    group_metrics(y_true_a, pred_a, 'A (15% base)'),
    group_metrics(y_true_b, pred_b, 'B (22% base)'),
])
print("Same threshold, different base rates:")
print(results.to_string(index=False))
print("\nNotice: TPR, FPR, and PPV all differ between groups.")
print("A single threshold cannot equalize all three simultaneously.")

What This Means in Practice

The impossibility theorem forces a choice. For Metro General's readmission model, the team must decide:

  • Prioritize equal TPR (equal opportunity): Ensure the model catches the same percentage of readmissions in every racial group, even if this means different false positive rates. This means some groups will receive more unnecessary follow-up calls than others.

  • Prioritize equal FPR: Ensure the model incorrectly flags patients at the same rate in every group, even if this means missing more readmissions in the high-base-rate group. This means the model will catch fewer Black readmissions.

  • Prioritize calibration: Ensure the model's probabilities are accurate for every group, and let the TPR and FPR fall where they may. This means clinicians can trust the numbers, but error rates will differ.

Key Insight --- The choice between fairness criteria is not a technical decision. It is a values decision. It requires input from domain experts (clinicians, ethicists, patient advocates), not just data scientists. Your job is to compute the metrics and present the tradeoffs. The decision about which tradeoff to accept belongs to the people affected by the system.


Part 4: Bias Mitigation Strategies

Once you have identified and measured bias, you need tools to reduce it. Mitigation strategies fall into three categories based on where in the ML pipeline they intervene.

Pre-Processing: Fix the Data Before Training

Pre-processing techniques modify the training data to reduce bias before the model ever sees it.

Reweighting

The simplest pre-processing technique. Assign sample weights that correct for underrepresentation or differential base rates.

def compute_fairness_weights(df, protected_col, target_col):
    """
    Compute sample weights to equalize positive rates across groups.
    Each (group, label) cell gets a weight proportional to
    the overall rate divided by the group-specific rate.
    """
    overall_pos_rate = df[target_col].mean()
    overall_neg_rate = 1 - overall_pos_rate
    weights = np.ones(len(df))

    for group in df[protected_col].unique():
        group_mask = df[protected_col] == group
        group_size = group_mask.sum()
        group_pos = df.loc[group_mask, target_col].sum()
        group_neg = group_size - group_pos

        # Weight positive samples in this group
        pos_mask = group_mask & (df[target_col] == 1)
        if group_pos > 0:
            weights[pos_mask] = (overall_pos_rate * len(df)) / (2 * group_pos)

        # Weight negative samples in this group
        neg_mask = group_mask & (df[target_col] == 0)
        if group_neg > 0:
            weights[neg_mask] = (overall_neg_rate * len(df)) / (2 * group_neg)

    return weights

# Compute weights
sample_weights = compute_fairness_weights(df, 'race', 'readmitted')

# Train with sample weights
model_reweighted = GradientBoostingClassifier(
    n_estimators=200, max_depth=4, learning_rate=0.1, random_state=42
)
model_reweighted.fit(X_train, y_train,
                     sample_weight=sample_weights[X_train.index])

y_pred_rw = model_reweighted.predict(X_test)

# Compare fairness metrics
print("Original model - Equalized Odds:")
print(equalized_odds(y_test.values, y_pred, race_test))
print("\nReweighted model - Equalized Odds:")
print(equalized_odds(y_test.values, y_pred_rw, race_test))

Oversampling Underrepresented Groups

If representation bias is the primary issue, targeted oversampling (using SMOTE or random oversampling) for underrepresented groups can help the model learn their patterns better.

from sklearn.utils import resample

def oversample_minority_groups(X, y, groups, target_size=None):
    """Oversample each group to have the same number of samples."""
    if target_size is None:
        group_counts = pd.Series(groups).value_counts()
        target_size = group_counts.max()

    X_resampled = []
    y_resampled = []

    for group in np.unique(groups):
        mask = groups == group
        X_group = X[mask]
        y_group = y[mask]

        if len(X_group) < target_size:
            X_up, y_up = resample(
                X_group, y_group,
                replace=True, n_samples=target_size, random_state=42
            )
        else:
            X_up, y_up = X_group, y_group

        X_resampled.append(X_up)
        y_resampled.append(y_up)

    return np.vstack(X_resampled), np.concatenate(y_resampled)

In-Processing: Constrain the Model During Training

In-processing techniques modify the training algorithm itself to optimize for both accuracy and fairness simultaneously.

Fairness-Constrained Optimization

The idea is to add a fairness penalty to the loss function. Instead of minimizing loss alone, the model minimizes:

$$\mathcal{L}_{\text{fair}} = \mathcal{L}_{\text{accuracy}} + \lambda \cdot \mathcal{L}_{\text{fairness}}$$

where $\lambda$ controls the strength of the fairness constraint. Higher $\lambda$ means more fairness at the cost of accuracy.

Using AI Fairness 360 (AIF360)

IBM's AIF360 library provides a comprehensive suite of in-processing algorithms. The most commonly used is prejudice remover, which adds a regularization term based on mutual information between the predicted outcome and the protected attribute.

# Note: AIF360 installation: pip install aif360
# This demonstrates the API pattern; install the library to run

# from aif360.datasets import BinaryLabelDataset
# from aif360.algorithms.inprocessing import PrejudiceRemover
# from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
#
# # Convert to AIF360 format
# aif_dataset = BinaryLabelDataset(
#     df=df_train_with_race,
#     label_names=['readmitted'],
#     protected_attribute_names=['race_encoded'],
#     favorable_label=0,          # not readmitted is favorable
#     unfavorable_label=1,        # readmitted is unfavorable
# )
#
# # Train with prejudice remover
# pr = PrejudiceRemover(sensitive_attr='race_encoded', eta=1.0)
# pr_model = pr.fit(aif_dataset)
#
# # Predict and evaluate
# aif_test = BinaryLabelDataset(...)
# predictions = pr_model.predict(aif_test)

Practical Note --- AIF360 is powerful but has a steep learning curve. Its data structures (BinaryLabelDataset, StructuredDataset) are different from pandas DataFrames, and the conversion can be error-prone. For many production systems, threshold adjustment (post-processing) is simpler to implement, easier to explain to stakeholders, and nearly as effective.

Post-Processing: Adjust the Predictions After Training

Post-processing techniques modify the model's outputs to satisfy fairness constraints, without changing the model itself.

Threshold Adjustment (Group-Specific Thresholds)

The simplest and most widely used post-processing technique. Instead of applying a single threshold to all groups, use different thresholds for each group to equalize the desired fairness metric.

from sklearn.metrics import roc_curve

def find_equalized_thresholds(y_true, y_prob, groups, target_tpr=0.70):
    """
    Find per-group thresholds that achieve approximately equal TPR.
    Uses ROC curve to find the threshold closest to target_tpr for each group.
    """
    thresholds = {}
    for group in np.unique(groups):
        mask = groups == group
        fpr, tpr, thresh = roc_curve(y_true[mask], y_prob[mask])
        # Find threshold closest to target TPR
        idx = np.argmin(np.abs(tpr - target_tpr))
        thresholds[group] = thresh[idx]
    return thresholds

# Find thresholds for equal TPR of ~0.70
eq_thresholds = find_equalized_thresholds(
    y_test.values, y_prob, race_test, target_tpr=0.70
)
print("Group-specific thresholds for ~70% TPR:")
for group, thresh in eq_thresholds.items():
    print(f"  {group}: {thresh:.3f}")

# Apply group-specific thresholds
y_pred_eq = np.zeros(len(y_test))
for group, thresh in eq_thresholds.items():
    mask = race_test == group
    y_pred_eq[mask] = (y_prob[mask] >= thresh).astype(int)

print("\nEqualized Odds after threshold adjustment:")
print(equalized_odds(y_test.values, y_pred_eq.astype(int), race_test))

The tradeoff: Group-specific thresholds equalize error rates at the cost of overall accuracy. The model's total accuracy will drop, because some groups now have thresholds that are not at the accuracy-maximizing point. This is the fairness-accuracy tradeoff in action.

# Measure the accuracy cost
acc_original = accuracy_score(y_test, y_pred)
acc_equalized = accuracy_score(y_test, y_pred_eq)

print(f"Original accuracy:   {acc_original:.3f}")
print(f"Equalized accuracy:  {acc_equalized:.3f}")
print(f"Accuracy cost:       {acc_original - acc_equalized:.3f}")

Key Insight --- The accuracy cost of fairness is usually smaller than people expect. In most real-world settings, equalizing error rates across groups reduces overall accuracy by 1--3 percentage points. The question is whether the organization is willing to accept that cost. If the answer is no, then the organization is implicitly saying that overall accuracy is worth more than equitable treatment of underserved populations. That is a values statement, and it should be made explicitly, not hidden inside a default threshold.

Comparison of Mitigation Approaches

Approach When to Use Pros Cons
Reweighting Representation bias is the primary issue Simple, model-agnostic Does not address measurement or historical bias
Oversampling Underrepresented groups in training data Easy to implement Can cause overfitting for small groups
Prejudice Remover Need fairness built into the model Principled optimization Complex, library-dependent, hard to explain
Adversarial Debiasing Protected info leaks through features Theoretically elegant Unstable training, requires careful tuning
Threshold Adjustment Post-hoc fairness correction Simple, interpretable, model-agnostic Does not improve model quality for underserved groups

Part 5: Model Cards --- Documenting What Your Model Does and Doesn't Do

A model card is a standardized document that accompanies a deployed model, describing its intended use, performance characteristics, limitations, and fairness properties. The concept was introduced by Mitchell et al. (2019) at Google and has become a best practice for responsible ML deployment.

Operational Reality --- A model without a model card is like a drug without a label. It might be effective, but nobody knows the dosage, the side effects, or the contraindications. Model cards are not bureaucratic overhead. They are the documentation that prevents your model from being used in contexts it was never designed for, on populations it was never evaluated on, with consequences nobody anticipated.

Model Card Template

def create_model_card(
    model_name,
    model_version,
    description,
    intended_use,
    out_of_scope,
    training_data,
    evaluation_data,
    metrics,
    fairness_metrics,
    limitations,
    ethical_considerations,
):
    """Generate a structured model card as a dictionary."""
    return {
        "model_details": {
            "name": model_name,
            "version": model_version,
            "description": description,
            "date": pd.Timestamp.now().strftime("%Y-%m-%d"),
        },
        "intended_use": {
            "primary_use": intended_use,
            "out_of_scope": out_of_scope,
        },
        "data": {
            "training_data": training_data,
            "evaluation_data": evaluation_data,
        },
        "performance": {
            "overall_metrics": metrics,
            "fairness_metrics": fairness_metrics,
        },
        "limitations_and_risks": limitations,
        "ethical_considerations": ethical_considerations,
    }

# Example: Metro General readmission model card
readmission_card = create_model_card(
    model_name="Metro General 30-Day Readmission Predictor",
    model_version="2.1",
    description=(
        "Gradient boosted classifier predicting 30-day all-cause "
        "readmission for patients discharged from Metro General Hospital."
    ),
    intended_use=(
        "Risk-stratify discharged patients for care coordination follow-up. "
        "Patients above the risk threshold receive a call within 48 hours."
    ),
    out_of_scope=[
        "Pediatric patients (model trained on adults 18+)",
        "Psychiatric admissions (excluded from training data)",
        "Patients transferred to long-term acute care facilities",
        "Use as the sole basis for clinical decisions",
    ],
    training_data=(
        "14,200 adult discharges from Metro General, Jan 2020 -- Dec 2024. "
        "Demographics: 45% White, 25% Black, 20% Hispanic, 10% Asian."
    ),
    evaluation_data="30% holdout stratified by readmission status.",
    metrics={
        "AUC": 0.832,
        "Accuracy": 0.761,
        "Precision": 0.388,
        "Recall": 0.674,
    },
    fairness_metrics={
        "Demographic Parity (max disparity)": 0.07,
        "Equalized Odds TPR range": 0.14,
        "Equalized Odds FPR range": 0.04,
        "Calibration (max deviation)": 0.06,
    },
    limitations=[
        "Model underperforms for Asian patients (AUC 0.71 vs. 0.84 overall) "
        "due to small sample size (5% of training data).",
        "Does not capture post-discharge medication adherence or "
        "follow-up appointment attendance (data not available at prediction time).",
        "Performance has not been validated on patients from other hospitals.",
        "Fairness metrics were computed on held-out test data; "
        "production fairness monitoring is required.",
    ],
    ethical_considerations=[
        "Readmission rates reflect systemic healthcare disparities, not "
        "inherent patient characteristics. Model predictions should not be "
        "interpreted as measures of patient compliance or personal responsibility.",
        "Group-specific thresholds are used to equalize TPR across racial groups, "
        "accepting a 2.1% reduction in overall accuracy.",
        "The model must not be used to deny or reduce care. Its purpose is to "
        "identify patients who need MORE follow-up, not patients who deserve less.",
    ],
)

import json
print(json.dumps(readmission_card, indent=2))

What a Model Card Should Include

Section What It Answers Who Reads It
Model Details What is this model? When was it built? Everyone
Intended Use What should this model be used for? Deployers, end users
Out of Scope What should this model NOT be used for? Deployers, governance
Training Data What data was the model trained on? Data scientists, auditors
Performance How well does the model work? Data scientists, stakeholders
Fairness Metrics Does the model work equally well for everyone? Auditors, governance, affected communities
Limitations Where does the model fail? Everyone
Ethical Considerations What are the risks of using this model? Governance, ethics review, leadership

Production Practice --- Model cards are living documents. Update them when you retrain the model, when you discover new failure modes, and when fairness metrics change in production. Version the model card alongside the model artifact. A model card that describes a previous version of the model is worse than useless --- it is actively misleading.


Part 6: Putting It All Together --- A Fairness Audit Workflow

Here is the complete workflow for auditing an ML model for fairness. This is what you should do for every model that makes decisions about people.

Step 1: Identify Protected Attributes

Determine which demographic attributes are relevant. In the U.S., federal law protects race, color, religion, sex, national origin, age, disability, and genetic information. Your domain may have additional constraints (healthcare: HIPAA; finance: ECOA and Fair Housing Act; employment: Title VII).

Step 2: Compute Group-Level Performance

def fairness_audit(y_true, y_pred, y_prob, groups, group_name='Group'):
    """Run a complete fairness audit."""
    print(f"{'='*60}")
    print(f"FAIRNESS AUDIT REPORT — Protected Attribute: {group_name}")
    print(f"{'='*60}\n")

    # 1. Base rates
    print("1. BASE RATES")
    for group in np.unique(groups):
        mask = groups == group
        rate = y_true[mask].mean()
        print(f"   {group}: {rate:.3f} (n={mask.sum()})")

    # 2. Demographic parity
    print("\n2. DEMOGRAPHIC PARITY (positive prediction rate)")
    dp = demographic_parity(y_pred, groups)
    print(dp.to_string())
    print(f"   Max disparity: {dp.max() - dp.min():.3f}")

    # 3. Equalized odds
    print("\n3. EQUALIZED ODDS (TPR and FPR by group)")
    eo = equalized_odds(y_true, y_pred, groups)
    print(eo.to_string())

    # 4. Predictive parity
    print("\n4. PREDICTIVE PARITY (precision by group)")
    pp = predictive_parity(pd.Series(y_true), y_pred, groups)
    print(pp.to_string())

    # 5. Disparate impact
    print("\n5. DISPARATE IMPACT RATIO")
    di = disparate_impact_ratio(y_pred, groups, np.unique(groups)[0])
    print(di.to_string())
    flagged = di[di < 0.80]
    if len(flagged) > 0:
        print(f"   WARNING: Groups below 0.80 threshold: {list(flagged.index)}")

    # 6. AUC by group
    print("\n6. AUC BY GROUP")
    for group in np.unique(groups):
        mask = groups == group
        if len(np.unique(y_true[mask])) == 2:
            auc = roc_auc_score(y_true[mask], y_prob[mask])
            print(f"   {group}: {auc:.3f}")

    print(f"\n{'='*60}")
    print("END OF FAIRNESS AUDIT REPORT")
    print(f"{'='*60}")

# Run the audit
fairness_audit(y_test.values, y_pred, y_prob, race_test, group_name='Race')

Step 3: Choose a Fairness Criterion

Based on the domain context and stakeholder input, decide which fairness metric to prioritize. Document the decision and the reasoning.

Step 4: Apply Mitigation

Use the appropriate technique (reweighting, threshold adjustment, etc.) and re-run the audit.

Step 5: Document in a Model Card

Record the audit results, the chosen fairness criterion, the mitigation applied, and the residual disparities in the model card.

Step 6: Monitor in Production

Fairness metrics can drift just like accuracy metrics. If the demographic composition of incoming patients changes, or if the model degrades differently across groups, your production fairness guarantees may no longer hold. Add group-level performance monitoring to the monitoring pipeline from Chapter 32.


StreamFlow Fairness: A Preview

The Hospital Readmission example makes the fairness stakes vivid because the consequences are clinical. But fairness applies to any model that makes decisions about people --- including StreamFlow's churn model.

Consider: if StreamFlow's churn model predicts differently for subscribers in different age groups or geographic regions, and those predictions drive retention offers (discounts, personalized content, outreach calls), then the model is allocating business resources inequitably. Older subscribers might receive fewer retention offers. Subscribers in rural regions might be deprioritized. This may not have legal consequences, but it has business and ethical consequences.

The Progressive Project (M12) in Case Study 2 walks you through a fairness audit of the StreamFlow churn model. The patterns are the same: compute group-level metrics, identify disparities, apply threshold adjustment, and document the tradeoff.


Chapter Summary

Fairness in ML is not optional. It is not a nice-to-have that you add if you have time at the end of a project. It is a constraint that shapes the entire ML lifecycle --- from data collection (who is represented?) to feature engineering (are features measured equitably?) to model evaluation (do error rates differ across groups?) to deployment (are predictions used equitably?).

The impossibility theorem tells you that you cannot have it all. You must choose which fairness criterion matters most in your context, and that choice requires input from people beyond the data science team --- domain experts, ethicists, affected communities. Your role is to compute the metrics, present the tradeoffs honestly, and implement the chosen mitigation. The model card documents these decisions for everyone who comes after you.

Fairness is not a feature you add at the end. It is a constraint you design into the system from the start.


Next chapter: Chapter 34: The Business of Data Science --- Stakeholder communication, ROI of ML projects, and building a data-driven culture.