Appendix B: Python Reference for AI Fairness Auditing

A Practical Guide to Bias Detection, Fairness Metrics, and Explainability


Introduction

This appendix is a hands-on companion to the textbook's core chapters on algorithmic bias, fairness, and explainability. It provides working Python code you can run immediately to audit a model, compute fairness metrics, and generate explanations for individual predictions.

What this appendix covers:

  • Setting up a Python environment for fairness analysis
  • Loading and exploring datasets with known demographic disparities
  • Computing industry-standard fairness metrics using Fairlearn
  • Detecting proxy discrimination using SHAP (SHapley Additive exPlanations)
  • Explaining individual decisions using LIME (Local Interpretable Model-agnostic Explanations)
  • Applying bias mitigation using IBM's AI Fairness 360 toolkit
  • Structuring an end-to-end fairness audit

What this appendix does not cover:

This appendix does not teach Python fundamentals, machine learning from scratch, or statistical theory. It assumes you are comfortable reading Python code, understand what a pandas DataFrame is, and have used scikit-learn at least once. If you need a refresher, the Further Resources section lists recommended Python primers.

Assumed knowledge: Basic Python, pandas DataFrames, scikit-learn fit/predict patterns, and basic concepts of classification (accuracy, confusion matrix).

A note on ethics before you begin: The tools in this appendix are measurement instruments, not solutions. A demographic parity score tells you that a disparity exists. It does not tell you whether that disparity is legally permissible, morally acceptable, or technically fixable without creating new harms. Every number produced by the code below requires human interpretation, domain expertise, and organizational judgment. Chapter 9 (Measuring Fairness) and Chapter 19 (Auditing AI Systems) discuss this interpretive responsibility in depth. Do not treat a passing fairness metric as a certificate of ethical compliance.


Section 1: Setting Up Your Environment

Required Libraries

The following libraries form the core toolkit for this appendix. Each serves a distinct purpose in the fairness auditing workflow.

Library Purpose Maintained By
scikit-learn Model training, baseline metrics Scikit-learn community
pandas Data manipulation and exploration Pandas community
numpy Numerical operations NumPy community
matplotlib Plotting and visualization Matplotlib community
seaborn Statistical data visualization Seaborn community
fairlearn Fairness metrics and mitigation Microsoft
aif360 Bias detection and mitigation suite IBM Research
shap Model-agnostic feature attribution Scott Lundberg / Microsoft
lime Local interpretable explanations Marco Ribeiro / UW

Installation

Using pip (standard Python package manager):

pip install scikit-learn pandas numpy matplotlib seaborn
pip install fairlearn
pip install aif360
pip install shap
pip install lime

Using conda (recommended if you use Anaconda or Miniconda):

conda install -c conda-forge scikit-learn pandas numpy matplotlib seaborn
conda install -c conda-forge fairlearn
pip install aif360          # AIF360 is not on conda-forge; use pip
pip install shap lime

Note on AIF360 installation: AIF360 has a larger set of dependencies than the other libraries and may take several minutes to install. If you encounter errors, try pip install aif360 --no-deps and then install dependencies manually. The AIF360 GitHub repository (linked in Further Resources) maintains a troubleshooting guide.

Version compatibility: All code in this appendix was written and tested against Python 3.10, scikit-learn 1.4, fairlearn 0.10, aif360 0.6, shap 0.45, and lime 0.2. Minor syntax differences may appear in earlier or later versions.

For exploratory fairness auditing, Jupyter Notebook or JupyterLab is strongly recommended. The ability to run individual cells, see output inline, and iterate quickly on visualizations makes the audit process far more practical. Install with:

pip install jupyterlab
jupyter lab

For production audit scripts intended for version control and team collaboration, VS Code with the Python extension and the Jupyter extension provides a good balance of interactivity and engineering discipline.

Sample Datasets

The examples in this appendix use three publicly available datasets that are commonly used in fairness research:

  • Adult Census Income dataset (also called "UCI Adult"): Predicts whether individual income exceeds $50K/year based on census data. Contains age, sex, race, education, and occupation. Available via sklearn.datasets or the UCI Machine Learning Repository.
  • German Credit dataset: Predicts credit risk (good/bad) for loan applicants. Contains age, sex, and various financial attributes. Available via AIF360 or the UCI repository.
  • COMPAS Recidivism dataset: The dataset at the center of the ProPublica investigation into racial bias in criminal risk scores (see Chapter 30). Available from ProPublica's GitHub: https://github.com/propublica/compas-analysis

Standard Import Block

Save this at the top of every fairness analysis notebook:

# Standard imports for AI fairness analysis
# Run this cell first in every fairness audit notebook

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import (
    confusion_matrix,
    classification_report,
    accuracy_score
)
import matplotlib.pyplot as plt
import seaborn as sns

# Set consistent plot styling throughout the notebook
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("colorblind")  # Use colorblind-friendly palette

# Suppress warnings from library version mismatches
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
RANDOM_STATE = 42

Section 2: Loading and Exploring Data for Fairness Analysis

Before training any model or computing any metric, you must understand the data. Fairness audits begin not with code but with questions: Who is in this dataset? Who is not? Are demographic groups represented in proportions that reflect the population the model will serve? These questions connect directly to Chapter 8 (Sources of Bias), where we examine how historical data encodes historical discrimination.

The following example uses the Adult Census Income dataset, which predicts whether a person earns more than $50,000 per year. This dataset has well-documented demographic disparities and is widely used in fairness benchmarking.

# =============================================================
# SECTION 2: Loading and Exploring the Adult Census Income Dataset
# =============================================================

# Load the dataset directly from the UCI repository.
# Column names are defined manually because the raw file lacks headers.
column_names = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'race', 'sex',
    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
    'income'
]

url = (
    "https://archive.ics.uci.edu/ml/machine-learning-databases/"
    "adult/adult.data"
)
df = pd.read_csv(url, names=column_names, sep=',\s*', engine='python',
                 na_values='?')

# Alternatively, load a local copy if you have one:
# df = pd.read_csv('data/adult.csv', names=column_names, na_values='?')

print(f"Dataset shape: {df.shape}")
print(f"Rows: {df.shape[0]:,}, Columns: {df.shape[1]}")
# --- STEP 1: Inspect the target variable and protected attributes ---

# The target variable: income (binary classification)
# Clean the label: strip whitespace and standardize
df['income'] = df['income'].str.strip()
df['income_binary'] = (df['income'] == '>50K').astype(int)

print("Target variable distribution:")
print(df['income_binary'].value_counts(normalize=True).round(3))
# Expected: roughly 76% earn <=50K, 24% earn >50K
# This class imbalance itself can introduce bias — see Chapter 8

# Protected attributes in this dataset:
# 'sex': binary (Male/Female in the original encoding — a limitation)
# 'race': five categories (White, Black, Asian-Pac-Islander, etc.)
print("\nSex distribution:")
print(df['sex'].value_counts(normalize=True).round(3))

print("\nRace distribution:")
print(df['race'].value_counts(normalize=True).round(3))
# --- STEP 2: Check representation by group ---
# The key question: does each group have enough representation to train on
# and to generate statistically meaningful fairness metrics?

# Cross-tabulate income by sex
sex_income = pd.crosstab(df['sex'], df['income_binary'],
                          normalize='index').round(3)
print("High-income rate by sex:")
print(sex_income)
# This already reveals the disparity: males earn >50K at a far higher rate

# Cross-tabulate income by race
race_income = pd.crosstab(df['race'], df['income_binary'],
                           normalize='index').round(3)
print("\nHigh-income rate by race:")
print(race_income)
# --- STEP 3: Visualize the disparities before modeling ---
# Visualization reveals patterns that tables obscure.
# Always visualize before training. The model will learn these patterns.

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Bar chart: high-income rate by sex
sex_income[1].plot(kind='bar', ax=axes[0], color=['#4878CF', '#D65F5F'])
axes[0].set_title('High-Income Rate by Sex\n(Adult Census Data)',
                   fontsize=13, fontweight='bold')
axes[0].set_xlabel('Sex')
axes[0].set_ylabel('Proportion earning >$50K')
axes[0].set_ylim(0, 0.5)
axes[0].tick_params(axis='x', rotation=0)

# Bar chart: high-income rate by race
race_income[1].sort_values(ascending=False).plot(
    kind='bar', ax=axes[1], color='steelblue'
)
axes[1].set_title('High-Income Rate by Race\n(Adult Census Data)',
                   fontsize=13, fontweight='bold')
axes[1].set_xlabel('Race')
axes[1].set_ylabel('Proportion earning >$50K')
axes[1].set_ylim(0, 0.5)
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('figures/data_disparity_exploration.png', dpi=150,
            bbox_inches='tight')
plt.show()

# What you see here is the dataset's historical pattern —
# a pattern the model will learn unless we intervene (Chapter 9, Chapter 19)
# --- STEP 4: Handle missing values and prepare features ---

# Check for missing values
print("Missing values per column:")
print(df.isnull().sum()[df.isnull().sum() > 0])

# Drop rows with missing values for this introductory example.
# In a real audit, document and justify your missing-data strategy.
df_clean = df.dropna().copy()
print(f"\nRows after dropping NAs: {len(df_clean):,}")

# Encode categorical features
# We encode everything except the protected attributes for now —
# we will pass them separately to the fairness metrics
categorical_cols = ['workclass', 'education', 'marital_status',
                    'occupation', 'relationship', 'native_country']

df_encoded = pd.get_dummies(df_clean, columns=categorical_cols,
                             drop_first=True)

# Encode sex as binary (1 = Male, 0 = Female)
df_encoded['sex_binary'] = (df_encoded['sex'] == 'Male').astype(int)

# Store the protected attribute series separately for fairness metrics
sensitive_sex = df_encoded['sex_binary']
sensitive_race = df_encoded['race']

print("Features prepared. Shape:", df_encoded.shape)

The exploration above typically reveals two things. First, the demographic groups are not equally represented — there are far fewer non-White individuals in the dataset. Second, the outcome rates differ substantially across groups before any model is trained. These baseline disparities are the starting point for any fairness audit, not an afterthought.


Section 3: Computing Fairness Metrics

Fairness metrics quantify the extent to which a model's decisions differ across demographic groups. Chapter 9 introduces the main fairness criteria conceptually. This section shows how to compute them.

The four metrics computed here:

  • Demographic Parity Difference: The difference between the highest and lowest group positive prediction rates. A value of 0 means the model predicts a positive outcome at the same rate for all groups regardless of their actual outcomes. See Chapter 9 for the debate about when this is the right criterion.
  • Equalized Odds Difference: Measures whether true positive rates AND false positive rates are equal across groups. A value of 0 means the model is equally accurate and equally wrong across groups.
  • Equal Opportunity Difference: A relaxed version of equalized odds that focuses only on true positive rates. A value of 0 means qualified individuals have an equal chance of a positive prediction regardless of group membership.
  • Disparate Impact Ratio: The ratio of the lowest group's positive rate to the highest group's positive rate. The 80% rule (ratio >= 0.8) is a common legal threshold in employment contexts (EEOC guidelines). Below 0.8 is a prima facie indicator of disparate impact.
# =============================================================
# SECTION 3: Training a Classifier and Computing Fairness Metrics
# =============================================================

from fairlearn.metrics import (
    demographic_parity_difference,
    equalized_odds_difference,
    MetricFrame
)
from sklearn.metrics import accuracy_score, recall_score, precision_score

# --- STEP 1: Define features and split the data ---

# Features used for prediction (excluding protected attributes directly)
feature_cols = [col for col in df_encoded.columns
                if col not in ['income', 'income_binary', 'sex',
                                'sex_binary', 'race']]

X = df_encoded[feature_cols]
y = df_encoded['income_binary']
sensitive = df_encoded['sex_binary']   # We'll audit by sex first

X_train, X_test, y_train, y_test, sens_train, sens_test = train_test_split(
    X, y, sensitive,
    test_size=0.3,
    random_state=RANDOM_STATE,
    stratify=y       # Stratify to preserve class balance in both splits
)

print(f"Training set: {len(X_train):,} rows")
print(f"Test set:     {len(X_test):,} rows")
# --- STEP 2: Train a logistic regression classifier ---
# We use logistic regression because it is interpretable and commonly
# deployed in high-stakes settings like credit and hiring.
# The fairness problems that appear here also appear in more complex models.

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)   # Use training set statistics

model = LogisticRegression(max_iter=1000, random_state=RANDOM_STATE)
model.fit(X_train_scaled, y_train)

# Generate predictions on the test set
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]  # Probabilities for SHAP

print(f"Overall accuracy: {accuracy_score(y_test, y_pred):.3f}")
print("\nClassification report:")
print(classification_report(y_test, y_pred, target_names=['<=50K', '>50K']))
# --- STEP 3: Compute core fairness metrics using Fairlearn ---
# sensitive_features receives the protected attribute column
# from the TEST SET — never the training set

# DEMOGRAPHIC PARITY DIFFERENCE
# Question: Does the model predict high income at the same rate
# for male and female individuals?
dpd = demographic_parity_difference(
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sens_test
)
print(f"Demographic Parity Difference (sex): {dpd:.4f}")
# Interpretation: a value of 0 = perfect demographic parity
# Positive value = the privileged group receives more positive predictions
# A value above 0.1 is generally considered problematic

# EQUALIZED ODDS DIFFERENCE
# Question: Does the model make errors at equal rates across groups?
eod = equalized_odds_difference(
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sens_test
)
print(f"Equalized Odds Difference (sex):     {eod:.4f}")
# --- STEP 4: Use MetricFrame for a detailed breakdown ---
# MetricFrame is Fairlearn's most powerful object.
# It computes any metric you specify, broken down by group.

# Define the metrics you want to compute per group
metrics_dict = {
    'accuracy':   accuracy_score,
    'recall':     recall_score,      # = true positive rate
    'precision':  precision_score,
    'selection_rate': lambda y_true, y_pred: y_pred.mean()  # positive rate
}

# Create a MetricFrame for the sex attribute
mf = MetricFrame(
    metrics=metrics_dict,
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sens_test
)

print("Overall metrics:")
print(mf.overall.round(3))

print("\nMetrics by group (0=Female, 1=Male):")
print(mf.by_group.round(3))

print("\nDifference between groups (max - min):")
print(mf.difference().round(3))
# Recall difference is the Equal Opportunity Difference —
# the gap in true positive rates between groups.
# Chapter 9 explains why this metric matters in hiring and lending.
# --- STEP 5: Compute the Disparate Impact Ratio manually ---
# Fairlearn does not include disparate impact ratio directly,
# but it is required for regulatory analysis in many jurisdictions

# Positive prediction rate per group
rates = mf.by_group['selection_rate']
print("Positive prediction rates by sex:")
print(rates.round(4))

disparate_impact_ratio = rates.min() / rates.max()
print(f"\nDisparate Impact Ratio: {disparate_impact_ratio:.4f}")

if disparate_impact_ratio < 0.8:
    print("WARNING: Ratio below 0.8 threshold.")
    print("This model may produce disparate impact under EEOC guidelines.")
    print("Consult legal counsel before deployment in employment contexts.")
else:
    print("Ratio >= 0.8. Passes the four-fifths rule for this attribute.")
    print("Note: passing this threshold does not preclude disparate impact")
    print("claims or mean the model is fair — see Chapter 9.")

These metrics should be computed for every protected attribute in your dataset, not just sex. Repeat the MetricFrame computation with sensitive_features=race_test (where race_test is the race column from the test set) to check for racial disparities independently.


Section 4: Visualizing Fairness

Numbers alone are difficult to communicate to non-technical stakeholders. Visualizations make fairness disparities concrete and actionable. Chapter 15 (Communicating AI Decisions) discusses how to present these findings to executives and affected communities.

# =============================================================
# SECTION 4: Visualizing Fairness Metrics
# =============================================================

# --- PLOT 1: MetricFrame bar charts (built-in Fairlearn visualization) ---
from fairlearn.metrics import MetricFrame

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

metrics_to_plot = ['accuracy', 'recall', 'precision', 'selection_rate']
titles = ['Accuracy by Sex', 'Recall (TPR) by Sex',
          'Precision by Sex', 'Selection Rate by Sex']
colors = ['#4878CF', '#D65F5F']
group_labels = ['Female', 'Male']

for ax, metric, title in zip(axes.flatten(), metrics_to_plot, titles):
    values = mf.by_group[metric]
    bars = ax.bar(group_labels, values.values, color=colors, edgecolor='black',
                  linewidth=0.7)
    ax.set_title(title, fontweight='bold', fontsize=11)
    ax.set_ylabel('Score')
    ax.set_ylim(0, 1.0)
    # Add value labels on each bar
    for bar, val in zip(bars, values.values):
        ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02,
                f'{val:.3f}', ha='center', va='bottom', fontsize=10)
    # Draw a reference line at the overall metric value
    overall_val = mf.overall[metric]
    ax.axhline(overall_val, color='gray', linestyle='--', linewidth=1,
               label=f'Overall: {overall_val:.3f}')
    ax.legend(fontsize=9)

plt.suptitle('Fairness Metrics by Sex — Logistic Regression on Adult Data',
             fontsize=13, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('figures/fairness_metrics_by_sex.png', dpi=150,
            bbox_inches='tight')
plt.show()
# --- PLOT 2: Confusion matrices side-by-side by group ---
# Separate confusion matrices show where errors concentrate

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

group_names = {0: 'Female', 1: 'Male'}
for group_val, ax in zip([0, 1], axes):
    mask = (sens_test == group_val)
    cm = confusion_matrix(y_test[mask], y_pred[mask])
    # Normalize by row (true label) to get rates
    cm_normalized = cm.astype(float) / cm.sum(axis=1, keepdims=True)
    sns.heatmap(cm_normalized, annot=True, fmt='.2%', cmap='Blues',
                ax=ax, cbar=False,
                xticklabels=['Pred <=50K', 'Pred >50K'],
                yticklabels=['Actual <=50K', 'Actual >50K'])
    ax.set_title(f'Confusion Matrix — {group_names[group_val]}',
                 fontweight='bold')
    ax.set_xlabel('Predicted Label')
    ax.set_ylabel('True Label')

plt.suptitle('Normalized Confusion Matrices by Sex', fontsize=12,
             fontweight='bold')
plt.tight_layout()
plt.savefig('figures/confusion_matrix_by_group.png', dpi=150,
            bbox_inches='tight')
plt.show()

# Reading these plots: A higher false negative rate for one group
# means that group's qualified members are more often wrongly denied.
# A higher false positive rate means that group is more often wrongly
# flagged — the error type matters enormously (Chapter 9, Chapter 30).

When reading these plots, pay attention to which type of error concentrates in which group. In lending models, a higher false negative rate for a protected group means qualified applicants from that group are disproportionately denied. In recidivism risk tools, a higher false positive rate for Black defendants — as documented by ProPublica in their COMPAS investigation — means innocent people from that group are more often incorrectly labeled as high-risk. The same confusion matrix value means different things depending on the decision context.


Section 5: AI Fairness 360 (IBM)

IBM's AI Fairness 360 (AIF360) toolkit provides a broader set of bias metrics and mitigation algorithms than Fairlearn. While Fairlearn focuses primarily on constrained optimization during training, AIF360 includes pre-processing, in-processing, and post-processing mitigation strategies, along with over 70 bias metrics from the academic literature.

AIF360 uses its own data container (BinaryLabelDataset) which provides a strict structure that enforces documentation of protected attributes.

# =============================================================
# SECTION 5: Bias Detection and Mitigation with AIF360
# =============================================================

from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing
from aif360.algorithms.postprocessing import EqOddsPostprocessing

# --- STEP 1: Convert pandas DataFrame to AIF360 format ---
# AIF360 requires its own BinaryLabelDataset container.
# You must specify which groups are 'privileged' and 'unprivileged'.
# These terms are analytical labels (historical advantage), not value judgments.

# Prepare a combined dataframe for AIF360
# AIF360 expects the label and sensitive feature to be in the dataframe
aif_df = df_encoded[feature_cols + ['income_binary', 'sex_binary']].copy()
aif_df = aif_df.dropna()

# Create AIF360 dataset
aif_dataset = BinaryLabelDataset(
    df=aif_df,
    label_names=['income_binary'],
    protected_attribute_names=['sex_binary'],
    favorable_label=1,       # 1 = high income (the 'good' outcome)
    unfavorable_label=0,
    privileged_protected_attributes=[[1]],    # 1 = Male (historically advantaged)
    unprivileged_protected_attributes=[[0]]   # 0 = Female
)

print("AIF360 dataset created.")
print(f"Number of instances: {aif_dataset.features.shape[0]:,}")
print(f"Number of features:  {aif_dataset.features.shape[1]}")
# --- STEP 2: Measure pre-mitigation bias in the raw data ---
# Before training anything, measure the bias in the labels themselves.
# If the historical data is biased, a model trained on it will be too.

privileged_groups   = [{'sex_binary': 1}]   # Male
unprivileged_groups = [{'sex_binary': 0}]   # Female

dataset_metric = BinaryLabelDatasetMetric(
    aif_dataset,
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups
)

print("Dataset-level bias metrics (before any model is trained):")
print(f"  Mean difference (base rate gap): "
      f"{dataset_metric.mean_difference():.4f}")
print(f"  Disparate impact ratio:          "
      f"{dataset_metric.disparate_impact():.4f}")

# A mean difference < 0 means the unprivileged group has a lower
# positive rate in the raw labels — i.e., the historical data itself
# reflects the very discrimination we are trying to avoid propagating.
# --- STEP 3: Apply Reweighing (pre-processing mitigation) ---
# Reweighing assigns different sample weights to individuals so that
# the effective positive rates become equal across groups.
# This is a PRE-PROCESSING technique — it adjusts the training data
# before the model sees it, rather than modifying the model itself.
# See Chapter 9 for the trade-offs between pre/in/post-processing.

# Split the AIF360 dataset into train and test
aif_train, aif_test = aif_dataset.split([0.7], shuffle=True,
                                         seed=RANDOM_STATE)

# Apply the Reweighing algorithm to the training set
RW = Reweighing(
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)
aif_train_rw = RW.fit_transform(aif_train)

# The reweighted dataset assigns higher weight to underrepresented
# (group, outcome) combinations. Train a model using these weights.
X_train_rw = aif_train_rw.features
y_train_rw = aif_train_rw.labels.ravel()
sample_weights = aif_train_rw.instance_weights

model_rw = LogisticRegression(max_iter=1000, random_state=RANDOM_STATE)
model_rw.fit(X_train_rw, y_train_rw, sample_weight=sample_weights)

print("Reweighed model trained.")
# --- STEP 4: Evaluate the reweighed model using AIF360 metrics ---

# Generate predictions on the test set
X_test_rw = aif_test.features
y_pred_rw = model_rw.predict(X_test_rw)

# Create AIF360 dataset objects from predictions
# (AIF360 wraps predictions in the same BinaryLabelDataset format)
aif_test_pred = aif_test.copy()
aif_test_pred.labels = y_pred_rw.reshape(-1, 1)

# Compute AIF360 ClassificationMetric
class_metric = ClassificationMetric(
    aif_test,                  # true labels
    aif_test_pred,             # predicted labels
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)

print("Post-mitigation fairness metrics (Reweighing):")
print(f"  Equal opportunity difference: "
      f"{class_metric.equal_opportunity_difference():.4f}")
print(f"  Average odds difference:      "
      f"{class_metric.average_odds_difference():.4f}")
print(f"  Disparate impact ratio:       "
      f"{class_metric.disparate_impact():.4f}")
print(f"  Overall accuracy:             "
      f"{class_metric.accuracy():.4f}")

# Important: improved fairness metrics often come at a cost to overall
# accuracy. Document both and let stakeholders make the trade-off decision.
# Chapter 9 and Chapter 19 discuss the accuracy-fairness trade-off.

AIF360's value is in its breadth: it implements algorithms such as Adversarial Debiasing, Calibrated Equalized Odds, Reject Option Classification, and Learning Fair Representations, among others. The GitHub repository includes detailed tutorials for each. For the German Credit and COMPAS datasets, AIF360 provides built-in loaders (GermanDataset, CompasDataset) that handle preprocessing automatically, making it easier to replicate published fairness benchmarks.


Section 6: SHAP for Bias Detection

SHAP (SHapley Additive exPlanations) is grounded in cooperative game theory and provides a principled way to attribute each feature's contribution to a specific prediction. For fairness auditing, SHAP is particularly useful for identifying proxy variables — features that are not protected attributes themselves but are strongly correlated with them. Chapter 14 (Explainable AI Techniques) and Chapter 8 (Sources of Bias) both discuss the proxy problem in detail.

A model trained without sex or race as explicit features can still discriminate based on zip code, last name, or occupation — features that correlate with protected attributes in historical data. SHAP helps surface these proxies.

# =============================================================
# SECTION 6: SHAP for Proxy Variable Detection
# =============================================================

import shap

# --- STEP 1: Compute SHAP values for the trained model ---
# We use LinearExplainer for logistic regression.
# For tree-based models (RandomForest, XGBoost), use TreeExplainer.
# For arbitrary models, use KernelExplainer (slower but universal).

# Use a background dataset (sample of training data) to anchor explanations
# A sample of 100-500 rows is usually sufficient for linear models
background = shap.sample(X_train_scaled, 200, random_state=RANDOM_STATE)

# Wrap the model's predict function for SHAP
explainer = shap.LinearExplainer(
    model,
    background,
    feature_perturbation='interventional'
)

# Compute SHAP values for the test set (use a sample for speed)
X_test_sample = X_test_scaled[:500]
shap_values = explainer.shap_values(X_test_sample)

# shap_values is a matrix: one row per instance, one column per feature
# Positive SHAP values push the prediction toward high income (class 1)
# Negative SHAP values push it toward low income (class 0)
print(f"SHAP values shape: {shap_values.shape}")
print("Shape matches: (samples, features)")
# --- STEP 2: Global summary plot — identify the top features ---
# The summary plot shows both feature importance (how often each feature
# matters across all predictions) and feature effect direction.

feature_names = X_test.columns.tolist()

plt.figure(figsize=(10, 8))
shap.summary_plot(
    shap_values,
    X_test_sample,
    feature_names=feature_names,
    show=False,
    max_display=20    # Show the top 20 features by mean |SHAP|
)
plt.title('SHAP Summary Plot — Top 20 Features\n(Income Prediction Model)',
          fontsize=12, fontweight='bold')
plt.tight_layout()
plt.savefig('figures/shap_summary_plot.png', dpi=150, bbox_inches='tight')
plt.show()

# Reading this plot:
# - Y-axis: features ranked by mean |SHAP value| (importance)
# - X-axis: SHAP value for each instance
# - Color: feature value (red = high, blue = low)
# Look for features that correlate with sex or race at the top of the chart
# (e.g., marital status, occupation, relationship type in this dataset)
# --- STEP 3: Compute mean absolute SHAP value per feature ---
# A ranked table makes it easy to identify potential proxy variables

mean_abs_shap = pd.DataFrame({
    'feature': feature_names,
    'mean_abs_shap': np.abs(shap_values).mean(axis=0)
}).sort_values('mean_abs_shap', ascending=False)

print("Top 15 features by mean |SHAP value|:")
print(mean_abs_shap.head(15).to_string(index=False))

# Cross-reference with correlation to protected attributes.
# If 'relationship_Wife' or 'marital_status_Married' appear near the top,
# that is a signal that the model is using features that are strongly
# correlated with sex — a proxy discrimination pattern.
# --- STEP 4: Waterfall plot for a single prediction ---
# After identifying global patterns, inspect individual predictions.
# This is the foundation of individual right-to-explanation requirements
# discussed in Chapter 17.

# Select a single instance for explanation
instance_index = 5
instance = X_test_sample[instance_index].reshape(1, -1)
instance_shap = shap_values[instance_index]

# Create a SHAP Explanation object for the waterfall plot
explanation = shap.Explanation(
    values=instance_shap,
    base_values=explainer.expected_value,
    data=X_test_sample[instance_index],
    feature_names=feature_names
)

plt.figure(figsize=(10, 6))
shap.plots.waterfall(explanation, show=False, max_display=12)
plt.title(f'SHAP Waterfall — Instance #{instance_index}',
          fontsize=12, fontweight='bold')
plt.tight_layout()
plt.savefig('figures/shap_waterfall.png', dpi=150, bbox_inches='tight')
plt.show()

# Reading the waterfall plot:
# - The base value (E[f(x)]) is the model's average prediction
# - Each bar shows how much that feature pushed the prediction up or down
# - The final value (f(x)) is the model's prediction for this instance
# - Features in red push toward high income; blue push toward low income

SHAP values alone cannot tell you whether proxy discrimination is occurring — that requires comparing the feature importance rankings against the correlation structure of the data. If the top SHAP features are also strongly correlated with a protected attribute (which you can check with df_clean.groupby('sex')[top_features].mean()), you have a signal worth investigating further and flagging in your audit documentation.


Section 7: LIME for Individual Explanation

Where SHAP provides consistent, theoretically grounded attributions across the entire dataset, LIME (Local Interpretable Model-agnostic Explanations) focuses exclusively on explaining a single prediction. LIME works by perturbing the input around a specific instance and fitting a simple linear model to the perturbations. This makes it model-agnostic and easy to interpret for non-technical audiences.

LIME is particularly suited for the scenario described in Chapter 17: a person denied a loan wants to understand which specific factors led to their denial, in terms they can act on.

# =============================================================
# SECTION 7: LIME for Individual Prediction Explanations
# =============================================================

import lime
import lime.lime_tabular

# --- STEP 1: Create the LIME explainer ---
# LIME needs the training data to understand the feature distributions
# it will use for perturbation

lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train_scaled,
    feature_names=feature_names,
    class_names=['<=50K', '>50K'],   # Human-readable class labels
    mode='classification',
    discretize_continuous=True,      # Makes continuous features interpretable
    random_state=RANDOM_STATE
)

print("LIME explainer created.")
print(f"Training data shape: {X_train_scaled.shape}")
# --- STEP 2: Explain a single denied application ---
# Select an instance that was predicted as low income (denied for high income)
# In a lending context, this would represent a credit denial.

# Find an instance where the prediction is 0 (denied)
denied_indices = np.where(y_pred == 0)[0]
# Select the first denied instance for explanation
explain_idx = denied_indices[0]
instance_to_explain = X_test_scaled[explain_idx]

print(f"Explaining instance {explain_idx}")
print(f"  True label: {'High income' if y_test.iloc[explain_idx]==1 else 'Low income'}")
print(f"  Predicted:  {'High income' if y_pred[explain_idx]==1 else 'Low income'}")
print(f"  Predicted probability of high income: "
      f"{y_prob[explain_idx]:.3f}")

# Generate the LIME explanation
# num_features controls how many features appear in the explanation
lime_exp = lime_explainer.explain_instance(
    data_row=instance_to_explain,
    predict_fn=model.predict_proba,
    num_features=10,
    num_samples=5000    # More samples = more stable explanation, but slower
)
# --- STEP 3: Visualize the LIME explanation ---

fig = lime_exp.as_pyplot_figure()
fig.set_size_inches(10, 6)
plt.title(f'LIME Explanation — Instance {explain_idx}\n'
          f'Predicted: Low Income (Probability: {y_prob[explain_idx]:.3f})',
          fontsize=12, fontweight='bold')
plt.tight_layout()
plt.savefig('figures/lime_explanation.png', dpi=150, bbox_inches='tight')
plt.show()

# The explanation shows:
# - Orange bars: features that support the 'high income' prediction
# - Blue bars: features that push toward 'low income'
# - Bar length: the size of each feature's local influence
# - Feature conditions: LIME expresses contributions as rules
#   (e.g., "capital_gain <= 0" pushes strongly toward low income)

# Print the explanation as a list of (feature, weight) pairs
print("\nLIME feature contributions:")
for feature, weight in lime_exp.as_list():
    direction = "supports High Income" if weight > 0 else "supports Low Income"
    print(f"  {feature:50s}  weight: {weight:+.4f}  ({direction})")

Limitations to communicate alongside LIME results: LIME explanations are local approximations and may not be stable. Running LIME twice on the same instance with different random seeds can produce slightly different explanations. The features selected by LIME's local linear model are the features that matter most in the neighborhood of this specific point — they may not be globally important features. Never present a LIME explanation as definitive proof of why a model made a decision. Use it as a starting point for investigation and conversation, not as a final answer. Chapter 14 discusses the difference between approximation-based explanations and mechanistic explanations in more depth.


Section 8: A Complete Fairness Audit Workflow

The following is a condensed end-to-end workflow that brings together all the tools introduced in this appendix. This mirrors the audit process described in Chapter 19 (Auditing AI Systems). For full implementations of each step, refer to the chapter code files: ch09 for metrics, ch11 for financial services applications, ch14 for explainability, ch19 for the audit framework, and ch27 for privacy-aware auditing.

# =============================================================
# SECTION 8: End-to-End Fairness Audit — Condensed Workflow
# =============================================================

print("=" * 60)
print("FAIRNESS AUDIT REPORT")
print("Model: Logistic Regression — Income Prediction")
print("Dataset: Adult Census Income")
print("Auditor: [Your name]")
print(f"Date: [Today's date]")
print("=" * 60)

# --- STEP 1: DATA AUDIT ---
print("\n[1] DATA AUDIT")
print(f"  Total records (cleaned): {len(df_clean):,}")
print(f"  Protected attributes:    sex, race")
print(f"  Positive outcome rate:   {y.mean():.3f} (overall)")
print(f"  Male positive rate:      "
      f"{df_clean[df_clean['sex']=='Male']['income_binary'].mean():.3f}")
print(f"  Female positive rate:    "
      f"{df_clean[df_clean['sex']==' Female']['income_binary'].mean():.3f}")

# --- STEP 2: MODEL PERFORMANCE ---
print("\n[2] MODEL PERFORMANCE")
print(f"  Overall accuracy:  {accuracy_score(y_test, y_pred):.4f}")
print(f"  Overall recall:    {recall_score(y_test, y_pred):.4f}")
print(f"  Overall precision: {precision_score(y_test, y_pred):.4f}")

# --- STEP 3: FAIRNESS METRICS ---
print("\n[3] FAIRNESS METRICS — SEX")
print(f"  Demographic parity difference: {dpd:.4f}")
print(f"  Equalized odds difference:     {eod:.4f}")
print(f"  Disparate impact ratio:        "
      f"{disparate_impact_ratio:.4f} "
      f"({'PASS' if disparate_impact_ratio >= 0.8 else 'FAIL'} 4/5 rule)")
print("\n  Metrics by group:")
print(mf.by_group[['accuracy', 'recall', 'selection_rate']].round(3))

# --- STEP 4: PROXY VARIABLE ANALYSIS ---
print("\n[4] PROXY VARIABLE ANALYSIS (SHAP)")
print("  Top 5 features by mean |SHAP value|:")
for _, row in mean_abs_shap.head(5).iterrows():
    print(f"    {row['feature']:45s}  {row['mean_abs_shap']:.4f}")
print("  --> Cross-reference with sex/race correlation for proxy risk")

# --- STEP 5: MITIGATION APPLIED ---
print("\n[5] MITIGATION APPLIED")
print("  Method: Reweighing (AIF360 pre-processing)")
print("  Status: Applied to training set")
print("  Post-mitigation equalized odds difference: (re-run metrics above)")

# --- STEP 6: RE-EVALUATION ---
print("\n[6] RECOMMENDATION")
if abs(dpd) > 0.1 or abs(eod) > 0.1 or disparate_impact_ratio < 0.8:
    print("  FINDING: Significant fairness disparities detected.")
    print("  ACTION:  Do not deploy without stakeholder review.")
    print("           Consider reweighing, constraint-based training,")
    print("           or data collection improvements.")
else:
    print("  FINDING: Metrics within acceptable ranges.")
    print("  ACTION:  Proceed to human review before deployment.")
    print("           Document this audit and schedule re-audit quarterly.")

# --- STEP 7: DOCUMENTATION ---
print("\n[7] AUDIT DOCUMENTATION")
print("  Save this report and all figures to the model card.")
print("  Files to archive: data_disparity_exploration.png,")
print("                    fairness_metrics_by_sex.png,")
print("                    confusion_matrix_by_group.png,")
print("                    shap_summary_plot.png")
print("  Model card template: see Chapter 19, Exhibit 19.3")
print("=" * 60)

The output of this workflow is not a pass or fail verdict — it is structured evidence for a human decision. Chapter 19 discusses how to present this evidence to an audit committee, legal team, or regulatory body.


Section 9: Interpreting Results Responsibly

Fairness metrics are tools for investigation, not arbiters of ethical acceptability. Every number produced in the preceding sections requires careful interpretation.

What fairness metrics tell you: That a measurable disparity exists in model behavior across groups, as defined by the specific metric you chose. A demographic parity difference of 0.15 tells you that the model predicts the favorable outcome 15 percentage points more often for one group than another.

What fairness metrics do not tell you:

  • Whether the disparity is caused by the model or reflects real-world inequality (and whether that distinction matters for your use case)
  • Whether the disparity is legally actionable under applicable law in your jurisdiction
  • Whether the disparity is practically significant enough to cause harm at your deployment scale
  • Whether fixing this disparity will create new disparities for other groups or other definitions of fairness (as Chapter 9's impossibility results show, it is mathematically impossible to satisfy all fairness criteria simultaneously when base rates differ)

Statistical significance with small groups: If any demographic group in your test set has fewer than 100 instances, treat the fairness metrics for that group with extreme caution. Small samples produce high-variance estimates. Use confidence intervals. Bootstrap the metrics across multiple test set samples. If a group is too small for reliable estimation, document this limitation prominently and consider targeted data collection before deployment.

When mitigation is and is not appropriate: Technical mitigation (reweighing, constraint-based training) is appropriate when the disparity is attributable to the model or training data, the mitigation does not introduce new harms, stakeholders have reviewed and accepted the trade-offs, and the underlying business objective is ethically sound. Mitigation is not appropriate as a substitute for addressing the root causes of inequality in your data or decision context, or when the model should not be deployed at all due to fundamental ethical concerns about the use case.

Documenting your analysis: Every fairness audit should produce a written record that includes: the dataset used and its known limitations, the protected attributes examined, the metrics computed and their values, any mitigation applied, the accuracy-fairness trade-off accepted, who reviewed the results, and when the next audit is scheduled. Chapter 19 provides an audit documentation template.

The human judgment requirement: No code can substitute for the judgment required to decide whether a disparity is acceptable, who was harmed, and what remedy is appropriate. These are ethical, legal, and organizational questions that require human accountability. Use these tools to inform that judgment, not to replace it.


Section 10: Common Errors and Troubleshooting

Installation Errors

ImportError: cannot import name 'X' from 'fairlearn' Fairlearn has undergone API changes across versions. Run pip show fairlearn to check your version. This appendix targets fairlearn 0.10. If you are on 0.7 or earlier, some metric names differ. Consult the migration guide in the Fairlearn changelog.

ModuleNotFoundError: No module named 'cvxpy' AIF360's post-processing algorithms depend on cvxpy for optimization. Install it separately: pip install cvxpy. On Windows, this may require Visual C++ build tools.

pip install aif360 fails with dependency conflicts Try pip install aif360 --no-deps then pip install BlackBoxAuditing. The AIF360 repository README lists minimum required dependencies.

Data Format Issues

ValueError: sensitive_features must be a 1-D array Fairlearn expects sensitive_features as a 1-D array or pandas Series. If you pass a DataFrame column, ensure it is a Series: df['sex_binary'] not df[['sex_binary']]. Two square brackets returns a DataFrame; one returns a Series.

ValueError: y_true and y_pred must have the same shape This usually occurs after a train-test split where index alignment is lost. Reset indices before passing to metrics: y_test.reset_index(drop=True).

AIF360 BinaryLabelDataset rejects your DataFrame AIF360 is strict about column types. Ensure all feature columns are numeric (use df.dtypes to check) and that the label column contains only 0 and 1. Categorical columns not encoded as integers will cause errors.

Sensitive Features Parameter Conventions

Different libraries use different parameter names and formats: - Fairlearn: sensitive_features= — accepts a pandas Series, 1-D numpy array, or DataFrame with one column - AIF360: sensitive attributes are defined at dataset creation time in protected_attribute_names= — not passed to metric functions separately - SHAP: SHAP itself is not fairness-aware; you analyze feature attributions by group manually after computing SHAP values

NaN Handling

All three libraries are intolerant of NaN values. Before passing data to any fairness library, run:

assert X_test.isnull().sum().sum() == 0, "NaNs in features"
assert y_test.isnull().sum() == 0, "NaNs in labels"
assert sensitive.isnull().sum() == 0, "NaNs in sensitive features"

Missing values in the sensitive features column are a particularly common source of errors because NaN handling during preprocessing often drops rows inconsistently, causing length mismatches.

Small Sample Size Warnings

If Fairlearn prints UserWarning: The number of samples in one or more groups is small, the metric estimates for those groups may be unreliable. Do not include those group-level metrics in a report without flagging the sample size limitation. Consider aggregating rare categories (e.g., grouping race categories with fewer than 200 test instances under "Other" with an explicit methodological note) and documenting the aggregation decision.


Further Resources

Library Documentation

  • Fairlearn: https://fairlearn.org/main/user_guide/ — The user guide covers all metrics and mitigation algorithms with worked examples. The "Fairness in Machine Learning" tutorial on the Fairlearn website is the best starting point.

  • AI Fairness 360 (AIF360): https://aif360.mybluemix.net/ — IBM's interactive demo lets you explore bias metrics before writing code. The GitHub repository at https://github.com/Trusted-AI/AIF360 includes dataset-specific notebooks for Adult, German Credit, COMPAS, and several medical datasets.

  • SHAP: https://shap.readthedocs.io/ — The official documentation includes notebooks for tree models, linear models, and deep learning models. The "TabularExamples" section is most relevant for the use cases in this textbook.

  • LIME: https://lime-ml.readthedocs.io/ — The repository at https://github.com/marcotcr/lime includes notebooks for tabular, text, and image explanations.

Key Papers

  • Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press. (Available free online at https://fairmlbook.org) — This is the definitive technical reference for the concepts underlying all the metrics in this appendix.

  • Hardt, M., Price, E., and Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." Advances in Neural Information Processing Systems 29. — The paper that introduced equalized odds and equal opportunity as formal criteria.

  • Ribeiro, M.T., Singh, S., and Guestrin, C. (2016). "'Why Should I Trust You?': Explaining the Predictions of Any Classifier." KDD 2016. — The original LIME paper.

  • Lundberg, S.M. and Lee, S.I. (2017). "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems 30. — The original SHAP paper.

  • Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2). — Demonstrates the impossibility of simultaneous fairness criteria with the COMPAS dataset.

Additional Tools

Microsoft Responsible AI Toolbox: A unified interface for fairness assessment, error analysis, counterfactual analysis, and causal inference. Integrates Fairlearn and InterpretML in a single dashboard. Available at https://responsibleaitoolbox.ai/ and installable via pip install raiwidgets.

Google What-If Tool: A browser-based tool that requires no coding. Upload a model and dataset and interactively explore fairness metrics, counterfactuals, and individual predictions. Accessible at https://pair-code.github.io/what-if-tool/. Particularly useful for communicating fairness concepts to stakeholders who do not write code — a scenario discussed in Chapter 15.

Aequitas: A bias and fairness audit toolkit from the Center for Data Science and Public Policy at the University of Chicago. Provides an interactive web interface and a Python API. Well-suited for public sector and criminal justice applications. Available at https://github.com/dssg/aequitas.

Scikit-fairness (skfair): A scikit-learn compatible library with additional preprocessing steps for fairness-aware pipelines. Available via pip install scikit-fairness.


This appendix provides starting-point implementations. Real-world fairness audits require deeper engagement with the specific regulatory environment, affected communities, and deployment context of each system. The code here will help you identify what to investigate — human judgment, legal expertise, and community engagement determine what to do about it.