Chapter 39 Exercises: AI Safety, Ethics, and Governance

Conceptual Exercises

Exercise 1: Identifying Bias Sources

For each scenario, identify the type(s) of bias present (selection, historical, measurement, algorithmic, deployment) and propose a mitigation strategy.

a) A resume screening model trained on a tech company's hiring history (80% male engineers) ranks male candidates higher. b) A medical AI trained on data from major hospitals performs poorly for rural populations. c) A credit scoring model uses zip code as a feature, which correlates strongly with race. d) A content recommendation system shows users increasingly extreme content because engagement metrics reward it. e) A recidivism prediction tool uses arrest records, which reflect over-policing of minority neighborhoods.

Exercise 2: Fairness Definitions

Consider a loan approval model with the following confusion matrix for two groups:

Group A (privileged): TP=800, FP=50, FN=100, TN=1050 Group B (unprivileged): TP=150, FP=30, FN=120, TN=700

a) Compute the positive prediction rate, TPR, and FPR for each group. b) Does the model satisfy demographic parity? Equalized odds? Equal opportunity? c) Which fairness definition would you prioritize for a loan approval system, and why? d) Can you adjust the classification threshold for Group B to achieve equal opportunity? What would the new threshold look like?

Exercise 3: Impossibility Theorem

a) State the impossibility theorem for fairness metrics in your own words. b) Give a concrete numeric example where calibration and equalized odds cannot both hold. c) How should practitioners handle this impossibility in practice? Who should decide which fairness criterion to prioritize? d) Are there situations where no fairness criterion should apply? Discuss.

Exercise 4: EU AI Act Classification

Classify each of the following AI systems under the EU AI Act risk categories (unacceptable, high-risk, limited risk, minimal risk) and justify your answer:

a) An AI-powered spam filter b) A facial recognition system for law enforcement c) An AI chatbot for customer service d) An AI system that scores students' exam essays e) A social media recommendation algorithm f) A self-driving car's perception system g) An AI that generates music in the style of a specific artist

Exercise 5: Differential Privacy

a) Explain the definition of $(\epsilon, \delta)$-differential privacy in plain language. b) If $\epsilon = 0$, what does this mean about the algorithm's privacy guarantee? c) If $\epsilon = \infty$, what does this mean? d) In DP-SGD, explain why gradient clipping is necessary before adding noise. e) How does the privacy budget ($\epsilon$) accumulate over multiple training steps?

Exercise 6: Deepfakes and Generative AI

a) List three categories of harm caused by deepfakes. b) Why is deepfake detection fundamentally challenging (hint: adversarial dynamics)? c) Describe the C2PA content provenance standard. What are its strengths and limitations? d) Should generative AI companies be held liable for harmful content generated by their models? Argue both sides.

Exercise 7: Environmental Impact

a) Why does training a large language model consume so much energy? b) Compare the environmental impact of training a model from scratch vs. fine-tuning a pre-trained model. c) What is "carbon-aware computing" and how can it reduce AI's carbon footprint? d) Should researchers be required to report the energy consumption and carbon emissions of their experiments? Justify your position.

Exercise 8: Responsible AI Framework

Design a responsible AI framework for a specific application (choose one): a) An AI-powered hiring tool b) A medical diagnosis assistant c) A criminal risk assessment tool

For your chosen application, address: 1. Fairness: Which groups might be affected? What fairness metrics apply? 2. Transparency: How will decisions be explained? 3. Privacy: What data is collected and how is it protected? 4. Safety: What could go wrong? What safeguards are needed? 5. Accountability: Who is responsible when the system makes a mistake? 6. Monitoring: How will you track performance and bias in production?

Programming Exercises

Exercise 9: Bias Detection Pipeline

Build a complete bias detection pipeline:

a) Generate a synthetic dataset with two groups, where the model's accuracy differs significantly between groups. b) Compute all fairness metrics from Section 39.1.3. c) Implement threshold adjustment (post-processing) to achieve equal opportunity. d) Compare the model's overall accuracy before and after adjustment. What is the cost of fairness?

Exercise 10: Adversarial Debiasing

Implement adversarial debiasing:

a) Train a classifier on a biased dataset (where the protected attribute is correlated with the label). b) Add an adversary that tries to predict the protected attribute from the classifier's output. c) Train the system with the adversarial loss. Compare fairness metrics before and after. d) Experiment with different adversary strengths (lambda values). Plot the accuracy-fairness trade-off.

Exercise 11: Differentially Private Training

Implement DP-SGD from scratch:

a) Train a model normally and record its accuracy. b) Implement per-sample gradient clipping and Gaussian noise addition. c) Train the same model with DP-SGD for different values of the noise multiplier. d) Plot the privacy-accuracy trade-off curve. e) Estimate the total privacy budget ($\epsilon$) using the basic composition theorem.

Exercise 12: Membership Inference Attack

Implement a simple membership inference attack:

a) Train a target model on a dataset (the "training set"). b) Create a shadow model with similar architecture on separate data. c) Train an attack model that distinguishes between the target model's outputs on training vs. non-training data. d) Report the attack accuracy. Does differential privacy reduce the attack's success?

Exercise 13: Model Card Generator

Write a Python function that generates a Model Card:

a) Accept model metadata (name, architecture, training data description). b) Accept evaluation results (overall and per-group metrics). c) Accept fairness metrics and ethical considerations. d) Output a formatted Markdown document. e) Test it on a model you train on a synthetic dataset with demographic groups.

Exercise 14: Fairness-Constrained Optimization

Implement a fairness-constrained training loop:

a) Define a loss function that combines cross-entropy with a fairness penalty. b) Use a Lagrangian relaxation approach: $\mathcal{L} = \mathcal{L}_{\text{task}} + \lambda \cdot \mathcal{L}_{\text{fairness}}$ c) Implement $\mathcal{L}_{\text{fairness}}$ as the squared difference in TPR between groups. d) Train with automatic adjustment of $\lambda$ (increase when constraint is violated, decrease when satisfied). e) Compare with unconstrained training and post-processing approaches.