24 min read

> "A model without documentation is a decision without justification."

Learning Objectives

  • Compare major responsible AI frameworks (Microsoft, Google, OECD AI Principles) and evaluate their strengths and limitations
  • Explain the purpose and components of model cards (Mitchell et al. 2019) and datasheets for datasets (Gebru et al. 2021)
  • Implement a ModelCard dataclass in Python that generates a formatted model card document for an AI system
  • Design and execute a red-teaming exercise to identify vulnerabilities and failure modes in AI systems
  • Develop a monitoring plan for detecting model drift, performance degradation, and emergent bias in deployed AI systems
  • Create model documentation for VitraMed's clinical prediction model that meets responsible AI standards
  • Evaluate real-world cases of responsible AI success and failure

Chapter 29: Responsible AI Development

"A model without documentation is a decision without justification." — Margaret Mitchell, co-author of the Model Cards paper

Chapter Overview

Chapter 28 established the assessment processes — PIAs, DPIAs, AIAs — that evaluate data practices before deployment. This chapter complements those assessments with the documentation and development practices that make responsible AI possible from the inside out.

The challenge is this: an AI model deployed in production makes thousands or millions of decisions, each affecting real people. A credit scoring model determines who receives a loan. A clinical risk model determines which patients receive preventive interventions. A content moderation model determines what speech is amplified and what is suppressed. The people affected by these decisions have a right to know what the model does, how it was built, where it works well, where it fails, and what ethical considerations shaped its development.

Yet most deployed AI models are poorly documented. Their training data is undescribed. Their limitations are unstated. Their intended use is undefined, making out-of-scope use undetectable. Their performance across demographic groups is unmeasured or unreported. They are, as Chapter 16 described, black boxes — and they remain black boxes by design or by neglect.

This chapter introduces the practices that open the box: model cards, datasheets for datasets, red-teaming, and deployment monitoring. It provides a Python implementation of a ModelCard dataclass that demonstrates how documentation can be systematized. And it follows VitraMed as it creates its first model card for the clinical prediction model assessed in Chapter 28's DPIA.

In this chapter, you will learn to: - Navigate the landscape of responsible AI frameworks and evaluate their practical value - Create model cards and datasheets that meet transparency and accountability standards - Implement a ModelCard in Python that generates structured documentation - Plan and execute adversarial testing (red-teaming) for AI systems - Design a deployment monitoring plan that detects drift, degradation, and emergent bias - Apply these practices to VitraMed's clinical prediction model


29.1 Responsible AI Frameworks: A Comparative Analysis

29.1.1 The Proliferation of Principles

Between 2016 and 2023, more than 160 sets of AI ethics principles were published by corporations, governments, international organizations, and civil society groups. This proliferation reflects genuine concern — but it also creates confusion. Which principles matter? How do they differ? And do any of them actually change what organizations build?

29.1.2 Three Influential Frameworks

The OECD AI Principles (2019)

Adopted by 46 countries, the OECD principles represent the closest thing to an international consensus on responsible AI:

  1. Inclusive growth, sustainable development, and well-being. AI should benefit people and the planet.
  2. Human-centred values and fairness. AI should respect human rights, democratic values, and fairness; safeguards should enable human intervention.
  3. Transparency and explainability. People should understand when they interact with an AI system and be able to challenge its output.
  4. Robustness, security, and safety. AI systems should function safely, and risks should be continuously assessed.
  5. Accountability. Organizations and individuals developing AI should be accountable for its functioning.

The OECD principles shaped the EU AI Act (Chapter 21), influenced national AI strategies, and provided a common vocabulary for international discussions.

Limitation: The principles are high-level and non-binding. They provide a compass but not a map.

Microsoft's Responsible AI Standard (2022)

Microsoft's internal standard translates high-level principles into specific engineering requirements. It specifies:

  • Impact assessments (RAIA) at defined development milestones
  • Fairness requirements: Models must be tested for disparate impact; results must be documented
  • Reliability and safety requirements: Systems must meet defined performance thresholds; failure modes must be identified
  • Privacy and security: Data practices must comply with privacy-by-design principles
  • Inclusiveness: Systems must be designed for diverse users; accessibility must be evaluated
  • Transparency: Users must be informed when they interact with AI; model cards or transparency notes must be published

Strength: Specificity. Unlike abstract principles, Microsoft's standard includes testable requirements that engineers can implement.

Limitation: It is an internal corporate document. External parties cannot verify compliance. The 2023 layoffs of Microsoft's ethics and society team raised questions about the standard's enforceability even within Microsoft.

Google's AI Principles (2018)

Published after the Project Maven controversy (military AI applications), Google's principles state that AI should:

  1. Be socially beneficial
  2. Avoid creating or reinforcing unfair bias
  3. Be built and tested for safety
  4. Be accountable to people
  5. Incorporate privacy design principles
  6. Uphold high standards of scientific excellence
  7. Be made available for uses that accord with these principles

Google also identified four AI applications it "will not pursue": weapons, surveillance violating international norms, technologies contravening international law, and technologies whose purpose "contravenes widely accepted principles of international law and human rights."

Strength: The explicit exclusion of specific application areas is rare among corporate principles.

Limitation: The principles were published without a governance mechanism for enforcement. The subsequent departure of AI ethics leaders Timnit Gebru and Margaret Mitchell, and the dissolution of the Ethical AI team's leadership, raised fundamental questions about whether the principles constrain Google's behavior or merely describe its aspirations.

29.1.3 The Principles-to-Practice Gap

"I've read every major AI ethics framework published since 2016," Dr. Adeyemi told her class. "They all say roughly the same things: fairness, transparency, accountability, safety, privacy. The problem is not that we don't know what's right. The problem is that principles without implementation mechanisms are wishes, not governance."

The gap between principles and practice is well-documented:

  • A 2021 analysis by Algorithm Watch found that of 160+ AI ethics guidelines, fewer than 15% included specific implementation mechanisms.
  • A 2022 study by Mittelstadt found that most corporate AI ethics principles lacked enforcement mechanisms, accountability structures, and meaningful consequences for violations.
  • Industry surveys consistently find that engineers and data scientists report that ethical principles have "little to no impact" on their day-to-day work.

The remainder of this chapter focuses on the practices that bridge this gap: documentation (model cards, datasheets), testing (red-teaming), and monitoring (deployment surveillance). These are not principles — they are tools.


29.2 Model Cards: Documenting What You Build

29.2.1 The Origin of Model Cards

In 2019, Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru published "Model Cards for Model Reporting" — a paper that proposed a standardized framework for documenting machine learning models.

The insight was simple and powerful: AI models are products, and products should come with documentation. A pharmaceutical drug has a label specifying its ingredients, intended use, dosage, side effects, contraindications, and patient population. An AI model should have equivalent documentation specifying its purpose, training data, performance characteristics, limitations, and ethical considerations.

29.2.2 Model Card Components

A model card includes:

Model Details. What is the model? Who built it? What version is this? When was it created?

Intended Use. What is the model designed to do? What population is it designed for? In what context should it be deployed?

Out-of-Scope Uses. What should the model not be used for? This is as important as the intended use — many AI harms arise from deploying models in contexts they were not designed for.

Training Data. What data was the model trained on? How was it collected? What populations does it represent? What populations are underrepresented or absent?

Evaluation Metrics. How was the model's performance measured? What are the key metrics (accuracy, precision, recall, F1, AUC)? How does performance vary across demographic groups?

Ethical Considerations. What ethical risks does this model present? What mitigations have been implemented? What risks remain unresolved?

Limitations. Where does the model fail? What are its known weaknesses? Under what conditions should its output not be trusted?

29.2.3 Why Model Cards Matter

For developers: Model cards force developers to articulate what their model does and doesn't do, surfacing assumptions and blindspots that informal documentation misses.

For decision-makers: Model cards enable informed deployment decisions. A product manager reading a model card that states "performance degrades significantly for patients over 75" can make a governance decision about whether to deploy the model for that population.

For affected communities: Model cards provide a basis for accountability. If a model card states "not intended for use in hiring decisions" and the model is later used for hiring, the documentation demonstrates that the deployment was out-of-scope.

For regulators: Model cards create an audit trail. Regulators can evaluate whether a deployed model aligns with its documentation, whether the documented limitations were communicated to users, and whether ethical considerations were adequately addressed.

"A model without a model card is like a prescription drug without a label," Ray Zhao observed. "You might be taking the right medication. But you have no way to know."


29.3 Datasheets for Datasets

29.3.1 The Proposal

In 2021, Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daume III, and Kate Crawford published "Datasheets for Datasets" — a companion to model cards, focused on the data that models are trained on.

The motivation parallels model cards: just as models should be documented, so should the datasets that shape them. A model trained on biased data will produce biased outputs. A model trained on unrepresentative data will fail for underrepresented populations. But without dataset documentation, these problems are invisible until the model is deployed and harm occurs.

29.3.2 Datasheet Components

A datasheet answers seven categories of questions:

Motivation. Why was the dataset created? For what purpose? By whom? Was it funded, and if so, by whom?

Composition. What do the instances represent? How many are there? What data is included? Is any information missing? Does the dataset contain data that might be considered sensitive?

Collection Process. How was the data collected? Who collected it? Over what timeframe? Were data subjects aware of the collection? Did they consent?

Preprocessing/Cleaning. Was the raw data processed? What transformations were applied? Was any data excluded, and if so, why?

Uses. What tasks has the dataset been used for? Are there tasks it should not be used for? Has it been used for purposes beyond its original intent?

Distribution. How is the dataset distributed? Are there restrictions on access? Is it subject to any regulatory requirements?

Maintenance. Who is responsible for maintaining the dataset? Will it be updated? How will errors be corrected? Is there a mechanism for data subjects to request removal?

29.3.3 The Dataset-Ethics Connection

"Every dataset is a political document," Dr. Adeyemi said. "It reflects choices about who counts, what matters, and whose reality is represented. A datasheet makes those choices visible."

Consider VitraMed's training data for the clinical prediction model. Without a datasheet, the following facts remain hidden:

  • The training data overrepresents patients from suburban clinics (70% of the dataset) and underrepresents patients from rural clinics (12%) and urban safety-net clinics (18%)
  • The dataset spans 2018-2025, but data from 2020-2021 is anomalous due to COVID-19 (delayed diagnoses, reduced clinic visits, atypical patterns)
  • Patients who left VitraMed's network are excluded from follow-up data, creating survivorship bias
  • Race and ethnicity data is missing for approximately 30% of records

A datasheet would document each of these limitations, enabling developers to address them and users to understand the model's blindspots.

Common Pitfall: Many organizations publish model cards without datasheets — documenting the model but not the data. This is like publishing a drug label without disclosing the clinical trial population. The model card says "accuracy: 92%," but the datasheet would reveal "accuracy: 92% for suburban patients aged 35-65; accuracy: 78% for urban patients over 75." The aggregate hides the disparity.


29.4 The ModelCard: A Python Implementation

29.4.1 Design Rationale

The following Python implementation demonstrates how model documentation can be systematized. The ModelCard dataclass captures the essential components of a model card and generates a formatted report that can be reviewed, published, and audited.

Python Context: This implementation connects to Chapter 27's DataLineageTracker (which tracks data assets) and Chapter 22's DataQualityAuditor (which assesses data quality). Together, these three tools form a documentation pipeline: the datasheet documents the training data, the quality auditor assesses its integrity, the lineage tracker follows it through the pipeline, and the model card documents the resulting model.

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional


@dataclass
class ModelCard:
    """
    A structured model card following Mitchell et al. (2019).

    Documents an ML model's purpose, performance, limitations,
    and ethical considerations in a format suitable for review,
    publication, and audit.

    Usage:
        card = ModelCard(
            model_name="Patient Risk Predictor",
            version="2.1",
            ...
        )
        print(card.generate_report())
    """
    # Model details
    model_name: str
    version: str
    description: str
    author: str
    date: str
    model_type: str = ""  # e.g., "gradient boosted classifier"
    license: str = ""

    # Use specification
    intended_use: str = ""
    out_of_scope_uses: str = ""
    primary_users: str = ""

    # Data
    training_data_summary: str = ""
    training_data_size: str = ""
    training_data_timeframe: str = ""
    preprocessing_steps: str = ""

    # Performance
    evaluation_metrics: dict = field(default_factory=dict)
    disaggregated_metrics: dict = field(default_factory=dict)

    # Ethics and limitations
    ethical_considerations: str = ""
    limitations: str = ""
    fairness_considerations: str = ""
    risks_and_harms: str = ""

    # Governance
    review_status: str = ""  # e.g., "Approved by Ethics Committee 2026-02-15"
    contact: str = ""
    update_schedule: str = ""

    def generate_report(self) -> str:
        """
        Generate a complete, formatted model card document.

        The report is structured for multiple audiences:
        - Technical reviewers can assess performance and methodology
        - Ethics committees can evaluate risks and safeguards
        - Regulators can verify compliance and accountability
        - Affected communities can understand how the model works
          and where it may fail

        Returns a formatted string ready for publication or review.
        """
        lines = []
        separator = "=" * 65

        # Header
        lines.append(separator)
        lines.append(f"MODEL CARD: {self.model_name}")
        lines.append(separator)
        lines.append("")

        # Model details
        lines.append("1. MODEL DETAILS")
        lines.append("-" * 40)
        lines.append(f"   Name:        {self.model_name}")
        lines.append(f"   Version:     {self.version}")
        lines.append(f"   Author:      {self.author}")
        lines.append(f"   Date:        {self.date}")
        if self.model_type:
            lines.append(f"   Type:        {self.model_type}")
        if self.license:
            lines.append(f"   License:     {self.license}")
        lines.append("")
        lines.append(f"   Description:")
        for line in self._wrap_text(self.description, 55):
            lines.append(f"   {line}")
        lines.append("")

        # Intended use
        lines.append("2. INTENDED USE")
        lines.append("-" * 40)
        if self.intended_use:
            lines.append("   Primary intended use:")
            for line in self._wrap_text(self.intended_use, 55):
                lines.append(f"   {line}")
        if self.primary_users:
            lines.append("")
            lines.append(f"   Primary users: {self.primary_users}")
        if self.out_of_scope_uses:
            lines.append("")
            lines.append("   OUT-OF-SCOPE USES (do not use for):")
            for line in self._wrap_text(self.out_of_scope_uses, 55):
                lines.append(f"   ! {line}")
        lines.append("")

        # Training data
        lines.append("3. TRAINING DATA")
        lines.append("-" * 40)
        if self.training_data_summary:
            for line in self._wrap_text(
                self.training_data_summary, 55
            ):
                lines.append(f"   {line}")
        if self.training_data_size:
            lines.append(f"   Dataset size: {self.training_data_size}")
        if self.training_data_timeframe:
            lines.append(
                f"   Timeframe:    {self.training_data_timeframe}"
            )
        if self.preprocessing_steps:
            lines.append("")
            lines.append("   Preprocessing:")
            for line in self._wrap_text(
                self.preprocessing_steps, 55
            ):
                lines.append(f"   {line}")
        lines.append("")

        # Evaluation metrics
        lines.append("4. EVALUATION METRICS")
        lines.append("-" * 40)
        if self.evaluation_metrics:
            lines.append("   Overall performance:")
            max_key_len = max(
                len(str(k)) for k in self.evaluation_metrics
            )
            for metric, value in self.evaluation_metrics.items():
                padding = " " * (max_key_len - len(str(metric)))
                if isinstance(value, float):
                    lines.append(
                        f"   {metric}:{padding} {value:.4f}"
                    )
                else:
                    lines.append(
                        f"   {metric}:{padding} {value}"
                    )
        else:
            lines.append("   No evaluation metrics provided.")
        lines.append("")

        # Disaggregated metrics
        if self.disaggregated_metrics:
            lines.append("   Disaggregated performance:")
            for group, metrics in self.disaggregated_metrics.items():
                lines.append(f"   [{group}]")
                for metric, value in metrics.items():
                    if isinstance(value, float):
                        lines.append(
                            f"     {metric}: {value:.4f}"
                        )
                    else:
                        lines.append(f"     {metric}: {value}")
            lines.append("")

        # Ethical considerations
        lines.append("5. ETHICAL CONSIDERATIONS")
        lines.append("-" * 40)
        if self.ethical_considerations:
            for line in self._wrap_text(
                self.ethical_considerations, 55
            ):
                lines.append(f"   {line}")
        else:
            lines.append(
                "   WARNING: No ethical considerations documented."
            )
        lines.append("")

        if self.fairness_considerations:
            lines.append("   Fairness considerations:")
            for line in self._wrap_text(
                self.fairness_considerations, 55
            ):
                lines.append(f"   {line}")
            lines.append("")

        if self.risks_and_harms:
            lines.append("   Risks and potential harms:")
            for line in self._wrap_text(self.risks_and_harms, 55):
                lines.append(f"   ! {line}")
            lines.append("")

        # Limitations
        lines.append("6. LIMITATIONS")
        lines.append("-" * 40)
        if self.limitations:
            for line in self._wrap_text(self.limitations, 55):
                lines.append(f"   {line}")
        else:
            lines.append(
                "   WARNING: No limitations documented. All models "
                "have limitations."
            )
        lines.append("")

        # Governance
        lines.append("7. GOVERNANCE")
        lines.append("-" * 40)
        if self.review_status:
            lines.append(f"   Review status: {self.review_status}")
        if self.contact:
            lines.append(f"   Contact:       {self.contact}")
        if self.update_schedule:
            lines.append(
                f"   Updates:       {self.update_schedule}"
            )
        lines.append("")

        # Footer
        lines.append(separator)
        lines.append(
            f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}"
        )
        lines.append(separator)

        return "\n".join(lines)

    @staticmethod
    def _wrap_text(text: str, width: int) -> list[str]:
        """Simple word-wrapping for display formatting."""
        words = text.split()
        lines = []
        current_line = []
        current_length = 0

        for word in words:
            if current_length + len(word) + 1 > width and current_line:
                lines.append(" ".join(current_line))
                current_line = [word]
                current_length = len(word)
            else:
                current_line.append(word)
                current_length += len(word) + 1

        if current_line:
            lines.append(" ".join(current_line))

        return lines if lines else [""]

29.4.2 Applying the ModelCard: VitraMed's Patient Risk Model

Following the DPIA conducted in Chapter 28, VitraMed's data science team — working with Dr. Khoury (DPO) and the ethics advisory group — created the first model card for their clinical prediction model.

vitramed_card = ModelCard(
    model_name="VitraMed Chronic Disease Risk Predictor",
    version="2.1.0",
    description=(
        "A gradient boosted classifier that predicts the probability "
        "of a patient developing Type 2 diabetes, hypertension, or "
        "cardiovascular disease within a 5-year horizon. The model "
        "generates a risk score (0.0 to 1.0) and a risk tier "
        "(Low/Medium/High/Critical) for each condition. Predictions "
        "are intended to support — not replace — clinical judgment "
        "in recommending preventive interventions."
    ),
    author="VitraMed Data Science Team (lead: Dr. Ravi Mehta)",
    date="2026-02-01",
    model_type="Gradient Boosted Classifier (XGBoost 2.0)",
    license="Proprietary — VitraMed Inc.",
    intended_use=(
        "Clinical decision support for licensed healthcare providers "
        "at VitraMed partner clinics. Risk scores are presented to "
        "clinicians alongside patient records to flag patients who "
        "may benefit from preventive care interventions, including "
        "lifestyle counseling, screening tests, and medication review. "
        "The model is intended to augment clinical judgment, not "
        "automate clinical decisions."
    ),
    primary_users=(
        "Licensed physicians, nurse practitioners, and physician "
        "assistants at VitraMed partner clinics"
    ),
    out_of_scope_uses=(
        "This model must NOT be used for: (1) Insurance underwriting "
        "or coverage decisions. (2) Employment screening. (3) Direct "
        "patient-facing recommendations without clinician review. "
        "(4) Populations not represented in the training data "
        "(pediatric patients under 18, patients outside the US). "
        "(5) Diagnostic purposes — the model predicts risk, not "
        "diagnosis. (6) Any automated decision-making without human "
        "oversight."
    ),
    training_data_summary=(
        "De-identified electronic health records from 142,000 patients "
        "across 480 VitraMed partner clinics in 32 US states. Records "
        "include demographics (age range, gender), medical history, "
        "diagnoses (ICD-10), medications, lab results (HbA1c, lipid "
        "panels, blood pressure), vital signs, and visit frequency. "
        "Race/ethnicity data is available for ~70% of records. "
        "IMPORTANT: Training data overrepresents suburban clinics (70%) "
        "relative to urban safety-net (18%) and rural (12%) clinics. "
        "COVID-era data (2020-2021) included but flagged as anomalous."
    ),
    training_data_size="142,000 patient records; 47 features",
    training_data_timeframe="January 2018 - December 2025",
    preprocessing_steps=(
        "ICD-9 codes converted to ICD-10. Missing values imputed "
        "using multiple imputation (MICE). Continuous variables "
        "normalized. Zip code and insurance type EXCLUDED as features "
        "after bias analysis revealed correlation with race and "
        "socioeconomic status. Age binned into 5-year ranges. "
        "COVID-era records (2020-2021) downweighted by 0.5 to reduce "
        "influence of anomalous patterns."
    ),
    evaluation_metrics={
        "AUC-ROC (overall)": 0.847,
        "Precision": 0.812,
        "Recall": 0.789,
        "F1 Score": 0.800,
        "Calibration (Brier)": 0.142,
    },
    disaggregated_metrics={
        "Age 18-44": {
            "AUC-ROC": 0.871,
            "Precision": 0.835,
            "Recall": 0.821,
        },
        "Age 45-64": {
            "AUC-ROC": 0.858,
            "Precision": 0.824,
            "Recall": 0.803,
        },
        "Age 65+": {
            "AUC-ROC": 0.792,
            "Precision": 0.761,
            "Recall": 0.723,
        },
        "Suburban clinics": {
            "AUC-ROC": 0.862,
            "Precision": 0.831,
            "Recall": 0.808,
        },
        "Urban safety-net clinics": {
            "AUC-ROC": 0.803,
            "Precision": 0.774,
            "Recall": 0.741,
        },
        "Rural clinics": {
            "AUC-ROC": 0.811,
            "Precision": 0.782,
            "Recall": 0.756,
        },
    },
    ethical_considerations=(
        "This model processes sensitive health data and generates "
        "predictions that influence clinical care decisions. Key "
        "ethical considerations: (1) Risk scores may create anchoring "
        "bias — clinicians may over-rely on the model's prediction "
        "rather than exercising independent judgment. (2) Patients "
        "are not directly informed that their data is used for risk "
        "prediction (remediation in progress per DPIA findings). "
        "(3) Performance disparities across demographic groups mean "
        "that some populations receive less accurate predictions, "
        "potentially leading to under-referral for preventive care."
    ),
    fairness_considerations=(
        "The model exhibits lower performance for patients age 65+ "
        "(AUC-ROC 0.792 vs. 0.847 overall) and for patients at urban "
        "safety-net clinics (AUC-ROC 0.803 vs. 0.862 at suburban "
        "clinics). These disparities reflect training data imbalances. "
        "Remediation plan: (1) Active recruitment of additional urban "
        "and rural clinic data for v3.0. (2) Calibration adjustment "
        "for underperforming subgroups. (3) Mandatory clinician "
        "notification when predictions are generated for populations "
        "with lower model confidence. Zip code and insurance type "
        "were excluded as features to prevent proxy discrimination."
    ),
    risks_and_harms=(
        "RISK 1: False negatives — patients incorrectly classified as "
        "low-risk may miss preventive interventions. Mitigation: "
        "model is decision-support only; clinical judgment remains "
        "primary. RISK 2: False positives — patients incorrectly "
        "classified as high-risk may undergo unnecessary testing, "
        "causing anxiety and cost. Mitigation: clinician review "
        "required before any intervention. RISK 3: If risk scores "
        "are shared with insurance partners (currently under ethics "
        "committee review), patients could face coverage or premium "
        "consequences. Mitigation: insurance sharing paused pending "
        "ethics review; if resumed, only aggregate clinic-level "
        "data will be shared. RISK 4: Model drift — as patient "
        "populations and medical practices change, model accuracy "
        "may degrade. Mitigation: quarterly revalidation and "
        "automated drift detection."
    ),
    limitations=(
        "This model has the following known limitations: "
        "(1) Not validated for pediatric populations (under 18). "
        "(2) Not validated for populations outside the United States. "
        "(3) Lower accuracy for patients age 65+ and patients at "
        "urban safety-net and rural clinics. (4) Training data "
        "includes COVID-era anomalies that may affect predictions "
        "for patients with pandemic-related care gaps. (5) The model "
        "does not account for social determinants of health (housing, "
        "food security, environmental exposures) — these factors "
        "significantly influence chronic disease risk but are not "
        "captured in EHR data. (6) Race/ethnicity data is missing "
        "for ~30% of training records, limiting the ability to "
        "assess racial disparities in model performance."
    ),
    review_status=(
        "DPIA completed 2026-02-10 (DPIA #VTM-2026-001). Ethics "
        "Advisory Group review completed 2026-02-15. Approved for "
        "clinical decision support use with conditions: (1) patient "
        "notification to be implemented by 2026-06-01, (2) insurance "
        "partner data sharing paused, (3) quarterly fairness audit "
        "required."
    ),
    contact="Dr. Amina Khoury, DPO (dpo@vitramed.example.com)",
    update_schedule=(
        "Model retrained quarterly. Model card updated with each "
        "retraining. Next scheduled update: 2026-05-01."
    ),
)

print(vitramed_card.generate_report())

29.4.3 Reading the Model Card: What It Reveals

The VitraMed model card demonstrates several principles of responsible documentation:

Honesty about limitations. The card does not claim universal accuracy. It documents specific populations where the model underperforms — age 65+, urban safety-net clinics, rural clinics — and explains why (training data imbalance). This honesty enables informed deployment decisions.

Explicit out-of-scope uses. The card clearly states what the model must not be used for, including insurance underwriting and automated decision-making. If VitraMed later uses the model for insurance purposes, the model card serves as evidence that this use was known to be out-of-scope.

Disaggregated metrics. Overall metrics (AUC-ROC 0.847) look strong. But disaggregated by population, the picture is more nuanced: AUC-ROC drops to 0.792 for patients over 65 and 0.803 for urban safety-net clinics. Publishing disaggregated metrics is an act of transparency that many organizations resist — because it reveals that a model that works well on average may work poorly for the populations most in need.

Connection to governance. The card references the DPIA (Chapter 28), the ethics advisory group review, and specific conditions attached to approval. It is not an isolated document — it is embedded in an organizational governance structure.

Eli, reviewing the model card for a class assignment, noticed something: "The limitations section says the model doesn't account for social determinants of health — housing, food security, environmental exposures. But those are the factors that drive chronic disease in communities like mine. So the model is best at predicting risk for people who are already well-served by the healthcare system, and worst at predicting risk for people who face the most barriers. That's the power asymmetry showing up in the metrics."

The Power Asymmetry in Model Performance: Eli's observation highlights a structural pattern: AI models trained on existing institutional data tend to perform best for populations that are best served by existing institutions — and worst for populations that are marginalized, underrepresented, or poorly served. Model cards make this pattern visible. But visibility alone does not resolve it.


29.5 Red-Teaming and Adversarial Testing

29.5.1 What Is Red-Teaming?

Red-teaming — borrowed from military and cybersecurity practice — involves a dedicated team attempting to find ways to make a system fail, produce harmful outputs, or be exploited. In the AI context, red-teaming means systematically probing a model to discover:

  • Failure modes. Under what inputs does the model produce incorrect, absurd, or harmful outputs?
  • Bias vulnerabilities. Can the model be made to produce discriminatory outputs through specific input patterns?
  • Adversarial attacks. Can an attacker manipulate inputs to cause the model to produce a desired (incorrect) output?
  • Misuse potential. Can the model be used for purposes other than its intended use, including harmful purposes?

29.5.2 Red-Teaming in Practice

Structured red-teaming process:

  1. Define scope. What system is being tested? What are the boundaries of the exercise?
  2. Assemble a diverse team. Include people with different backgrounds, perspectives, and technical skills. People from affected communities are essential — they can identify failure modes that developers, who typically are not members of affected communities, would miss.
  3. Develop attack scenarios. Generate specific scenarios designed to test the system's weaknesses. These should include both technical attacks (adversarial inputs, boundary conditions) and social attacks (misuse scenarios, unintended applications).
  4. Execute tests. Systematically probe the system using the defined scenarios. Document every finding, including failures the system handles well (these provide confidence in the system's robustness).
  5. Report and remediate. Compile findings into a structured report. Prioritize issues by severity and likelihood. Develop and implement remediations for critical findings.
  6. Re-test. After remediations are implemented, re-test to verify they are effective without introducing new vulnerabilities.

29.5.3 Red-Teaming VitraMed's Model

VitraMed's ethics advisory group recommended a red-teaming exercise for the clinical prediction model. The exercise surfaced several findings:

Finding 1: Medication-based gaming. If a patient's medication list is updated to include medications typically prescribed for high-risk conditions (even if the patient doesn't actually take them), the risk score increases dramatically. A data entry error — entering the wrong medication — could trigger unnecessary interventions.

Finding 2: Visit frequency bias. Patients who visit the clinic more frequently receive higher risk scores, because the model interprets frequent visits as a signal of declining health. But some patients visit frequently for routine care (pregnancy, chronic condition management), not because they are at higher risk.

Finding 3: Missing data pattern. When lab results are missing (because the patient hasn't had recent bloodwork), the model imputes values that systematically bias toward higher risk. This means patients with less access to care (fewer lab results) are flagged as higher risk — a pattern that could lead to overintervention for already-underserved populations.

Each finding was documented, prioritized, and added to the model card's limitations section. Remediation plans were developed for findings 2 and 3 (finding 1 was deemed low-probability given clinical workflow safeguards).

Reflection: If you were red-teaming an AI hiring system, what attack scenarios would you design? What failure modes would you look for? Who would you include on the red team?


29.6 Monitoring Deployed Models

29.6.1 The Deployment Paradox

Many organizations invest heavily in model development — training, evaluation, documentation, review — and then deploy the model with minimal ongoing monitoring. This is the deployment paradox: the model is treated as a finished product when, in fact, deployment is where the hardest problems begin.

Deployed models face challenges that laboratory conditions do not:

Data drift. The real-world data the model encounters may differ from the training data — new diseases, changing demographics, evolving treatment practices.

Concept drift. The underlying relationships the model learned may change. The factors that predicted diabetes risk in 2020 may not be the same factors that predict it in 2027.

Population shift. The population served by the model may change as VitraMed expands to new regions or clinic types.

Feedback loops. The model's own outputs can change the data it receives. If high-risk patients receive interventions that reduce their risk, the model may appear to be working well (fewer high-risk patients develop conditions) even as it becomes less accurate (because the intervention, not the prediction, is driving outcomes).

29.6.2 A Monitoring Framework

Effective deployment monitoring requires:

Performance monitoring. Track key metrics (AUC-ROC, precision, recall) on a rolling basis. Set thresholds: if AUC-ROC drops below 0.80, trigger a model review.

Drift detection. Compare the statistical distribution of incoming data to the training data distribution. Significant shifts in feature distributions or label rates indicate drift.

Fairness monitoring. Track disaggregated performance metrics continuously. If the performance gap between demographic groups widens, investigate and remediate.

Outcome monitoring. Track actual outcomes — did the high-risk patients actually develop the predicted conditions? This requires longer time horizons (years, for chronic disease prediction) but is the ultimate test of model validity.

Anomaly detection. Flag unusual patterns — sudden spikes in high-risk classifications, clusters of predictions that don't match clinical expectations, unexpected correlations between inputs and outputs.

29.6.3 When to Intervene

Not every drift or degradation requires immediate action. A monitoring framework should define escalation thresholds:

Signal Response
Minor drift (feature distribution shift <5%) Log and monitor; include in next quarterly review
Moderate drift (5-15%) Investigate root cause; consider model retraining
Significant drift (>15%) Trigger model review; consider pausing deployment
Performance below threshold Mandatory retraining and re-evaluation
Fairness gap widening Investigate immediately; implement corrective measures
New failure mode discovered Red-team the specific failure; update model card

"Monitoring is the unglamorous cousin of model development," Ray Zhao observed. "Nobody gets promoted for monitoring a model that's working correctly. But when a model drifts and nobody notices, people get hurt. The monitoring team is the last line of defense."


29.7 Case Studies

29.7.1 Google's Model Cards in Practice

Background: Google was among the first major technology companies to adopt model cards, publishing them for several AI services including Face Detection, Object Detection, and various Cloud AI APIs. These model cards follow the Mitchell et al. (2019) framework and are publicly available.

What Google's model cards do well:

  • Disaggregated performance reporting. Google's face detection model card reports performance separately by skin tone (using the Fitzpatrick skin type scale), age, and gender — revealing, for example, that detection accuracy is lower for darker skin tones.
  • Explicit intended use and limitations. The model cards clearly state what the model is designed for and what it should not be used for.
  • Evaluation methodology. Detailed description of how performance was measured, including the evaluation datasets used.

What critics note:

  • Limited scope. Model cards are published for selected services, not all AI products. The most controversial applications may lack documentation.
  • Static documentation. Model cards represent a snapshot at publication time. If the model is updated, the card may lag behind, creating a gap between documentation and reality.
  • No accountability mechanism. The model card states limitations, but there is no mechanism to prevent the model from being used in ways the card identifies as out-of-scope.
  • Internal vs. external transparency. Google's internal model documentation is reportedly more detailed than the public-facing model cards. The public sees a curated summary, not the full picture.

Key lesson: Model cards are a necessary but not sufficient transparency mechanism. They make model characteristics visible, but visibility without accountability produces awareness without change.

29.7.2 When Models Drift: Real-World Deployment Failures

Case: The UK's A-Level Algorithm (2020)

In August 2020, the UK government deployed an algorithm to predict A-level exam results for students who could not sit exams due to the COVID-19 pandemic. The algorithm used schools' historical performance data and teachers' predicted grades.

The drift problem: The algorithm was developed using historical data from normal exam years. But 2020 was not a normal year. The algorithm did not account for the disruption caused by the pandemic — months of remote learning, unequal access to technology, and the psychological toll on students.

The failure: The algorithm systematically downgraded students at lower-performing schools (disproportionately attended by students from disadvantaged backgrounds) and upgraded students at higher-performing schools (disproportionately attended by affluent students). Approximately 40% of students received grades below their teachers' predictions.

Why monitoring failed: The algorithm was not deployed with meaningful monitoring. There was no mechanism to detect that the algorithm's outputs diverged systematically from teacher assessments. There was no fairness analysis comparing outcomes across school types. There was no process for students to challenge their algorithmically-assigned grades before university admissions decisions were made.

Aftermath: After widespread protests and media coverage, the government reversed course and awarded teacher-assessed grades instead. The episode damaged public trust in algorithmic decision-making in education and led to an inquiry by the UK Parliament's Education Select Committee.

Key lessons: 1. Models trained on historical data fail when the present diverges from the past. The algorithm assumed continuity with previous years — an assumption that COVID-19 invalidated. 2. Deployment without monitoring is deployment without accountability. If the algorithm's outputs had been monitored for fairness and compared against teacher assessments in real time, the disparate impact could have been identified before grades were issued. 3. Model cards matter. If a model card had been published for the A-level algorithm — documenting its training data, intended use, limitations, and fairness analysis — the mismatch between the model's design and its deployment context would have been immediately apparent.

Case: Zillow's iBuying Algorithm

In 2021, Zillow announced it was shutting down its iBuying division — an AI-powered home purchasing operation — after losing over $880 million. The company's algorithm predicted home values to determine purchase prices. When the housing market shifted rapidly, the model's predictions became unreliable: Zillow purchased homes at prices that exceeded their actual market value.

The drift: The algorithm was trained on a housing market characterized by steady price appreciation. When conditions changed — supply chain disruptions, interest rate signals, pandemic-driven migration — the model's assumptions broke down. The algorithm continued to make aggressive purchase recommendations even as market conditions deteriorated.

Why monitoring failed: Zillow reportedly had monitoring systems in place, but the feedback loop was too slow. By the time declining resale prices signaled that the algorithm's purchase recommendations were too high, Zillow had already acquired thousands of overvalued properties.

Key lesson: Monitoring must be designed for the speed of the decision-making process. If a model makes decisions daily but monitoring operates monthly, dangerous drift can accumulate before it's detected.

The Accountability Gap in Deployment: When a deployed model causes harm — a patient receives a wrong risk score, a student receives an unfair grade, a company loses $880 million — who is accountable? The model developer? The deployment decision-maker? The monitoring team that missed the drift? The organizational leadership that approved deployment without adequate safeguards? Effective responsible AI practice requires closing this gap by assigning clear accountability at each stage of the deployment lifecycle.


29.8 Chapter Summary

Key Concepts

  • Responsible AI frameworks (OECD, Microsoft, Google) provide principles, but the principles-to-practice gap remains the central challenge. Documentation, testing, and monitoring are the tools that bridge the gap.
  • Model cards (Mitchell et al. 2019) standardize AI model documentation, covering intended use, out-of-scope uses, training data, performance metrics, ethical considerations, and limitations. The ModelCard Python dataclass demonstrates how this documentation can be systematized.
  • Datasheets for datasets (Gebru et al. 2021) document the training data underlying AI models, making data provenance, composition, and limitations visible.
  • Red-teaming systematically probes AI systems for failure modes, bias vulnerabilities, and misuse potential. Diverse red teams — including members of affected communities — produce the most valuable findings.
  • Deployment monitoring detects data drift, concept drift, performance degradation, and emergent fairness problems in production AI systems. Monitoring must operate at the speed of the decision-making process.
  • VitraMed's model card documents the clinical prediction model's capabilities, limitations, and ethical considerations — connecting to the DPIA (Chapter 28) and the ethics governance structure (Chapter 26).

Key Debates

  • Should model cards be legally required for AI systems that affect fundamental rights (healthcare, credit, employment, criminal justice)?
  • Who should red-team AI systems — the developers, an internal team, or independent external auditors?
  • How should organizations balance transparency (publishing detailed model cards) against competitive concerns (revealing proprietary model details)?
  • Can deployment monitoring detect emergent harms that the model's developers did not anticipate — or only the harms they thought to look for?

Applied Framework

When evaluating any deployed AI system, apply the Documentation Test: 1. Does a model card exist? Is it comprehensive and honest? 2. Does a datasheet exist for the training data? Does it document limitations and composition? 3. Has the system been red-teamed? Were findings documented and addressed? 4. Is deployment monitoring active? Are there defined thresholds and escalation procedures? 5. Is there a feedback loop from deployment experience back to model documentation?

If any answer is "no," the system is not meeting responsible AI standards.


What's Next

In Chapter 30: When Things Go Wrong: Breach Response and Crisis Ethics, we confront the moment that every organization dreads — and that every organization must prepare for. A data breach at VitraMed exposes patient data, testing everything Part 5 has built: the ethics program, the stewardship structures, the assessment processes, the documentation practices. How Vikram Chakravarti responds to the crisis will determine whether VitraMed's ethics infrastructure is genuine or decorative — and what the cost of the answer will be.

Before moving on, complete the exercises and quiz to practice creating model cards, designing red-teaming exercises, and evaluating deployment monitoring frameworks.


Chapter 29 Exercises → exercises.md

Chapter 29 Quiz → quiz.md

Case Study: Google's Model Cards in Practice → case-study-01.md

Case Study: When Models Drift: Real-World Deployment Failures → case-study-02.md