Exercises — Chapter 30: The EU AI Act and Algorithmic Accountability


Exercise 1: AI System Classification Under the Four Risk Tiers

Instructions

For each of the six AI systems described below, (a) identify the most likely risk tier under the EU AI Act (Prohibited / High-Risk / Limited Risk / Minimal Risk / Uncertain — requires legal review), (b) cite the specific legal provision or Annex III category that supports your classification, and (c) identify any ambiguity or classification dispute that might arise.

Systems to Classify

System A: Automated Overdraft Refusal Model A UK retail bank operates an ML model that automatically refuses overdraft requests for EU-resident customers when the predicted probability of default exceeds a set threshold. No human reviews the refusal before it is communicated to the customer. The customer receives an automated email: "Your overdraft request has been declined."

System B: Executive Compensation Benchmarking Tool A financial services firm uses an AI tool to analyze market salary data and recommend compensation bands for senior executives. The tool processes publicly available salary surveys and generates recommended pay ranges. It is used internally by HR leadership; individual employees are not affected by its outputs directly.

System C: Real-Time Facial Expression Analysis in Video Interviews A financial services firm uses an AI tool to analyze facial expressions and micro-expressions during recorded video interviews with job applicants. The tool generates an "engagement and authenticity score" that is provided to hiring managers alongside the video recording.

System D: Customer Sentiment Monitoring in Call Centre A financial firm uses voice analytics AI to classify the emotional tone of customer service calls (frustrated, satisfied, neutral) in real time. The outputs are used to prioritize escalations and are shown on the supervising team leader's dashboard. The system runs continuously in the background of all calls without customer notification.

System E: Generative AI for Compliance Document Drafting A compliance team uses a large language model to draft initial versions of internal policies, procedure documents, and regulatory correspondence. A human compliance officer always reviews and approves the final document before it is issued. The LLM is not used for customer-facing decisions or credit assessments.

System F: AML Risk Score Contributing to Account Restrictions A bank's AML monitoring system generates a risk score for each customer account. When the score exceeds a threshold, the system automatically places a restriction on the account (blocks outbound payments) and generates a SAR referral queue item for a compliance analyst to review. The restriction takes effect before human review.


Exercise 2: Designing an Article 9 Risk Management System for a Credit Scoring AI

Instructions

You have been appointed as the EU AI Act compliance officer for Meridian Bank's retail credit scoring system, MeridianScore v3.1. The system is a gradient boosting model used to evaluate creditworthiness for personal loans, approved credit cards, and mortgage pre-qualification for EU-resident customers.

Design a risk management system for MeridianScore v3.1 that satisfies Article 9 of the EU AI Act. Your design should:

(a) Identify and categorize the known and foreseeable risks associated with MeridianScore, including risks to individuals (discriminatory outcomes, inaccurate assessments, automation bias in decision-making), risks to the firm (regulatory penalty, reputational harm, model drift), and risks to financial stability (systemic credit misallocation).

(b) Specify the risk evaluation methodology. How will identified risks be estimated and evaluated? What quantitative metrics will be used for accuracy, fairness, and stability monitoring? How will the evaluation framework address risks that emerge post-deployment that were not identified during development?

(c) Specify risk mitigation measures. For each major risk category identified in (a), describe at least one concrete mitigation measure. Include measures addressing: (i) discriminatory outcomes by protected characteristic; (ii) model drift over time; (iii) adversarial or fraudulent input manipulation; (iv) data quality failures.

(d) Describe the residual risk disclosure process. After mitigation, what residual risks remain? How will these be disclosed to deployers (the relationship managers and credit decision teams who use MeridianScore's outputs) in accordance with Article 13's transparency requirements?

(e) Define the lifecycle scope. How and when will the risk management system be reviewed and updated? What triggers a mandatory review (model update, significant performance change, new regulatory guidance, adverse event)?

Your answer should be structured as a policy framework outline — not a narrative essay. Use numbered sections, bullet points for specific requirements, and brief explanatory notes where needed.


Exercise 3: Drafting an Article 13 Instructions-for-Use Document for a Fraud Detection System

Instructions

Article 13 of the EU AI Act requires that high-risk AI systems be accompanied by instructions for use that provide sufficient transparency for deployers to understand outputs and use the system appropriately.

You are the provider of FraudSentinel, an ML-based real-time transaction fraud scoring system sold to financial institutions. FraudSentinel outputs a fraud probability score (0.0–1.0) and a risk category (Low / Medium / High / Block) for each transaction. High scores trigger automatic payment holds; Block-category transactions are rejected automatically.

Draft the key sections of an Article 13-compliant instructions-for-use document for FraudSentinel. Your document must include the following sections:

Section 1: System Description and Intended Purpose What FraudSentinel does; what it is designed to be used for; what it must not be used for.

Section 2: Performance Metrics The accuracy metrics that characterize FraudSentinel's performance. Include: overall precision and recall; false positive rate (legitimate transactions blocked); false negative rate (fraudulent transactions missed); performance variation across transaction types, geographic regions, and customer segments. Specify the conditions under which stated metrics are valid.

Section 3: Known Risks and Limitations Document at least four known risks or limitations that deployers must be aware of. For each, describe the risk and specify what deployers should do to mitigate it.

Section 4: Human Oversight Requirements Specify the human oversight measures that deployers must implement in accordance with Article 14. Include: the minimum qualification and training required for oversight personnel; the circumstances in which human review is required before a Hold decision becomes a Block; the process for customer dispute resolution; and the reporting requirements for detected system malfunctions.

Section 5: Input Data Requirements What input data does FraudSentinel require? What are the minimum data quality standards for each input field? What should deployers do when required input data is missing or of poor quality?

Section 6: Monitoring and Logging What events does FraudSentinel automatically log? What additional monitoring should deployers implement? What performance thresholds should trigger escalation to the provider?

Your document should be professionally structured — written as a real technical compliance document, not an academic exercise.


Exercise 4: Code Exercise — Implementing a classification_wizard() Method

Instructions

Extend the AIActComplianceRegister class from the chapter's Python section by implementing a classification_wizard() method. The method should:

  1. Accept a system record identifier or prompt the user to identify a system from the register
  2. Ask a structured sequence of classification questions based on the EU AI Act's risk tier logic
  3. Based on answers, suggest the most likely risk tier and the relevant legal provision
  4. Output a classification recommendation with rationale

Starting Code

from __future__ import annotations
from dataclasses import dataclass, field
from datetime import date
from enum import Enum
from typing import Optional


class AIRiskTier(Enum):
    PROHIBITED = "Prohibited — Must cease use"
    HIGH_RISK = "High Risk — Full compliance required"
    LIMITED_RISK = "Limited Risk — Transparency obligations"
    MINIMAL_RISK = "Minimal Risk — General law only"
    UNCERTAIN = "Classification Uncertain — Legal review needed"


class ComplianceStatus(Enum):
    COMPLIANT = "Compliant"
    PARTIAL = "Partially Compliant — Gaps Identified"
    NON_COMPLIANT = "Non-Compliant"
    NOT_YET_ASSESSED = "Not Yet Assessed"


@dataclass
class AISystemRecord:
    system_id: str
    name: str
    description: str
    provider: str
    deployer: str
    use_case: str
    affects_eu_customers: bool
    decision_type: str
    autonomous_decision: bool
    risk_tier: AIRiskTier = AIRiskTier.UNCERTAIN
    compliance_status: ComplianceStatus = ComplianceStatus.NOT_YET_ASSESSED
    classification_rationale: str = ""
    ce_marking: bool = False
    eu_database_registered: bool = False
    technical_documentation_complete: bool = False
    risk_management_system: bool = False
    human_oversight_implemented: bool = False
    logging_implemented: bool = False
    next_review_date: Optional[date] = None

    def compliance_gaps(self) -> list[str]:
        if self.risk_tier != AIRiskTier.HIGH_RISK:
            return []
        gaps = []
        if not self.technical_documentation_complete:
            gaps.append("Technical documentation not complete (Article 11 + Annex IV)")
        if not self.risk_management_system:
            gaps.append("Risk management system not implemented (Article 9)")
        if not self.human_oversight_implemented:
            gaps.append("Human oversight measures not implemented (Article 14)")
        if not self.logging_implemented:
            gaps.append("Automatic logging not implemented (Article 12)")
        if not self.ce_marking:
            gaps.append("CE marking and Declaration of Conformity not completed (Article 48)")
        if not self.eu_database_registered:
            gaps.append("Not registered in EU database (Article 49)")
        return gaps

    def readiness_score(self) -> float:
        if self.risk_tier != AIRiskTier.HIGH_RISK:
            return 1.0
        requirements = [
            self.technical_documentation_complete,
            self.risk_management_system,
            self.human_oversight_implemented,
            self.logging_implemented,
            self.ce_marking,
            self.eu_database_registered,
        ]
        return sum(requirements) / len(requirements)


class AIActComplianceRegister:
    def __init__(self, firm_name: str):
        self.firm_name = firm_name
        self._systems: dict[str, AISystemRecord] = {}

    def register(self, system: AISystemRecord) -> None:
        self._systems[system.system_id] = system

    def high_risk_systems(self) -> list[AISystemRecord]:
        return [s for s in self._systems.values() if s.risk_tier == AIRiskTier.HIGH_RISK]

    def systems_with_gaps(self) -> list[tuple[AISystemRecord, list[str]]]:
        result = []
        for system in self.high_risk_systems():
            gaps = system.compliance_gaps()
            if gaps:
                result.append((system, gaps))
        return result

    def unclassified_systems(self) -> list[AISystemRecord]:
        return [s for s in self._systems.values() if s.risk_tier == AIRiskTier.UNCERTAIN]

    def inventory_summary(self) -> dict:
        all_systems = list(self._systems.values())
        return {
            "total_systems": len(all_systems),
            "prohibited": sum(1 for s in all_systems if s.risk_tier == AIRiskTier.PROHIBITED),
            "high_risk": sum(1 for s in all_systems if s.risk_tier == AIRiskTier.HIGH_RISK),
            "limited_risk": sum(1 for s in all_systems if s.risk_tier == AIRiskTier.LIMITED_RISK),
            "minimal_risk": sum(1 for s in all_systems if s.risk_tier == AIRiskTier.MINIMAL_RISK),
            "unclassified": sum(1 for s in all_systems if s.risk_tier == AIRiskTier.UNCERTAIN),
            "fully_compliant": sum(
                1 for s in all_systems if s.compliance_status == ComplianceStatus.COMPLIANT
            ),
            "eu_affecting": sum(1 for s in all_systems if s.affects_eu_customers),
        }

    def readiness_dashboard(self) -> list[dict]:
        return sorted(
            [
                {
                    "system_id": s.system_id,
                    "name": s.name,
                    "tier": s.risk_tier.value.split(" —")[0],
                    "readiness": f"{s.readiness_score():.0%}",
                    "gaps": len(s.compliance_gaps()),
                    "eu_affecting": s.affects_eu_customers,
                }
                for s in self._systems.values()
                if s.risk_tier == AIRiskTier.HIGH_RISK
            ],
            key=lambda x: x["readiness"],
        )

    def classification_wizard(self, system_id: str) -> dict:
        """
        TODO: Implement this method.

        The method should:
        1. Retrieve the system record from self._systems by system_id
        2. Ask a structured sequence of YES/NO classification questions
        3. Apply the EU AI Act decision tree:
           - Does the system engage in a prohibited practice? (Article 5) -> PROHIBITED
           - Does the system affect EU customers? -> If no, limited Act scope
           - Does the system perform creditworthiness assessment? -> HIGH_RISK (Annex III(5)(b))
           - Does the system perform insurance risk assessment? -> HIGH_RISK (Annex III(5)(c))
           - Does the system perform recruitment/selection? -> HIGH_RISK (Annex III(4)(a))
           - Does the system perform biometric identification? -> HIGH_RISK (Annex III(1))
           - Does the system interact with humans without disclosing it is AI? -> LIMITED_RISK
           - Is the system a deepfake generator? -> LIMITED_RISK
           - Otherwise: MINIMAL_RISK with note to review Annex III completeness
        4. Return a dict with keys: system_id, name, suggested_tier, rationale, provision,
           requires_legal_review (bool), classification_confidence ("High"/"Medium"/"Low")

        The method should NOT use input() — instead accept a responses dict parameter
        for testability, with fallback to interactive prompting.

        Hint: Structure as a decision tree with early exits. Keep the question sequence
        logical and explainable — each question should correspond to a specific
        Article 5 prohibition or Annex III category.
        """
        raise NotImplementedError("Implement classification_wizard()")

Requirements

Your implementation must:

  1. Accept an optional responses: dict[str, bool] = None parameter alongside system_id — if responses are provided (keyed to question identifiers), use them instead of prompting; if not provided, use input() prompts
  2. Cover at least the following classification branches: Article 5 prohibited practices, Annex III(5)(b) credit, Annex III(5)(c) insurance, Annex III(4) employment, Annex III(1) biometric, Article 50 chatbot disclosure, and minimal risk
  3. Return a typed dict with keys: system_id, name, suggested_tier (an AIRiskTier value), rationale (string), provision (string citing specific Act article), requires_legal_review (bool), classification_confidence (one of "High", "Medium", "Low")
  4. After generating the recommendation, optionally update the system record's risk_tier and classification_rationale fields if the user confirms

Write a test harness demonstrating the wizard with at least three different AI systems (one that classifies as high-risk, one as limited-risk, one as uncertain requiring legal review).


Exercise 5: Building an EU AI Act Compliance Timeline for an EU-Serving Firm

Instructions

You are the Head of RegTech Compliance at Atlas Financial Services, a German asset manager with €12 billion AUM that serves retail and institutional clients across Germany, France, and the Netherlands. Atlas uses twelve AI systems across its operations. Following an initial inventory, the relevant systems have been classified as follows:

System ID Name Classification EU Customers Affected
AFS-001 InvestScore (retail suitability scoring) High-Risk (Annex III(5)(b)) Yes
AFS-002 HiringBot (recruitment screening) High-Risk (Annex III(4)) Yes
AFS-003 SentinelAML (AML transaction monitoring) Uncertain — legal review Yes
AFS-004 AtlasChat (customer chatbot) Limited Risk Yes
AFS-005 ReportDrafter (LLM for compliance reports) Minimal Risk No
AFS-006 PortfolioOptimizer (internal quant model) Minimal Risk No

The current date is 1 March 2026. The August 2026 deadline is five months away.

Task A: Build a Compliance Timeline

Create a structured compliance timeline for Atlas Financial Services from 1 March 2026 through 31 August 2026. The timeline should cover:

  • Legal review completion for AFS-003
  • Technical documentation preparation for AFS-001 and AFS-002
  • Risk management system design and implementation (Article 9) for AFS-001 and AFS-002
  • Human oversight framework design and implementation (Article 14) for AFS-001 and AFS-002
  • Data governance review (Article 10) including bias testing for AFS-001 and AFS-002
  • Logging implementation (Article 12) for any high-risk systems lacking it
  • Disclosure language implementation for AFS-004
  • Internal conformity assessment for AFS-001 and AFS-002
  • CE marking and Declaration of Conformity execution
  • EU database registration
  • Post-deadline monitoring plan

For each milestone, estimate the duration (in weeks), identify the responsible function (legal, compliance, model risk, IT, HR, external counsel), and identify any dependencies that affect sequencing.

Task B: Risk Assessment of Timeline Feasibility

Five months is a short runway for two full conformity assessments plus an AML legal review and two high-risk compliance implementations. Identify the three biggest risks to the timeline and propose mitigation strategies for each.

Task C: Budget Estimate

Based on the compliance work described in the timeline, provide an order-of-magnitude budget estimate for the Atlas Financial Services AI Act compliance program. Break down costs by category: external legal counsel, external technical consultants, internal staff time, IT infrastructure changes, and ongoing monitoring costs post-August 2026. Justify your estimates with reference to comparable regulatory implementation programs (GDPR, SR 11-7 model validation, DORA).