Chapter 26: Quiz — Explainable AI (XAI) and Model Governance
Instructions: Choose the best answer for each multiple-choice question. Short-answer questions require written responses of 2–4 sentences. Review the answer key at the end only after completing the quiz independently.
Multiple Choice
1. Federal Reserve Supervisory Guidance SR 11-7 identifies three pillars of model risk management. Which of the following correctly lists all three pillars?
A) Data governance, independent validation, and retirement documentation B) Conceptual soundness, ongoing monitoring, and outcomes analysis C) Risk tiering, PSI monitoring, and explainability tooling D) Model inventory, validation report, and fairness testing
2. A financial institution's model governance team calculates a Population Stability Index of 0.31 for a consumer credit scorecard that has been in production for 14 months. According to standard PSI thresholds, the correct response is:
A) Continue monitoring — PSI below 0.35 is within acceptable limits B) Increase reporting frequency to weekly but take no other action C) Treat this as a critical breach — suspend the model from high-stakes decisions and initiate a retraining investigation D) Recalibrate the model's threshold upward by 0.10 to compensate for drift
3. GDPR Article 22 is relevant to financial services model governance because it:
A) Requires all AI systems in financial services to be registered with the European Banking Authority B) Creates a right not to be subject to solely automated decisions with significant effects, and a right to a meaningful explanation of such decisions C) Prohibits the use of machine learning models in credit decisions within the European Union D) Mandates that all models producing consumer-facing outputs undergo third-party CE marking assessment
4. Which of the following best describes the theoretical foundation of SHAP (SHapley Additive exPlanations)?
A) SHAP trains a local linear regression model around each prediction instance and uses the linear coefficients as explanations B) SHAP uses gradient information from the model's loss function to attribute importance to input features C) SHAP applies Shapley values from cooperative game theory, computing each feature's average marginal contribution to the prediction across all possible feature orderings D) SHAP generates counterfactual instances and measures feature sensitivity by comparing predicted outputs for the original and counterfactual instances
5. Under the Equal Credit Opportunity Act and its implementing Regulation B, when a creditor takes an adverse action on a credit application, it must:
A) Provide a general statement that the application did not meet the institution's credit standards B) Provide specific reasons for the adverse action that reflect the actual factors driving the decision C) Offer the applicant the opportunity to reapply with additional documentation within 30 days D) Submit the declination and its reasons to the Consumer Financial Protection Bureau within 90 days
6. A data science team is choosing between SHAP and LIME to generate explanations for a credit model. The primary regulator has asked the firm to provide per-decision adverse action reasons that can withstand legal challenge. The most appropriate choice is:
A) LIME, because it is model-agnostic and requires no access to model internals B) LIME, because it is faster and more computationally efficient for production use C) SHAP, because it satisfies formal theoretical axioms and produces stable, exact attributions suitable for regulatory documentation D) Either LIME or SHAP — both are equivalent in stability and theoretical rigor
7. Under the EU AI Act (2024), which of the following financial services AI applications is most likely to be classified as a high-risk AI system under Annex III?
A) A chatbot that answers customer questions about branch opening hours B) A data pipeline that aggregates transaction records for management reporting C) A credit scoring system that determines creditworthiness for retail loan applicants D) A tool that automatically formats and sends monthly account statements
8. A fairness researcher proposes that a credit scoring model should simultaneously satisfy demographic parity (equal approval rates across groups) and equalized odds (equal true positive and false positive rates across groups). A model governance expert's most accurate response is:
A) These metrics are always simultaneously achievable with sufficient training data B) These metrics can be simultaneously achieved only if the model's AUC exceeds 0.85 C) These metrics are mathematically incompatible when base rates differ across groups, which is nearly always the case in practice D) These metrics are simultaneously achievable by adjusting the decision threshold independently for each group
9. A model that has been in production for 30 months shows sustained AUC degradation from 0.83 at validation to 0.71 currently, and the business purpose for which it was originally developed has been substantially modified. According to model governance best practice, the appropriate action is:
A) Recalibrate the model's coefficients and continue using it in production B) Retire the model, document the retirement trigger, identify a replacement, and establish a transition plan C) Submit the model for enhanced monitoring with weekly PSI reporting until performance recovers D) Continue using the model until the next scheduled annual review
10. The "shadow model" problem in model inventory management refers to:
A) A model that produces predictions inconsistent with its validation results B) A vendor-supplied model that the firm uses without having access to its methodology C) Models that are in production and informing decisions but have never been registered in the model inventory D) A backup version of a model that is maintained but not actively used
11. Which of the following describes the primary advantage of counterfactual explanations over SHAP waterfall plots for adverse action communication?
A) Counterfactual explanations are more computationally efficient than SHAP B) Counterfactual explanations tell the applicant what would need to change for a different outcome, providing actionable information rather than retrospective attribution C) Counterfactual explanations satisfy SHAP's formal theoretical axioms, whereas SHAP waterfall plots do not D) Counterfactual explanations are model-agnostic, whereas SHAP requires access to model internals
12. Under SR 11-7, the requirement for "independent" model validation means:
A) The validation must be performed by an external consultancy rather than by internal staff B) The validation team must be genuinely separate from the development team, with no shared reporting lines or project incentives C) The validation must be completed without access to the model's training data D) The validation report must be approved by the firm's external auditor before the model can enter production
13. A partial dependence plot (PDP) for a credit model shows that predicted approval probability decreases as a borrower's income increases above £80,000. A model validator reviewing this PDP should:
A) Accept this finding as confirmation that high-income borrowers are a known credit risk B) Flag this as a potential anomaly inconsistent with economic theory and investigate whether the model has learned a spurious relationship in the training data C) Adjust the model's income feature to cap at £80,000 before the validation is complete D) Report this to the FCA as evidence of discriminatory lending against high-income borrowers
14. Which of the following is NOT a stated requirement for high-risk AI systems under the EU AI Act?
A) A risk management system that identifies and mitigates foreseeable risks throughout the model lifecycle B) Technical documentation sufficient for a competent national authority to verify compliance C) Replacement of human decision-makers with AI for all routine financial decisions D) Human oversight measures enabling persons responsible for oversight to monitor for anomalies and intervene
Short Answer
15. Explain in your own words why the black box problem exists — why the machine learning models with the best predictive performance tend to be the hardest to interpret. How does XAI address this tension without sacrificing model performance?
16. A compliance officer at a UK bank says: "We don't need to worry about the EU AI Act — we're based in London." Is this statement accurate? What UK-specific regulatory considerations are relevant to model governance and explainability for a UK-based financial institution?
Answer Key
1. B — SR 11-7's three pillars are conceptual soundness, ongoing monitoring, and outcomes analysis. The other options include elements of good governance but do not represent SR 11-7's three-pillar framework.
2. C — PSI above 0.25 is a critical breach requiring immediate action: suspension from high-stakes decisions and initiation of a retraining investigation. There is no acceptable practice of simply adjusting the threshold to compensate for population drift; the model's learned relationships may no longer apply to the current population.
3. B — Article 22 of the GDPR creates the right not to be subject to solely automated decisions with legal or similarly significant effects, and the right to obtain human intervention and to challenge the decision. It does not prohibit automated decisions but requires that meaningful explanations be available.
4. C — SHAP applies Shapley values from cooperative game theory. Option A describes LIME. Option B describes gradient-based attribution methods. Option D describes counterfactual explanation methods.
5. B — Regulation B requires specific, meaningful reasons that reflect the actual factors driving the adverse decision. Generic statements are insufficient. The other options describe practices that are not required by Regulation B.
6. C — SHAP is the appropriate choice for regulatory documentation due to its theoretical stability and exact attribution properties. LIME's instability — where different runs can produce different explanations — makes it unsuitable for documents that may be reviewed by regulators or challenged in legal proceedings.
7. C — A credit scoring system determining creditworthiness for retail loan applicants is explicitly listed in Annex III of the EU AI Act as a high-risk AI application. The other options describe systems that do not involve consequential determinations about individuals' access to financial resources.
8. C — This is a well-established mathematical result in the algorithmic fairness literature. Demographic parity and equalized odds are mutually incompatible when base rates differ across groups, which is nearly always the case in real-world applications. The incompatibility is a mathematical constraint, not a limitation of current techniques.
9. B — Sustained performance degradation below acceptable thresholds, combined with a material change in business purpose, are both named triggers for model retirement under governance best practice. Retirement requires documentation, a replacement plan, and a transition plan.
10. C — Shadow models are models that are in production and influencing decisions but have never been registered in the firm's model inventory. They cannot be subject to validation, monitoring, or review if the governance function does not know they exist.
11. B — Counterfactual explanations are actionable: they tell the applicant what would need to change (lower debt-to-income ratio, longer account age) for the outcome to be different. SHAP waterfall plots tell the applicant what features drove the decision, which is informative but not necessarily actionable. Note that counterfactuals are not more computationally efficient and do not satisfy SHAP's axioms.
12. B — Independence under SR 11-7 means genuine organizational separation from the development team — separate reporting lines, no shared project accountability, and professional obligation to find problems rather than to approve. External consultancy is not required; internal teams with genuine independence satisfy the requirement.
13. B — A PDP showing that predicted approval probability falls as income rises above a threshold is anomalous and inconsistent with basic credit economics. This should be flagged as a potential spurious correlation learned from training data — perhaps income correlates with some other variable in the training dataset that is actually predictive of default, and the model has captured the spurious relationship rather than the genuine economic one.
14. C — The EU AI Act requires human oversight measures but explicitly does not require replacement of human decision-makers. Indeed, the human oversight requirement is designed to ensure that AI supports rather than supplants human judgment in high-risk contexts. All other options accurately describe EU AI Act requirements for high-risk systems.
15. Model responses should explain that complex models such as gradient-boosted trees and neural networks achieve high predictive performance by learning high-order, non-linear interactions among many features — the very complexity that makes them powerful is what makes them opaque. A logistic regression's interpretability comes from its simplicity, which also limits its ability to capture complex patterns. XAI methods like SHAP do not simplify the model; they provide mathematically principled approximations of the model's behavior for specific instances or across the population, allowing human interpretation without requiring model simplification.
16. The statement is not fully accurate. While the EU AI Act does not have direct legal force in the UK following Brexit, a UK financial institution that provides services to EU customers or that operates in EU markets may be subject to the AI Act's requirements for systems affecting EU persons. More significantly, the FCA's Consumer Duty, ICO guidance on automated decision-making (derived from UK GDPR Article 22), and growing FCA supervisory expectations for ML model explainability create a substantial domestic regulatory framework for model governance and explainability even without a standalone UK AI Act.