Chapter 14: Quiz — Explainable AI (XAI) Techniques

Total questions: 20 Format: 8 Multiple Choice | 5 True/False | 4 Short Answer | 3 Applied Scenario Recommended time: 45-60 minutes

Note: The Answer Key appears at the end of this document.

Part 1: Multiple Choice (2 points each)

Question 1 Which of the following best distinguishes interpretability from explainability?

A) Interpretability applies to simple models; explainability applies only to complex models. B) Interpretability refers to understanding a model's internal logic directly; explainability refers to understanding the cause of a decision, potentially through post-hoc tools applied to an opaque model. C) Interpretability is a property of the model; explainability is a property of the user. D) Interpretability and explainability are synonymous terms used interchangeably in the literature.

Question 2 LIME (Local Interpretable Model-Agnostic Explanations) produces explanations by:

A) Computing the average contribution of each feature across all possible feature orderings, using Shapley values from cooperative game theory. B) Retraining a smaller, simpler version of the original model on the same training data. C) Generating perturbations of a specific input, observing the model's responses, and fitting a simple linear model to approximate local behavior. D) Computing the gradient of the model's output with respect to each input feature at the point of the prediction.

Question 3 Which of the following is a mathematical property guaranteed by SHAP values but NOT by LIME explanations?

A) The explanation is always human-interpretable. B) The feature contributions are always non-negative. C) The feature contributions sum to the difference between the prediction and the baseline. D) The explanation is computed in constant time regardless of model complexity.

Question 4 The Slack et al. (2020) finding that "adversarial classifiers can fool LIME and SHAP" means:

A) LIME and SHAP contain software bugs that cause them to produce random outputs. B) A classifier can be deliberately engineered to produce fair-looking explanations when queried by explanation tools while still discriminating on actual applicant data. C) LIME and SHAP are only reliable when applied to linear models. D) SHAP values become unreliable when the model has more than 100 features.

Question 5 Adebayo et al. (2018) tested gradient-based saliency methods by comparing their outputs on trained models versus randomly initialized models. The finding was that:

A) Randomly initialized models produced higher-quality saliency maps than trained models. B) Many saliency methods produced similar maps regardless of whether the model was trained, suggesting the maps reflected input data structure rather than learned model behavior. C) Saliency maps are always faithful representations of model behavior. D) Gradient-based saliency is more reliable than SHAP for image classification tasks.

Question 6 A counterfactual explanation for a loan denial would most appropriately say:

A) "Your predicted default probability was 0.74, which is above our approval threshold of 0.45." B) "The top features contributing to your denial were: debt-to-income ratio (+0.18 SHAP), prior late payments (+0.11 SHAP)." C) "If your credit score had been 680 instead of 621, and you had no late payments in the past 12 months, your application would have been approved." D) "Our model's AUC-ROC is 0.82 on the validation set, indicating high predictive accuracy."

Question 7 The "Rashomon set" concept, as applied to the argument for interpretable models in high-stakes settings, refers to:

A) The set of all possible explanations for a given prediction, of which only one is correct. B) The collection of models that achieve similar accuracy on a given problem, often including interpretable models, suggesting that accuracy rarely requires opacity. C) The set of regulatory jurisdictions with conflicting explainability requirements. D) Multiple conflicting counterfactual explanations for the same prediction.

Question 8 Which type of SHAP is most appropriate for computing fast, exact SHAP values for a gradient-boosted tree model (such as XGBoost)?

A) KernelSHAP, because it is model-agnostic and works with any architecture. B) DeepSHAP, because gradient-boosted trees use gradient descent during training. C) TreeSHAP, because it exploits the structure of tree-based models to compute exact values efficiently. D) AttentionSHAP, because XGBoost uses attention mechanisms in its boosting rounds.

Part 2: True / False (2 points each)

Question 9 A biased machine learning model that is accurately explained by SHAP is no longer a biased model. The explanation corrects the bias.

True / False

Question 10 LIME is "model-agnostic," meaning it can be applied to explain predictions from any type of model regardless of architecture — random forests, neural networks, linear models, and others.

True / False

Question 11 Attention weights in transformer-based language models are reliable indicators of which input tokens most influenced the model's output, and can be used as straightforward explanations of model reasoning.

True / False

Question 12 The EU AI Act's transparency requirements for high-risk AI systems require that organizations use a specific approved explanation method (either LIME or SHAP) for all automated decisions affecting individuals.

True / False

Question 13 The "faithfulness problem" in XAI refers to the risk that post-hoc explanations may inaccurately represent the model's actual decision logic — that the explanation is itself an approximation that can be wrong.

True / False

Part 3: Short Answer (6 points each)

Question 14 Explain in plain language why the instability of LIME explanations (the fact that two runs of LIME on the same input can produce different explanations) is a governance concern, not just a technical inconvenience. Your answer should reference a specific compliance context.

[4-6 sentences]

Question 15 A health insurance company uses an AI model to determine which policyholders are flagged for wellness program outreach. A data scientist runs a global SHAP analysis and finds that "county_health_index" — a score derived from county-level public health data — is the second most important feature. Describe the steps the data scientist should take to determine whether this feature is functioning as a proxy for race or socioeconomic status, and what the company should do if it finds evidence of proxy discrimination.

[5-8 sentences]

Question 16 The chapter distinguishes three claims that are sometimes conflated: (a) explanation is not justification, (b) explanation is not fairness, and (c) explanation is not accountability. Using a specific, concrete example — not just abstract statements — illustrate the distinction among these three claims. Your example can be drawn from credit lending, healthcare, hiring, or criminal justice.

[6-9 sentences]

Question 17 What is the Explainable Boosting Machine (EBM), and why does it represent a different approach to the accuracy-interpretability tradeoff compared to deploying a black-box model (like XGBoost) with SHAP explanations? What governance advantages does the EBM approach offer?

[4-7 sentences]

Part 4: Applied Scenario (10 points each)

Question 18 — Applied Scenario: The SHAP Waterfall Interpretation

A parole board uses an AI risk assessment tool to inform — not determine — parole decisions. A parole officer is reviewing the SHAP-based risk report for an applicant named James. The report shows the following:

Feature	Value	SHAP Value
age_at_first_arrest	16	+0.14
num_prior_convictions	3	+0.11
years_since_last_offense	4	-0.09
employment_at_release	No	+0.08
education_level	High school	-0.03
social_support_score	42/100	+0.04
zip_code_reentry_risk	78/100	+0.09
substance_use_history	Yes	+0.06

Baseline recidivism probability: 0.38. Final predicted probability: 0.78.

a) What does James's SHAP report tell the parole officer about why the model assessed James as high-risk? Identify the three largest contributors to his above-baseline score. (3 points)

b) "Age at first arrest" has the highest SHAP value (+0.14). James was 16 when he was first arrested. What ethical concerns does the use of this feature raise in a criminal justice risk assessment context? How might this feature encode systemic inequities in policing and prosecution? (4 points)

c) "Zip code reentry risk" has a SHAP value of +0.09. Based on your understanding of proxy variables from this chapter, what follow-up analysis should the parole board require before relying on this feature? What would constitute evidence that this feature is discriminatory? (3 points)

Question 19 — Applied Scenario: Designing for Different Audiences

CompleteCare Health System has deployed an AI model to predict which patients with diabetes are at highest 90-day risk of emergency hospitalization. The model is used to trigger proactive nurse outreach. The model is an XGBoost classifier with approximately 40 input features, trained on 5 years of electronic health record data from CompleteCare's patient population.

The following stakeholders want explanations of the model's behavior:

Dr. Chen, the Chief Medical Officer, who wants to understand the model's overall clinical logic and whether it reflects current evidence-based medicine.
Nurse Rodriguez, who manages the outreach program and needs to understand why individual patients were flagged so she can have informed conversations with them.
Patient Marcus Thompson, who received an outreach call and wants to know why he was identified as high-risk.
The State Health Department, which is auditing the system for potential health equity concerns under state anti-discrimination law.
CompleteCare's Board of Directors, which has approved the AI initiative and has oversight responsibility.

For each of the five stakeholders above:

a) Identify the most appropriate type of explanation (global, local, SHAP, LIME, counterfactual, summary) and explain why it fits this stakeholder's needs. (6 points)

b) Identify one specific governance concern that each stakeholder should raise based on what they learn from their explanation — not just "is the model accurate" but a deeper accountability question. (4 points)

Question 20 — Applied Scenario: The Compliance Challenge

Fairway Mortgage Corporation, a mid-sized mortgage lender, has received a Matters Requiring Attention (MRA) from its bank examiner citing deficiencies in its ML credit model governance. The MRA specifically requires that Fairway: (1) conduct a disparate impact analysis of its current model; (2) validate the faithfulness of its LIME-based adverse action notice system; and (3) develop an ongoing monitoring plan for proxy variable usage.

You are a consultant engaged to design Fairway's remediation plan. Drawing on Chapter 14, design a credible remediation framework that addresses all three requirements.

Your framework should include:

a) The analytical approach for the disparate impact analysis, specifying what data you will need, what SHAP-based analysis you will run, what statistical tests you will use, and what thresholds will constitute a finding of disparate impact. (4 points)

b) Your methodology for testing whether the LIME-based adverse action notices are faithfully representing the model's actual behavior, including at least two specific faithfulness tests you will apply and what results would indicate a failure. (3 points)

c) An ongoing monitoring plan that specifies: (i) the frequency and method of proxy variable analysis; (ii) who receives the results; (iii) what thresholds trigger escalation; and (iv) what remediation process is activated if a proxy is detected. (3 points)

Answer Key

Part 1: Multiple Choice

Question 1 — Answer: B Interpretability refers to directly understanding the model's internal logic (as in a linear regression or decision tree). Explainability refers to understanding the cause of a decision, which can be achieved through post-hoc tools even for opaque models. The distinction is not about model complexity per se (A), not about whose property it is (C), and the terms are not synonymous (D) — though they are frequently conflated.

Question 2 — Answer: C LIME generates perturbations of a specific input, queries the black-box model for predictions on those perturbations, and fits a simple local linear model to approximate the black-box model's behavior in the neighborhood of the instance. Option A describes SHAP. Option B describes model distillation. Option D describes gradient-based saliency methods.

Question 3 — Answer: C SHAP's additivity property guarantees that SHAP values for all features sum to the difference between the model's prediction for the instance and the baseline (average) prediction. LIME does not have this guarantee. Option A is desirable but not unique to SHAP. Option B is incorrect — SHAP values can be negative. Option D is incorrect — SHAP computation time varies by model type.

Question 4 — Answer: B The Slack et al. finding is that classifiers can be deliberately engineered to produce fair-looking explanations to explanation tools (which query out-of-distribution inputs) while still discriminating on real applicant data (in-distribution inputs). The attack is not a software bug (A), does not require linear models (C), and is unrelated to feature count (D).

Question 5 — Answer: B Adebayo et al. found that many gradient-based saliency methods produced similar maps for trained and randomly initialized models, which should not be the case if the maps were reflecting learned model behavior. This suggests many methods reflect input data structure rather than model behavior — a serious faithfulness failure.

Question 6 — Answer: C A counterfactual explanation specifies what would have had to be different about the input for a different outcome. Option A describes a raw model output. Option B describes a SHAP-based feature importance explanation. Option D describes overall model performance — not an explanation of any individual decision.

Question 7 — Answer: B The Rashomon set (named for the film about multiple accounts of the same event) is the collection of models achieving similar accuracy. If interpretable models are often in this set, the argument that accuracy requires opacity is weakened. The Rashomon set is not about explanations (A), regulatory jurisdictions (C), or competing counterfactuals (D).

Question 8 — Answer: C TreeSHAP is designed specifically for tree-based models (random forests, gradient-boosted trees) and computes exact SHAP values in polynomial time by exploiting the tree structure. KernelSHAP (A) is model-agnostic and approximate. DeepSHAP (B) is for deep neural networks. AttentionSHAP (D) does not exist as a standard method.

Part 2: True / False

Question 9 — Answer: FALSE Explanation describes the model's behavior; it does not change it. A biased model with an accurate SHAP explanation is still biased. The explanation creates information that could be used to remediate the bias — but the remediation requires changing the model, the data, or the deployment decision. This is one of the chapter's central points.

Question 10 — Answer: TRUE LIME is explicitly designed to be model-agnostic: it interacts with any model only through the model's prediction function (querying outputs for given inputs), without requiring access to the model's internal architecture or parameters. This means it can explain any model that exposes a prediction function.

Question 11 — Answer: FALSE Jain and Wallace (2019) showed that attention weights are not reliable explanations. Different attention distributions can produce the same model output, and adversarial attention distributions can be constructed that look very different from the original while producing the same prediction. Attention is not explanation.

Question 12 — Answer: FALSE The EU AI Act does not mandate any specific explanation method. It requires that high-risk AI systems be capable of providing certain types of transparency and that documentation be maintained, but it does not specify LIME, SHAP, or any other particular technical method. This is a common misconception about AI regulation.

Question 13 — Answer: TRUE The faithfulness problem is precisely this: post-hoc explanations are approximations of model behavior, not direct readings of model logic, and these approximations can be inaccurate. Adebayo et al. documented this for saliency methods, and the problem applies in varying degrees to LIME and KernelSHAP as well.

Part 3: Short Answer — Model Answers

Question 14 — Model Answer LIME's instability means that running the same explanation algorithm twice on the same loan application can produce different lists of adverse factors. In the context of ECOA and Regulation B compliance, this creates a serious legal problem: if a denied applicant requests their adverse action notice and receives a different list of reasons than was initially generated and recorded, the institution may be unable to defend the consistency of its process. If the same application is reviewed by both the institution and a regulator, they may be looking at different explanations of the same decision, making compliance verification impossible. At a minimum, institutions using LIME for adverse action notices must implement reproducibility controls — fixing random seeds, averaging across multiple runs, or switching to a more stable method such as TreeSHAP — and must log and retain the specific explanation generated at decision time.

Question 15 — Model Answer The first step is to obtain demographic data — either from public sources (ACS census data) or from HMDA data if applicable — that provides demographic composition by county. The data scientist should then compute the Pearson or Spearman correlation between the county_health_index SHAP values across all model predictions and the demographic composition of each applicant's county. A significant positive correlation between high SHAP values (pushing toward non-selection for outreach, or selection for the wrong reasons) and minority population percentage would be evidence of proxy discrimination. The data scientist should also compare denial-equivalent rates across demographic groups after controlling for legitimate health variables. If evidence of proxy discrimination is found, the company should convene a cross-functional team including legal, compliance, and data science to assess the legal exposure (the ACA prohibits certain forms of health-status discrimination; state laws may prohibit race-based disparities in health programs), evaluate alternative features with lower demographic correlation, retrain without the problematic feature if alternatives exist, and document the entire process for regulators.

Question 16 — Model Answer Consider a hiring AI that has been shown through SHAP analysis to rely heavily on the prestige of an applicant's undergraduate institution. The explanation is accurate: the model does use institutional prestige, and SHAP faithfully identifies this. This accurate explanation shows why a specific candidate was rejected — but it does not justify the decision (a). Prestige is a legitimate predictor of some skills, but it also correlates strongly with wealth and race, and using it likely produces disparate impact on candidates from lower socioeconomic backgrounds. Knowing that prestige was the key factor does not make the decision fair (b). And even with a perfect explanation, it remains unclear who is responsible: the data scientist who chose to include the variable? The HR director who approved the model? The vendor who sold it? The board that oversaw the deployment? The explanation identifies the mechanism of potential harm; it does not assign responsibility or provide any remedy to the affected candidate (c). All three failures — unjustified decision, unfair outcome, unclear accountability — coexist with an accurate explanation.

Question 17 — Model Answer The Explainable Boosting Machine (EBM), developed by Microsoft Research, extends Generalized Additive Models (GAMs) using gradient boosting to learn complex, nonlinear shapes for each feature's contribution, while maintaining the additive structure that makes each feature's effect independently visualizable and interpretable. Unlike deploying XGBoost with SHAP explanations, the EBM's interpretability is built in — the feature shapes learned during training are themselves the explanation, rather than a post-hoc approximation of a black-box model's behavior. This means the EBM explanation is faithful by construction: there is no approximation error, no instability, and no risk of adversarial manipulation of a separate explanation layer. From a governance perspective, the EBM approach eliminates the faithfulness problem for the features it can represent, provides more reliable documentation for model validation and regulatory examination, and does not require a separate explanation infrastructure — the model's own structure is the documentation.

Part 4: Applied Scenario — Grading Guidance

Question 18 — Key Points

Part a (3 points): The three largest positive SHAP contributors are: age_at_first_arrest (+0.14), num_prior_convictions (+0.11), and zip_code_reentry_risk (+0.09). Together these push James's probability from 0.38 to approximately 0.78 (the negative SHAP values partially offset). Full credit requires correct identification and the observation that baseline + all SHAPs should sum to 0.78.

Part b (4 points): Age at first arrest encodes when law enforcement first intersected with James's life — which is heavily influenced by policing intensity in the neighborhood where he grew up. Research consistently shows that low-income, minority neighborhoods are more heavily policed, resulting in higher arrest rates not necessarily reflecting higher criminal behavior. A high SHAP value for this feature systematically disadvantages individuals from over-policed communities, potentially tracking race and socioeconomic status. Using an arrest age (not even a conviction) as a high-weight feature in a recidivism tool raises due process concerns and may violate equal protection principles depending on jurisdiction.

Part c (3 points): Follow-up analysis should match zip codes to demographic data (census) and test whether high zip_code_reentry_risk values correlate with minority population concentration. If yes, the feature may be a proxy for race or national origin. Evidence of discrimination would include: statistically significant correlation between SHAP values for this feature and demographic composition; higher recidivism predictions for similarly situated individuals from majority-minority zip codes after controlling for individual criminal history features; comparison of feature construction methodology to identify if it encodes housing segregation patterns.

Question 19 — Key Points

Award points for matching explanation types to stakeholder needs with clear reasoning, and for substantive governance questions (not just "is the model accurate?"):

Dr. Chen: Global SHAP beeswarm or bar plot to see overall feature importance and clinical logic. Governance question: Do the top features reflect evidence-based clinical predictors, or are they encoding demographic shortcuts?
Nurse Rodriguez: Local SHAP waterfall or counterfactual for each flagged patient, translated into clinical language. Governance question: Is the reason for flagging clinically actionable and something she can discuss with the patient?
Marcus Thompson: Plain-English local explanation, ideally counterfactual (what could he do to reduce his risk?). Governance question: Can Marcus meaningfully understand and challenge this decision? Does he have a right to opt out of outreach?
State Health Department: Aggregate SHAP audit cross-referenced with demographic data, plus disparate impact analysis. Governance question: Are minority patients systematically being over- or under-identified relative to their actual risk?
Board of Directors: Model card summary with performance metrics, known limitations, fairness metrics, and escalation protocols. Governance question: What is the liability exposure if the model produces health equity disparities, and what oversight do directors have?

Question 20 — Key Points

Part a (4 points): Disparate impact analysis should use: HMDA data (if available) and census data to obtain demographic characteristics; TreeSHAP run on full application population for the relevant time period; correlation analysis between SHAP values for each feature and demographic variables; statistical tests (chi-square, logistic regression controlling for creditworthiness factors) to compare approval rates across protected class groups; four-fifths rule application. A finding is triggered if approval rates for a protected class are less than 80% of the majority group rate after controlling for legitimate creditworthiness factors.

Part b (3 points): Faithfulness tests should include: (1) SHAP consistency check — verify that adverse action notices list the same features with consistent directionality as the SHAP waterfall for a sample of denials; (2) Feature perturbation test — for a sample, manually change the top LIME-cited adverse factor and verify that the model prediction changes in the expected direction; (3) Randomization check — verify that LIME explanations change substantially when the model is retrained on randomized labels (following Adebayo et al.). Failure is indicated by consistent notices that do not match SHAP values, or by LIME explanations that are insensitive to model changes.

Part c (3 points): Monitoring plan should specify: quarterly SHAP audit with automated alerts for features exceeding SHAP importance thresholds; demographic correlation analysis run with each quarterly audit; results reported to model risk committee and fair lending compliance officer; escalation triggered if demographic correlation r > 0.3 or denial rate disparity exceeds 1.25x; remediation process includes data science review within 30 days, outside counsel engagement within 60 days, and voluntary regulatory disclosure if disparate impact is confirmed.