Chapter 35: Quiz

Test your understanding of interpretability and explainability at scale. Answers follow each question.


Question 1

Distinguish interpretability from explainability. Give one example of an interpretable model and one example of a post-hoc explanation method for a non-interpretable model.

Answer **Interpretability** is a property of the model itself: a model is interpretable if a human can understand the entire mapping from inputs to outputs — not an approximation, but the actual function. **Explainability** is a property of the explanation method: a model is explainable if we can produce a post-hoc explanation — a secondary, simplified account of why the model made a prediction. An **interpretable model** example: logistic regression with 10 features, where each coefficient has a precise meaning and the prediction is a weighted sum. A **post-hoc explanation method** example: SHAP values applied to a gradient-boosted tree ensemble with 500 trees, producing feature attributions that approximate the model's behavior without requiring the user to understand the full ensemble.

Question 2

State the four axioms that uniquely characterize the Shapley value. For each axiom, explain its practical significance in the context of feature attribution.

Answer 1. **Efficiency:** The Shapley values sum to $v(N) - v(\emptyset)$ — the total prediction minus the baseline. Practically, this means every unit of prediction difference is attributed to some feature; nothing is lost or created. The explanation accounts for the full prediction. 2. **Symmetry:** If two features contribute equally to every coalition, they receive equal Shapley values. Practically, features that behave identically in the model are treated identically in the explanation — no arbitrary bias toward one feature over another. 3. **Dummy:** If a feature contributes nothing to any coalition, its Shapley value is zero. Practically, irrelevant features get zero attribution — the explanation does not blame features the model ignores. 4. **Linearity (Additivity):** If the value function is a sum of two games, the Shapley values of the sum are the sums of the Shapley values. Practically, this enables decomposing complex models into simpler components and combining their attributions consistently. No other attribution method satisfies all four axioms simultaneously.

Question 3

Why is computing exact Shapley values intractable for models with many features? How does TreeSHAP solve this problem for tree-based models?

Answer The Shapley value formula requires evaluating $2^p$ coalitions (subsets of features) for each feature, where $p$ is the number of features. For a model with 200 features, this requires $2^{200}$ evaluations — computationally intractable. **TreeSHAP** solves this by exploiting the tree's internal structure. Instead of treating the model as a black box, TreeSHAP recursively propagates instance weights through each tree, using the tree's splits to efficiently compute the conditional expectation $E[f(x) \mid x_S]$ for every feature subset simultaneously. Features not in the current branch's path are marginalized by following both child nodes weighted by the training data fraction at each split. This reduces the computation to $O(TLD^2)$ where $T$ is the number of trees, $L$ is the maximum number of leaves, and $D$ is the maximum depth — milliseconds per instance instead of years.

Question 4

Explain the difference between TreeSHAP's tree-path-dependent and interventional modes. When would you prefer each?

Answer **Tree-path-dependent** (the default): marginalizes absent features by following the tree's internal data distribution. When a feature is absent from the coalition, the algorithm follows both children at any split on that feature, weighted by the fraction of training data going to each child. This respects the correlations in the training data and reflects the model's actual behavior. **Interventional**: replaces absent features with values sampled from a reference distribution, breaking correlations. This answers a different question: "What would the prediction be if we *intervened* to set the feature to a random value?" **Prefer tree-path-dependent** for regulatory audits where you need to identify which features the model actually uses — it reflects the model's learned behavior. **Prefer interventional** for causal analysis where you want to know the effect of independently changing each feature — it aligns with the do-calculus from causal inference. For correlated features (e.g., income and credit limit), the two modes produce different attributions: tree-path-dependent concentrates attribution on the first feature encountered in the tree path, while interventional distributes it more evenly.

Question 5

What is KernelSHAP, and why is it not suitable for real-time production serving?

Answer **KernelSHAP** is a model-agnostic method that approximates Shapley values by formulating the computation as a weighted linear regression. It samples coalition vectors (binary masks indicating which features are "present"), evaluates the model on perturbed inputs for each coalition, and fits a weighted regression using the SHAP kernel to recover approximate Shapley values. It works with any model as a black box. It is **not suitable for real-time serving** because each instance requires $O(n_{\text{samples}} \times n_{\text{background}})$ model evaluations. For 2,048 coalition samples, 100 background instances, and a model with 100ms inference time, one explanation takes approximately 204,800 evaluations — roughly 5.7 hours. Even with faster models, KernelSHAP is orders of magnitude slower than TreeSHAP (1-10ms) or DeepSHAP (10-100ms). It is appropriate for offline auditing and research, not serving-time explanations.

Question 6

Describe two documented limitations of LIME. How does SHAP address each limitation?

Answer **Limitation 1: Instability.** Alvarez-Melis and Jaakkola (2018) showed that running LIME twice on the same instance with different random seeds can produce substantially different explanations. The random perturbation sampling causes the LASSO fit to vary across runs. **SHAP addresses this** because exact SHAP methods (TreeSHAP) are deterministic — the same input always produces the same explanation. Sampling-based SHAP methods (KernelSHAP) converge to a unique solution as the number of samples increases. **Limitation 2: No principled kernel width selection.** LIME's proximity kernel width — controlling how "local" the neighborhood is — has no principled selection criterion. A narrow kernel may not sample enough of the decision boundary; a wide kernel produces less locally faithful explanations. The default works in many cases but can fail silently. **SHAP addresses this** through the SHAP kernel, which has a principled derivation from the Shapley value axioms. The SHAP kernel weights are uniquely determined by the Shapley formula — there is no free parameter analogous to LIME's kernel width.

Question 7

Explain the difference between Partial Dependence Plots (PDP) and Accumulated Local Effects (ALE). When does PDP produce misleading results?

Answer **PDP** shows the marginal effect of a feature by averaging the model's prediction over all values of the other features in the dataset: for each value of the feature of interest, replace that feature for every training instance and average the predictions. **ALE** shows the local effect by computing how the prediction changes as the feature moves within small intervals, using the actual conditional distribution of other features within each interval. **PDP is misleading when features are correlated.** PDP averages over all possible values of other features, including combinations that never occur in the data. For example, if income and credit limit are highly correlated, the PDP for income at $200,000 averages over all credit limits — including $2,000, which no $200,000-income applicant would have. This produces predictions based on unrealistic feature combinations that distort the plot. ALE avoids this extrapolation by computing only local changes within conditional bands, so it never evaluates the model on out-of-distribution feature combinations.

Question 8

Describe the integrated gradients method. State its two key axioms and explain why vanilla gradient saliency fails to satisfy them.

Answer **Integrated gradients** computes feature attribution by accumulating gradients along a straight-line path from a baseline $x'$ to the actual input $x$: $\text{IG}_j(x) = (x_j - x_j') \times \int_0^1 \frac{\partial f(x' + \alpha(x - x'))}{\partial x_j} d\alpha$. The two key axioms are: (1) **Completeness** — the attributions sum to $f(x) - f(x')$. Like Shapley's efficiency axiom, every unit of prediction difference is accounted for. (2) **Sensitivity** — if changing feature $j$ from the baseline value to the input value changes the output, feature $j$ receives nonzero attribution. A feature that matters always gets credit. **Vanilla gradients fail both:** they compute only the local gradient $\frac{\partial f}{\partial x_j}$ at the input $x$. (1) The gradient magnitudes do not generally sum to $f(x) - f(x')$ — there is no completeness guarantee. (2) In the flat region of a ReLU or sigmoid, the gradient is near zero even if the feature was critical to reaching that region — violating sensitivity (the "gradient saturation" problem).

Question 9

What is TCAV? How does it produce concept-level explanations rather than feature-level explanations?

Answer **TCAV** (Testing with Concept Activation Vectors, Kim et al., 2018) quantifies the importance of a human-defined *concept* — such as "striped texture," "fibrotic tissue," or "inflammation pattern" — to a model's predictions. The method: (1) Collect positive examples (containing the concept) and negative examples (random). (2) Train a linear classifier in the activation space of an internal layer to separate them; the decision boundary normal is the **Concept Activation Vector (CAV)**. (3) Compute the directional derivative of the model's output with respect to the CAV direction for target-class inputs. (4) Report the fraction of target-class inputs with positive conceptual sensitivity (the TCAV score). TCAV produces concept-level rather than feature-level explanations because the CAV represents a **high-level semantic direction** in activation space, not an individual input feature. "Striped texture" is a concept that emerges from many pixels in many configurations — no single pixel or feature captures it. By probing the model's internal representations at an intermediate layer, TCAV measures whether the model's decision is sensitive to a concept that a domain expert recognizes, regardless of how that concept is encoded at the feature level.

Question 10

What is a concept bottleneck model? What are its advantages and disadvantages compared to post-hoc concept explanations like TCAV?

Answer A **concept bottleneck model** (CBM, Koh et al., 2020) has two stages: (1) a concept predictor that maps raw inputs to a vector of human-defined concept activations ($c = g(x)$), and (2) a task predictor that maps concept activations to the final prediction ($\hat{y} = h(c)$). The prediction must pass through the concept bottleneck, so the model's reasoning is inherently constrained to operate through interpretable concepts. **Advantages over TCAV:** (1) Genuinely interpretable — the model *is* constrained to use concepts, not just tested against them. (2) Supports concept intervention — a domain expert can override a concept value and get a revised prediction. (3) The task predictor is typically linear, so concept contributions are transparent. **Disadvantages:** (1) Accuracy loss — the bottleneck prevents the model from using features that don't correspond to defined concepts. (2) Requires concept annotations during training, which is expensive. (3) The set of concepts may be incomplete, missing important predictive signals. (4) TCAV can analyze any existing model; CBMs must be designed and trained as CBMs from the start.

Question 11

What is a counterfactual explanation? List three practical constraints that counterfactual generation must satisfy.

Answer A **counterfactual explanation** answers: "What is the smallest change to the input that would change the model's decision?" Rather than explaining why a prediction was made (attribution), counterfactuals explain what would need to be different for the outcome to change. For a denied credit applicant, a counterfactual might say: "If your debt-to-income ratio were 32% instead of 48% and your credit history were 5 years instead of 2 years, the application would have been approved." Three practical constraints: (1) **Immutability** — some features cannot be changed (age, race, sex). The optimization must hold these fixed. (2) **Actionability** — the changes should represent actions the person can actually take. "Increase your age by 10 years" is mathematically valid but useless. (3) **Plausibility** — the counterfactual should lie within the data manifold. A counterfactual with $200,000 income and $500 credit limit may flip the prediction but represents an impossible person. Other important constraints include causal consistency (changes should respect causal relationships between features) and sparsity (humans prefer explanations with few changes).

Question 12

Why did Jain and Wallace (2019) argue that "attention is not explanation"? Under what circumstances is attention visualization still useful?

Answer Jain and Wallace showed that: (1) Attention weights do not correlate reliably with gradient-based feature importance — the most-attended tokens are not necessarily the most influential. (2) **Alternative attention distributions** can produce identical predictions — adversarial attention patterns, dramatically different from the learned attention, yield nearly identical outputs because the value vectors compensate. (3) Multi-head attention distributes information across heads, so any single head's pattern is partial and potentially misleading. **Attention is still useful** as a *communication tool* for non-technical audiences (users understand "the model focused on X" more easily than SHAP values), as a *debugging tool* for model developers (to check whether the model is attending to expected positions), and as a *weak explanation* that provides meaningful but imprecise information about model behavior. It should **not** be used as the sole basis for regulatory compliance, fairness audits, or any context requiring faithful causal attribution.

Question 13

Under ECOA Regulation B, what must a creditor provide when denying a credit application? How does SHAP help satisfy this requirement?

Answer ECOA Regulation B requires a "statement of specific reasons for the action taken" — the reasons must be **specific** (identify particular contributing factors, not generic statements like "did not meet our criteria"), **accurate** (reflect the factors that actually influenced the decision), and **actionable** (where possible, indicate what the applicant could change). Typically 2-4 principal reasons are provided. **SHAP helps** by producing feature attributions that are: (1) specific — each SHAP value identifies a particular feature and its quantitative contribution, (2) ranked — the top-$k$ features by absolute SHAP value are the "principal reasons," (3) directional — positive SHAP values indicate features pushing toward denial, (4) theoretically grounded — Shapley values are the unique attribution satisfying the efficiency, symmetry, dummy, and linearity axioms. TreeSHAP computes these in milliseconds, enabling real-time generation of adverse action notices. The CFPB has indicated that SHAP-based explanations are consistent with Regulation B requirements, though the specific implementation must accurately reflect the model's behavior.

Question 14

What does GDPR Article 22 require regarding automated decision-making? How does Recital 71 extend this requirement?

Answer **Article 22** gives individuals the right not to be subject to decisions based solely on automated processing that produce legal effects or similarly significantly affect them. When automated processing is permitted (under explicit consent, contractual necessity, or member state law), Article 22(3) requires "suitable measures to safeguard the data subject's rights and freedoms and legitimate interests, at least the right to obtain human intervention, to express his or her point of view, and to contest the decision." **Recital 71** (interpretive, not legally binding) extends this by stating the data subject should have the right "to obtain an explanation of the decision reached after such assessment." The scope of this "right to explanation" is debated among legal scholars. The Article 29 Working Party (now EDPB) interpreted it to require "meaningful information about the logic involved" — most practitioners interpret this as requiring some form of feature importance or factor-based explanation. The EU AI Act (2024) adds transparency and logging requirements for high-risk AI systems (Annex III) that further strengthen the de facto need for explanation infrastructure.

Question 15

Describe three honest limitations of post-hoc explanation methods.

Answer (1) **Post-hoc explanations are not the model.** Every post-hoc method (SHAP, LIME, integrated gradients) is a separate model that approximates the original model's behavior. The explanation is faithful only to the degree that the approximation is good. For TreeSHAP the approximation is exact; for LIME and KernelSHAP it can be imperfect, especially in regions of high nonlinearity or out-of-distribution inputs. (2) **Explanations can be manipulated.** Slack et al. (2020) demonstrated that adversarial "scaffolded models" can produce any desired SHAP or LIME explanation while using different features for actual prediction. A model could attribute its decision to "credit history" while actually relying on "zip code" (a proxy for race). (3) **Explanations do not explain causation.** A SHAP value tells which features the model *used* — not which features *caused* the outcome. High SHAP importance for "credit inquiries" means the model's prediction would differ if inquiries were different — not that inquiries caused the default. Confusing attribution with causation leads to incorrect actionable advice.

Question 16

Design the key components of an explanation audit trail for a regulated lending system. What must the audit log contain, and how long must it be retained?

Answer The audit log must contain, for every prediction that results in an adverse action: (1) the **request ID** and **timestamp**, (2) the **model ID and version** (with a hash of the model artifact for integrity verification), (3) a **hash of the input features** (not raw PII, for privacy), (4) the **prediction** and **prediction label**, (5) the **explanation method** used, (6) the **explanation hash** for integrity verification, (7) the **top contributing factors** with their attributions, (8) any **counterfactual summary**, (9) the **audience** for whom the explanation was generated, (10) the **computation time**. The log must be **immutable** (append-only with tamper-evident hashing — each entry includes a hash of the previous entry), **complete** (every adverse action has a corresponding entry), and **reproducible** (the same input, model version, and method must produce the same explanation). **Retention:** ECOA requires 25 months for adverse action records. GDPR records should be retained for the processing duration plus the limitation period. EU AI Act requires logs throughout the AI system's lifecycle. Internal policy should specify by jurisdiction; a common default is 7 years for financial records.

Question 17

Compare the computational cost, faithfulness, and production viability of TreeSHAP, DeepSHAP, KernelSHAP, and LIME.

Answer | Method | Cost per Instance | Faithfulness | Production Viable? | |--------|-------------------|-------------|-------------------| | **TreeSHAP** | 1-10 ms | Exact (tree-path-dependent mode) — provably correct Shapley values for tree models | Yes — fast, deterministic, suitable for real-time serving | | **DeepSHAP** | 10-100 ms | Approximate — exact for ReLU networks, introduces error for sigmoid/tanh/GELU activations via layer-by-layer decomposition | Yes with batching — fast enough for near-real-time | | **KernelSHAP** | Minutes to hours | Approximate — converges to exact Shapley values in the limit but requires many samples; accuracy depends on sample count and background size | No — too slow for serving; suitable for offline audit only | | **LIME** | Seconds | Not guaranteed — unstable across runs (different random seeds produce different explanations); no convergence guarantee; kernel width has no principled selection | No — instability makes it unsuitable for regulatory-grade explanations | For production deployment: TreeSHAP for tree models, DeepSHAP/integrated gradients for neural networks, KernelSHAP for auditing any model offline.

Question 18

How does the natural language explanation for an "applicant" audience differ from one for a "regulator" audience? Why is this distinction important?

Answer **Applicant audience:** Receives the top 4 factors in plain English ("Your debt-to-income ratio contributed to this decision"), without raw SHAP values, model version numbers, or technical terminology. The explanation is designed to be understandable by a non-technical person and actionable (indicating what they could change). It includes a notice of the right to request additional information. **Regulator audience:** Receives the full technical explanation: model ID and version, SHAP variant used, exact prediction value, complete SHAP attribution for all features with numerical values, statistical context, and links to model documentation. The explanation is designed for a technically sophisticated reader performing an audit or compliance review. This distinction is important because *different audiences need different information to make different decisions.* An applicant needs to understand what to change; a regulator needs to verify that the model complies with ECOA. Providing raw SHAP values to an applicant would be confusing and potentially misleading; providing only plain-language summaries to a regulator would be insufficient for compliance verification. The explanation infrastructure must support multiple formats from the same underlying attribution.

Question 19

What is the StreamRec progressive project milestone (M16) for this chapter? Describe the four deliverables.

Answer M16 adds explanation infrastructure to the StreamRec system. The four deliverables are: 1. **Global SHAP importance for the ranking model:** Compute TreeSHAP values for the gradient-boosted re-ranking model on 10,000 user-item pairs. Produce a global summary plot showing the top 20 features. Identify the three features that most differentiate high- from low-engagement predictions. 2. **Attention visualization for the session transformer:** Extract attention weights for 100 user sessions from the transformer session model. Visualize which historical items the model attends to when scoring the top recommendation. Identify patterns (recency, similarity, category). 3. **Natural language explanation generation:** Build a template-based NL generator that translates SHAP values and attention patterns into user-facing text with three templates: watch-history-based ("Because you watched X and Y"), category-based ("Popular in [category] with viewers like you"), and trending-based ("Trending in [region]"). 4. **Audit logging:** Integrate the `ExplanationAuditLogger` into the serving pipeline. Every shown recommendation gets an audit entry with user ID hash, item ID, model version, explanation method, top-3 factors, and timestamp. The log is queryable by user, item, model version, and time range.

Question 20

Cynthia Rudin argues that interpretable models should always be preferred over post-hoc explanations of black-box models for high-stakes decisions. What is the strongest argument for her position, and what is the strongest argument against it?

Answer **Strongest argument for:** Post-hoc explanations can be *unfaithful* — they present a simplified story that does not accurately reflect the model's actual reasoning. A SHAP explanation attributing 40% of a credit denial to "debt-to-income ratio" might mask a nonlinear interaction with zip code that is the actual marginal driver. Slack et al. (2020) showed that adversarial models can produce any desired explanation while using different features for prediction. With an inherently interpretable model (logistic regression, decision tree, scoring card), the explanation IS the model — there is no gap between what the model does and what the explanation says. For decisions with legal consequences (credit, sentencing, healthcare), this gap is unacceptable. **Strongest argument against:** Interpretable models often have lower predictive accuracy, and lower accuracy has real costs. A credit scoring model that is 2 AUC points worse means more qualified applicants denied credit and more unqualified applicants granted credit (increasing defaults). A clinical model that is less accurate misses treatable conditions. When the performance gap is material — when it affects real outcomes for real people — using a less accurate model to gain interpretability may cause more harm than the risk of unfaithful explanations. The decision depends on the specific performance gap, the stakes of the decision, and the quality of available explanation methods.