Chapter 35: Key Takeaways

  1. Interpretability and explainability are different things — and the difference matters for trust. Interpretability is a property of the model: a model is interpretable if a human can understand the entire mapping from inputs to outputs (linear regression, decision trees, scoring cards). Explainability is a property of the explanation method: post-hoc methods (SHAP, LIME, integrated gradients) produce approximate accounts of why a model made a prediction. Interpretable models guarantee faithful explanations by construction — the explanation IS the model. Post-hoc explanations approximate the model and can be unfaithful, especially in regions of high nonlinearity, feature interaction, or adversarial manipulation. For high-stakes decisions where the performance gap between interpretable and complex models is small, prefer interpretable models. When the gap is material, use complex models with rigorous post-hoc explanations — but never deploy without one or the other.

  2. SHAP derives from cooperative game theory and is the unique attribution satisfying efficiency, symmetry, dummy, and linearity. The Shapley value is not one method among many — it is the only attribution that satisfies all four fairness axioms simultaneously. This theoretical foundation makes SHAP the principled default for feature attribution. The practical question is computational: exact Shapley values require $2^p$ evaluations (intractable for production models). TreeSHAP solves this for tree ensembles in $O(TLD^2)$ time (milliseconds per instance). DeepSHAP approximates for neural networks via layer-wise backpropagation. KernelSHAP handles any model via weighted regression but is too slow for serving (minutes to hours per instance). Choose the SHAP variant that matches your model type: TreeSHAP for trees, DeepSHAP or GradientSHAP for neural networks, KernelSHAP for offline audit only.

  3. LIME is unstable, PDP extrapolates with correlated features, and attention is not faithful attribution — know the limitations of every method. LIME produces different explanations on different runs of the same instance, making it unsuitable for regulatory-grade explanations. PDP averages over unrealistic feature combinations when features are correlated; use ALE instead. Attention weights do not correlate reliably with gradient-based importance and can be replaced by adversarial alternatives that produce identical predictions — attention is a communication tool, not a rigorous attribution. Every explanation method has failure modes. Using multiple methods and comparing their results (do SHAP and integrated gradients agree on the top features?) is the most reliable approach.

  4. Concept-based explanations bridge the gap between feature-level attributions and human reasoning. Feature-level explanations ("pixel (142, 87) contributed +0.03") are useless to domain experts who think in terms of concepts ("inflammation pattern," "payment reliability"). TCAV quantifies model sensitivity to human-defined concepts by training concept activation vectors in the model's internal representation space. Concept bottleneck models go further: they force the model to reason through defined concepts, enabling concept-level inspection and human-in-the-loop intervention. The cost is accuracy (the bottleneck constrains the model to known concepts) and annotation effort (concept labels must be provided). For clinical, financial, and other domain-expert-facing applications, concept-level explanations are often more valuable than feature-level attributions.

  5. Counterfactual explanations provide actionable recourse — but only with domain constraints. Rather than explaining why a prediction was made, counterfactuals explain what would need to change for a different outcome. For credit denials, this directly satisfies the spirit of ECOA: "If your debt-to-income ratio were 32% instead of 48%, your application would have been approved." Unconstrained counterfactuals produce mathematically valid but practically useless recommendations. Immutability constraints (age cannot change), causal consistency (changing income should change debt-to-income), actionability (changes must be achievable), and plausibility (the counterfactual must look like a real person) are essential. Well-constrained counterfactuals complement SHAP-based explanations: SHAP tells the applicant why they were denied; the counterfactual tells them what to do about it.

  6. Explanation infrastructure is a software engineering problem, not an analysis problem. Generating one SHAP explanation in a notebook takes seconds. Generating 15,000 per day (Meridian Financial) or 400 million per day (StreamRec) with audit logging, version tracking, regulatory compliance, and sub-100ms latency requires: an explanation API that runs the same model version that made the prediction, deterministic explanation methods (ruling out LIME for regulated use), immutable audit trails with hash chain integrity, natural language generation for multiple audiences (applicant, underwriter, regulator, internal), and monitoring of explanation quality over time. Organizations that treat explanation as infrastructure — reusable across models, integrated into the serving pipeline, monitored like any other production component — deliver compliant explanations faster than those that treat it as a one-time analysis task.

  7. The regulatory landscape makes explanations mandatory, not optional, for high-stakes AI systems. ECOA Regulation B requires specific, accurate, individualized reasons for credit denials — boilerplate reason codes reflecting global feature importance do not suffice (CFPB Circular 2022-03). GDPR Article 22 requires "meaningful information about the logic involved" for automated decisions with significant effects. The EU AI Act requires transparency, logging, and traceability for high-risk AI systems. These requirements are not aspirational — they are enforceable, with material penalties. The practical compliance strategy: TreeSHAP + counterfactuals for ECOA, global model description + local SHAP for GDPR, full audit logging + documentation for EU AI Act. Even in unregulated settings, explanations improve user trust (StreamRec A/B test: +4.0% CTR, +7.9% satisfaction), catch model pathologies (explanation monitoring detected proxy discrimination shift at Meridian Financial), and enable domain expert oversight.