Further Reading: Chapter 19

Model Interpretation


Foundational Papers

1. "A Unified Approach to Interpreting Model Predictions" --- Scott Lundberg and Su-In Lee (2017) The paper that introduced SHAP (SHapley Additive exPlanations), unifying six existing explanation methods under a single theoretical framework based on Shapley values. Lundberg and Lee showed that SHAP is the only method satisfying three desirable properties simultaneously: local accuracy (the explanation must match the prediction), missingness (features that are absent contribute zero), and consistency (if a feature's contribution increases in the underlying model, its SHAP value must not decrease). Published at NeurIPS 2017. This is the paper to read if you want to understand why SHAP is theoretically superior to ad hoc explanation methods.

2. "From Local Explanations to Global Understanding with Explainable AI for Trees" --- Lundberg, Erion, Chen, DeGrave, Prutkin, Nair, Kber, Liu, and Lee (2020) The paper introducing TreeSHAP, the polynomial-time exact algorithm for computing SHAP values on tree-based models. TreeSHAP exploits the recursive structure of decision trees to compute Shapley values without the exponential brute-force enumeration. The paper also introduces SHAP interaction values and demonstrates the method on clinical and financial datasets. Published in Nature Machine Intelligence, Vol. 2. This is the algorithmic foundation for the TreeSHAP computations used throughout this chapter.

3. "'Why Should I Trust You?': Explaining the Predictions of Any Classifier" --- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin (2016) The paper that introduced LIME (Local Interpretable Model-agnostic Explanations). Ribeiro et al. proposed explaining any classifier's prediction by approximating the model locally with an interpretable model (typically linear regression). The paper is notable for its focus on user trust and human evaluation of explanations. Published at KDD 2016. Read this for the original LIME algorithm and the insight that explanations must be designed for human consumption, not just mathematical correctness.


SHAP Theory and Extensions

4. "Consistent Individualized Feature Attribution for Tree Ensembles" --- Lundberg, Erion, and Lee (2019) A deeper treatment of TreeSHAP's consistency guarantees. This paper formalizes why gain-based feature importance (the default in XGBoost and Random Forest) can be inconsistent --- increasing a feature's true importance can sometimes decrease its gain-based ranking --- while SHAP is provably consistent. Published at the AAAI Conference on AI. Read this if you need to argue to a colleague that SHAP is preferable to built-in feature importance.

5. "The Shapley Value in Machine Learning" --- Rozemberczki, Watson, Bayer, Yang, Kiss, Nilsson, and Sarkar (2022) A comprehensive survey of how Shapley values are used across machine learning, covering feature attribution, data valuation, model selection, and federated learning. Sections 3-4 provide the clearest overview of different SHAP variants (KernelSHAP, TreeSHAP, DeepSHAP, GradientSHAP) and when to use each. Published in the International Joint Conference on Artificial Intelligence (IJCAI).

6. "Problems with Shapley-Value-Based Explanations as Feature Importance Measures" --- Kumar, Vaidyanathan, Sundararajan, and Ribeiro (2020) An important critique of SHAP that practitioners should understand. Kumar et al. show that SHAP values can be sensitive to the choice of background distribution, that interventional and observational SHAP values can disagree, and that SHAP does not directly measure causal influence. Published at ICML 2020. Read this to understand the limitations of SHAP and avoid over-interpreting SHAP values as causal claims.


Partial Dependence and ICE

7. "Greedy Function Approximation: A Gradient Boosting Machine" --- Jerome Friedman (2001) The paper that introduced both gradient boosting and partial dependence plots. Section 8 describes PDPs as a tool for understanding the "partial dependence" of the response on a subset of features, marginalizing over the remaining features. Published in the Annals of Statistics. The PDP concept is simple but Friedman's original description is precise and worth reading.

8. "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation" --- Goldstein, Kapelner, Bleich, and Pitkin (2015) The paper that introduced ICE plots as a solution to the averaging problem in PDPs. Goldstein et al. showed that PDPs can hide heterogeneous effects by averaging across subgroups, and that disaggregating the PDP into individual curves (ICE) reveals patterns that the average obscures. Published in the Journal of Computational and Graphical Statistics. This is a short and practical paper with clear examples.


Permutation Importance

9. "Random Forests" --- Leo Breiman (2001) Breiman introduced permutation importance as part of the Random Forest algorithm. The idea --- shuffle a feature's values and measure the performance drop --- is simple, powerful, and model-agnostic (though Breiman applied it only to Random Forests). Published in Machine Learning, Vol. 45. The permutation importance section is in Section 10 and is only a few pages.

10. "Conditional Variable Importance for Random Forests" --- Strobl, Boulesteix, Kneib, Augustin, and Zeileis (2008) This paper demonstrates the bias in standard permutation importance when features are correlated and proposes conditional permutation importance as a solution. The bias is exactly the problem discussed in this chapter: shuffling one correlated feature does not destroy the signal because the other carries the same information. Published in BMC Bioinformatics.


Interpretability in Practice

11. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable --- Christoph Molnar (2nd edition, 2022) The definitive practitioner's guide to model interpretation. Molnar covers PDPs, ICE, SHAP, LIME, permutation importance, feature interaction measures, counterfactual explanations, and more. Each chapter includes clear explanations, limitations, and Python code. The book is freely available at christophm.github.io/interpretable-ml-book and is updated regularly. If this chapter left you wanting more, Molnar is the next resource to read.

12. "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead" --- Cynthia Rudin (2019) A provocative and important paper arguing that for high-stakes decisions (criminal justice, healthcare), post-hoc explanations of black-box models are fundamentally insufficient, and inherently interpretable models (logistic regression, decision lists, scoring systems) should be used instead. Rudin's argument is that explanations of complex models are always approximations, and for life-altering decisions, approximations are not good enough. Published in Nature Machine Intelligence. Read this to understand the strongest counterargument to the approach in this chapter.

13. "The Mythos of Model Interpretability" --- Zachary Lipton (2018) Lipton argues that "interpretability" is an overloaded term that means different things to different audiences (transparency, simulatability, decomposability, algorithmic transparency). He distinguishes between interpretability as a property of the model and post-hoc explanations as a separate activity. Published in Communications of the ACM. This short paper clarifies the vocabulary used in the interpretability literature.


Clinical and Domain Applications

14. "Explainable Machine Learning in Deployment" --- Bhatt, Xiang, Sharma, Weller, Taly, Jia, Ghosh, Puri, Moura, and Eckersley (2020) A study of how 50 organizations deploy explainable ML in practice. Key findings: most teams use feature importance methods (SHAP, LIME) for model debugging during development, not for end-user explanations. Internal explanations for data scientists are more common than external explanations for end users. Published at the ACM Conference on Fairness, Accountability, and Transparency (FAccT). Read this for a reality check on how interpretation is used in industry.

15. "Clinical Artificial Intelligence Quality Improvement: Towards Continual Monitoring and Updating of AI Algorithms in Healthcare" --- Feng et al. (2022) A framework for deploying AI in healthcare that includes interpretability as a core requirement. The paper describes how SHAP explanations are used at a large academic medical center to build clinician trust, enable override workflows, and drive model improvement through feedback. Published in npj Digital Medicine. This paper provides evidence for the approach used in Case Study 2 of this chapter.


Software and Tools

16. SHAP Library Documentation --- Scott Lundberg The official documentation for the shap Python library. Includes tutorials for TreeSHAP, KernelSHAP, DeepSHAP, and all plot types (summary, waterfall, dependence, force, decision). The examples section covers tabular data, text, images, and model-specific implementations. Available at shap.readthedocs.io.

17. LIME Library Documentation --- Marco Tulio Ribeiro The official documentation for the lime Python library. Covers tabular data, text, and image explanations. The tabular explainer tutorial is the most relevant for this chapter. Available at github.com/marcotcr/lime.

18. scikit-learn User Guide --- Inspection Module scikit-learn's documentation for PartialDependenceDisplay, permutation_importance, and related inspection tools. Includes practical examples with gradient boosting, random forests, and neural networks. Available at scikit-learn.org in the User Guide under "Inspection."


Responsible AI and Regulation

19. "Assessment List for Trustworthy Artificial Intelligence (ALTAI)" --- European Commission High-Level Expert Group on AI (2020) The assessment checklist derived from the EU's Ethics Guidelines for Trustworthy AI. Section 3 covers transparency and explainability, including specific requirements for explanations at different levels (user-facing, developer-facing, regulator-facing). This document is the practical operationalization of the EU AI Act's transparency requirements. Available at ec.europa.eu.

20. "Model Risk Management" --- SR 11-7, Federal Reserve and OCC (2011) The U.S. banking regulators' guidance on model risk management. While predating modern ML, it establishes principles directly relevant to interpretation: models must be understood by their users, assumptions must be documented, and limitations must be communicated. Section 5 covers model validation, which in practice requires the kind of interpretation methods covered in this chapter. Available at federalreserve.gov.


Video and Interactive Resources

21. Scott Lundberg --- "SHAP: A Game Theoretic Approach to Explain the Output of any Machine Learning Model" (NeurIPS 2017 presentation) Lundberg's 20-minute conference presentation of the original SHAP paper. Clearer and more accessible than the paper itself, with visual explanations of Shapley values, the unification framework, and TreeSHAP. Available on YouTube.

22. Christoph Molnar --- "Interpretable Machine Learning" (Conference Tutorials) Molnar regularly gives tutorials at ML conferences (ICML, NeurIPS) that walk through the full interpretation toolkit. His tutorials are hands-on, with live coding and clear comparisons between methods. Search for his most recent tutorial on YouTube.

23. StatQuest with Josh Starmer --- "SHAP Values Explained" A visual, intuition-first walkthrough of Shapley values and SHAP. Starmer explains the game theory foundation with simple examples before connecting to machine learning. 18 minutes on YouTube. If the mathematical notation in Lundberg and Lee (item 1) is intimidating, start here.


How to Use This List

If you read nothing else, read Lundberg and Lee (item 1) on SHAP and Molnar (item 11) on interpretable ML. Together they provide the theoretical foundation and practical toolkit for everything in this chapter.

If you use tree-based models (and you probably do), read Lundberg et al. (item 2) on TreeSHAP. It explains why TreeSHAP is exact and fast, and introduces SHAP interaction values that go beyond what this chapter covers.

If you are concerned about SHAP's limitations, read Kumar et al. (item 6) on problems with Shapley-based explanations and Rudin (item 12) on the case for inherently interpretable models. These are the strongest critiques of the SHAP-centric approach, and understanding them makes you a more rigorous practitioner.

If you are deploying models in healthcare, finance, or other regulated domains, read Bhatt et al. (item 14) on how interpretation is used in practice and the relevant regulatory guidance (items 19-20). Interpretation requirements differ by domain, and knowing the regulatory landscape is as important as knowing the methods.

If you want to improve your stakeholder communication, Molnar's book (item 11) has the most practical advice on translating interpretation results into non-technical language. Ribeiro et al. (item 3) also emphasizes the human factors in explanation design.


This reading list supports Chapter 19: Model Interpretation. Return to the chapter to review concepts before diving in.