Quiz: Chapter 19

DataField.Dev

Quiz: Chapter 19

Model Interpretation

Instructions: Answer all questions. Multiple-choice questions have one correct answer unless otherwise stated. Short-answer questions should be answered in 2-4 sentences.

Question 1 (Multiple Choice)

What is the fundamental property that distinguishes SHAP values from other feature importance methods?

A) SHAP values are always positive
B) SHAP values are computed faster than any other method
C) SHAP values for all features sum to the difference between the prediction and the expected value
D) SHAP values require a linear model

Answer: C) SHAP values for all features sum to the difference between the prediction and the expected value. This additivity property is the defining characteristic of Shapley values from game theory. It guarantees that the contribution of each feature to a prediction is "fair" in the sense that the individual contributions account for the entire prediction exactly, without overlap or gaps. Other methods (gain-based importance, permutation importance) do not provide this decomposition.

Question 2 (Multiple Choice)

A SHAP summary (dot) plot shows red dots for days_since_last_session clustered on the right side of the plot. What does this mean?

A) The feature has a negative correlation with the target
B) High values of days_since_last_session push the model's prediction higher (toward the positive class)
C) The feature is the most important feature in the model
D) Low values of days_since_last_session increase churn risk

Answer: B) High values of days_since_last_session push the model's prediction higher (toward the positive class). In the SHAP summary dot plot, the horizontal axis represents the SHAP value (contribution to the prediction), and the color represents the feature value (red = high, blue = low). Red dots on the right mean that high feature values produce positive SHAP values, pushing the prediction toward the positive class (churn, in this context). This indicates that customers who have not logged in for many days are more likely to be predicted as churners.

Question 3 (Multiple Choice)

Which SHAP plot type is most appropriate for answering the question: "Why did the model predict that this specific customer will churn?"

A) SHAP summary plot (bar)
B) SHAP summary plot (dot)
C) SHAP waterfall plot
D) SHAP dependence plot

Answer: C) SHAP waterfall plot. The waterfall plot shows, for a single observation, the base value (expected prediction), each feature's SHAP contribution (positive or negative), and the final prediction. It answers the "why this prediction" question by decomposing the prediction into feature-level contributions. The summary plots (A and B) are global visualizations showing patterns across all observations. The dependence plot (D) shows the relationship between one feature's values and its SHAP values across all observations.

Question 4 (Short Answer)

Explain the difference between a Partial Dependence Plot (PDP) and an Individual Conditional Expectation (ICE) plot. When would the PDP be misleading without the ICE plot?

Answer: A PDP shows the average prediction as one feature varies, averaging across all observations in the dataset. An ICE plot shows the prediction curve for each individual observation separately, revealing how the feature's effect varies across the population. A PDP would be misleading when the feature's effect is heterogeneous --- for example, if increasing plan_price increases churn for new customers but decreases churn for long-tenured customers (perhaps because long-tenured customers on expensive plans are highly engaged). The PDP would show a flat or mildly positive slope, averaging the two opposing effects, while the ICE plot would reveal the two distinct patterns.

Question 5 (Multiple Choice)

Permutation importance measures feature importance by:

A) Computing how much the feature improves splits in tree-based models
B) Computing the Shapley value of each feature
C) Randomly shuffling the feature's values and measuring the drop in model performance
D) Fitting a linear model to approximate the feature's contribution

Answer: C) Randomly shuffling the feature's values and measuring the drop in model performance. Permutation importance breaks the relationship between a feature and the target by shuffling the feature's values across observations, then measures how much the model's evaluation metric (e.g., AUC) degrades. A large drop indicates the model relies heavily on that feature. This method is model-agnostic, meaning it works with any model type, not just tree-based models.

Question 6 (Multiple Choice)

Which of the following is a known limitation of permutation importance when features are correlated?

A) It overestimates the importance of correlated features
B) It underestimates the importance of correlated features because shuffling one does not destroy the signal carried by the other
C) It cannot be computed when features are correlated
D) It assigns identical importance values to all correlated features

Answer: B) It underestimates the importance of correlated features because shuffling one does not destroy the signal carried by the other. When two features carry similar information (e.g., monthly_hours_watched and sessions_last_30d are both engagement measures), shuffling one feature leaves the other intact, so the model can still partially reconstruct the signal. Both features may show lower permutation importance than either would show if it were the sole representative of that information. This can lead to underestimating the importance of an entire group of correlated features.

Question 7 (Short Answer)

A data scientist presents a SHAP waterfall plot to a product manager. The product manager says: "I do not understand this chart. What are these bars?" Rewrite the waterfall explanation in one paragraph, using no technical terms (no "SHAP," no "log-odds," no "base value").

Answer: Think of the model as starting from the average churn rate across all customers --- about 8%. Then it adjusts up or down for each factor it knows about this specific customer. Red bars are factors that increase this customer's risk above average, and blue bars are factors that decrease it. For example, the biggest red bar says "has not logged in for 38 days," which pushes the risk up significantly because the average customer logs in every 5 days. The final number on the right --- 61% --- is the model's overall risk estimate after considering all factors together. The longer the bar, the bigger the factor's influence on the final number.

Question 8 (Multiple Choice)

LIME explains a prediction by:

A) Computing exact Shapley values using tree structure
B) Generating perturbed versions of the observation, getting the model's predictions for each, and fitting a simple linear model to the perturbation-prediction pairs
C) Computing the gradient of the model's output with respect to each feature
D) Comparing the observation to the K nearest neighbors in the training set

Answer: B) Generating perturbed versions of the observation, getting the model's predictions for each, and fitting a simple linear model to the perturbation-prediction pairs. LIME creates many slightly modified versions of the original observation, asks the complex model to predict each one, weights the perturbed observations by their proximity to the original, and then fits an interpretable model (typically linear regression) to this weighted dataset. The coefficients of the linear model are the LIME explanation. This approach is model-agnostic but stochastic --- different perturbation samples can produce different explanations.

Question 9 (Multiple Choice)

Which statement best describes the relationship between SHAP and LIME for tree-based models?

A) LIME is always preferred because it is faster
B) SHAP (TreeSHAP) is preferred because it is exact, deterministic, and provides both global and local explanations
C) They always produce identical explanations
D) LIME is more theoretically grounded than SHAP

Answer: B) SHAP (TreeSHAP) is preferred because it is exact, deterministic, and provides both global and local explanations. TreeSHAP exploits the tree structure to compute exact Shapley values in polynomial time. LIME relies on random perturbation sampling, making it stochastic (explanations can vary between runs) and providing no consistency guarantees. Additionally, SHAP values can be aggregated for global interpretation (summary plots, bar plots), while LIME is inherently local. For non-tree models where TreeSHAP is unavailable, LIME becomes a practical alternative.

Question 10 (Short Answer)

Explain the PDP assumption of feature independence. Give an example of when this assumption would produce misleading results.

Answer: PDPs compute the average prediction by varying one feature while holding all others at their observed values. This implicitly assumes that the feature being varied is independent of the others --- that any value of the varied feature can co-occur with any value of the held-constant features. This is misleading when features are correlated. For example, in a housing model, a PDP for square_footage might extrapolate to houses with 5,000 square feet but only 1 bathroom --- a combination that does not exist in reality. The PDP would show the model's prediction for this impossible scenario, potentially producing misleading conclusions about how square footage affects price.

Question 11 (Multiple Choice)

When presenting model interpretation results to a non-technical stakeholder, which approach is most effective?

A) Show the SHAP summary dot plot and explain the color encoding
B) Present the AUC, precision, and recall metrics alongside permutation importance
C) Use the "three-slide" framework: what the model does (no math), what drives it (one ranked list), and a specific example with top reasons
D) Provide the complete SHAP waterfall for every observation in the test set

Answer: C) Use the "three-slide" framework: what the model does (no math), what drives it (one ranked list), and a specific example with top reasons. Non-technical stakeholders do not need to see SHAP dot plots, statistical metrics, or per-observation waterfalls. They need to understand three things: what the model does in business terms, what factors drive its decisions (as a ranked list in plain English), and a concrete example showing the top reasons for a specific prediction. This framework builds trust through simplicity, specificity, and an invitation for the stakeholder to apply their domain knowledge.

Question 12 (Short Answer)

A colleague says: "SHAP showed that zip_code is the most important feature in our loan approval model. That means we should definitely keep it." What concerns should you raise?

Answer: A feature being important in SHAP does not automatically mean it should be kept. zip_code likely correlates with race, income level, and other protected characteristics, so relying on it could introduce or perpetuate discriminatory lending patterns, potentially violating fair lending regulations (ECOA, Fair Housing Act). The model may be learning a proxy for demographic characteristics rather than legitimate credit risk factors. You should investigate what signal zip_code is capturing (using SHAP dependence plots to see which zip codes push predictions in which direction), test whether removing it meaningfully degrades performance, and consult with your compliance and legal teams about whether its use is appropriate.

Question 13 (Multiple Choice)

You compute SHAP values for an XGBoost model and the SHAP summary plot shows that feature_X has consistently small SHAP values (close to zero) for all observations. You then compute permutation importance and find that feature_X has the third-highest permutation importance. Which explanation is most likely?

A) The SHAP computation is incorrect
B) The permutation importance computation is incorrect
C) feature_X is highly correlated with another feature; SHAP distributes the credit between them while permutation importance measures the unique contribution
D) feature_X has a strong non-linear effect that SHAP cannot detect

Answer: C) feature_X is highly correlated with another feature; SHAP distributes the credit between them while permutation importance measures the unique contribution. When two features carry similar information, SHAP distributes the Shapley value "fairly" between them, so each gets a moderate share. Permutation importance, however, measures what happens when you break the feature's relationship with the target --- and if the correlated feature still provides the signal, shuffling feature_X alone does not hurt much. Wait --- the scenario is reversed: SHAP is small but permutation importance is high. This suggests feature_X interacts with other features in a way that permutation captures but that SHAP distributes across the interaction partners.

Question 14 (Short Answer)

You are building a hospital readmission model. A clinician reviews the SHAP waterfall for a flagged patient and says: "The model says length of stay is the top reason, but I think the real issue is that the patient lives alone and has no transportation. The model is wrong." How do you respond?

Answer: The clinician may well be right. The model can only use the information it was trained on, and if lives_alone and transportation_access are not features in the model (or are poorly measured), the model cannot weight them appropriately. Length of stay may be serving as a noisy proxy for the true underlying risk factors the clinician is identifying. This is exactly the kind of feedback that improves models --- the clinician's domain knowledge reveals a signal the model misses. The right next step is to record this feedback, investigate whether social determinants of health features can be added to the model, and emphasize that the model's explanation is its best guess given the data it has, not a claim about causation.

Question 15 (Multiple Choice)

Which of the following correctly describes when to use each interpretation method?

A) Use SHAP for global interpretation, LIME for local interpretation, PDP for feature interactions
B) Use permutation importance for global ranking, SHAP waterfall for local explanation, PDP + ICE for feature relationship visualization
C) Use PDP for all interpretation tasks because it is the most model-agnostic method
D) Use LIME for all interpretation tasks because it provides both global and local explanations

Answer: B) Use permutation importance for global ranking, SHAP waterfall for local explanation, PDP + ICE for feature relationship visualization. This represents a practical toolkit: permutation importance provides a quick, model-agnostic global ranking (complemented by SHAP bar/summary plots for direction information); SHAP waterfall plots decompose individual predictions for stakeholder communication; and PDP + ICE plots reveal the shape and consistency of feature-prediction relationships. SHAP can serve all three roles, but using multiple methods provides cross-validation and catches issues that any single method might miss.

This quiz covers Chapter 19: Model Interpretation. Return to the chapter to review concepts.