Case Study 1: StreamFlow SHAP Deep Dive --- Three Customer Stories


Background

StreamFlow's churn model has been through the full pipeline. Chapters 11-14 built and compared models. Chapter 16 evaluated them properly. Chapter 17 addressed class imbalance. Chapter 18 tuned the hyperparameters. The final model is a tuned XGBoost classifier with an AUC of 0.938 and an average precision of 0.54 on the 8.2% churn-rate dataset.

Now the Customer Success team has a practical problem. The model scores 50,000 active subscribers daily and flags roughly 4,100 (about 8%) as high-risk (predicted churn probability above 40%). The team has capacity to make proactive outreach calls to about 200 customers per day. They need to prioritize --- and more importantly, they need to know what to say when they call.

"Tell me the customer's risk score" is not enough. The customer success representative needs to know: "This customer is at risk because of X, Y, and Z --- here is what you can offer to address each one."

This case study builds the SHAP-based explanation system that powers those conversations.


The Data and Model

import numpy as np
import pandas as pd
import shap
import matplotlib.pyplot as plt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

np.random.seed(42)
n = 50000

streamflow = pd.DataFrame({
    'monthly_hours_watched': np.random.exponential(18, n).round(1),
    'sessions_last_30d': np.random.poisson(14, n),
    'avg_session_minutes': np.random.exponential(28, n).round(1),
    'unique_titles_watched': np.random.poisson(8, n),
    'content_completion_rate': np.random.beta(3, 2, n).round(3),
    'binge_sessions_30d': np.random.poisson(2, n),
    'weekend_ratio': np.random.beta(2.5, 3, n).round(3),
    'peak_hour_ratio': np.random.beta(3, 2, n).round(3),
    'hours_change_pct': np.random.normal(0, 30, n).round(1),
    'sessions_change_pct': np.random.normal(0, 25, n).round(1),
    'months_active': np.random.randint(1, 60, n),
    'plan_price': np.random.choice(
        [9.99, 14.99, 19.99, 24.99], n, p=[0.35, 0.35, 0.20, 0.10]
    ),
    'devices_used': np.random.randint(1, 6, n),
    'profiles_active': np.random.randint(1, 5, n),
    'payment_failures_6m': np.random.poisson(0.3, n),
    'support_tickets_90d': np.random.poisson(0.8, n),
    'days_since_last_session': np.random.exponential(5, n).round(0).clip(0, 60),
    'recommendation_click_rate': np.random.beta(2, 8, n).round(3),
    'search_frequency_30d': np.random.poisson(6, n),
    'download_count_30d': np.random.poisson(3, n),
    'share_count_30d': np.random.poisson(1, n),
    'rating_count_30d': np.random.poisson(2, n),
    'free_trial_convert': np.random.binomial(1, 0.65, n),
    'referral_source': np.random.choice(
        [0, 1, 2, 3], n, p=[0.50, 0.25, 0.15, 0.10]
    ),
})

churn_logit = (
    -3.0
    + 0.08 * streamflow['days_since_last_session']
    - 0.02 * streamflow['monthly_hours_watched']
    - 0.04 * streamflow['sessions_last_30d']
    + 0.15 * streamflow['payment_failures_6m']
    + 0.10 * streamflow['support_tickets_90d']
    - 0.03 * streamflow['content_completion_rate'] * 10
    + 0.05 * (streamflow['hours_change_pct'] < -30).astype(int)
    - 0.01 * streamflow['months_active']
    + 0.08 * (streamflow['plan_price'] > 19.99).astype(int)
    - 0.02 * streamflow['unique_titles_watched']
    + np.random.normal(0, 0.3, n)
)
churn_prob = 1 / (1 + np.exp(-churn_logit))
streamflow['churned'] = np.random.binomial(1, churn_prob)

X = streamflow.drop(columns=['churned'])
y = streamflow['churned']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

model = XGBClassifier(
    n_estimators=500, learning_rate=0.05, max_depth=5,
    subsample=0.8, colsample_bytree=0.8, min_child_weight=3,
    eval_metric='logloss', random_state=42, n_jobs=-1
)
model.fit(X_train, y_train)

# Compute SHAP values
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
probs = model.predict_proba(X_test)[:, 1]

The Three Customers

We will examine three customers who represent different risk profiles and different churn stories. In production, these would be selected by the customer success team from their daily queue. Here, we select them programmatically.

# Customer A: The Ghost — high engagement that suddenly stopped
ghost_mask = (
    (probs > 0.55) &
    (X_test['days_since_last_session'] > 25) &
    (X_test['monthly_hours_watched'] > 15) &
    (X_test['months_active'] > 24)
)
ghost_idx = np.where(ghost_mask)[0][0] if ghost_mask.any() else np.where(probs > 0.55)[0][0]

# Customer B: The Frustrated User — payment and support issues
frustrated_mask = (
    (probs > 0.45) &
    (X_test['payment_failures_6m'] >= 2) &
    (X_test['support_tickets_90d'] >= 3)
)
frustrated_idx = np.where(frustrated_mask)[0][0] if frustrated_mask.any() else np.where(probs > 0.45)[0][0]

# Customer C: The Drifter — slow disengagement
drifter_mask = (
    (probs > 0.35) & (probs < 0.55) &
    (X_test['hours_change_pct'] < -25) &
    (X_test['sessions_change_pct'] < -20)
)
drifter_idx = np.where(drifter_mask)[0][0] if drifter_mask.any() else np.where((probs > 0.35) & (probs < 0.55))[0][0]

customers = {
    'The Ghost': ghost_idx,
    'The Frustrated User': frustrated_idx,
    'The Drifter': drifter_idx,
}

for name, idx in customers.items():
    print(f"\n{name} (index {idx}):")
    print(f"  Churn probability: {probs[idx]:.1%}")
    print(f"  days_since_last_session: {X_test.iloc[idx]['days_since_last_session']}")
    print(f"  monthly_hours_watched: {X_test.iloc[idx]['monthly_hours_watched']}")
    print(f"  months_active: {X_test.iloc[idx]['months_active']}")
    print(f"  payment_failures_6m: {X_test.iloc[idx]['payment_failures_6m']}")
    print(f"  support_tickets_90d: {X_test.iloc[idx]['support_tickets_90d']}")
    print(f"  hours_change_pct: {X_test.iloc[idx]['hours_change_pct']}")

Customer A: The Ghost

Profile

The Ghost is a long-tenured subscriber (over 24 months) who historically watched significant hours per month. But they have not logged in for over 25 days. The model flags them because of the sudden inactivity --- not because of any billing or support issue.

ghost = X_test.iloc[ghost_idx]
ghost_prob = probs[ghost_idx]
ghost_shap = shap_values[ghost_idx]

print(f"Churn probability: {ghost_prob:.1%}")
print(f"\nKey metrics:")
print(f"  Days since last session: {ghost['days_since_last_session']:.0f}")
print(f"  Monthly hours watched:   {ghost['monthly_hours_watched']:.1f}")
print(f"  Months active:           {ghost['months_active']:.0f}")
print(f"  Payment failures (6m):   {ghost['payment_failures_6m']:.0f}")
print(f"  Support tickets (90d):   {ghost['support_tickets_90d']:.0f}")
print(f"  Content completion rate:  {ghost['content_completion_rate']:.3f}")

SHAP Waterfall

shap.plots.waterfall(
    shap.Explanation(
        values=ghost_shap,
        base_values=explainer.expected_value,
        data=ghost.values,
        feature_names=X_test.columns.tolist()
    ),
    max_display=12, show=False
)
plt.title(f"The Ghost --- Churn Probability: {ghost_prob:.1%}")
plt.tight_layout()
plt.savefig("cs1_ghost_waterfall.png", dpi=150, bbox_inches='tight')
plt.show()

Interpretation

The waterfall tells a clear story. The dominant feature is days_since_last_session, with a large positive SHAP value pushing the prediction toward churn. This is the Ghost's defining characteristic: they simply stopped showing up.

Interestingly, several features push against churn. High months_active (long tenure) provides a protective effect. Decent historical monthly_hours_watched also pushes the prediction down. The model recognizes that this was an engaged subscriber --- but the recent inactivity overrides that history.

Top 3 Reasons (For Customer Success)

top3 = np.argsort(np.abs(ghost_shap))[::-1][:3]
print("Top 3 reasons the model flagged The Ghost:")
for rank, feat_idx in enumerate(top3, 1):
    feat_name = X_test.columns[feat_idx]
    feat_val = ghost.iloc[feat_idx]
    sv = ghost_shap[feat_idx]
    direction = "increases" if sv > 0 else "decreases"
    print(f"  {rank}. {feat_name} = {feat_val} "
          f"(SHAP = {sv:+.4f}, {direction} churn risk)")

What the Customer Success Rep Should Say

"Hi [Customer Name], I noticed you have not been on StreamFlow in a few weeks. You have been with us for over two years and used to watch regularly --- we want to make sure everything is okay. Is there something we can help with? We recently added [relevant content based on their viewing history] that I think you would enjoy. Can I extend your next billing cycle by a week so you have time to check it out?"

The conversation is personalized because the SHAP explanation reveals why this customer was flagged: inactivity, not frustration or billing issues. The intervention targets the actual reason.


Customer B: The Frustrated User

Profile

The Frustrated User has a different story. They are still somewhat active, but they have had multiple payment failures and have filed several support tickets. The model flags them not because they stopped watching, but because they are showing signs of operational friction.

frustrated = X_test.iloc[frustrated_idx]
frustrated_prob = probs[frustrated_idx]
frustrated_shap = shap_values[frustrated_idx]

print(f"Churn probability: {frustrated_prob:.1%}")
print(f"\nKey metrics:")
print(f"  Days since last session:  {frustrated['days_since_last_session']:.0f}")
print(f"  Monthly hours watched:    {frustrated['monthly_hours_watched']:.1f}")
print(f"  Months active:            {frustrated['months_active']:.0f}")
print(f"  Payment failures (6m):    {frustrated['payment_failures_6m']:.0f}")
print(f"  Support tickets (90d):    {frustrated['support_tickets_90d']:.0f}")
print(f"  Content completion rate:   {frustrated['content_completion_rate']:.3f}")

SHAP Waterfall

shap.plots.waterfall(
    shap.Explanation(
        values=frustrated_shap,
        base_values=explainer.expected_value,
        data=frustrated.values,
        feature_names=X_test.columns.tolist()
    ),
    max_display=12, show=False
)
plt.title(f"The Frustrated User --- Churn Probability: {frustrated_prob:.1%}")
plt.tight_layout()
plt.savefig("cs1_frustrated_waterfall.png", dpi=150, bbox_inches='tight')
plt.show()

Interpretation

The Frustrated User's waterfall looks different from the Ghost's. Here, payment_failures_6m and support_tickets_90d are the dominant positive SHAP contributors. The model is not reacting to disengagement (this customer is still watching) --- it is reacting to friction signals.

This distinction matters enormously for the customer success team. Offering the Ghost a content recommendation makes sense. Offering the Frustrated User a content recommendation would be tone-deaf. Their problem is not content --- it is billing and service quality.

Top 3 Reasons (For Customer Success)

top3 = np.argsort(np.abs(frustrated_shap))[::-1][:3]
print("Top 3 reasons the model flagged The Frustrated User:")
for rank, feat_idx in enumerate(top3, 1):
    feat_name = X_test.columns[feat_idx]
    feat_val = frustrated.iloc[feat_idx]
    sv = frustrated_shap[feat_idx]
    direction = "increases" if sv > 0 else "decreases"
    print(f"  {rank}. {feat_name} = {feat_val} "
          f"(SHAP = {sv:+.4f}, {direction} churn risk)")

What the Customer Success Rep Should Say

"Hi [Customer Name], I am reaching out because I see your account has had some billing issues recently, and I wanted to make sure we get those sorted out for you. I can also see you contacted our support team a few times in the last three months --- I want to make sure those issues were resolved to your satisfaction. Let me [fix the billing issue] and [follow up on the support tickets]. Is there anything else I can help with?"

The conversation addresses the actual pain points: billing and support. The SHAP explanation ensures the rep does not waste the customer's time with generic retention offers.

Comparing the Two Waterfalls

fig, axes = plt.subplots(1, 2, figsize=(20, 8))

plt.sca(axes[0])
shap.plots.waterfall(
    shap.Explanation(
        values=ghost_shap,
        base_values=explainer.expected_value,
        data=ghost.values,
        feature_names=X_test.columns.tolist()
    ),
    max_display=8, show=False
)
axes[0].set_title(f"The Ghost ({ghost_prob:.1%})")

plt.sca(axes[1])
shap.plots.waterfall(
    shap.Explanation(
        values=frustrated_shap,
        base_values=explainer.expected_value,
        data=frustrated.values,
        feature_names=X_test.columns.tolist()
    ),
    max_display=8, show=False
)
axes[1].set_title(f"The Frustrated User ({frustrated_prob:.1%})")

plt.tight_layout()
plt.savefig("cs1_comparison_ghost_frustrated.png", dpi=150, bbox_inches='tight')
plt.show()

Two customers with similar churn probabilities, but completely different reasons. Without SHAP, they would receive the same generic retention email. With SHAP, they receive targeted interventions that address their actual problems.


Customer C: The Drifter

Profile

The Drifter is the subtlest case. They are still active, they have not had billing problems, and they have not filed support tickets. But their engagement is declining: hours watched are down, sessions are down, and they are watching fewer unique titles. The model detects this slow disengagement.

drifter = X_test.iloc[drifter_idx]
drifter_prob = probs[drifter_idx]
drifter_shap = shap_values[drifter_idx]

print(f"Churn probability: {drifter_prob:.1%}")
print(f"\nKey metrics:")
print(f"  Days since last session:  {drifter['days_since_last_session']:.0f}")
print(f"  Monthly hours watched:    {drifter['monthly_hours_watched']:.1f}")
print(f"  Months active:            {drifter['months_active']:.0f}")
print(f"  Payment failures (6m):    {drifter['payment_failures_6m']:.0f}")
print(f"  Support tickets (90d):    {drifter['support_tickets_90d']:.0f}")
print(f"  Hours change %:           {drifter['hours_change_pct']:.1f}")
print(f"  Sessions change %:        {drifter['sessions_change_pct']:.1f}")
print(f"  Unique titles watched:    {drifter['unique_titles_watched']:.0f}")

SHAP Waterfall

shap.plots.waterfall(
    shap.Explanation(
        values=drifter_shap,
        base_values=explainer.expected_value,
        data=drifter.values,
        feature_names=X_test.columns.tolist()
    ),
    max_display=12, show=False
)
plt.title(f"The Drifter --- Churn Probability: {drifter_prob:.1%}")
plt.tight_layout()
plt.savefig("cs1_drifter_waterfall.png", dpi=150, bbox_inches='tight')
plt.show()

Interpretation

The Drifter's waterfall is more distributed than the other two. No single feature dominates. Instead, several features contribute moderate positive SHAP values: declining hours (hours_change_pct strongly negative), declining sessions (sessions_change_pct negative), and relatively low unique_titles_watched. The model is picking up on a pattern of gradual disengagement across multiple dimensions.

This is the hardest customer to intervene on because there is no single problem to fix. The customer is simply losing interest. The intervention needs to be about re-engagement: discovering new content, refreshing the experience.

Top 3 Reasons (For Customer Success)

top3 = np.argsort(np.abs(drifter_shap))[::-1][:3]
print("Top 3 reasons the model flagged The Drifter:")
for rank, feat_idx in enumerate(top3, 1):
    feat_name = X_test.columns[feat_idx]
    feat_val = drifter.iloc[feat_idx]
    sv = drifter_shap[feat_idx]
    direction = "increases" if sv > 0 else "decreases"
    print(f"  {rank}. {feat_name} = {feat_val} "
          f"(SHAP = {sv:+.4f}, {direction} churn risk)")

What the Customer Success Rep Should Say

"Hi [Customer Name], I noticed your viewing has slowed down a bit recently, and I wanted to reach out personally. We have some new titles coming out this week that based on your history I think you would really enjoy --- [specific recommendation]. We also recently launched [new feature] that makes it easier to find shows you will love. Would you like me to set up a personalized watchlist for you?"

The conversation addresses the actual problem --- fading interest --- with content discovery, not billing fixes or generic retention offers.


Building the Production Explanation Pipeline

In production, you do not generate waterfalls manually. You build a pipeline that scores every customer, computes SHAP values, and produces a structured explanation for each flagged customer.

def generate_customer_explanations(model, explainer, X, threshold=0.4, top_n=3):
    """
    Generate structured SHAP explanations for all customers
    above the risk threshold.

    Returns a DataFrame with one row per flagged customer,
    including top N reasons.
    """
    probs = model.predict_proba(X)[:, 1]
    shap_vals = explainer.shap_values(X)

    flagged_mask = probs > threshold
    flagged_indices = np.where(flagged_mask)[0]

    records = []
    for idx in flagged_indices:
        sv = shap_vals[idx]
        top_feat_indices = np.argsort(np.abs(sv))[::-1][:top_n]

        record = {
            'customer_index': idx,
            'churn_probability': round(probs[idx], 3),
        }

        for rank, feat_idx in enumerate(top_feat_indices, 1):
            feat_name = X.columns[feat_idx]
            feat_val = X.iloc[idx, feat_idx]
            record[f'reason_{rank}_feature'] = feat_name
            record[f'reason_{rank}_value'] = feat_val
            record[f'reason_{rank}_shap'] = round(sv[feat_idx], 4)
            record[f'reason_{rank}_direction'] = (
                'increases risk' if sv[feat_idx] > 0 else 'decreases risk'
            )

        records.append(record)

    return pd.DataFrame(records)


# Generate explanations
explanations = generate_customer_explanations(
    model, explainer, X_test, threshold=0.4, top_n=3
)

print(f"Flagged customers: {len(explanations)}")
print(f"\nSample explanations (first 5):")
print(explanations[[
    'customer_index', 'churn_probability',
    'reason_1_feature', 'reason_1_value', 'reason_1_direction',
    'reason_2_feature', 'reason_2_value', 'reason_2_direction',
    'reason_3_feature', 'reason_3_value', 'reason_3_direction',
]].head().to_string(index=False))

Categorizing Churn Drivers

# What are the most common "reason 1" features across all flagged customers?
reason_1_counts = explanations['reason_1_feature'].value_counts()
print("\nMost common primary churn driver:")
print(reason_1_counts.to_string())

# Categorize customers by their primary churn driver
def categorize_churn_type(row):
    r1 = row['reason_1_feature']
    if r1 == 'days_since_last_session':
        return 'Inactivity'
    elif r1 in ('payment_failures_6m', 'support_tickets_90d'):
        return 'Friction'
    elif r1 in ('hours_change_pct', 'sessions_change_pct'):
        return 'Declining Engagement'
    else:
        return 'Other'

explanations['churn_type'] = explanations.apply(categorize_churn_type, axis=1)

print("\nChurn type distribution:")
print(explanations['churn_type'].value_counts().to_string())
print(f"\nThis categorization helps the CS team route customers to "
      f"the right intervention track.")

Key Lessons

  1. Same risk score, different stories. The Ghost, the Frustrated User, and the Drifter all have elevated churn probabilities, but for completely different reasons. Without SHAP, they are all just "high-risk customers." With SHAP, they are three distinct problems with three distinct solutions.

  2. SHAP enables personalized interventions. A generic "we miss you" email is cheap but ineffective. A personalized outreach call that addresses the specific reasons the model flagged this customer is expensive but effective. SHAP is what makes personalization possible at scale.

  3. The production pipeline is a DataFrame, not a plot. Waterfall plots are great for analysis and presentations. But in production, the customer success team needs a structured table: customer ID, risk score, top 3 reasons, feature values. Build the pipeline that produces this table automatically.

  4. Churn type categorization enables routing. By categorizing the primary driver (inactivity, friction, declining engagement), you can route customers to different intervention tracks. Inactivity customers get re-engagement campaigns. Friction customers get billing and support resolution. Declining engagement customers get content recommendations.

  5. The model is a conversation starter, not a decision. The customer success representative's judgment still matters. The model cannot see that the Ghost just had a baby, that the Frustrated User's billing issue was already resolved yesterday, or that the Drifter just signed up for a competitor. SHAP provides the starting point; the human provides the context.


This case study supports Chapter 19: Model Interpretation. Return to the chapter for the complete treatment of SHAP, PDP, permutation importance, and LIME.