Case Study 1: StreamFlow ROI and Stakeholder Presentation
Background
StreamFlow's churn prediction model has been in production for three months. The model --- a LightGBM classifier trained on 50,000 subscribers with 19 features --- was built in Chapters 11--19, deployed as a FastAPI endpoint in Chapter 31, monitored for drift in Chapter 32, and audited for fairness in Chapter 33. The customer success team receives a weekly "high-risk" list: subscribers with predicted churn probability above 0.20 get a retention intervention.
The VP of Customer Success, Rachel Torres, loves the model. Her team has been using it to prioritize outreach, and anecdotally, it seems to be working. But anecdotes are not evidence, and Rachel has a budget meeting next week. The CFO wants to know whether the data science infrastructure is worth its cost. Rachel has asked the data science team for a single slide that proves it.
This case study walks through the full ROI analysis, the stakeholder presentation, and the hard questions that follow.
Phase 1: Gathering the Numbers
The first step is collecting the actual business outcomes, not simulated data.
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
np.random.seed(42)
# Production data from three months of deployment
# In practice, these come from the monitoring pipeline (Chapter 32)
# and the CRM system that tracks intervention outcomes.
n_subscribers = 10000
# True outcomes: did the subscriber churn within 60 days?
churn_rate = 0.12
y_true = np.random.binomial(1, churn_rate, n_subscribers)
# Model predictions (simulated to match AUC ~ 0.88)
probs = np.where(
y_true == 1,
np.random.beta(4, 2, n_subscribers),
np.random.beta(1.5, 5, n_subscribers),
)
# Current threshold: 0.20 (set in Chapter 16)
threshold = 0.20
y_pred = (probs >= threshold).astype(int)
# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
tn, fp, fn, tp = cm.ravel()
print(f"True Positives: {tp:>6,}")
print(f"False Positives: {fp:>6,}")
print(f"False Negatives: {fn:>6,}")
print(f"True Negatives: {tn:>6,}")
print(f"Precision: {tp / (tp + fp):.3f}")
print(f"Recall: {tp / (tp + fn):.3f}")
Establishing the Cost Structure
Rachel provides the cost structure from the customer success team:
# Annual subscriber values by plan tier (weighted average)
plan_distribution = {
"basic": {"price": 9.99, "pct": 0.35},
"standard": {"price": 14.99, "pct": 0.35},
"premium": {"price": 24.99, "pct": 0.20},
"family": {"price": 29.99, "pct": 0.10},
}
weighted_monthly_value = sum(
tier["price"] * tier["pct"] for tier in plan_distribution.values()
)
weighted_annual_value = weighted_monthly_value * 12
print(f"Weighted average monthly subscriber value: ${weighted_monthly_value:.2f}")
print(f"Weighted average annual subscriber value: ${weighted_annual_value:.2f}")
# Intervention costs
intervention_costs = {
"personalized_email": 2.50, # automated, low cost
"loyalty_discount": 15.00, # average discount value
"cs_agent_outreach": 12.50, # 15 minutes of agent time
}
total_intervention_cost = sum(intervention_costs.values())
print(f"\nTotal intervention cost per subscriber: ${total_intervention_cost:.2f}")
# Intervention success rate (from A/B test of intervention program)
# This was measured by comparing churn rate of contacted high-risk
# subscribers vs. matched controls who were not contacted.
intervention_success_rate = 0.58 # 58% of contacted churners are retained
print(f"Intervention success rate: {intervention_success_rate:.0%}")
Computing the Expected Value per Cell
# Net value per True Positive
# If we correctly identify a churner and the intervention succeeds,
# we save the annual subscription revenue minus the intervention cost.
# Expected value accounts for the success rate.
tp_expected_value = (
weighted_annual_value * intervention_success_rate
- total_intervention_cost
)
print(f"Expected value per TP: ${tp_expected_value:.2f}")
# Cost per False Positive
# We contact a loyal subscriber unnecessarily. Cost = intervention cost.
# There is also a small brand damage risk, which we estimate at $5.
fp_cost = total_intervention_cost + 5.0
print(f"Cost per FP: ${fp_cost:.2f}")
# Cost per False Negative
# We miss a churner. They cancel. We lose annual revenue.
# There is also a customer acquisition cost to replace them (~$45).
fn_cost = weighted_annual_value + 45.0
print(f"Cost per FN: ${fn_cost:.2f}")
# True Negatives
tn_value = 0.0
print(f"Value per TN: ${tn_value:.2f}")
Phase 2: The ROI Calculation
# Monthly model economics
gross_benefit = tp * tp_expected_value
gross_cost_errors = (fp * fp_cost) + (fn * fn_cost)
# Operational costs
infrastructure_cost = 2500.0 # AWS (EC2, S3, monitoring)
team_cost = 4000.0 # 10% of senior DS salary + benefits
operational_cost = infrastructure_cost + team_cost
# Net value
net_value = gross_benefit - gross_cost_errors - operational_cost
# Baseline: no model (no interventions at all)
baseline_loss = sum(y_true) * fn_cost # every churner is a missed FN
# Model lift over baseline
lift = net_value - (-baseline_loss)
print("=" * 60)
print("STREAMFLOW CHURN MODEL --- MONTHLY ROI ANALYSIS")
print("=" * 60)
print(f"\nConfusion Matrix:")
print(f" True Positives: {tp:>8,} (churners caught)")
print(f" False Positives: {fp:>8,} (loyal flagged)")
print(f" False Negatives: {fn:>8,} (churners missed)")
print(f" True Negatives: {tn:>8,} (loyal, no action)")
print(f"\nGross Benefit (TPs): ${gross_benefit:>12,.2f}")
print(f"Gross Cost (FPs+FNs): ${gross_cost_errors:>12,.2f}")
print(f"Operational Cost: ${operational_cost:>12,.2f}")
print(f" Infrastructure: ${infrastructure_cost:>12,.2f}")
print(f" Team time: ${team_cost:>12,.2f}")
print(f"\nNet Monthly Value: ${net_value:>12,.2f}")
print(f"Baseline Loss: ${-baseline_loss:>12,.2f}")
print(f"Model Lift (vs none): ${lift:>12,.2f}")
print(f"\nAnnual Projection: ${net_value * 12:>12,.2f}")
Threshold Sensitivity
Rachel asks: "What if we changed the threshold? Could we catch more churners?"
from sklearn.metrics import confusion_matrix as cm_func
def compute_value_at_threshold(y_true, y_proba, threshold, tp_val, fp_val, fn_val):
y_pred = (y_proba >= threshold).astype(int)
tn, fp, fn, tp = cm_func(y_true, y_pred).ravel()
return {
"threshold": threshold,
"tp": tp,
"fp": fp,
"fn": fn,
"precision": tp / (tp + fp) if (tp + fp) > 0 else 0,
"recall": tp / (tp + fn) if (tp + fn) > 0 else 0,
"net_value": (tp * tp_val) - (fp * fp_val) - (fn * fn_val),
}
thresholds = np.arange(0.05, 0.60, 0.05)
results = [
compute_value_at_threshold(
y_true, probs, t, tp_expected_value, fp_cost, fn_cost
)
for t in thresholds
]
results_df = pd.DataFrame(results)
results_df["net_value_formatted"] = results_df["net_value"].apply(
lambda x: f"${x:,.0f}"
)
print("\nThreshold Sensitivity Analysis")
print("-" * 80)
print(
results_df[["threshold", "tp", "fp", "fn", "precision", "recall",
"net_value_formatted"]].to_string(index=False)
)
# Find the optimal
best_idx = results_df["net_value"].idxmax()
best = results_df.iloc[best_idx]
print(f"\nOptimal threshold: {best['threshold']:.2f}")
print(f"Net value at optimal: ${best['net_value']:,.0f}")
print(f"Net value at current (0.20): "
f"${results_df[results_df['threshold'] == 0.20]['net_value'].values[0]:,.0f}")
Finding --- The optimal threshold is typically between 0.10 and 0.20 for this cost structure because missing a churner ($220+) is much more expensive than a false alarm ($35). The current threshold of 0.20 is close to optimal but may be slightly conservative. Lowering it to 0.15 would catch additional churners at the cost of more false alarms --- the net effect depends on the exact precision/recall tradeoff at that threshold.
Phase 3: The One-Slide Executive Summary
Rachel needs one slide for the CFO. Here is the content:
def build_executive_slide(
model_name: str,
monthly_net_value: float,
annual_projection: float,
monthly_cost: float,
catch_rate: float,
flag_accuracy: float,
subscribers_flagged: int,
subscribers_total: int,
months_in_production: int,
cumulative_value: float,
) -> str:
"""
Build a plain-text executive summary slide.
"""
lines = [
"=" * 65,
f" {model_name.upper()} --- EXECUTIVE SUMMARY",
"=" * 65,
"",
" THE PROBLEM",
f" StreamFlow loses approximately ${subscribers_total * 0.12 * 220:,.0f}/year",
" to preventable subscriber churn.",
"",
" THE SOLUTION",
" An ML model identifies subscribers likely to cancel within",
" 60 days. The customer success team contacts flagged subscribers",
" with personalized retention offers.",
"",
" RESULTS",
f" Catches {catch_rate:.0%} of subscribers who would have canceled",
f" Of those flagged, {flag_accuracy:.0%} are genuine churn risks",
f" Flags {subscribers_flagged:,} of {subscribers_total:,} subscribers "
f"({subscribers_flagged / subscribers_total:.1%}) per month",
"",
" FINANCIAL IMPACT",
f" Monthly net value: ${monthly_net_value:>12,.0f}",
f" Monthly operating cost: ${monthly_cost:>12,.0f}",
f" Annual projection: ${annual_projection:>12,.0f}",
f" Cumulative ({months_in_production} months): "
f"${cumulative_value:>12,.0f}",
f" ROI: "
f"{((monthly_net_value + monthly_cost) / monthly_cost - 1) * 100:>11.0f}%",
"",
" RECOMMENDATION",
" Continue deployment. Evaluate threshold optimization to",
" capture an additional estimated $3,000--$8,000/month.",
"=" * 65,
]
return "\n".join(lines)
slide = build_executive_slide(
model_name="StreamFlow Churn Predictor",
monthly_net_value=net_value,
annual_projection=net_value * 12,
monthly_cost=operational_cost,
catch_rate=tp / (tp + fn),
flag_accuracy=tp / (tp + fp),
subscribers_flagged=tp + fp,
subscribers_total=n_subscribers,
months_in_production=3,
cumulative_value=net_value * 3,
)
print(slide)
Phase 4: The Hard Questions
The CFO reviews the slide and asks three questions. The data scientist must be prepared for all of them.
Question 1: "How confident are you in the 60% intervention success rate?"
This is the most sensitive assumption. A sensitivity analysis is essential.
success_rates = np.arange(0.20, 0.85, 0.05)
sensitivity = []
for rate in success_rates:
adj_tp_value = weighted_annual_value * rate - total_intervention_cost
adj_net = (tp * adj_tp_value) - (fp * fp_cost) - (fn * fn_cost) - operational_cost
sensitivity.append({
"success_rate": f"{rate:.0%}",
"tp_value": f"${adj_tp_value:.2f}",
"monthly_net": f"${adj_net:,.0f}",
"annual_net": f"${adj_net * 12:,.0f}",
"profitable": "Yes" if adj_net > 0 else "NO",
})
sensitivity_df = pd.DataFrame(sensitivity)
print("Sensitivity: Intervention Success Rate")
print("-" * 65)
print(sensitivity_df.to_string(index=False))
Key Finding --- The model remains profitable at success rates well below 60%. Even if the intervention success rate drops to 30--35%, the model generates positive net value because the cost of missing churners ($220+ each) dominates the cost of unnecessary outreach ($35 each). This is the message for the CFO: the model is robust to pessimistic assumptions.
Question 2: "What happens if the churn rate changes?"
churn_rates = [0.08, 0.10, 0.12, 0.15, 0.18, 0.20]
churn_sensitivity = []
for cr in churn_rates:
n_churners = int(n_subscribers * cr)
# Scale TP/FN proportionally (assuming same recall)
recall = tp / (tp + fn)
adj_tp = int(n_churners * recall)
adj_fn = n_churners - adj_tp
adj_net = (
(adj_tp * tp_expected_value)
- (fp * fp_cost)
- (adj_fn * fn_cost)
- operational_cost
)
churn_sensitivity.append({
"churn_rate": f"{cr:.0%}",
"churners": n_churners,
"caught": adj_tp,
"missed": adj_fn,
"monthly_net": f"${adj_net:,.0f}",
})
print("\nSensitivity: Churn Rate")
print("-" * 55)
print(pd.DataFrame(churn_sensitivity).to_string(index=False))
Question 3: "What is the payback period?"
# One-time costs
model_development_cost = 40000.0 # 2 months of DS time for initial build
infrastructure_setup = 8000.0 # initial cloud setup, Docker, CI/CD
total_initial_investment = model_development_cost + infrastructure_setup
# Monthly net value (from the ROI calculation)
payback_months = total_initial_investment / net_value if net_value > 0 else float("inf")
print(f"\nInitial investment: ${total_initial_investment:>12,.0f}")
print(f"Monthly net value: ${net_value:>12,.0f}")
print(f"Payback period: {payback_months:>12.1f} months")
print(f"12-month net return: ${(net_value * 12 - total_initial_investment):>12,.0f}")
Phase 5: What the Data Science Team Learned
Three months of running the model in production taught the team lessons that no holdout set could:
Lesson 1: The intervention success rate is the lever. The biggest driver of ROI is not model accuracy --- it is how well the customer success team converts flagged subscribers into retained ones. Improving the intervention (better offers, better timing, better messaging) has a larger impact on ROI than improving the model.
Lesson 2: Stakeholders do not care about your model. They care about their workflow. Rachel does not think about confusion matrices. She thinks about how many names are on the weekly list, how many of those names respond to outreach, and how her team's retention numbers look in the quarterly review. The model is invisible. The list is visible.
Lesson 3: The one-slide summary was worth more than the model documentation. The 15-page model card (Chapter 33) is essential for governance. The fairness audit is essential for ethics. But the one-slide ROI summary is what kept the project funded. Data science teams that cannot articulate their value in business terms will lose their budget to teams that can.
Lesson 4: Sensitivity analysis builds trust. When the CFO asked about the intervention success rate, having the sensitivity table ready --- showing the model is profitable even at pessimistic assumptions --- built more trust than any ROC curve ever could. Presenting a range ("$28K--$65K per month depending on assumptions") is more credible than a single number.
Discussion Questions
-
If the intervention success rate dropped to 25%, would you recommend keeping the model in production? What additional information would you need to decide?
-
The customer success team reports that some "false positives" are actually subscribers who were considering canceling but had not yet shown strong signals in the data. Should these be reclassified as true positives? How would this change the ROI calculation?
-
StreamFlow is considering a premium intervention (personal phone call from a senior retention specialist) for the top 50 highest-risk subscribers each month. This costs $75 per call but has a 78% success rate. How would you evaluate whether this tiered intervention strategy improves ROI?
-
The fairness audit from Chapter 33 revealed that the model has lower recall for subscribers on the basic plan. These subscribers have lower annual value. Should the threshold be adjusted by plan tier to equalize recall, even if it reduces overall ROI?
This case study supports Chapter 34: The Business of Data Science. Return to the chapter for full context.