Case Study 1: StreamFlow Churn --- Framing the Right Question

DataField.Dev

Framing the Right Question" type: case-study chapter: 1 part: 1

Case Study 1: StreamFlow Churn --- Framing the Right Question

Background

StreamFlow's VP of Product, Jenna Park, walks into a meeting with the data science team and says: "We're bleeding subscribers. I need a model that tells us who's going to churn."

That sentence contains at least four ambiguities that will determine whether the project succeeds or fails. This case study walks through the process of turning a vague business request into a precise, actionable machine learning problem --- and shows how different framings of the "same" problem lead to fundamentally different solutions.

The Business Context

StreamFlow is a B2C subscription streaming analytics platform. Content creators --- YouTubers, podcasters, Twitch streamers --- pay between $9.99 and $49.99 per month for dashboards, audience insights, and growth analytics.

The company's vital signs:

Metric	Value
Total subscribers	2.4 million
Monthly churn rate	8.2%
Industry benchmark	5-7% monthly
Annual recurring revenue	$180M
Customer acquisition cost	$62
Average revenue per user	$18.40/month
Customer lifetime value (at current churn)	$224
Monthly churners	~197,000

The 8.2% churn rate is above the industry benchmark of 5-7%. Reducing it by even one percentage point would retain an additional 24,000 subscribers per month, worth approximately $5.3M in annual revenue.

The Framing Exercise: Four Versions of "Churn Prediction"

The phrase "predict churn" is not a machine learning problem. It is the starting point for at least four distinct ML problems, each with different targets, features, evaluation criteria, and business implications.

Framing A: Who Will Churn? (Classification)

Target: Binary. churned_within_30_days (1 = canceled subscription within 30 days of the prediction date, 0 = retained).

Observation unit: Subscriber-month. One row per active subscriber per prediction cycle (monthly).

Output: Probability of churn for each subscriber. Rank by risk.

Action: Retention team contacts the top 5% highest-risk subscribers with targeted offers.

Evaluation: AUC-ROC (discrimination), precision at top 5% (are we targeting the right people?).

This is the most common framing and the one we will use for the progressive project. It is straightforward, actionable, and well-suited to standard classification algorithms.

Limitation: It tells you who will churn but not why or when within the 30-day window.

Framing B: When Will They Churn? (Survival Analysis / Regression)

Target: Continuous. days_until_churn for subscribers who eventually cancel; censored observations for those who are still active.

Observation unit: Subscriber at a point in time. One row per subscriber, with a time-to-event target.

Output: Expected time until cancellation. Survival curve per subscriber.

Action: Prioritize intervention by urgency. A subscriber expected to churn in 3 days gets a phone call; one expected to churn in 25 days gets an email.

Evaluation: Concordance index (C-index), calibration of predicted survival curves.

This framing is more informative than binary classification --- it captures not just if but when. The tradeoff: survival models are harder to build, harder to evaluate, and harder to explain to stakeholders.

Framing C: Why Are They Churning? (Causal Inference)

Target: Causal effect of specific interventions (discount, feature unlock, customer success call) on churn probability.

Observation unit: Subscriber exposed to an intervention vs. matched control.

Output: Estimated treatment effect. "Offering a 20% discount reduces churn probability by 4.2 percentage points for subscribers in months 3-6 of their tenure."

Action: Allocate intervention budget to the subscriber-intervention pairs with the highest expected lift.

Evaluation: Average treatment effect (ATE), conditional average treatment effect (CATE), heterogeneous treatment effects.

This is the framing that answers the real business question: not "who will churn?" but "who will churn that we can save?" A subscriber who will churn regardless of intervention is not a useful prediction target --- they waste the retention budget. A subscriber who would have stayed regardless does not need an offer. The valuable prediction is the subscriber whose behavior changes because of the intervention.

Limitation: Requires experimental or quasi-experimental data (A/B tests on retention offers). You cannot estimate causal effects from observational data alone without strong assumptions.

Framing D: What Segments Are at Risk? (Clustering + Descriptive)

Target: None (unsupervised). Discover natural groups among churned subscribers.

Observation unit: Churned subscriber. One row per subscriber who canceled in the last 90 days.

Output: Cluster labels and cluster profiles. "Cluster 1: new subscribers (< 3 months) who never engaged with the advanced analytics features. Cluster 2: long-tenured subscribers who downgraded from Pro to Basic before canceling."

Action: Design different retention strategies for different churn segments. One-size-fits-all offers become tailored interventions.

Evaluation: Silhouette score (cluster quality), business interpretability, actionability of segments.

This framing does not predict anything. It describes. But the descriptions can be more valuable than predictions if they reveal distinct churn pathways that require distinct interventions.

Comparing the Framings

Dimension	A: Classification	B: Survival	C: Causal	D: Clustering
Question	Who?	When?	Why does intervention work?	What types?
Target	Binary	Time-to-event	Treatment effect	None
Data needed	Observational	Observational (censored)	Experimental (A/B test)	Observational (churners)
Complexity	Low	Medium	High	Low-Medium
Actionability	Medium (who to target)	High (urgency)	Very high (ROI per offer)	Medium (strategy design)
Model types	Logistic regression, GBM, RF	Cox regression, Random Survival Forests	Uplift models, causal forests	K-Means, DBSCAN, hierarchical

The right framing depends on:

What data you have. If you have no A/B test data on retention offers, Framing C is not feasible yet (but you should start collecting that data).
What operational capacity you have. If the retention team can handle 5,000 calls per month, you need a ranking (Framing A). If they need segment-specific playbooks, you need Framing D.
What stage the company is at. A company with no churn model starts with Framing A. A company that already has a churn model and wants to optimize retention ROI evolves to Framing C.

The StreamFlow Decision

For this book's progressive project, we adopt Framing A (binary classification) as the primary problem, with elements of Framing D (we will segment churners in Chapter 20 when we cover clustering).

Here is the formal problem specification:

# StreamFlow Churn Prediction --- Problem Specification

PROBLEM_SPEC = {
    # Business context
    'business_question': (
        'Which active subscribers should the retention team target '
        'with offers in the next monthly campaign?'
    ),

    # ML specification
    'task_type': 'binary_classification',
    'target_variable': 'churned_within_30_days',
    'target_definition': (
        '1 if subscriber cancels within 30 calendar days of prediction date; '
        '0 otherwise. "Cancel" means explicit cancellation or non-renewal after '
        'failed payment recovery (3 retry attempts over 14 days).'
    ),
    'observation_unit': 'subscriber-month',
    'prediction_frequency': 'First business day of each month',
    'prediction_horizon': '30 calendar days',

    # Feature constraints
    'feature_cutoff': (
        'All features computed using data available BEFORE the prediction date. '
        'No future information. Billing events processed with a 2-day lag.'
    ),
    'excluded_features': [
        'cancellation_reason (only known post-churn)',
        'exit_survey_responses (only known post-churn)',
        'retention_offer_sent (would create feedback loop)',
    ],

    # Evaluation
    'primary_metric': 'AUC-ROC',
    'secondary_metrics': [
        'Precision at top 5% (operational capacity of retention team)',
        'Recall at top 5% (coverage of actual churners)',
        'Log-loss (probability calibration)',
    ],
    'baseline': 'Predict majority class (no churn): accuracy = 91.8%, AUC = 0.50',

    # Operational constraints
    'max_subscribers_targeted': 120_000,  # 5% of 2.4M
    'retention_offer_budget': '$15/subscriber/month',
    'model_refresh_frequency': 'Monthly retrain on rolling 12-month window',
}

Notice the precision of this specification. The target definition handles edge cases (what counts as a cancellation?). The feature cutoff prevents leakage. The excluded features list catches the most dangerous variables. The evaluation metrics map to both model quality (AUC) and operational reality (precision at the retention team's capacity).

What Goes Wrong Without Good Framing

To appreciate why this level of precision matters, consider three framing failures that have occurred at real companies (names changed):

Failure 1: The infinite time horizon. A SaaS company defined their target as "will this customer ever churn?" Since all customers eventually leave (die, go out of business, switch products), this is trivially true for everyone. The model learned to predict tenure rather than imminent churn, and its predictions had no operational value.

Failure 2: The leaky feature. An e-commerce company included "number of items returned this month" as a feature for predicting "will this customer make a purchase this month?" Returns only happen after purchases --- so the feature was a near-perfect proxy for the target, giving the model a fraudulent AUC of 0.97.

Failure 3: The unactionable prediction. A telecom company built a churn model with excellent performance. The top predictor was "customer is in the final month of their contract." This is true, predictive, and completely useless --- the retention team already knew this. The model added no information beyond what a simple calendar query provided.

Discussion Questions

Framing tradeoffs. StreamFlow's VP of Product originally asked for "a model that tells us who's going to churn." We translated that into four different ML framings. In what situations would Framing B (survival analysis) be clearly preferable to Framing A (classification)? What additional data would you need?
Target definition edge cases. The problem specification defines cancellation as "explicit cancellation or non-renewal after failed payment recovery." What about subscribers who downgrade from Pro ($49.99) to Basic ($9.99) --- is that churn? How would including or excluding downgrades change the model's behavior? Propose a target definition that captures "meaningful churn" and explain your reasoning.
Feature availability. The specification excludes retention_offer_sent as a feature because it would "create a feedback loop." Explain what this means. What happens if you include this feature and then deploy the model to decide who receives retention offers?
Business case sensitivity. The Chapter 1 business case estimated $9.7M in annual net value. Identify the three assumptions that most affect this estimate. For each, describe a scenario where the assumption is optimistic and one where it is pessimistic.
Framing C in practice. Suppose StreamFlow runs an A/B test on retention offers (50% of high-risk subscribers get an offer, 50% do not). After 3 months, you have outcome data. How would you use this data to evolve from Framing A (predict who will churn) to Framing C (predict who will churn AND respond to an offer)? What is the new target variable?
The observation unit problem. Why is "subscriber-month" a better observation unit than "subscriber" for this problem? Construct a concrete example where using "subscriber" as the observation unit would bias the model's predictions.

This case study supports Chapter 1: From Analysis to Prediction. Return to the chapter or continue to Case Study 2: Metro General Hospital.