Case Study 2: StreamFlow Content Recommendations for Churn Reduction

DataField.Dev

Case Study 2: StreamFlow Content Recommendations for Churn Reduction

Background

StreamFlow, the SaaS streaming platform tracked throughout Part IV, has 2.1 million active subscribers and a monthly churn rate of 4.8%. The churn model from Chapter 17 predicts who is at risk. The association rule analysis from Chapter 23 identified which content combinations predict retention. This case study closes the loop: build a recommender system that steers at-risk subscribers toward content combinations that reduce churn.

Elena Vasquez, VP of Content Strategy, frames the objective: "We are not optimizing for clicks or watch time. We are optimizing for subscriber retention. The recommender should surface content that increases the probability of a subscriber staying next month. If that means recommending a documentary to an action-movie fan because the action+documentary combination predicts lower churn, we do it."

This is a fundamentally different objective from a standard recommender. Most recommenders optimize for engagement (clicks, views, time spent). StreamFlow's recommender optimizes for a downstream business outcome (retention), which requires a different evaluation framework.

The Data

import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from surprise import Dataset, Reader, SVD
import matplotlib.pyplot as plt

np.random.seed(42)

# --- Simulate StreamFlow viewing data ---
n_subscribers = 8000
n_shows = 500

# Show metadata
genres = ['action', 'comedy', 'drama', 'documentary', 'sci_fi',
          'thriller', 'romance', 'horror', 'animation', 'true_crime',
          'cooking', 'reality', 'sports', 'kids', 'foreign']

show_genre = np.random.choice(genres, size=n_shows)
show_descriptions = []
desc_pools = {
    'action': ['intense battles high stakes adrenaline military combat'],
    'comedy': ['funny hilarious sitcom laughs witty humor'],
    'drama': ['emotional powerful character-driven storytelling compelling'],
    'documentary': ['informative educational real-world investigation reporting'],
    'sci_fi': ['futuristic technology space exploration alien worlds'],
    'thriller': ['suspense mystery tension psychological twists dark'],
    'romance': ['love relationship heartfelt emotional chemistry'],
    'horror': ['scary supernatural terrifying haunted psychological'],
    'animation': ['animated colorful creative family imaginative'],
    'true_crime': ['investigation murder mystery forensic detective'],
    'cooking': ['culinary chef recipes competition kitchen gourmet'],
    'reality': ['competition lifestyle entertainment unscripted social'],
    'sports': ['athletic competition training documentary team'],
    'kids': ['children educational fun animated adventure'],
    'foreign': ['international subtitled cultural diverse cinema'],
}

for i in range(n_shows):
    g = show_genre[i]
    base = desc_pools[g][0]
    extra = np.random.choice(['award-winning', 'binge-worthy', 'critically-acclaimed',
                               'fan-favorite', 'new-release', 'classic', 'trending',
                               'exclusive', 'limited-series', 'multi-season'], size=2,
                              replace=False)
    show_descriptions.append(f"{g} {base} {' '.join(extra)}")

# Define "sticky" genre combinations (from Chapter 23 association rules)
sticky_combos = [
    frozenset(['drama', 'documentary']),
    frozenset(['action', 'sci_fi']),
    frozenset(['comedy', 'animation', 'kids']),
    frozenset(['thriller', 'true_crime']),
    frozenset(['cooking', 'documentary', 'reality']),
    frozenset(['drama', 'foreign']),
]

# Generate viewing data with churn labels
n_latent = 6
sub_factors = np.random.randn(n_subscribers, n_latent)
show_factors = np.random.randn(n_shows, n_latent)

views = []
for sub in range(n_subscribers):
    # Number of shows watched: varies by engagement
    n_watched = max(1, int(np.random.exponential(scale=15)))

    scores = sub_factors[sub] @ show_factors.T + np.random.randn(n_shows) * 0.3
    probs = np.exp(scores - scores.max())
    probs /= probs.sum()

    watched = np.random.choice(n_shows, size=min(n_watched, 60),
                                replace=False, p=probs)

    for show_id in watched:
        # Engagement score (1-5, like a rating)
        raw = sub_factors[sub] @ show_factors[show_id] + np.random.randn() * 0.4
        engagement = np.clip(round(raw * 0.5 + 3.5), 1, 5)
        views.append({
            'subscriber_id': sub,
            'show_id': show_id,
            'engagement': float(engagement)
        })

views_df = pd.DataFrame(views)
views_df = views_df.drop_duplicates(
    subset=['subscriber_id', 'show_id']
).reset_index(drop=True)

# Generate churn labels based on:
# 1. Total engagement (more watching = less churn)
# 2. Genre diversity (more genres = less churn)
# 3. Sticky combos (watching sticky pairs = less churn)
sub_stats = views_df.groupby('subscriber_id').agg(
    n_shows=('show_id', 'count'),
    mean_engagement=('engagement', 'mean'),
    shows_watched=('show_id', list)
).reset_index()

def compute_churn_probability(row):
    """Lower probability = more likely to retain."""
    base_prob = 0.15

    # More shows watched -> lower churn
    base_prob -= min(row['n_shows'] * 0.003, 0.08)

    # Higher engagement -> lower churn
    base_prob -= (row['mean_engagement'] - 3) * 0.02

    # Genre diversity -> lower churn
    genres_watched = set(show_genre[s] for s in row['shows_watched'] if s < n_shows)
    base_prob -= min(len(genres_watched) * 0.01, 0.05)

    # Sticky combos -> lower churn (strongest effect)
    for combo in sticky_combos:
        if combo.issubset(genres_watched):
            base_prob -= 0.04

    return np.clip(base_prob, 0.01, 0.30)

sub_stats['churn_prob'] = sub_stats.apply(compute_churn_probability, axis=1)
sub_stats['churned'] = (np.random.rand(len(sub_stats)) < sub_stats['churn_prob']).astype(int)

print(f"Viewing data: {len(views_df)} subscriber-show pairs")
print(f"Subscribers: {views_df['subscriber_id'].nunique()}")
print(f"Shows: {views_df['show_id'].nunique()}")
print(f"Churn rate: {sub_stats['churned'].mean():.1%}")
print(f"\nChurn rate by engagement level:")
engagement_bins = pd.cut(sub_stats['n_shows'], bins=[0, 5, 10, 20, 100],
                          labels=['1-5', '6-10', '11-20', '21+'])
print(sub_stats.groupby(engagement_bins, observed=True)['churned'].mean().round(3))

Practical Note --- The churn rate varies by engagement level, confirming the hypothesis: subscribers who watch more are less likely to churn. But the question is what to recommend, not just how much. The sticky genre combinations from Chapter 23 provide the "what."

Step 1: Identify At-Risk Subscribers

# Subscribers at risk: top quartile of churn probability
# In production, this comes from the churn model (Chapter 17)
at_risk = sub_stats[sub_stats['churn_prob'] > sub_stats['churn_prob'].quantile(0.75)]
print(f"At-risk subscribers: {len(at_risk)} ({len(at_risk)/len(sub_stats):.0%})")
print(f"Mean churn prob (at-risk): {at_risk['churn_prob'].mean():.3f}")
print(f"Mean churn prob (others): "
      f"{sub_stats[~sub_stats['subscriber_id'].isin(at_risk['subscriber_id'])]['churn_prob'].mean():.3f}")

# What genres are at-risk subscribers watching?
at_risk_subs = set(at_risk['subscriber_id'])
at_risk_views = views_df[views_df['subscriber_id'].isin(at_risk_subs)]

at_risk_genres = []
for _, row in at_risk_views.iterrows():
    sid = int(row['show_id'])
    if sid < n_shows:
        at_risk_genres.append(show_genre[sid])

genre_dist = pd.Series(at_risk_genres).value_counts(normalize=True)
print(f"\nTop genres among at-risk subscribers:")
print(genre_dist.head(10).round(3))

Step 2: Standard Recommender (Engagement-Optimized)

First, build a standard SVD recommender that optimizes for predicted engagement. This is what most teams would build.

# Train-test split
def train_test_split_per_user(df, test_frac=0.2, random_state=42):
    rng = np.random.RandomState(random_state)
    train_parts, test_parts = [], []
    for uid, group in df.groupby('subscriber_id'):
        n_test = max(1, int(len(group) * test_frac))
        test_idx = rng.choice(group.index, size=n_test, replace=False)
        train_parts.append(group.drop(test_idx))
        test_parts.append(group.loc[test_idx])
    return pd.concat(train_parts), pd.concat(test_parts)

train_df, test_df = train_test_split_per_user(views_df)

test_items = test_df.groupby('subscriber_id')['show_id'].apply(set).to_dict()
train_items = train_df.groupby('subscriber_id')['show_id'].apply(set).to_dict()

# Standard SVD
reader = Reader(rating_scale=(1, 5))
train_surprise = Dataset.load_from_df(
    train_df[['subscriber_id', 'show_id', 'engagement']], reader
)
trainset = train_surprise.build_full_trainset()

svd_standard = SVD(n_factors=50, n_epochs=30, lr_all=0.005,
                    reg_all=0.02, random_state=42)
svd_standard.fit(trainset)

# Generate standard recommendations
all_shows = set(views_df['show_id'].unique())
standard_recs = {}
for uid in test_items:
    seen = train_items.get(uid, set())
    unseen = all_shows - seen
    preds = [(sid, svd_standard.predict(uid=uid, iid=sid).est) for sid in unseen]
    preds.sort(key=lambda x: x[1], reverse=True)
    standard_recs[uid] = [sid for sid, _ in preds[:20]]

Step 3: Retention-Aware Recommender

The retention-aware recommender modifies the ranking to boost shows from genres that form sticky combinations with the subscriber's existing viewing history.

def get_subscriber_genres(uid, train_df, show_genre):
    """Get the set of genres a subscriber has watched."""
    history = train_df[train_df['subscriber_id'] == uid]['show_id'].values
    return set(show_genre[sid] for sid in history if sid < len(show_genre))

def retention_boost(uid, show_id, train_df, show_genre, sticky_combos):
    """
    Compute a retention boost score for a show.

    If the show's genre completes a sticky combination with the subscriber's
    existing genre history, it gets a boost. The boost is higher for shows
    that complete a combo the subscriber hasn't activated yet.

    Parameters
    ----------
    uid : int
        Subscriber ID.
    show_id : int
        Candidate show.
    train_df : DataFrame
        Training data.
    show_genre : ndarray
        Genre per show.
    sticky_combos : list of frozensets
        Genre combinations that predict retention.

    Returns
    -------
    float
        Boost score (0.0 if no sticky combo completed, up to 1.0).
    """
    current_genres = get_subscriber_genres(uid, train_df, show_genre)

    if show_id >= len(show_genre):
        return 0.0

    candidate_genre = show_genre[show_id]
    extended_genres = current_genres | {candidate_genre}

    boost = 0.0
    for combo in sticky_combos:
        # Is this combo not yet activated but would be with this show?
        if not combo.issubset(current_genres) and combo.issubset(extended_genres):
            boost += 0.5  # high boost for completing a new combo
        elif combo.issubset(current_genres) and candidate_genre in combo:
            boost += 0.1  # mild boost for reinforcing an active combo

    return min(boost, 1.0)


def retention_aware_recommend(uid, train_df, svd_model, show_genre,
                               sticky_combos, all_shows, n=20,
                               retention_weight=0.3):
    """
    Hybrid recommender that blends engagement prediction with retention boost.

    score = (1 - retention_weight) * cf_score + retention_weight * retention_boost

    Parameters
    ----------
    uid : int
        Subscriber ID.
    train_df : DataFrame
        Training data.
    svd_model : surprise SVD
        Trained engagement model.
    show_genre : ndarray
        Genre per show.
    sticky_combos : list of frozensets
        Sticky genre combinations.
    all_shows : set
        All show IDs.
    n : int
        Number of recommendations.
    retention_weight : float
        How much to weight the retention boost (0 = pure engagement, 1 = pure retention).

    Returns
    -------
    list of int
        Recommended show IDs.
    """
    seen = set(train_df[train_df['subscriber_id'] == uid]['show_id'].values)
    candidates = all_shows - seen

    scored = []
    for sid in candidates:
        # CF engagement score (normalized to 0-1)
        cf_pred = svd_model.predict(uid=uid, iid=sid).est
        cf_score = (cf_pred - 1) / 4

        # Retention boost
        ret_boost = retention_boost(uid, sid, train_df, show_genre, sticky_combos)

        # Blend
        final_score = (1 - retention_weight) * cf_score + retention_weight * ret_boost
        scored.append((sid, final_score))

    scored.sort(key=lambda x: x[1], reverse=True)
    return [sid for sid, _ in scored[:n]]


# Generate retention-aware recommendations
retention_recs = {}
for uid in test_items:
    retention_recs[uid] = retention_aware_recommend(
        uid, train_df, svd_standard, show_genre, sticky_combos, all_shows,
        n=20, retention_weight=0.3
    )

print("Retention-aware recommendations generated for all test users.")

Step 4: Evaluate Both Recommenders

Standard Ranking Metrics

def hit_rate_at_k(recs, test_items, k=10):
    hits, total = 0, 0
    for uid in test_items:
        if uid not in recs:
            continue
        if set(recs[uid][:k]) & test_items[uid]:
            hits += 1
        total += 1
    return hits / total if total > 0 else 0.0

def ndcg_at_k(recs, test_items, k=10):
    scores = []
    for uid in test_items:
        if uid not in recs:
            continue
        relevant = test_items[uid]
        ranked = recs[uid][:k]
        dcg = sum(1.0 / np.log2(i + 2) for i, item in enumerate(ranked)
                  if item in relevant)
        n_rel = min(len(relevant), k)
        idcg = sum(1.0 / np.log2(i + 2) for i in range(n_rel))
        scores.append(dcg / idcg if idcg > 0 else 0.0)
    return np.mean(scores) if scores else 0.0

print("Ranking Metric Comparison:")
print(f"{'Method':<30} {'HR@5':>8} {'HR@10':>8} {'NDCG@10':>8}")
print("-" * 55)
for name, recs in [('Standard SVD', standard_recs),
                    ('Retention-Aware Hybrid', retention_recs)]:
    hr5 = hit_rate_at_k(recs, test_items, k=5)
    hr10 = hit_rate_at_k(recs, test_items, k=10)
    ndcg = ndcg_at_k(recs, test_items, k=10)
    print(f"{name:<30} {hr5:>8.4f} {hr10:>8.4f} {ndcg:>8.4f}")

Practical Note --- The retention-aware recommender may score slightly lower on standard ranking metrics. This is expected and acceptable. It is intentionally sacrificing some engagement optimization to promote genre diversity that predicts retention. The question is whether the retention benefit outweighs the engagement cost.

Genre Diversity Analysis

def recommendation_genre_diversity(recs, show_genre, k=10):
    """
    Measure genre diversity: average number of unique genres
    in the top-K recommendations per user.
    """
    diversities = []
    for uid, rec_list in recs.items():
        genres_in_recs = set()
        for sid in rec_list[:k]:
            if sid < len(show_genre):
                genres_in_recs.add(show_genre[sid])
        diversities.append(len(genres_in_recs))
    return np.mean(diversities)

def sticky_combo_coverage(recs, show_genre, train_df, sticky_combos, k=10):
    """
    Measure what fraction of users would activate at least one new
    sticky combo if they watched all top-K recommended shows.
    """
    activated = 0
    total = 0

    for uid, rec_list in recs.items():
        current_genres = set()
        history = train_df[train_df['subscriber_id'] == uid]['show_id'].values
        for sid in history:
            if sid < len(show_genre):
                current_genres.add(show_genre[sid])

        # Genres if user watches all top-K recs
        extended_genres = current_genres.copy()
        for sid in rec_list[:k]:
            if sid < len(show_genre):
                extended_genres.add(show_genre[sid])

        # Did we activate a new combo?
        new_combo = False
        for combo in sticky_combos:
            if not combo.issubset(current_genres) and combo.issubset(extended_genres):
                new_combo = True
                break

        if new_combo:
            activated += 1
        total += 1

    return activated / total if total > 0 else 0.0

print("\nDiversity and Retention Metrics:")
print(f"{'Method':<30} {'Avg Genres@10':>15} {'Sticky Combo %':>15}")
print("-" * 62)
for name, recs in [('Standard SVD', standard_recs),
                    ('Retention-Aware Hybrid', retention_recs)]:
    div = recommendation_genre_diversity(recs, show_genre, k=10)
    sticky_pct = sticky_combo_coverage(recs, show_genre, train_df, sticky_combos, k=10)
    print(f"{name:<30} {div:>15.2f} {sticky_pct:>14.1%}")

Step 5: Focus on At-Risk Subscribers

The real test: how do the recommenders perform for subscribers the churn model identifies as at-risk?

# Filter to at-risk subscribers
at_risk_test = {uid: test_items[uid] for uid in at_risk['subscriber_id']
                if uid in test_items}

print(f"\nAt-Risk Subscriber Analysis ({len(at_risk_test)} subscribers):")
print(f"{'Method':<30} {'NDCG@10':>10} {'Diversity':>12} {'Sticky %':>12}")
print("-" * 66)

for name, recs in [('Standard SVD', standard_recs),
                    ('Retention-Aware Hybrid', retention_recs)]:
    at_risk_recs = {uid: recs[uid] for uid in at_risk_test if uid in recs}

    ndcg = ndcg_at_k(at_risk_recs, at_risk_test, k=10)
    div = recommendation_genre_diversity(at_risk_recs, show_genre, k=10)
    sticky = sticky_combo_coverage(at_risk_recs, show_genre, train_df, sticky_combos, k=10)

    print(f"{name:<30} {ndcg:>10.4f} {div:>12.2f} {sticky:>11.1%}")

Step 6: Simulated Retention Impact

# Estimate the retention impact of the sticky combo activation
# Assumption: activating a sticky combo reduces churn probability by 4 percentage points
# (from the data generation model)

def estimate_retention_impact(recs, train_df, sub_stats, show_genre,
                               sticky_combos, churn_reduction_per_combo=0.04, k=10):
    """
    Estimate how many subscribers would be retained if they watched the
    recommended shows and activated sticky combos.
    """
    retained = 0
    total_at_risk = 0

    for uid in sub_stats[sub_stats['churned'] == 1]['subscriber_id']:
        if uid not in recs:
            continue

        total_at_risk += 1
        current_genres = set()
        history = train_df[train_df['subscriber_id'] == uid]['show_id'].values
        for sid in history:
            if sid < len(show_genre):
                current_genres.add(show_genre[sid])

        extended_genres = current_genres.copy()
        for sid in recs[uid][:k]:
            if sid < len(show_genre):
                extended_genres.add(show_genre[sid])

        # Count new sticky combos activated
        new_combos = 0
        for combo in sticky_combos:
            if not combo.issubset(current_genres) and combo.issubset(extended_genres):
                new_combos += 1

        # Estimate retention probability increase
        retention_increase = new_combos * churn_reduction_per_combo
        if np.random.rand() < retention_increase:
            retained += 1

    return retained, total_at_risk

np.random.seed(42)  # for reproducibility of simulation

for name, recs in [('Standard SVD', standard_recs),
                    ('Retention-Aware Hybrid', retention_recs)]:
    retained, total = estimate_retention_impact(
        recs, train_df, sub_stats, show_genre, sticky_combos, k=10
    )
    print(f"{name}:")
    print(f"  Churned subscribers: {total}")
    print(f"  Estimated saves: {retained} ({retained/max(total,1):.1%})")

    # Scale to full platform
    full_scale_churners = int(2_100_000 * 0.048)
    full_scale_saves = int(full_scale_churners * retained / max(total, 1))
    monthly_revenue_per_sub = 14.99
    print(f"  At platform scale (2.1M subs, 4.8% churn):")
    print(f"    Monthly churners: {full_scale_churners:,}")
    print(f"    Estimated saves: {full_scale_saves:,}")
    print(f"    Monthly revenue saved: ${full_scale_saves * monthly_revenue_per_sub:,.0f}")
    print()

Elena's Recommendation to the CEO

Elena presents the retention-aware recommender to the StreamFlow leadership team:

The standard recommender optimizes for engagement but misses retention. It recommends more of what subscribers already watch, reinforcing narrow viewing patterns that correlate with higher churn.
The retention-aware recommender deliberately promotes genre diversity, specifically the genre combinations (from Chapter 23's association rule analysis) that predict lower churn. It sacrifices some short-term engagement prediction accuracy to increase the probability that subscribers activate sticky viewing patterns.
For at-risk subscribers, the retention-aware model increases sticky combo activation significantly compared to the standard model. The estimated churn reduction, if even partially realized, would save substantial monthly recurring revenue.
Proposed A/B test design: - Population: Subscribers identified as at-risk by the churn model (top 25% of churn probability) - Control: Standard SVD recommender (engagement-optimized) - Treatment: Retention-aware hybrid recommender - Primary metric: 30-day churn rate - Secondary metrics: Weekly active hours, genre diversity of consumption, NPS survey - Duration: 8 weeks (to capture a full churn cycle with statistical power) - Sample size: 50,000 per arm (powered to detect a 0.5 percentage point difference in churn rate)
If the A/B test confirms a 0.5+ percentage point reduction in churn among at-risk subscribers, the lifetime value impact justifies the engineering investment by a wide margin.

Core Principle --- The StreamFlow case illustrates a critical distinction: optimizing for the metric that is easiest to measure (engagement) is not the same as optimizing for the metric that matters most to the business (retention). The retention-aware recommender makes an explicit tradeoff --- slightly less engaging recommendations in exchange for viewing patterns that keep subscribers on the platform. This tradeoff is invisible to standard ranking metrics but critical to the P&L.

Connection to Part IV

This case study threads through all five chapters of Part IV:

Chapter 20 (Clustering): Subscriber segments with different churn rates informed the at-risk targeting
Chapter 21 (Dimensionality Reduction): Visualized the separation between retained and churned subscribers in feature space
Chapter 22 (Anomaly Detection): Flagged unusual usage pattern drops as early churn signals
Chapter 23 (Association Rules): Identified the sticky genre combinations that predict retention
Chapter 24 (Recommender Systems): Built the system that translates all of the above into personalized content suggestions

Each chapter contributed a piece of the puzzle. The recommender is the piece that faces the subscriber directly.

This case study applies Chapter 24 concepts to StreamFlow's retention-optimized content recommendations. Return to the chapter or review Case Study 1 --- ShopSmart Product Recommendations.