Case Study 2: StreamFlow Sticky Content Combinations

DataField.Dev

Case Study 2: StreamFlow Sticky Content Combinations

Background

StreamFlow, the streaming platform from earlier chapters, has 2.1 million active subscribers and a monthly churn rate of 4.8%. The churn model (Chapter 17) identifies who is likely to churn. But the Content Strategy team has a different question: Which combinations of content genres, when watched together, predict subscriber retention?

The hypothesis comes from Elena Vasquez, VP of Content Strategy: "We know that subscribers who watch more hours per month are less likely to churn. But we suspect it is not just how much they watch --- it is what combination they watch. A subscriber who watches only one genre might run out of content they like. A subscriber who watches across specific genre pairs might discover enough variety to stick around."

If the hypothesis is correct, the implication is concrete: StreamFlow can proactively recommend content from a second genre to single-genre subscribers, nudging them into a stickier viewing pattern. The business frames this as "sticky content combinations" --- pairs and triples of genres that, when a subscriber watches all of them, predict lower churn.

This is not a standard market basket problem. The "transaction" is a subscriber's monthly viewing history. The "items" are genres watched. The metric of interest is not just co-occurrence frequency (support) but whether co-occurrence is associated with retention. Association rules find the co-occurrence patterns; a follow-up analysis checks whether those patterns correlate with lower churn.

The Data

StreamFlow logs every viewing session. The Content Analytics team aggregates this into a monthly genre profile per subscriber: which genres did the subscriber watch at least 2 hours of in the past 30 days? The 2-hour threshold filters out accidental clicks and trailers.

import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import fpgrowth, association_rules
from mlxtend.preprocessing import TransactionEncoder
import matplotlib.pyplot as plt

np.random.seed(42)

n_subscribers = 30_000
genres = [
    'action', 'comedy', 'drama', 'documentary', 'sci_fi',
    'thriller', 'romance', 'horror', 'animation', 'reality_tv',
    'true_crime', 'cooking', 'sports', 'kids', 'foreign_language'
]

# Define genre affinities (genres that tend to co-occur in viewing)
genre_affinities = {
    'action': ['sci_fi', 'thriller'],
    'comedy': ['romance', 'animation', 'reality_tv'],
    'drama': ['documentary', 'thriller', 'foreign_language'],
    'documentary': ['true_crime', 'cooking'],
    'sci_fi': ['action', 'animation'],
    'thriller': ['true_crime', 'horror'],
    'romance': ['comedy', 'drama'],
    'horror': ['thriller'],
    'true_crime': ['documentary', 'thriller'],
    'cooking': ['reality_tv', 'documentary'],
    'kids': ['animation', 'comedy'],
}

# Define "sticky" combinations (associated with lower churn)
sticky_combos = [
    frozenset(['drama', 'documentary']),
    frozenset(['action', 'sci_fi']),
    frozenset(['comedy', 'kids', 'animation']),
    frozenset(['thriller', 'true_crime']),
    frozenset(['cooking', 'documentary', 'reality_tv']),
    frozenset(['drama', 'foreign_language']),
]

# Generate subscriber viewing profiles
subscribers = []
for i in range(n_subscribers):
    # Number of genres watched (1-6, skewed toward 2-3)
    n_genres = np.random.choice([1, 2, 3, 4, 5, 6], p=[0.15, 0.30, 0.25, 0.15, 0.10, 0.05])
    primary = np.random.choice(genres)
    watched = {primary}

    # Add affinity genres
    for _ in range(n_genres - 1):
        if primary in genre_affinities:
            candidates = genre_affinities[primary]
            if np.random.random() < 0.6:
                watched.add(np.random.choice(candidates))
            else:
                watched.add(np.random.choice(genres))
        else:
            watched.add(np.random.choice(genres))

    # Determine churn (lower for sticky combos)
    base_churn_prob = 0.12 - 0.015 * len(watched)  # more genres = lower churn
    base_churn_prob = max(base_churn_prob, 0.01)

    for combo in sticky_combos:
        if combo.issubset(watched):
            base_churn_prob *= 0.4  # significant retention boost

    churned = np.random.random() < base_churn_prob

    subscribers.append({
        'subscriber_id': f'SUB_{i:06d}',
        'genres_watched': list(watched),
        'n_genres': len(watched),
        'churned': churned
    })

sub_df = pd.DataFrame(subscribers)
print(f"Subscribers: {len(sub_df):,}")
print(f"Overall churn rate: {sub_df['churned'].mean():.1%}")
print(f"Avg genres per subscriber: {sub_df['n_genres'].mean():.1f}")

Step 1: Exploratory Analysis

# Genre frequency
genre_counts = {}
for genres_list in sub_df['genres_watched']:
    for g in genres_list:
        genre_counts[g] = genre_counts.get(g, 0) + 1

genre_freq = pd.Series(genre_counts).sort_values(ascending=False)
genre_pct = (genre_freq / len(sub_df) * 100).round(1)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].barh(genre_freq.index[::-1], genre_freq.values[::-1],
             color='steelblue', edgecolor='black')
axes[0].set_xlabel('Subscriber Count')
axes[0].set_title('Genre Frequency (subscribers watching 2+ hrs/month)')

# Churn rate by number of genres
churn_by_n = sub_df.groupby('n_genres')['churned'].agg(['mean', 'count'])
churn_by_n.columns = ['churn_rate', 'n_subscribers']

axes[1].bar(churn_by_n.index, churn_by_n['churn_rate'],
            color='coral', edgecolor='black')
axes[1].set_xlabel('Number of Genres Watched')
axes[1].set_ylabel('Churn Rate')
axes[1].set_title('Churn Rate by Genre Breadth')
axes[1].axhline(y=sub_df['churned'].mean(), color='black',
                linestyle='--', label='Overall churn')
axes[1].legend()

plt.tight_layout()
plt.show()

print("\nChurn rate by number of genres:")
print(churn_by_n.to_string())

Key Observation --- Churn rate drops as genre breadth increases. But this is the obvious result --- more engagement correlates with lower churn. The interesting question is whether specific genre combinations predict retention beyond what breadth alone explains.

Step 2: Association Rules on Genre Viewing Patterns

# Create the transaction data (list of genre lists)
genre_transactions = sub_df['genres_watched'].tolist()

# One-hot encode
te = TransactionEncoder()
te_array = te.fit(genre_transactions).transform(genre_transactions)
genre_basket = pd.DataFrame(te_array, columns=te.columns_)

print(f"Genre basket shape: {genre_basket.shape}")

# Mine frequent genre combinations
freq_genres = fpgrowth(
    genre_basket,
    min_support=0.02,    # Genre combo appears in at least 2% of subscribers
    use_colnames=True
)

print(f"Frequent genre sets (min_support=0.02): {len(freq_genres)}")
print(f"  1-genre: {len(freq_genres[freq_genres['itemsets'].apply(len) == 1])}")
print(f"  2-genre: {len(freq_genres[freq_genres['itemsets'].apply(len) == 2])}")
print(f"  3-genre: {len(freq_genres[freq_genres['itemsets'].apply(len) == 3])}")

# Generate rules
genre_rules = association_rules(
    freq_genres,
    metric="lift",
    min_threshold=1.2
)

print(f"\nAssociation rules (lift > 1.2): {len(genre_rules)}")

# Top rules by lift
top_genre_rules = genre_rules.sort_values('lift', ascending=False).head(15)
for _, row in top_genre_rules.iterrows():
    ant = ', '.join(sorted(row['antecedents']))
    con = ', '.join(sorted(row['consequents']))
    print(f"  {{{ant}}} -> {{{con}}}  "
          f"sup={row['support']:.3f}  conf={row['confidence']:.2f}  "
          f"lift={row['lift']:.2f}")

Step 3: Linking Rules to Churn Outcomes

This is where the analysis goes beyond standard market basket. For each frequent genre combination, compute the churn rate of subscribers who watch that combination versus those who do not.

def compute_combo_churn(sub_df, genre_sets_df, min_subscribers=100):
    """
    For each frequent itemset, compute the churn rate of subscribers
    who watch that combination vs. those who do not.

    Returns DataFrame with churn lift (lower = stickier).
    """
    results = []

    for _, row in genre_sets_df.iterrows():
        combo = row['itemsets']
        if len(combo) < 2:  # skip single genres
            continue

        # Identify subscribers who watch all genres in the combo
        mask = pd.Series([True] * len(sub_df), index=sub_df.index)
        for genre in combo:
            genre_watchers = genre_basket[genre] if genre in genre_basket.columns else pd.Series([False] * len(sub_df))
            mask = mask & genre_watchers.values

        n_watchers = mask.sum()
        if n_watchers < min_subscribers:
            continue

        churn_watchers = sub_df.loc[mask, 'churned'].mean()
        churn_others = sub_df.loc[~mask, 'churned'].mean()

        # Churn ratio: < 1 means the combo is "sticky"
        churn_ratio = churn_watchers / churn_others if churn_others > 0 else np.nan

        results.append({
            'genre_combo': combo,
            'n_subscribers': n_watchers,
            'churn_rate': churn_watchers,
            'churn_rate_others': churn_others,
            'churn_ratio': churn_ratio,
            'support': row['support']
        })

    return pd.DataFrame(results).sort_values('churn_ratio')


combo_churn = compute_combo_churn(sub_df, freq_genres, min_subscribers=50)

print("=== Genre Combinations Ranked by Retention Impact ===")
print("(churn_ratio < 1.0 = lower churn than non-watchers)\n")

for _, row in combo_churn.head(15).iterrows():
    genres_str = ', '.join(sorted(row['genre_combo']))
    print(f"  {{{genres_str}}}")
    print(f"    Subscribers: {row['n_subscribers']:,}  "
          f"Churn: {row['churn_rate']:.1%} vs {row['churn_rate_others']:.1%}  "
          f"Ratio: {row['churn_ratio']:.2f}")

Key Insight --- Not all high-support genre pairs are sticky. Some genre combinations co-occur frequently but do not predict lower churn. The sticky combinations are those with both high lift (in the association rules sense) and low churn ratio (in the retention sense). The intersection of these two properties identifies the actionable patterns.

Step 4: Identifying Actionable Sticky Combinations

# Merge association rule metrics with churn data
# For 2-genre combos, find the corresponding rules
actionable = []

for _, churn_row in combo_churn.iterrows():
    if churn_row['churn_ratio'] >= 0.85:  # only combos with meaningful retention lift
        continue
    if churn_row['n_subscribers'] < 100:
        continue

    combo = churn_row['genre_combo']

    # Find matching association rules
    matching_rules = genre_rules[
        (genre_rules['antecedents'] | genre_rules['consequents']) == combo
    ]

    best_lift = matching_rules['lift'].max() if len(matching_rules) > 0 else np.nan

    actionable.append({
        'genre_combo': combo,
        'n_subscribers': churn_row['n_subscribers'],
        'churn_rate': churn_row['churn_rate'],
        'churn_ratio': churn_row['churn_ratio'],
        'support': churn_row['support'],
        'max_lift': best_lift,
        'combo_size': len(combo)
    })

actionable_df = pd.DataFrame(actionable).sort_values('churn_ratio')

print("=== Actionable Sticky Combinations ===")
print("(churn_ratio < 0.85 AND n_subscribers >= 100)\n")

for _, row in actionable_df.iterrows():
    genres_str = ', '.join(sorted(row['genre_combo']))
    print(f"  {{{genres_str}}}")
    print(f"    Subscribers: {row['n_subscribers']:,}  "
          f"Churn: {row['churn_rate']:.1%}  "
          f"Ratio: {row['churn_ratio']:.2f}  "
          f"Lift: {row['max_lift']:.2f}")

Step 5: Building the Recommendation Strategy

The Content Strategy team translates sticky combinations into a recommendation plan:

def recommend_sticky_genres(subscriber_genres, actionable_combos, top_n=3):
    """
    For a subscriber who watches a set of genres, recommend additional
    genres that would complete a sticky combination.

    Parameters
    ----------
    subscriber_genres : set of str, genres the subscriber currently watches
    actionable_combos : pd.DataFrame with 'genre_combo' and 'churn_ratio'
    top_n : int, max recommendations

    Returns
    -------
    list of dicts with 'recommended_genre', 'completes_combo', 'churn_ratio'
    """
    recommendations = []

    for _, row in actionable_combos.iterrows():
        combo = row['genre_combo']
        missing = combo - subscriber_genres
        present = combo & subscriber_genres

        # Recommend if subscriber watches at least one genre in the combo
        # but is missing one or two genres
        if len(present) >= 1 and 0 < len(missing) <= 2:
            for genre in missing:
                recommendations.append({
                    'recommended_genre': genre,
                    'completes_combo': combo,
                    'churn_ratio': row['churn_ratio'],
                    'already_watching': present
                })

    if not recommendations:
        return []

    # Deduplicate: keep recommendation with lowest churn_ratio
    rec_df = pd.DataFrame(recommendations)
    rec_df = rec_df.sort_values('churn_ratio').drop_duplicates(
        subset='recommended_genre', keep='first'
    ).head(top_n)

    return rec_df.to_dict('records')


# Example subscribers
example_subscribers = [
    {'id': 'SUB_001', 'genres': {'drama'}},
    {'id': 'SUB_002', 'genres': {'action'}},
    {'id': 'SUB_003', 'genres': {'comedy', 'kids'}},
    {'id': 'SUB_004', 'genres': {'cooking'}},
    {'id': 'SUB_005', 'genres': {'thriller'}},
]

print("=== Genre Recommendations for Retention ===\n")
for sub in example_subscribers:
    recs = recommend_sticky_genres(sub['genres'], actionable_df)
    print(f"Subscriber {sub['id']} watches: {sub['genres']}")
    if recs:
        for r in recs:
            combo_str = ', '.join(sorted(r['completes_combo']))
            print(f"  -> Recommend: {r['recommended_genre']}  "
                  f"(completes {{{combo_str}}}, churn_ratio={r['churn_ratio']:.2f})")
    else:
        print("  -> No sticky-combo recommendations available")
    print()

Step 6: Measuring Impact

# Simulate the intervention: for single-genre subscribers, nudge
# them toward a sticky second genre. Measure churn difference.

single_genre_subs = sub_df[sub_df['n_genres'] == 1].copy()
print(f"Single-genre subscribers: {len(single_genre_subs):,}")
print(f"Their churn rate: {single_genre_subs['churned'].mean():.1%}")

# Among multi-genre subscribers, compare sticky vs. non-sticky combos
multi_genre_subs = sub_df[sub_df['n_genres'] >= 2].copy()

def has_sticky_combo(genres_list, sticky_combos_list):
    watched = set(genres_list)
    for combo in sticky_combos_list:
        if combo.issubset(watched):
            return True
    return False

multi_genre_subs['has_sticky'] = multi_genre_subs['genres_watched'].apply(
    lambda g: has_sticky_combo(g, sticky_combos)
)

sticky_churn = multi_genre_subs.groupby('has_sticky')['churned'].agg(['mean', 'count'])
sticky_churn.columns = ['churn_rate', 'n_subscribers']
sticky_churn.index = ['No Sticky Combo', 'Has Sticky Combo']

print("\n=== Churn by Sticky Combo Presence (Multi-Genre Subscribers) ===")
print(sticky_churn.to_string())
print(f"\nRelative churn reduction: "
      f"{1 - sticky_churn.loc['Has Sticky Combo', 'churn_rate'] / sticky_churn.loc['No Sticky Combo', 'churn_rate']:.1%}")

Outcome

Elena Vasquez's team used the analysis to design a six-week A/B test. The treatment group received personalized genre nudges based on the sticky-combination recommendations. The control group received StreamFlow's standard "popular this week" recommendations.

Metric	Control (Popular Recs)	Treatment (Sticky Combos)
30-day genre breadth	1.8 genres	2.3 genres
Churn rate (6-week)	9.2%	6.8%
Hours watched/week	7.1	8.4
Recommendation CTR	4.1%	6.9%

The churn reduction from 9.2% to 6.8% represented approximately 5,000 fewer cancellations over six weeks. At StreamFlow's average revenue per subscriber of $13.99/month, that is roughly $70,000 in retained monthly revenue --- or $840,000 annualized --- from a single recommendation change.

Three findings that surprised the Content Strategy team:

Drama + documentary was the stickiest pair. The team had expected action + sci_fi to dominate (it was second). Drama + documentary subscribers had the lowest churn rate of any two-genre pair, possibly because both genres have deep catalogs that reward exploration.
Comedy + kids + animation was a triple, not a pair. The retention effect required all three genres. Comedy + kids alone did not predict meaningfully lower churn. This suggests family households where both parents and children watch --- a subscriber profile that is inherently stickier because cancellation affects multiple viewers.
Single-genre subscribers were the highest-opportunity group. They had the highest churn rate and the most room for improvement. Nudging even 20% of them into a sticky second genre would have outsized retention impact.

Practical Takeaway --- Association rules are traditionally a retail technique. But any domain with "basket-like" data --- streaming catalogs, insurance product bundles, SaaS feature usage, course enrollment patterns --- can be analyzed with the same framework. The trick is linking the co-occurrence patterns to a business outcome (churn, upsell, completion rate) rather than treating support and lift as ends in themselves.

Return to Chapter 23 | Next: Key Takeaways