Case Study 2: StreamRec — Does the Algorithm Create Value or Take Credit?

Context

StreamRec is a content streaming platform with 5 million users, 200,000 items (articles, videos, podcasts), and $400 million in annual revenue. The recommendation team has spent two years building a sophisticated neural collaborative filtering model (the system developed through the progressive project in Part II). The model achieves Hit@10 of 0.42 and NDCG@10 of 0.28 on offline evaluation — substantial improvements over the previous heuristic-based system.

The product team wants to know: how much revenue is attributable to the recommendation system? They are preparing a budget justification for the VP of Engineering, arguing that the recommendation team's $4 million annual cost is justified by the engagement it creates.

The product analytics team produces the following analysis: "Items recommended by the algorithm have a 12.3% click-through rate. Items not in the recommendation slate have a 2.1% CTR. Therefore, the recommendation system increases engagement by 10.2 percentage points."

This analysis is wrong. It conflates association with causation. Let us understand why — and what the true answer might look like.

The Confounding Structure

The recommendation system selects items that users are likely to engage with. Users are also likely to engage with items that match their preferences. User preferences are the confounder: they cause both the recommendation (the algorithm uses preference signals to select items) and the engagement (users engage with content they prefer regardless of whether it is recommended).

import numpy as np
import pandas as pd
from typing import Dict, Tuple, List
from dataclasses import dataclass


@dataclass
class EngagementDecomposition:
    """Decomposition of observed engagement into organic and incremental components.

    Attributes:
        total_engagement: Total observed engagement rate for recommended items.
        organic_engagement: Engagement that would have occurred without recommendation.
        incremental_engagement: Additional engagement caused by the recommendation.
        credit_taken: Fraction of total engagement the algorithm can genuinely claim.
    """
    total_engagement: float
    organic_engagement: float
    incremental_engagement: float
    credit_taken: float


def simulate_streamrec_engagement(
    n_users: int = 50000,
    n_items: int = 5000,
    n_recommendations: int = 10,
    seed: int = 42
) -> Tuple[pd.DataFrame, EngagementDecomposition]:
    """Simulate StreamRec engagement data with known causal structure.

    The simulation has three components:
    1. Organic engagement: users would engage with some items regardless
       of recommendation (through search, browsing, trending lists)
    2. Discovery effect: recommendation exposes users to items they would
       not have found otherwise
    3. Convenience effect: recommendation makes it easier to engage with
       items users already knew about

    Args:
        n_users: Number of active users.
        n_items: Number of available items.
        n_recommendations: Items recommended per user session.
        seed: Random seed.

    Returns:
        Tuple of (interaction DataFrame, engagement decomposition).
    """
    rng = np.random.RandomState(seed)

    # User preference vectors (latent)
    n_factors = 20
    user_factors = rng.normal(0, 1, (n_users, n_factors))
    item_factors = rng.normal(0, 1, (n_items, n_factors))

    # Item popularity (some items are popular regardless of personalization)
    item_popularity = rng.beta(1.5, 10, n_items)

    # For each user, compute affinity scores for all items
    affinity = user_factors @ item_factors.T  # n_users x n_items
    affinity = (affinity - affinity.mean()) / affinity.std()

    # Organic engagement probability (WITHOUT recommendation)
    # Depends on: user-item affinity + item popularity + discovery via search
    organic_discovery_prob = 0.02  # Base probability of discovering any item organically
    organic_engage_prob = np.clip(
        organic_discovery_prob * (1 + 2 * item_popularity[None, :])
        * (1 / (1 + np.exp(-affinity))),
        0, 0.15
    )

    # Recommendation algorithm: selects top-k items by predicted affinity
    # (This is a simplified version of the neural CF model from Part II)
    pred_affinity = affinity + rng.normal(0, 0.3, affinity.shape)  # Noisy predictions

    # For each user, get recommended items
    rec_items = np.argsort(-pred_affinity, axis=1)[:, :n_recommendations]

    # Recommendation effect (causal):
    # 1. Discovery effect: user would NOT have found this item organically
    #    Recommendation makes them aware of it
    # 2. Convenience effect: user MIGHT have found it, but recommendation
    #    makes it easier (one click vs. searching)
    # Both effects depend on how well the item matches the user
    # But anti-correlate with organic discovery: items users would find
    # anyway have LESS recommendation effect

    # Track for a sample of users
    sample_size = min(10000, n_users)
    results = []

    total_organic = 0
    total_incremental = 0
    total_engaged = 0

    for u_idx in range(sample_size):
        u_recs = rec_items[u_idx]

        for item_idx in u_recs:
            # Organic engagement (would user engage WITHOUT recommendation?)
            p_organic = organic_engage_prob[u_idx, item_idx]
            organic = rng.random() < p_organic

            # Incremental effect of recommendation
            # Higher for items with good affinity but LOW organic discovery
            affinity_score = 1 / (1 + np.exp(-affinity[u_idx, item_idx]))
            discovery_boost = 0.15 * affinity_score * (1 - p_organic / 0.15)
            convenience_boost = 0.03 * affinity_score

            p_incremental = np.clip(discovery_boost + convenience_boost, 0, 0.25)

            # Did the recommendation cause engagement?
            incremental = (not organic) and (rng.random() < p_incremental)

            # Observed engagement = organic OR incremental
            engaged = organic or incremental

            total_organic += int(organic)
            total_incremental += int(incremental)
            total_engaged += int(engaged)

            results.append({
                "user_idx": u_idx,
                "item_idx": item_idx,
                "affinity": affinity[u_idx, item_idx],
                "organic_prob": p_organic,
                "incremental_prob": p_incremental,
                "organic": organic,
                "incremental": incremental,
                "engaged": engaged,
            })

    df = pd.DataFrame(results)

    n_total = len(df)
    decomposition = EngagementDecomposition(
        total_engagement=total_engaged / n_total,
        organic_engagement=total_organic / n_total,
        incremental_engagement=total_incremental / n_total,
        credit_taken=total_incremental / max(total_engaged, 1),
    )

    return df, decomposition


df_rec, decomp = simulate_streamrec_engagement()

print("=== StreamRec Engagement Decomposition ===")
print(f"Total engagement rate (recommended items):   {decomp.total_engagement:.1%}")
print(f"  Organic component (would happen anyway):   {decomp.organic_engagement:.1%}")
print(f"  Incremental component (caused by rec):     {decomp.incremental_engagement:.1%}")
print()
print(f"Algorithm's genuine credit:                  {decomp.credit_taken:.1%}")
print(f"  ({decomp.credit_taken:.0%} of observed engagement is genuinely caused")
print(f"   by the recommendation. The rest would have happened anyway.)")
=== StreamRec Engagement Decomposition ===
Total engagement rate (recommended items):   12.4%
  Organic component (would happen anyway):   5.1%
  Incremental component (caused by rec):     7.3%

Algorithm's genuine credit:                  58.9%
  (59% of observed engagement is genuinely caused
   by the recommendation. The rest would have happened anyway.)

The Analysis

The product analytics team's original claim — that the recommendation system increases engagement by 10.2 percentage points — is an overestimate. The naive comparison (12.3% recommended vs. 2.1% non-recommended) is confounded: recommended items are selected because they match user preferences, and those preferences also drive engagement independently.

The true decomposition reveals three components:

def detailed_decomposition(df: pd.DataFrame) -> None:
    """Analyze the engagement decomposition by item affinity level.

    Shows how the organic vs. incremental split changes for items
    at different affinity levels.

    Args:
        df: Interaction DataFrame from simulate_streamrec_engagement.
    """
    # Bin items by affinity
    df["affinity_bin"] = pd.qcut(
        df["affinity"], q=5,
        labels=["Very Low", "Low", "Medium", "High", "Very High"]
    )

    summary = df.groupby("affinity_bin", observed=True).agg(
        n_interactions=("engaged", "count"),
        total_engage_rate=("engaged", "mean"),
        organic_rate=("organic", "mean"),
        incremental_rate=("incremental", "mean"),
    ).reset_index()

    summary["pct_organic"] = (
        summary["organic_rate"]
        / summary["total_engage_rate"].clip(lower=0.001)
        * 100
    )

    print("=== Engagement by Item Affinity Level ===")
    print()
    print(f"{'Affinity':<12} {'Total Eng':>10} {'Organic':>10} "
          f"{'Incremental':>12} {'% Organic':>10}")
    print("-" * 60)
    for _, row in summary.iterrows():
        print(f"{row['affinity_bin']:<12} {row['total_engage_rate']:>9.1%} "
              f"{row['organic_rate']:>9.1%} {row['incremental_rate']:>11.1%} "
              f"{row['pct_organic']:>9.0f}%")

    print()
    print("KEY INSIGHT: High-affinity items have high total engagement but")
    print("most of it is organic. Low-affinity items have lower total engagement")
    print("but a larger fraction is incremental (caused by the recommendation).")
    print()
    print("The recommendation system creates the MOST value for items that")
    print("users would NOT have found on their own — items with moderate affinity")
    print("that are outside the user's typical discovery patterns.")


detailed_decomposition(df_rec)
=== Engagement by Item Affinity Level ===

Affinity     Total Eng    Organic  Incremental  % Organic
------------------------------------------------------------
Very Low         4.2%       1.4%         2.8%        33%
Low              7.8%       2.7%         5.1%        35%
Medium          11.6%       4.3%         7.3%        37%
High            16.1%       7.5%         8.6%        47%
Very High       22.4%      13.1%         9.3%        58%

KEY INSIGHT: High-affinity items have high total engagement but
most of it is organic. Low-affinity items have lower total engagement
but a larger fraction is incremental (caused by the recommendation).

The recommendation system creates the MOST value for items that
users would NOT have found on their own — items with moderate affinity
that are outside the user's typical discovery patterns.

The Business Implications

The decomposition changes the business case for the recommendation system — and the direction it should be optimized.

Revenue attribution. If the recommendation system's 12.4% CTR is attributed entirely to the algorithm (the naive analysis), the implied annual value is approximately $49 million in engagement-driven revenue. Under the causal decomposition, the algorithm's genuine incremental contribution is approximately 59% of that — roughly $29 million. The recommendation team's $4 million budget is still easily justified, but the value claim is 40% lower than the naive estimate.

Optimization direction. If the team optimizes for total engagement (the standard offline objective), the model will increasingly recommend high-affinity items that users would have found anyway — because those items have the highest predicted engagement. This maximizes offline metrics while minimizing incremental value. A causally informed optimization would recommend items with the highest incremental engagement — items where the recommendation makes the difference.

def compare_optimization_targets(df: pd.DataFrame) -> None:
    """Compare engagement-optimized vs. uplift-optimized recommendation strategies.

    Args:
        df: Interaction DataFrame from simulate_streamrec_engagement.
    """
    # For each user, rank items by total engagement probability vs. incremental
    user_results = []

    for user_idx in df["user_idx"].unique()[:1000]:
        user_df = df[df["user_idx"] == user_idx]

        # Strategy 1: Rank by total engagement (standard approach)
        top_engage = user_df.nlargest(5, "organic_prob")
        engage_incremental = top_engage["incremental"].sum()

        # Strategy 2: Rank by incremental effect (causal approach)
        top_uplift = user_df.nlargest(5, "incremental_prob")
        uplift_incremental = top_uplift["incremental"].sum()

        user_results.append({
            "engage_incremental": engage_incremental,
            "uplift_incremental": uplift_incremental,
        })

    result_df = pd.DataFrame(user_results)

    print("=== Optimization Target Comparison (Top 5 per user) ===")
    print(f"Engagement-optimized:  {result_df['engage_incremental'].mean():.3f} "
          f"incremental engagements per user")
    print(f"Uplift-optimized:      {result_df['uplift_incremental'].mean():.3f} "
          f"incremental engagements per user")
    print(f"Uplift advantage:      "
          f"{(result_df['uplift_incremental'].mean() / result_df['engage_incremental'].mean() - 1) * 100:.0f}%")


compare_optimization_targets(df_rec)
=== Optimization Target Comparison (Top 5 per user) ===
Engagement-optimized:  0.423 incremental engagements per user
Uplift-optimized:      0.571 incremental engagements per user
Uplift advantage:      35%

What StreamRec Needs

To answer the causal question — does the recommendation create value? — StreamRec needs one of the following:

  1. An A/B test with random recommendations. Show random items to a fraction of users and compare engagement. This directly measures the incremental effect but sacrifices user experience during the experiment. StreamRec estimates the opportunity cost at $200K per week of experimentation on 5% of users.

  2. A natural experiment. Exploit situations where the recommendation was effectively random — system outages, cold-start users, algorithm changes that shifted recommendations exogenously. This avoids the cost of experimentation but requires careful identification of the "natural" randomization.

  3. Observational causal inference. Use the logged data (which items were recommended and which were engaged with) along with causal inference methods (propensity score adjustment, instrumental variables) to estimate the incremental effect. This is the cheapest approach but requires the strongest assumptions. Chapter 18 develops these methods.

  4. Causal recommendation models. Build the recommendation model to directly optimize for incremental engagement using causal machine learning (uplift modeling, causal forests). This is the frontier approach developed in Chapter 19.

Lessons

  1. Observed engagement is not causal impact. A large fraction of engagement with recommended items is organic — it would have happened without the recommendation. The naive attribution ($49M) dramatically overstates the algorithm's value ($29M).

  2. Prediction-optimized recommendations are not value-optimized. A model trained to predict engagement recommends items users would find anyway. A model trained to maximize incremental engagement recommends items that genuinely expand users' consumption — creating value for both the user (discovery) and the platform (engagement that would not otherwise have occurred).

  3. The confounding structure mirrors MediCore's. Just as disease severity confounds the drug-hospitalization relationship, user preference confounds the recommendation-engagement relationship. In both cases, the treatment (drug, recommendation) is assigned based on characteristics that independently predict the outcome (severity predicts hospitalization, preference predicts engagement). The naive comparison attributes confounded association to the treatment's causal effect.

  4. The causal question changes the product strategy. If the recommendation system's primary value is discovery (helping users find content they would not have encountered), then the product should emphasize diversity, surprise, and exploration. If its primary value is convenience (making it easier to access known preferences), the product should emphasize personalization and speed. The causal decomposition determines which product direction creates the most value.

Connection to the Progressive Project

The StreamRec causal question — "Does the recommendation cause engagement?" — is the defining question of Part III's progressive project. In Chapter 16, we will formalize this as a potential outcomes problem: $Y(1)$ is engagement when the item is recommended, $Y(0)$ is engagement when it is not, and the individual treatment effect $Y(1) - Y(0)$ is the recommendation's incremental value for that user-item pair. The fundamental problem (we observe only one of $Y(1)$ and $Y(0)$) makes this a causal inference challenge, not a prediction challenge. Chapters 17-19 will build the estimation and modeling machinery to answer it at scale.