Case Study 2: StreamRec — Does the Algorithm Create Value or Take Credit?
Context
StreamRec is a content streaming platform with 5 million users, 200,000 items (articles, videos, podcasts), and $400 million in annual revenue. The recommendation team has spent two years building a sophisticated neural collaborative filtering model (the system developed through the progressive project in Part II). The model achieves Hit@10 of 0.42 and NDCG@10 of 0.28 on offline evaluation — substantial improvements over the previous heuristic-based system.
The product team wants to know: how much revenue is attributable to the recommendation system? They are preparing a budget justification for the VP of Engineering, arguing that the recommendation team's $4 million annual cost is justified by the engagement it creates.
The product analytics team produces the following analysis: "Items recommended by the algorithm have a 12.3% click-through rate. Items not in the recommendation slate have a 2.1% CTR. Therefore, the recommendation system increases engagement by 10.2 percentage points."
This analysis is wrong. It conflates association with causation. Let us understand why — and what the true answer might look like.
The Confounding Structure
The recommendation system selects items that users are likely to engage with. Users are also likely to engage with items that match their preferences. User preferences are the confounder: they cause both the recommendation (the algorithm uses preference signals to select items) and the engagement (users engage with content they prefer regardless of whether it is recommended).
import numpy as np
import pandas as pd
from typing import Dict, Tuple, List
from dataclasses import dataclass
@dataclass
class EngagementDecomposition:
"""Decomposition of observed engagement into organic and incremental components.
Attributes:
total_engagement: Total observed engagement rate for recommended items.
organic_engagement: Engagement that would have occurred without recommendation.
incremental_engagement: Additional engagement caused by the recommendation.
credit_taken: Fraction of total engagement the algorithm can genuinely claim.
"""
total_engagement: float
organic_engagement: float
incremental_engagement: float
credit_taken: float
def simulate_streamrec_engagement(
n_users: int = 50000,
n_items: int = 5000,
n_recommendations: int = 10,
seed: int = 42
) -> Tuple[pd.DataFrame, EngagementDecomposition]:
"""Simulate StreamRec engagement data with known causal structure.
The simulation has three components:
1. Organic engagement: users would engage with some items regardless
of recommendation (through search, browsing, trending lists)
2. Discovery effect: recommendation exposes users to items they would
not have found otherwise
3. Convenience effect: recommendation makes it easier to engage with
items users already knew about
Args:
n_users: Number of active users.
n_items: Number of available items.
n_recommendations: Items recommended per user session.
seed: Random seed.
Returns:
Tuple of (interaction DataFrame, engagement decomposition).
"""
rng = np.random.RandomState(seed)
# User preference vectors (latent)
n_factors = 20
user_factors = rng.normal(0, 1, (n_users, n_factors))
item_factors = rng.normal(0, 1, (n_items, n_factors))
# Item popularity (some items are popular regardless of personalization)
item_popularity = rng.beta(1.5, 10, n_items)
# For each user, compute affinity scores for all items
affinity = user_factors @ item_factors.T # n_users x n_items
affinity = (affinity - affinity.mean()) / affinity.std()
# Organic engagement probability (WITHOUT recommendation)
# Depends on: user-item affinity + item popularity + discovery via search
organic_discovery_prob = 0.02 # Base probability of discovering any item organically
organic_engage_prob = np.clip(
organic_discovery_prob * (1 + 2 * item_popularity[None, :])
* (1 / (1 + np.exp(-affinity))),
0, 0.15
)
# Recommendation algorithm: selects top-k items by predicted affinity
# (This is a simplified version of the neural CF model from Part II)
pred_affinity = affinity + rng.normal(0, 0.3, affinity.shape) # Noisy predictions
# For each user, get recommended items
rec_items = np.argsort(-pred_affinity, axis=1)[:, :n_recommendations]
# Recommendation effect (causal):
# 1. Discovery effect: user would NOT have found this item organically
# Recommendation makes them aware of it
# 2. Convenience effect: user MIGHT have found it, but recommendation
# makes it easier (one click vs. searching)
# Both effects depend on how well the item matches the user
# But anti-correlate with organic discovery: items users would find
# anyway have LESS recommendation effect
# Track for a sample of users
sample_size = min(10000, n_users)
results = []
total_organic = 0
total_incremental = 0
total_engaged = 0
for u_idx in range(sample_size):
u_recs = rec_items[u_idx]
for item_idx in u_recs:
# Organic engagement (would user engage WITHOUT recommendation?)
p_organic = organic_engage_prob[u_idx, item_idx]
organic = rng.random() < p_organic
# Incremental effect of recommendation
# Higher for items with good affinity but LOW organic discovery
affinity_score = 1 / (1 + np.exp(-affinity[u_idx, item_idx]))
discovery_boost = 0.15 * affinity_score * (1 - p_organic / 0.15)
convenience_boost = 0.03 * affinity_score
p_incremental = np.clip(discovery_boost + convenience_boost, 0, 0.25)
# Did the recommendation cause engagement?
incremental = (not organic) and (rng.random() < p_incremental)
# Observed engagement = organic OR incremental
engaged = organic or incremental
total_organic += int(organic)
total_incremental += int(incremental)
total_engaged += int(engaged)
results.append({
"user_idx": u_idx,
"item_idx": item_idx,
"affinity": affinity[u_idx, item_idx],
"organic_prob": p_organic,
"incremental_prob": p_incremental,
"organic": organic,
"incremental": incremental,
"engaged": engaged,
})
df = pd.DataFrame(results)
n_total = len(df)
decomposition = EngagementDecomposition(
total_engagement=total_engaged / n_total,
organic_engagement=total_organic / n_total,
incremental_engagement=total_incremental / n_total,
credit_taken=total_incremental / max(total_engaged, 1),
)
return df, decomposition
df_rec, decomp = simulate_streamrec_engagement()
print("=== StreamRec Engagement Decomposition ===")
print(f"Total engagement rate (recommended items): {decomp.total_engagement:.1%}")
print(f" Organic component (would happen anyway): {decomp.organic_engagement:.1%}")
print(f" Incremental component (caused by rec): {decomp.incremental_engagement:.1%}")
print()
print(f"Algorithm's genuine credit: {decomp.credit_taken:.1%}")
print(f" ({decomp.credit_taken:.0%} of observed engagement is genuinely caused")
print(f" by the recommendation. The rest would have happened anyway.)")
=== StreamRec Engagement Decomposition ===
Total engagement rate (recommended items): 12.4%
Organic component (would happen anyway): 5.1%
Incremental component (caused by rec): 7.3%
Algorithm's genuine credit: 58.9%
(59% of observed engagement is genuinely caused
by the recommendation. The rest would have happened anyway.)
The Analysis
The product analytics team's original claim — that the recommendation system increases engagement by 10.2 percentage points — is an overestimate. The naive comparison (12.3% recommended vs. 2.1% non-recommended) is confounded: recommended items are selected because they match user preferences, and those preferences also drive engagement independently.
The true decomposition reveals three components:
def detailed_decomposition(df: pd.DataFrame) -> None:
"""Analyze the engagement decomposition by item affinity level.
Shows how the organic vs. incremental split changes for items
at different affinity levels.
Args:
df: Interaction DataFrame from simulate_streamrec_engagement.
"""
# Bin items by affinity
df["affinity_bin"] = pd.qcut(
df["affinity"], q=5,
labels=["Very Low", "Low", "Medium", "High", "Very High"]
)
summary = df.groupby("affinity_bin", observed=True).agg(
n_interactions=("engaged", "count"),
total_engage_rate=("engaged", "mean"),
organic_rate=("organic", "mean"),
incremental_rate=("incremental", "mean"),
).reset_index()
summary["pct_organic"] = (
summary["organic_rate"]
/ summary["total_engage_rate"].clip(lower=0.001)
* 100
)
print("=== Engagement by Item Affinity Level ===")
print()
print(f"{'Affinity':<12} {'Total Eng':>10} {'Organic':>10} "
f"{'Incremental':>12} {'% Organic':>10}")
print("-" * 60)
for _, row in summary.iterrows():
print(f"{row['affinity_bin']:<12} {row['total_engage_rate']:>9.1%} "
f"{row['organic_rate']:>9.1%} {row['incremental_rate']:>11.1%} "
f"{row['pct_organic']:>9.0f}%")
print()
print("KEY INSIGHT: High-affinity items have high total engagement but")
print("most of it is organic. Low-affinity items have lower total engagement")
print("but a larger fraction is incremental (caused by the recommendation).")
print()
print("The recommendation system creates the MOST value for items that")
print("users would NOT have found on their own — items with moderate affinity")
print("that are outside the user's typical discovery patterns.")
detailed_decomposition(df_rec)
=== Engagement by Item Affinity Level ===
Affinity Total Eng Organic Incremental % Organic
------------------------------------------------------------
Very Low 4.2% 1.4% 2.8% 33%
Low 7.8% 2.7% 5.1% 35%
Medium 11.6% 4.3% 7.3% 37%
High 16.1% 7.5% 8.6% 47%
Very High 22.4% 13.1% 9.3% 58%
KEY INSIGHT: High-affinity items have high total engagement but
most of it is organic. Low-affinity items have lower total engagement
but a larger fraction is incremental (caused by the recommendation).
The recommendation system creates the MOST value for items that
users would NOT have found on their own — items with moderate affinity
that are outside the user's typical discovery patterns.
The Business Implications
The decomposition changes the business case for the recommendation system — and the direction it should be optimized.
Revenue attribution. If the recommendation system's 12.4% CTR is attributed entirely to the algorithm (the naive analysis), the implied annual value is approximately $49 million in engagement-driven revenue. Under the causal decomposition, the algorithm's genuine incremental contribution is approximately 59% of that — roughly $29 million. The recommendation team's $4 million budget is still easily justified, but the value claim is 40% lower than the naive estimate.
Optimization direction. If the team optimizes for total engagement (the standard offline objective), the model will increasingly recommend high-affinity items that users would have found anyway — because those items have the highest predicted engagement. This maximizes offline metrics while minimizing incremental value. A causally informed optimization would recommend items with the highest incremental engagement — items where the recommendation makes the difference.
def compare_optimization_targets(df: pd.DataFrame) -> None:
"""Compare engagement-optimized vs. uplift-optimized recommendation strategies.
Args:
df: Interaction DataFrame from simulate_streamrec_engagement.
"""
# For each user, rank items by total engagement probability vs. incremental
user_results = []
for user_idx in df["user_idx"].unique()[:1000]:
user_df = df[df["user_idx"] == user_idx]
# Strategy 1: Rank by total engagement (standard approach)
top_engage = user_df.nlargest(5, "organic_prob")
engage_incremental = top_engage["incremental"].sum()
# Strategy 2: Rank by incremental effect (causal approach)
top_uplift = user_df.nlargest(5, "incremental_prob")
uplift_incremental = top_uplift["incremental"].sum()
user_results.append({
"engage_incremental": engage_incremental,
"uplift_incremental": uplift_incremental,
})
result_df = pd.DataFrame(user_results)
print("=== Optimization Target Comparison (Top 5 per user) ===")
print(f"Engagement-optimized: {result_df['engage_incremental'].mean():.3f} "
f"incremental engagements per user")
print(f"Uplift-optimized: {result_df['uplift_incremental'].mean():.3f} "
f"incremental engagements per user")
print(f"Uplift advantage: "
f"{(result_df['uplift_incremental'].mean() / result_df['engage_incremental'].mean() - 1) * 100:.0f}%")
compare_optimization_targets(df_rec)
=== Optimization Target Comparison (Top 5 per user) ===
Engagement-optimized: 0.423 incremental engagements per user
Uplift-optimized: 0.571 incremental engagements per user
Uplift advantage: 35%
What StreamRec Needs
To answer the causal question — does the recommendation create value? — StreamRec needs one of the following:
-
An A/B test with random recommendations. Show random items to a fraction of users and compare engagement. This directly measures the incremental effect but sacrifices user experience during the experiment. StreamRec estimates the opportunity cost at $200K per week of experimentation on 5% of users.
-
A natural experiment. Exploit situations where the recommendation was effectively random — system outages, cold-start users, algorithm changes that shifted recommendations exogenously. This avoids the cost of experimentation but requires careful identification of the "natural" randomization.
-
Observational causal inference. Use the logged data (which items were recommended and which were engaged with) along with causal inference methods (propensity score adjustment, instrumental variables) to estimate the incremental effect. This is the cheapest approach but requires the strongest assumptions. Chapter 18 develops these methods.
-
Causal recommendation models. Build the recommendation model to directly optimize for incremental engagement using causal machine learning (uplift modeling, causal forests). This is the frontier approach developed in Chapter 19.
Lessons
-
Observed engagement is not causal impact. A large fraction of engagement with recommended items is organic — it would have happened without the recommendation. The naive attribution ($49M) dramatically overstates the algorithm's value ($29M).
-
Prediction-optimized recommendations are not value-optimized. A model trained to predict engagement recommends items users would find anyway. A model trained to maximize incremental engagement recommends items that genuinely expand users' consumption — creating value for both the user (discovery) and the platform (engagement that would not otherwise have occurred).
-
The confounding structure mirrors MediCore's. Just as disease severity confounds the drug-hospitalization relationship, user preference confounds the recommendation-engagement relationship. In both cases, the treatment (drug, recommendation) is assigned based on characteristics that independently predict the outcome (severity predicts hospitalization, preference predicts engagement). The naive comparison attributes confounded association to the treatment's causal effect.
-
The causal question changes the product strategy. If the recommendation system's primary value is discovery (helping users find content they would not have encountered), then the product should emphasize diversity, surprise, and exploration. If its primary value is convenience (making it easier to access known preferences), the product should emphasize personalization and speed. The causal decomposition determines which product direction creates the most value.
Connection to the Progressive Project
The StreamRec causal question — "Does the recommendation cause engagement?" — is the defining question of Part III's progressive project. In Chapter 16, we will formalize this as a potential outcomes problem: $Y(1)$ is engagement when the item is recommended, $Y(0)$ is engagement when it is not, and the individual treatment effect $Y(1) - Y(0)$ is the recommendation's incremental value for that user-item pair. The fundamental problem (we observe only one of $Y(1)$ and $Y(0)$) makes this a causal inference challenge, not a prediction challenge. Chapters 17-19 will build the estimation and modeling machinery to answer it at scale.