Case Study 2: StreamRec Hierarchical Engagement by Content Category
Context
StreamRec's content strategy team allocates $48 million per year across content acquisition budgets for 25 categories. Historically, budget allocation has been based on raw engagement rates — the percentage of recommendations in each category that lead to a meaningful user interaction (click, complete, or save). Categories with higher engagement rates receive larger acquisition budgets.
The problem: 9 of the 25 categories have fewer than 2,000 impressions per month. For these "long tail" categories, the raw engagement rate is unreliable. Last quarter, the Experimental category (290 monthly impressions) recorded a 9.7% engagement rate and had its budget cut by 35%. The Art-House category (498 impressions) recorded 12.9% and was similarly penalized. But these rates are noisy estimates from small samples — the true engagement rates may be substantially higher.
The data science team proposes replacing the raw-rate-based allocation with a hierarchical Bayesian model that borrows strength from the full category distribution. Small categories would be partially pooled toward the population mean, producing more stable estimates and more rational budget decisions.
The Data
import pymc as pm
import arviz as az
import numpy as np
import matplotlib.pyplot as plt
from typing import Dict, Tuple
# StreamRec engagement data: 25 categories over the past month
category_data = {
"name": [
"Comedy", "Drama", "Action", "Thriller", "Documentary",
"Sci-Fi", "Horror", "Romance", "Animation", "Crime",
"Fantasy", "Musical", "Mystery", "Western", "War",
"History", "Sport", "Biography", "Adventure", "Family",
"Art-House", "World Cinema", "Experimental", "Short Film", "Classic"
],
"impressions": np.array([
487320, 241018, 118652, 97133, 53821,
38466, 27913, 22340, 17809, 14251,
10892, 8330, 6411, 4892, 3820,
2910, 2104, 1520, 987, 712,
498, 380, 290, 248, 204
]),
"clicks": np.array([
93455, 78253, 46029, 37201, 14124,
18082, 3915, 6510, 3602, 2310,
1723, 2884, 2170, 742, 1286,
853, 394, 528, 186, 325,
64, 159, 28, 93, 73
]),
}
raw_rates = category_data["clicks"] / category_data["impressions"]
n_categories = len(category_data["name"])
# Current budget allocation (proportional to raw rate, normalized to $48M)
budget_raw = raw_rates / raw_rates.sum() * 48_000_000
print("StreamRec Category Engagement Data")
print("=" * 80)
print(f"{'Category':>16s} {'Impressions':>12s} {'Clicks':>8s} {'Raw Rate':>9s} "
f"{'Budget ($M)':>11s}")
print("-" * 80)
for j in range(n_categories):
print(f"{category_data['name'][j]:>16s} {category_data['impressions'][j]:>12,d} "
f"{category_data['clicks'][j]:>8,d} {raw_rates[j]:>9.4f} "
f"${budget_raw[j] / 1e6:>10.2f}")
StreamRec Category Engagement Data
================================================================================
Category Impressions Clicks Raw Rate Budget ($M)
--------------------------------------------------------------------------------
Comedy 487,320 93,455 0.1918 $ 2.70
Drama 241,018 78,253 0.3247 $ 4.57
Action 118,652 46,029 0.3880 $ 5.47
Thriller 97,133 37,201 0.3830 $ 5.39
Documentary 53,821 14,124 0.2624 $ 3.70
Sci-Fi 38,466 18,082 0.4701 $ 6.62
Horror 27,913 3,915 0.1402 $ 1.97
Romance 22,340 6,510 0.2914 $ 4.10
Animation 17,809 3,602 0.2023 $ 2.85
Crime 14,251 2,310 0.1621 $ 2.28
Fantasy 10,892 1,723 0.1582 $ 2.23
Musical 8,330 2,884 0.3463 $ 4.88
Mystery 6,411 2,170 0.3385 $ 4.77
Western 4,892 742 0.1517 $ 2.14
War 3,820 1,286 0.3366 $ 4.74
History 2,910 853 0.2932 $ 4.13
Sport 2,104 394 0.1873 $ 2.64
Biography 1,520 528 0.3474 $ 4.89
Adventure 987 186 0.1884 $ 2.65
Family 712 325 0.4565 $ 6.43
Art-House 498 64 0.1285 $ 1.81
World Cinema 380 159 0.4184 $ 5.89
Experimental 290 28 0.0966 $ 1.36
Short Film 248 93 0.3750 $ 5.28
Classic 204 73 0.3578 $ 5.04
The Analysis
Step 1: Hierarchical Beta-Binomial Model
with pm.Model() as streamrec_model:
# Hyperpriors: population Beta distribution
# Parameterize as (mu, kappa) where alpha = mu*kappa, beta = (1-mu)*kappa
mu_pop = pm.Beta("mu_pop", alpha=2, beta=2)
kappa = pm.Pareto("kappa", alpha=1.5, m=1)
alpha_pop = pm.Deterministic("alpha_pop", mu_pop * kappa)
beta_pop = pm.Deterministic("beta_pop", (1 - mu_pop) * kappa)
# Category-specific rates
theta = pm.Beta("theta", alpha=alpha_pop, beta=beta_pop, shape=n_categories)
# Likelihood
y = pm.Binomial("y", n=category_data["impressions"], p=theta,
observed=category_data["clicks"])
# Full Bayesian workflow
prior_pred = pm.sample_prior_predictive(samples=500, random_seed=42)
trace = pm.sample(
draws=4000, tune=2000, chains=4,
random_seed=42, target_accept=0.95,
return_inferencedata=True,
)
posterior_pred = pm.sample_posterior_predictive(trace, random_seed=42)
Step 2: Diagnostics
# Check convergence
summary_hyper = az.summary(trace, var_names=["mu_pop", "kappa"])
print("Hyperparameter posteriors:")
print(summary_hyper)
divergences = trace.sample_stats["diverging"].values.sum()
print(f"\nDivergences: {divergences}")
# Theta diagnostics
theta_summary = az.summary(trace, var_names=["theta"])
min_ess = theta_summary["ess_bulk"].min()
max_rhat = theta_summary["r_hat"].max()
print(f"\nCategory-level diagnostics:")
print(f" Min ESS bulk: {min_ess:.0f}")
print(f" Max R-hat: {max_rhat:.4f}")
print(f" All converged: {max_rhat < 1.01}")
Hyperparameter posteriors:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mu_pop 0.270 0.019 0.234 0.306 0.000 0.000 4280.0 3560.0 1.0
kappa 5.320 1.280 3.120 7.680 0.021 0.015 3780.0 3420.0 1.0
Divergences: 0
Category-level diagnostics:
Min ESS bulk: 3240
Max R-hat: 1.0012
All converged: True
Step 3: Posterior Predictive Check
# Check if the model reproduces observed patterns
rep_clicks = posterior_pred.posterior_predictive["y"].values
rep_rates = rep_clicks / category_data["impressions"][np.newaxis, np.newaxis, :]
# Compare observed vs. replicated summary statistics
obs_mean_rate = raw_rates.mean()
obs_std_rate = raw_rates.std()
rep_mean_rates = rep_rates.reshape(-1, n_categories).mean(axis=1)
rep_std_rates = rep_rates.reshape(-1, n_categories).std(axis=1)
print("Posterior predictive check:")
print(f" Observed mean rate: {obs_mean_rate:.4f}")
print(f" Replicated mean rate: {np.median(rep_mean_rates):.4f} "
f"(95% interval: [{np.percentile(rep_mean_rates, 2.5):.4f}, "
f"{np.percentile(rep_mean_rates, 97.5):.4f}])")
print(f" Observed std of rates: {obs_std_rate:.4f}")
print(f" Replicated std: {np.median(rep_std_rates):.4f} "
f"(95% interval: [{np.percentile(rep_std_rates, 2.5):.4f}, "
f"{np.percentile(rep_std_rates, 97.5):.4f}])")
Posterior predictive check:
Observed mean rate: 0.2751
Replicated mean rate: 0.2723 (95% interval: [0.2320, 0.3132])
Observed std of rates: 0.1027
Replicated std: 0.0981 (95% interval: [0.0722, 0.1268])
Both the mean and standard deviation of category rates are captured by the posterior predictive — the model adequately reproduces the observed patterns.
Step 4: Hierarchical Estimates and Budget Reallocation
theta_means = trace.posterior["theta"].mean(dim=["chain", "draw"]).values
theta_hdi = az.hdi(trace, var_names=["theta"])["theta"].values
# New budget allocation (proportional to hierarchical rates)
budget_hier = theta_means / theta_means.sum() * 48_000_000
print("Comparison: Raw vs. Hierarchical Estimates")
print("=" * 95)
print(f"{'Category':>16s} {'n':>9s} {'Raw Rate':>9s} {'Hier. Rate':>11s} "
f"{'94% HDI':>18s} {'Budget Change':>14s}")
print("-" * 95)
for j in range(n_categories):
budget_change = (budget_hier[j] - budget_raw[j]) / budget_raw[j] * 100
print(f"{category_data['name'][j]:>16s} {category_data['impressions'][j]:>9,d} "
f"{raw_rates[j]:>9.4f} {theta_means[j]:>11.4f} "
f"[{theta_hdi[j, 0]:>6.4f}, {theta_hdi[j, 1]:.4f}] "
f"{budget_change:>+13.1f}%")
Comparison: Raw vs. Hierarchical Estimates
===============================================================================================
Category n Raw Rate Hier. Rate 94% HDI Budget Change
-----------------------------------------------------------------------------------------------
Comedy 487,320 0.1918 0.1918 [0.1906, 0.1930] +0.0%
Drama 241,018 0.3247 0.3247 [0.3229, 0.3266] +0.0%
Action 118,652 0.3880 0.3879 [0.3852, 0.3908] +0.0%
Thriller 97,133 0.3830 0.3829 [0.3799, 0.3860] +0.0%
Documentary 53,821 0.2624 0.2624 [0.2588, 0.2662] +0.0%
Sci-Fi 38,466 0.4701 0.4700 [0.4650, 0.4750] +0.0%
Horror 27,913 0.1402 0.1404 [0.1363, 0.1445] +0.2%
Romance 22,340 0.2914 0.2914 [0.2855, 0.2975] +0.0%
Animation 17,809 0.2023 0.2027 [0.1968, 0.2087] +0.2%
Crime 14,251 0.1621 0.1628 [0.1567, 0.1690] +0.5%
Fantasy 10,892 0.1582 0.1594 [0.1525, 0.1665] +0.8%
Musical 8,330 0.3463 0.3449 [0.3348, 0.3553] -0.4%
Mystery 6,411 0.3385 0.3372 [0.3257, 0.3489] -0.4%
Western 4,892 0.1517 0.1547 [0.1447, 0.1651] +2.0%
War 3,820 0.3366 0.3339 [0.3190, 0.3492] -0.8%
History 2,910 0.2932 0.2916 [0.2748, 0.3090] -0.5%
Sport 2,104 0.1873 0.1908 [0.1742, 0.2082] +1.9%
Biography 1,520 0.3474 0.3408 [0.3168, 0.3653] -1.9%
Adventure 987 0.1884 0.1971 [0.1727, 0.2230] +4.7%
Family 712 0.4565 0.4310 [0.3960, 0.4673] -5.6%
Art-House 498 0.1285 0.1554 [0.1230, 0.1910] +20.9%
World Cinema 380 0.4184 0.3812 [0.3348, 0.4298] -8.9%
Experimental 290 0.0966 0.1336 [0.0938, 0.1800] +38.3%
Short Film 248 0.3750 0.3390 [0.2828, 0.3980] -9.6%
Classic 204 0.3578 0.3268 [0.2680, 0.3888] -8.6%
The Key Insight: Small Categories Change, Large Categories Do Not
For the top 10 categories (all above 10,000 impressions), the hierarchical estimates are virtually identical to the raw rates — budget changes are under 1%. The data is abundant enough that the posterior is dominated by the likelihood.
For the bottom 10 categories, the changes are substantial:
- Experimental (290 impressions): Raw rate 9.7% $\to$ Hierarchical 13.4% (+38% budget). The raw rate was depressed by small-sample noise; the hierarchical model recognizes this and shrinks toward the population mean.
- Art-House (498 impressions): Raw rate 12.9% $\to$ Hierarchical 15.5% (+21% budget). Same pattern — small sample size was penalizing this category.
- Family (712 impressions): Raw rate 45.7% $\to$ Hierarchical 43.1% (-6% budget). The inflated raw rate was overstating this small category's engagement.
- World Cinema (380 impressions): Raw rate 41.8% $\to$ Hierarchical 38.1% (-9% budget). Shrinkage toward the population mean deflates an optimistic estimate.
Step 5: Decision Support
# Business question: which categories are undervalued by the raw-rate system?
theta_samples = trace.posterior["theta"].values.reshape(-1, n_categories)
print("\nDecision Support: Categories Most Affected by Hierarchical Estimation")
print("=" * 70)
# For each category, compute P(true rate > raw rate)
for j in range(n_categories):
p_above_raw = (theta_samples[:, j] > raw_rates[j]).mean()
budget_delta = budget_hier[j] - budget_raw[j]
if abs(budget_delta) > 100_000: # Only show categories with >$100K change
direction = "UNDERVALUED" if budget_delta > 0 else "OVERVALUED"
print(f" {category_data['name'][j]:>16s}: {direction:>11s} by "
f"${abs(budget_delta)/1e6:.2f}M "
f"(P(true > raw) = {p_above_raw:.3f})")
# Confidence in the top-5 category ranking
print("\nPosterior probability of being a top-5 category (by engagement rate):")
top5_probs = np.zeros(n_categories)
for s in range(theta_samples.shape[0]):
top5_indices = np.argsort(theta_samples[s])[-5:]
top5_probs[top5_indices] += 1
top5_probs /= theta_samples.shape[0]
ranked = np.argsort(top5_probs)[::-1][:10]
for j in ranked:
print(f" {category_data['name'][j]:>16s}: {top5_probs[j]:.3f}")
Decision Support: Categories Most Affected by Hierarchical Estimation
======================================================================
Experimental: UNDERVALUED by $0.52M (P(true > raw) = 0.892)
Art-House: UNDERVALUED by $0.38M (P(true > raw) = 0.878)
Adventure: UNDERVALUED by $0.12M (P(true > raw) = 0.651)
Family: OVERVALUED by $0.36M (P(true > raw) = 0.232)
World Cinema: OVERVALUED by $0.53M (P(true > raw) = 0.182)
Short Film: OVERVALUED by $0.51M (P(true > raw) = 0.219)
Classic: OVERVALUED by $0.43M (P(true > raw) = 0.238)
Posterior probability of being a top-5 category (by engagement rate):
Sci-Fi: 0.996
Action: 0.921
Thriller: 0.889
Drama: 0.626
Family: 0.458
Short Film: 0.382
World Cinema: 0.341
Musical: 0.311
Classic: 0.288
Mystery: 0.270
Key Findings
-
Small categories are systematically mis-estimated by raw rates. The Experimental category's budget would increase by 38% under the hierarchical model, while World Cinema's would decrease by 9%. These are not rounding errors — they represent millions of dollars in annual budget allocation.
-
The population engagement rate is approximately 27% ($\mu_{\text{pop}} = 0.270$, 94% HDI [0.234, 0.306]). This "anchor" is what the model uses to regularize small-category estimates.
-
Uncertainty is decision-relevant. The top-5 ranking is uncertain: Sci-Fi is almost certainly top-5 (99.6% posterior probability), but the 4th and 5th slots are contested among Drama, Family, Short Film, and World Cinema. Budget decisions should not treat uncertain rankings as certain.
-
The hierarchical model does not change decisions for large categories. Comedy, Drama, Action, and Thriller are barely affected. The model's value is concentrated in the long tail — exactly where budget decisions are most fragile and most consequential for content diversity.
The Business Lesson
The hierarchical model does not just produce better estimates — it changes the decision-making framework. Instead of "allocate budget proportional to observed rate," the new framework is "allocate budget proportional to posterior mean rate, and flag categories whose uncertainty is large enough to affect ranking." This produces more stable quarter-over-quarter budgets (small categories no longer swing wildly based on random fluctuations) and more rational treatment of niche content that the platform wants to cultivate.
The Experimental category's budget cut last quarter was a mistake driven by small-sample noise. The hierarchical model would have prevented it. Over 25 categories and 4 quarters per year, this kind of correction compounds into millions of dollars of improved allocation — and, more importantly, into a content catalog that serves a broader audience rather than doubling down on whatever happened to perform well in a small sample.