Case Study 2: StreamRec Engagement Forecasting — Multi-Seasonality, Walk-Forward Validation, and Model Comparison

DataField.Dev

Case Study 2: StreamRec Engagement Forecasting — Multi-Seasonality, Walk-Forward Validation, and Model Comparison

Context

StreamRec's operations team needs 30-day engagement forecasts for capacity planning and content acquisition decisions. When daily active users (DAU) drop below a category-specific threshold, the platform triggers promotional campaigns; when DAU exceeds the threshold, additional CDN capacity must be provisioned. In both cases, the cost of surprise is high — missed promotions waste content budgets, and insufficient CDN capacity degrades user experience.

The current system uses a 30-day trailing average with manual adjustments for holidays. The operations manager describes it as "mostly fine, except when it isn't — and when it isn't, we lose money." The data science team proposes replacing the heuristic with a proper forecasting system that provides both point predictions and prediction intervals, evaluated rigorously via walk-forward validation.

The team will compare three approaches: Prophet (the current industry default), a Bayesian structural time series model, and a Temporal Fusion Transformer trained as a global model across content categories. The comedy category — StreamRec's highest-traffic vertical — serves as the primary evaluation target, with the global TFT model evaluated across all eight categories.

The Data

import numpy as np
import pandas as pd
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass


def generate_streamrec_comedy_engagement(
    n_days: int = 1095,
    seed: int = 42,
) -> pd.DataFrame:
    """Generate realistic daily engagement data for StreamRec comedy.

    The series exhibits:
    - Day-of-week seasonality: weekends 18% higher
    - Annual seasonality: summer dip, winter peak
    - Holiday spikes: Christmas, New Year, Thanksgiving
    - Slow upward trend: +0.01 per 100 days
    - A regime change at day 730 (podcast vertical launch causes
      a temporary 2-week engagement dip as attention shifts)
    - ~5% missing values from system outages

    Args:
        n_days: Number of days to simulate.
        seed: Random seed.

    Returns:
        DataFrame with date, engagement_rate, and covariates.
    """
    np.random.seed(seed)
    dates = pd.date_range("2022-01-01", periods=n_days, freq="D")
    t = np.arange(n_days, dtype=float)

    # Base engagement rate
    base = 0.48

    # Trend: slow linear increase
    trend = 0.0001 * t

    # Day-of-week seasonality
    dow = np.array([d.dayofweek for d in dates])
    dow_effect = np.where(dow >= 5, 0.04, 0.0)  # Weekend boost
    dow_effect += np.where(dow == 4, 0.01, 0.0)  # Friday partial boost

    # Annual seasonality: peak in Dec-Jan, dip in Jun-Jul
    annual = 0.035 * np.cos(2 * np.pi * (t - 350) / 365.25)

    # Monthly sub-pattern: slightly higher engagement at month boundaries
    dom = np.array([d.day for d in dates])
    month_effect = 0.005 * np.where((dom <= 3) | (dom >= 28), 1.0, 0.0)

    # Holiday effects
    month_arr = np.array([d.month for d in dates])
    day_arr = np.array([d.day for d in dates])
    is_christmas = (month_arr == 12) & (day_arr >= 23) & (day_arr <= 31)
    is_newyear = (month_arr == 1) & (day_arr <= 3)
    is_thanksgiving = (month_arr == 11) & (dow == 3) & (day_arr >= 22) & (day_arr <= 28)
    is_july4 = (month_arr == 7) & (day_arr >= 3) & (day_arr <= 5)
    holiday_effect = (
        0.06 * is_christmas + 0.05 * is_newyear
        + 0.04 * is_thanksgiving + 0.03 * is_july4
    ).astype(float)
    is_holiday = (is_christmas | is_newyear | is_thanksgiving | is_july4).astype(int)

    # Regime change: podcast launch at day 730 causes 2-week engagement dip
    regime_dip = np.where(
        (t >= 730) & (t < 744),
        -0.03 * (1 - (t - 730) / 14.0),  # Linearly recovering dip
        0.0,
    )

    # Noise: slightly heteroscedastic (weekends noisier)
    noise_scale = np.where(dow >= 5, 0.025, 0.018)
    noise = noise_scale * np.random.randn(n_days)

    engagement = np.clip(
        base + trend + dow_effect + annual + month_effect
        + holiday_effect + regime_dip + noise,
        0.0, 1.0,
    )

    # Missing values (~5%)
    missing_mask = np.random.rand(n_days) < 0.05
    engagement_observed = engagement.copy()
    engagement_observed[missing_mask] = np.nan

    df = pd.DataFrame({
        "date": dates,
        "time_idx": np.arange(n_days),
        "engagement_rate": engagement_observed,
        "engagement_true": engagement,
        "day_of_week": dow,
        "month": month_arr,
        "day_of_month": day_arr,
        "is_weekend": (dow >= 5).astype(int),
        "is_holiday": is_holiday,
    })
    return df


comedy_df = generate_streamrec_comedy_engagement()
print(f"Date range: {comedy_df['date'].iloc[0].date()} to {comedy_df['date'].iloc[-1].date()}")
print(f"Observations: {len(comedy_df)} ({comedy_df['engagement_rate'].notna().sum()} non-missing)")
print(f"Mean engagement: {comedy_df['engagement_rate'].mean():.4f}")
print(f"Weekend mean: {comedy_df.loc[comedy_df['is_weekend']==1, 'engagement_rate'].mean():.4f}")
print(f"Weekday mean: {comedy_df.loc[comedy_df['is_weekend']==0, 'engagement_rate'].mean():.4f}")

Approach 1: Prophet (Baseline)

Prophet is the standard first model for business time series with holidays and multiple seasonality. The team fits it with default configuration plus custom holiday definitions.

from prophet import Prophet
import warnings
warnings.filterwarnings("ignore")


def fit_prophet_model(
    df: pd.DataFrame,
    forecast_horizon: int = 30,
    yearly_seasonality: int = 10,
    weekly_seasonality: int = 3,
) -> Dict[str, np.ndarray]:
    """Fit Prophet and produce probabilistic forecasts.

    Args:
        df: DataFrame with 'date' and 'engagement_rate' columns.
        forecast_horizon: Days to forecast ahead.
        yearly_seasonality: Fourier order for yearly seasonality.
        weekly_seasonality: Fourier order for weekly seasonality.

    Returns:
        Dictionary with point forecast, quantiles, and components.
    """
    # Prepare data (Prophet requires 'ds' and 'y' columns)
    prophet_df = df[["date", "engagement_rate"]].rename(
        columns={"date": "ds", "engagement_rate": "y"}
    ).dropna()

    # Define holidays
    holidays = pd.DataFrame({
        "holiday": (
            ["christmas"] * 3 + ["new_year"] * 3
            + ["thanksgiving"] * 3 + ["july_4th"] * 3
        ),
        "ds": pd.to_datetime([
            "2022-12-25", "2023-12-25", "2024-12-25",
            "2022-01-01", "2023-01-01", "2024-01-01",
            "2022-11-24", "2023-11-23", "2024-11-28",
            "2022-07-04", "2023-07-04", "2024-07-04",
        ]),
        "lower_window": [-2] * 12,
        "upper_window": [2] * 12,
    })

    model = Prophet(
        yearly_seasonality=yearly_seasonality,
        weekly_seasonality=weekly_seasonality,
        holidays=holidays,
        changepoint_prior_scale=0.05,
        seasonality_prior_scale=10.0,
        interval_width=0.90,
        mcmc_samples=0,  # MAP for speed
    )
    model.fit(prophet_df)

    future = model.make_future_dataframe(periods=forecast_horizon)
    forecast = model.predict(future)
    forecast_future = forecast.tail(forecast_horizon)

    return {
        "point": forecast_future["yhat"].values,
        "lower_90": forecast_future["yhat_lower"].values,
        "upper_90": forecast_future["yhat_upper"].values,
        "dates": forecast_future["ds"].values,
    }


# Quick test
prophet_result = fit_prophet_model(comedy_df)
print(f"\nProphet 30-day forecast:")
print(f"  Mean prediction: {prophet_result['point'].mean():.4f}")
print(f"  90% interval width: {(prophet_result['upper_90'] - prophet_result['lower_90']).mean():.4f}")

Approach 2: Bayesian Structural Time Series

The Bayesian STM uses explicit state-space components: a local linear trend, weekly seasonality (3 Fourier pairs), and annual seasonality (5 Fourier pairs). Full posterior inference via Monte Carlo sampling provides uncertainty that accounts for both observation noise and parameter estimation uncertainty.

@dataclass
class StreamRecBSTS:
    """Bayesian STS for StreamRec engagement forecasting.

    Uses Kalman filtering with parametric bootstrap for
    uncertainty quantification. Components: local linear trend,
    weekly Fourier seasonality, annual Fourier seasonality,
    weekend indicator, holiday indicator.

    Attributes:
        n_weekly_fourier: Fourier pairs for weekly seasonality.
        n_annual_fourier: Fourier pairs for annual seasonality.
        n_bootstrap: Number of bootstrap samples for intervals.
    """
    n_weekly_fourier: int = 3
    n_annual_fourier: int = 5
    n_bootstrap: int = 300

    def _build_features(self, df: pd.DataFrame) -> np.ndarray:
        """Build regression feature matrix."""
        t = df["time_idx"].values.astype(float)
        features = []

        # Weekly Fourier
        for k in range(1, self.n_weekly_fourier + 1):
            features.append(np.cos(2 * np.pi * k * t / 7.0))
            features.append(np.sin(2 * np.pi * k * t / 7.0))

        # Annual Fourier
        for k in range(1, self.n_annual_fourier + 1):
            features.append(np.cos(2 * np.pi * k * t / 365.25))
            features.append(np.sin(2 * np.pi * k * t / 365.25))

        # Weekend and holiday indicators
        features.append(df["is_weekend"].values.astype(float))
        features.append(df["is_holiday"].values.astype(float))

        return np.column_stack(features)

    def fit_and_forecast(
        self,
        train_df: pd.DataFrame,
        forecast_horizon: int = 30,
    ) -> Dict[str, np.ndarray]:
        """Fit BSTS and produce forecasts with uncertainty.

        Args:
            train_df: Training DataFrame.
            forecast_horizon: Steps ahead to forecast.

        Returns:
            Dictionary with point, quantiles, and samples.
        """
        y = train_df["engagement_rate"].values
        X_train = self._build_features(train_df)
        n_train = len(y)

        # Fit regression on non-missing observations
        valid = ~np.isnan(y)
        from numpy.linalg import lstsq
        beta, _, _, _ = lstsq(X_train[valid], y[valid], rcond=None)

        # Kalman filter on residuals
        y_resid = y - X_train @ beta
        resid_std = np.nanstd(y_resid)

        # Local linear trend parameters
        sigma_level = resid_std * 0.15
        sigma_trend = resid_std * 0.02
        sigma_obs = resid_std * 0.7

        T_mat = np.array([[1.0, 1.0], [0.0, 1.0]])
        Z_mat = np.array([[1.0, 0.0]])
        Q_mat = np.diag([sigma_level**2, sigma_trend**2])

        # Run Kalman filter
        a = np.array([0.0, 0.0])
        P = np.eye(2) * 1e4
        filtered_a = np.zeros((n_train, 2))
        filtered_P = np.zeros((n_train, 2, 2))

        for t in range(n_train):
            a_pred = T_mat @ a
            P_pred = T_mat @ P @ T_mat.T + Q_mat
            if np.isnan(y_resid[t]):
                a, P = a_pred, P_pred
            else:
                v = y_resid[t] - float(Z_mat @ a_pred)
                F = float(Z_mat @ P_pred @ Z_mat.T) + sigma_obs**2
                K = (P_pred @ Z_mat.T / F).ravel()
                a = a_pred + K * v
                P = P_pred - np.outer(K, Z_mat @ P_pred)
            filtered_a[t] = a
            filtered_P[t] = P

        # Build future features
        last_date = train_df["date"].iloc[-1]
        future_dates = pd.date_range(last_date + pd.Timedelta(days=1), periods=forecast_horizon, freq="D")
        future_df = pd.DataFrame({
            "date": future_dates,
            "time_idx": np.arange(n_train, n_train + forecast_horizon),
            "day_of_week": [d.dayofweek for d in future_dates],
            "month": [d.month for d in future_dates],
            "is_weekend": [int(d.dayofweek >= 5) for d in future_dates],
            "is_holiday": [0] * forecast_horizon,  # Simplified
        })
        X_future = self._build_features(future_df)

        # Bootstrap forecast samples
        reg_cov = np.linalg.inv(X_train[valid].T @ X_train[valid]) * resid_std**2
        samples = np.zeros((self.n_bootstrap, forecast_horizon))

        for s in range(self.n_bootstrap):
            beta_s = np.random.multivariate_normal(beta, reg_cov)
            reg_part = X_future @ beta_s
            a_s = np.random.multivariate_normal(filtered_a[-1], filtered_P[-1])

            for h in range(forecast_horizon):
                a_s = T_mat @ a_s + np.random.multivariate_normal(
                    np.zeros(2), Q_mat
                )
                state_part = float(Z_mat @ a_s)
                obs_noise = np.random.normal(0, sigma_obs)
                samples[s, h] = np.clip(reg_part[h] + state_part + obs_noise, 0, 1)

        point = np.median(samples, axis=0)
        quantiles = np.percentile(samples, [5, 10, 25, 50, 75, 90, 95], axis=0)

        return {
            "point": point,
            "samples": samples,
            "quantiles": quantiles,
            "quantile_levels": np.array([0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95]),
            "dates": future_dates,
        }


bsts_model = StreamRecBSTS()
bsts_result = bsts_model.fit_and_forecast(comedy_df)
print(f"\nBSTS 30-day forecast:")
print(f"  Median prediction: {bsts_result['point'].mean():.4f}")
print(f"  90% interval width: {(bsts_result['quantiles'][5] - bsts_result['quantiles'][1]).mean():.4f}")

Approach 3: TFT (Global Model)

The TFT is trained as a global model across all eight content categories, using category as a static covariate. This allows the model to leverage cross-category patterns — for example, that all categories show weekend effects, but with different magnitudes.

import torch
import torch.nn as nn


class StreamRecTFT(nn.Module):
    """Simplified TFT for multi-category engagement forecasting.

    Encodes historical engagement and covariates via LSTM,
    decodes future covariates, and outputs multi-quantile forecasts.
    Category embedding provides cross-series learning.

    Args:
        n_categories: Number of content categories.
        n_covariates: Temporal covariate features per step.
        embed_dim: Category embedding dimension.
        hidden_dim: LSTM hidden dimension.
        n_quantiles: Output quantiles.
    """

    def __init__(
        self,
        n_categories: int = 8,
        n_covariates: int = 6,
        embed_dim: int = 8,
        hidden_dim: int = 48,
        n_quantiles: int = 7,
    ):
        super().__init__()
        self.category_embed = nn.Embedding(n_categories, embed_dim)
        self.encoder = nn.LSTM(
            input_size=n_covariates + 1 + embed_dim,  # covariates + target + embed
            hidden_size=hidden_dim,
            num_layers=2,
            dropout=0.1,
            batch_first=True,
        )
        self.decoder = nn.LSTM(
            input_size=n_covariates + embed_dim,  # covariates + embed (no target)
            hidden_size=hidden_dim,
            num_layers=2,
            dropout=0.1,
            batch_first=True,
        )
        self.output_head = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, n_quantiles),
        )
        self.quantiles = [0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95]

    def forward(
        self,
        category_ids: torch.Tensor,
        x_encoder: torch.Tensor,
        y_encoder: torch.Tensor,
        x_decoder: torch.Tensor,
    ) -> torch.Tensor:
        """Forward pass.

        Args:
            category_ids: Category indices, shape (batch,).
            x_encoder: Historical covariates, shape (batch, enc_len, n_cov).
            y_encoder: Historical target, shape (batch, enc_len, 1).
            x_decoder: Future covariates, shape (batch, dec_len, n_cov).

        Returns:
            Quantile predictions, shape (batch, dec_len, n_quantiles).
        """
        batch_size = category_ids.shape[0]
        enc_len = x_encoder.shape[1]
        dec_len = x_decoder.shape[1]

        cat_emb = self.category_embed(category_ids)  # (batch, embed_dim)
        cat_enc = cat_emb.unsqueeze(1).expand(-1, enc_len, -1)
        cat_dec = cat_emb.unsqueeze(1).expand(-1, dec_len, -1)

        enc_input = torch.cat([x_encoder, y_encoder, cat_enc], dim=-1)
        _, (h_n, c_n) = self.encoder(enc_input)

        dec_input = torch.cat([x_decoder, cat_dec], dim=-1)
        dec_output, _ = self.decoder(dec_input, (h_n, c_n))

        return self.output_head(dec_output)

    def quantile_loss(self, y_true: torch.Tensor, y_pred: torch.Tensor) -> torch.Tensor:
        """Multi-quantile pinball loss."""
        losses = []
        for i, tau in enumerate(self.quantiles):
            error = y_true - y_pred[:, :, i]
            loss = torch.max(tau * error, (tau - 1.0) * error)
            losses.append(loss.mean())
        return torch.stack(losses).mean()


# Architecture verification
model_tft = StreamRecTFT()
cat_ids = torch.randint(0, 8, (16,))
x_enc = torch.randn(16, 90, 6)
y_enc = torch.randn(16, 90, 1)
x_dec = torch.randn(16, 30, 6)
output = model_tft(cat_ids, x_enc, y_enc, x_dec)
print(f"\nTFT output shape: {output.shape}")  # (16, 30, 7)
total_params = sum(p.numel() for p in model_tft.parameters())
print(f"Total parameters: {total_params:,}")

Walk-Forward Evaluation

The team evaluates all three models using expanding-window walk-forward validation with 30-day forecast horizons and monthly steps.

def walk_forward_evaluate(
    df: pd.DataFrame,
    forecaster_fn,
    min_train: int = 365,
    forecast_horizon: int = 30,
    step: int = 30,
) -> Dict[str, object]:
    """Walk-forward evaluation with multiple metrics.

    Args:
        df: Full DataFrame with 'engagement_rate' and 'engagement_true'.
        forecaster_fn: Callable(train_df) -> dict with 'point' and 'quantiles'.
        min_train: Minimum training days.
        forecast_horizon: Days to forecast per fold.
        step: Days between fold starts.

    Returns:
        Dictionary with per-fold and aggregate metrics.
    """
    n = len(df)
    all_mae, all_rmse_errors, all_coverages = [], [], []
    n_folds = 0

    for start_test in range(min_train, n - forecast_horizon + 1, step):
        end_test = start_test + forecast_horizon
        train_df = df.iloc[:start_test].copy()
        test_df = df.iloc[start_test:end_test].copy()
        y_test = test_df["engagement_true"].values

        result = forecaster_fn(train_df)
        point = result["point"][:len(y_test)]

        mae = np.mean(np.abs(y_test - point))
        rmse_err = (y_test - point) ** 2
        all_mae.append(mae)
        all_rmse_errors.extend(rmse_err.tolist())

        if "quantiles" in result:
            q = result["quantiles"]
            # 90% coverage: between q[1] (10th pctile) and q[5] (90th pctile)
            covered = np.mean((y_test >= q[1, :len(y_test)]) & (y_test <= q[5, :len(y_test)]))
            all_coverages.append(covered)

        n_folds += 1

    # MASE: compare with naive forecast
    y_full = df["engagement_true"].values[:min_train]
    naive_mae = np.mean(np.abs(np.diff(y_full)))

    return {
        "n_folds": n_folds,
        "mae": np.mean(all_mae),
        "rmse": np.sqrt(np.mean(all_rmse_errors)),
        "mase": np.mean(all_mae) / naive_mae,
        "coverage_90": np.mean(all_coverages) if all_coverages else None,
    }


# Evaluate BSTS via walk-forward
def bsts_forecaster(train_df: pd.DataFrame) -> dict:
    model = StreamRecBSTS(n_bootstrap=100)  # Fewer samples for speed
    return model.fit_and_forecast(train_df, forecast_horizon=30)


bsts_wf = walk_forward_evaluate(comedy_df, bsts_forecaster, min_train=365, step=30)

print("\n=== Walk-Forward Results ===")
print(f"{'Model':<15s} {'MAE':>8s} {'RMSE':>8s} {'MASE':>8s} {'90% Cov':>10s} {'Folds':>7s}")
print("-" * 60)
print(f"{'BSTS':<15s} {bsts_wf['mae']:>8.4f} {bsts_wf['rmse']:>8.4f} "
      f"{bsts_wf['mase']:>8.3f} {bsts_wf['coverage_90']:>10.3f} {bsts_wf['n_folds']:>7d}")

Results and Comparison

The team collects walk-forward results across all three models:

Model	MAE	RMSE	MASE	90% Coverage	Avg Interval Width
Prophet	~0.020	~0.026	~0.85	~0.82	~0.070
BSTS	~0.018	~0.023	~0.76	~0.88	~0.085
TFT (global)	~0.016	~0.021	~0.68	~0.84	~0.075

Five findings emerge:

All three models beat the naive baseline (MASE < 1). The seasonal patterns in engagement are strong and predictable. Even Prophet — the simplest model — captures the weekly and annual cycles well enough to improve over the random walk by 15%.
BSTS provides the best-calibrated intervals. Its 90% coverage (88%) is closest to the nominal 90%. The Bayesian framework's explicit uncertainty propagation produces intervals that are appropriately wide — slightly wider than the other models, but that width is the price of honesty. Prophet's intervals (82% coverage) are too narrow, reflecting the known limitation of MAP estimation (no parameter uncertainty propagation).
TFT wins on point accuracy. The global model's ability to learn cross-category patterns gives it an edge. The weekend effect, which varies by category (comedy weekends are 18% higher; documentary weekends are only 8% higher), is captured more precisely when the TFT sees all categories simultaneously.
The regime change at day 730 hurts all models temporarily. Prophet handles it through its changepoint detection mechanism. BSTS adapts through the Kalman filter's predict-update cycle. TFT, having seen similar dips across categories during training, recovers fastest — but this advantage depends on the training data containing analogous events.
Interval width is not the right optimization target. Prophet's intervals are the narrowest, but they are also the most miscalibrated. The operations team initially preferred Prophet's "precision" — until the data science team demonstrated that 18% of actual outcomes fell outside Prophet's 90% intervals, meaning the operations team was being surprised nearly twice as often as they should have been.

Recommendation

The team recommends the BSTS model for production deployment, wrapped with Adaptive Conformal Inference for additional robustness:

@dataclass
class AdaptiveConformalWrapper:
    """ACI wrapper for any point forecaster.

    Attributes:
        alpha: Target miscoverage rate.
        gamma: Adaptation rate.
    """
    alpha: float = 0.10
    gamma: float = 0.008

    def calibrate_and_forecast(
        self,
        historical_errors: np.ndarray,
        point_forecast: np.ndarray,
    ) -> Dict[str, np.ndarray]:
        """Calibrate on historical errors, then apply to forecast.

        Args:
            historical_errors: Absolute errors from walk-forward, shape (n,).
            point_forecast: Future point predictions, shape (H,).

        Returns:
            Dictionary with lower, upper bounds and interval width.
        """
        # Run ACI on historical errors to determine current q
        q_t = np.percentile(historical_errors[:50], 100 * (1 - self.alpha))
        for err in historical_errors[50:]:
            is_err = float(err > q_t)
            q_t = max(0, q_t + self.gamma * (is_err - self.alpha))

        # Apply calibrated q to forecast
        lower = point_forecast - q_t
        upper = point_forecast + q_t

        return {
            "lower": lower,
            "upper": upper,
            "interval_width": 2 * q_t,
            "q_final": q_t,
        }


# Apply ACI wrapper to BSTS point forecasts
aci = AdaptiveConformalWrapper(alpha=0.10, gamma=0.008)
# In production: historical_errors would come from walk-forward residuals
demo_errors = np.abs(np.random.normal(0, 0.02, 200))
demo_forecast = bsts_result["point"]
aci_result = aci.calibrate_and_forecast(demo_errors, demo_forecast)

print(f"\nACI-wrapped BSTS:")
print(f"  Calibrated interval half-width: {aci_result['q_final']:.4f}")
print(f"  Forecast interval width: {aci_result['interval_width']:.4f}")

The rationale: BSTS provides the most honest native intervals, but the ACI wrapper adds distribution-free robustness. If the engagement dynamics shift — due to a new content vertical, a platform redesign, or an external shock — the conformal wrapper will widen the intervals automatically. The operations team receives forecasts they can trust: "When we say 90% interval, we mean 90% interval, regardless of what the platform throws at us."

Discussion Questions

The TFT was trained globally across 8 categories. Would performance improve if the team trained a separate TFT per category? What would be lost?
Prophet's MAP estimation leads to overconfident intervals. Would switching to Prophet's MCMC mode (mcmc_samples=300) close the calibration gap with BSTS? What is the computational cost?
The comedy category has the highest traffic — forecasting errors here have the highest business impact. Should the team use a different model selection criterion (e.g., minimize the 95th percentile of absolute error rather than MAE) to account for the asymmetric cost of large forecast misses?