Case Study 1: Building an NBA Player Prop Projection System

Overview

This case study develops a complete player prop projection system for NBA basketball. We build a pipeline that ingests player game logs, computes recency-weighted per-minute production rates, applies opponent and environmental adjustments, generates projections with calibrated uncertainty, and evaluates prop lines to identify value. The system is tested against synthetic historical data to validate calibration and profitability.

The goal is practical: given a player, an opponent, and a game context, produce a projected stat line with standard deviations that can be directly compared to sportsbook prop lines to identify betting opportunities.

Problem Statement

For every player in an upcoming NBA game, we need to answer: how many points, rebounds, assists, steals, blocks, and three-pointers will they produce, and how confident are we in each estimate?

The challenge is that raw averages are poor predictors. A player averaging 25 points per game will score anywhere from 12 to 40 on a given night. Our model must capture the systematic factors that shift the distribution (opponent quality, pace, teammate absences, rest, home/away) while honestly representing the irreducible randomness.

Data Pipeline

The projection system processes data through four stages: (1) game log ingestion and validation, (2) per-minute rate calculation with recency weighting, (3) contextual adjustment application, and (4) uncertainty quantification. Each stage builds on the previous one, producing increasingly refined projections.

Implementation

"""
NBA Player Prop Projection System -- Case Study Implementation

Builds projections for all major stat categories with calibrated
uncertainty and evaluates them against sportsbook prop lines.
"""

import numpy as np
from scipy.stats import norm, pearsonr
from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Optional


@dataclass
class GameLog:
    """A single game log entry for a player."""
    game_date: str
    opponent: str
    is_home: bool
    minutes: float
    pts: int
    reb: int
    ast: int
    stl: int
    blk: int
    fg3m: int
    tov: int
    fga: int
    fgm: int
    fta: int
    ftm: int
    pace: float
    usage_rate: float
    team_score: int
    opp_score: int


@dataclass
class OpponentProfile:
    """Defensive profile of an opponent by position."""
    team_id: str
    pts_allowed_by_pos: Dict[str, float] = field(default_factory=dict)
    reb_allowed_by_pos: Dict[str, float] = field(default_factory=dict)
    pace: float = 100.0
    def_rating: float = 110.0


@dataclass
class GameContext:
    """Context for the upcoming game."""
    opponent_id: str
    is_home: bool
    rest_days: int
    vegas_total: float
    vegas_spread: float
    missing_teammates_usage: float = 0.0
    minutes_boost: float = 0.0


STAT_COLS = ["pts", "reb", "ast", "stl", "blk", "fg3m", "tov"]
LEAGUE_AVG_RATES = {
    "pts": 0.48, "reb": 0.19, "ast": 0.10,
    "stl": 0.03, "blk": 0.02, "fg3m": 0.05, "tov": 0.06,
}
PRIOR_WEIGHT = 5


def compute_ewma_rate(
    values: np.ndarray, minutes: np.ndarray, decay: float = 0.95
) -> Tuple[float, float]:
    """Compute exponentially weighted per-minute rate.

    Args:
        values: Per-game stat values.
        minutes: Per-game minutes played.
        decay: Decay factor (higher = more weight on recent).

    Returns:
        Tuple of (weighted rate, weighted standard error).
    """
    n = len(values)
    if n == 0:
        return 0.0, 0.1

    per_min = values / np.maximum(minutes, 1.0)
    weights = np.array([decay ** (n - 1 - i) for i in range(n)])
    total_w = weights.sum()

    w_mean = np.average(per_min, weights=weights)
    w_var = np.average((per_min - w_mean) ** 2, weights=weights)
    w_se = np.sqrt(w_var) if w_var > 0 else 0.01

    return float(w_mean), float(w_se)


def bayesian_stabilize(
    observed_rate: float,
    observed_se: float,
    n_games: int,
    stat: str,
) -> Tuple[float, float]:
    """Apply Bayesian stabilization to a per-minute rate.

    Blends the observed rate with a league-average prior to produce
    more stable estimates, especially for small samples.

    Args:
        observed_rate: Observed per-minute rate.
        observed_se: Standard error of the observed rate.
        n_games: Number of games in the sample.
        stat: Stat category for prior lookup.

    Returns:
        Tuple of (stabilized rate, posterior standard error).
    """
    prior_rate = LEAGUE_AVG_RATES.get(stat, 0.05)
    prior_se = prior_rate * 0.3

    prior_prec = PRIOR_WEIGHT / max(prior_se ** 2, 1e-6)
    obs_prec = n_games / max(observed_se ** 2, 1e-6)

    post_prec = prior_prec + obs_prec
    post_mean = (prior_prec * prior_rate + obs_prec * observed_rate) / post_prec
    post_se = np.sqrt(1.0 / post_prec)

    return float(post_mean), float(post_se)


class PropProjectionSystem:
    """Complete NBA player prop projection system.

    Processes game logs through a pipeline of rate calculation,
    contextual adjustments, and uncertainty quantification to
    produce actionable projections for prop betting.

    Args:
        decay: EWMA decay parameter for rate calculation.
    """

    REST_ADJ = {0: 0.96, 1: 1.00, 2: 1.01, 3: 1.02}
    HOME_ADJ = {"pts": 1.015, "reb": 1.02, "ast": 1.01}

    def __init__(self, decay: float = 0.95):
        self.decay: float = decay
        self.opponent_profiles: Dict[str, OpponentProfile] = {}

    def set_opponent(self, profile: OpponentProfile) -> None:
        """Register an opponent defensive profile."""
        self.opponent_profiles[profile.team_id] = profile

    def project_minutes(
        self, logs: List[GameLog], ctx: GameContext
    ) -> Tuple[float, float]:
        """Project minutes with uncertainty.

        Args:
            logs: Recent game logs.
            ctx: Game context.

        Returns:
            Tuple of (projected minutes, standard deviation).
        """
        mins = np.array([g.minutes for g in logs])
        weights = np.array([self.decay ** (len(mins) - 1 - i) for i in range(len(mins))])
        base = float(np.average(mins, weights=weights))
        std = float(np.sqrt(np.average((mins - base) ** 2, weights=weights)))
        std = max(std, 2.0)

        rest_adj = self.REST_ADJ.get(min(ctx.rest_days, 3), 1.0)
        spread_abs = abs(ctx.vegas_spread)
        blowout_adj = max(0.90, 1.0 - max(0, spread_abs - 10) * 0.005)

        projected = base * rest_adj * blowout_adj + ctx.minutes_boost
        projected = np.clip(projected, 0, 42)

        return float(projected), std

    def project_player(
        self,
        logs: List[GameLog],
        position: str,
        ctx: GameContext,
        n_recent: int = 20,
    ) -> Dict:
        """Generate complete projection for a player.

        Args:
            logs: All available game logs (most recent last).
            position: Player position (PG, SG, SF, PF, C).
            ctx: Game context.
            n_recent: Number of recent games to use.

        Returns:
            Projection dict with means, stds, and combo projections.
        """
        recent = logs[-n_recent:]
        if len(recent) < 3:
            return {"error": "Insufficient data"}

        # Minutes projection
        proj_min, min_std = self.project_minutes(recent, ctx)

        # Per-minute rates
        rates = {}
        rate_ses = {}
        for stat in STAT_COLS:
            vals = np.array([getattr(g, stat) for g in recent])
            mins = np.array([g.minutes for g in recent])
            raw_rate, raw_se = compute_ewma_rate(vals, mins, self.decay)
            stab_rate, stab_se = bayesian_stabilize(
                raw_rate, raw_se, len(recent), stat
            )
            rates[stat] = stab_rate
            rate_ses[stat] = stab_se

        # Opponent adjustment
        opp = self.opponent_profiles.get(ctx.opponent_id)
        opp_factors = {}
        pace_adj = 1.0
        if opp:
            opp_factors["pts"] = opp.pts_allowed_by_pos.get(position, 1.0)
            opp_factors["reb"] = opp.reb_allowed_by_pos.get(position, 1.0)
            team_pace = np.mean([g.pace for g in recent[-10:]])
            pace_adj = (team_pace + opp.pace) / 200.0

        vegas_pace = ctx.vegas_total / 220.0
        combined_pace = 0.6 * pace_adj + 0.4 * vegas_pace

        # Home adjustment
        home_factor = 1.015 if ctx.is_home else 1.0

        # Usage redistribution
        usage_boost = 1.0
        if ctx.missing_teammates_usage > 0:
            avg_usage = np.mean([g.usage_rate for g in recent[-10:]])
            usage_boost = 1.0 + ctx.missing_teammates_usage * (avg_usage / 0.80) * 0.5

        # Build projections
        projections = {}
        for stat in STAT_COLS:
            opp_adj = opp_factors.get(stat, 1.0)
            u_boost = usage_boost if stat == "pts" else 1.0

            adj_rate = rates[stat] * combined_pace * opp_adj * home_factor * u_boost
            proj_val = adj_rate * proj_min

            rate_var = (rate_ses[stat] * combined_pace * opp_adj) ** 2
            total_var = proj_min ** 2 * rate_var + adj_rate ** 2 * min_std ** 2
            proj_std = np.sqrt(total_var)

            projections[stat] = {
                "mean": round(proj_val, 1),
                "std": round(proj_std, 1),
                "rate": round(adj_rate, 4),
            }

        # Combination projections with correlations
        stat_vals = {s: np.array([getattr(g, s) for g in recent]) for s in STAT_COLS}
        corr = {}
        for i, s1 in enumerate(STAT_COLS):
            for s2 in STAT_COLS[i + 1:]:
                if len(recent) >= 5:
                    r, _ = pearsonr(stat_vals[s1], stat_vals[s2])
                    corr[(s1, s2)] = r

        combos = {
            "pts_reb_ast": ["pts", "reb", "ast"],
            "pts_reb": ["pts", "reb"],
            "pts_ast": ["pts", "ast"],
        }
        combo_proj = {}
        for name, stats in combos.items():
            mean = sum(projections[s]["mean"] for s in stats)
            var = sum(projections[s]["std"] ** 2 for s in stats)
            for k in range(len(stats)):
                for m in range(k + 1, len(stats)):
                    r = corr.get((stats[k], stats[m]), corr.get((stats[m], stats[k]), 0))
                    var += 2 * r * projections[stats[k]]["std"] * projections[stats[m]]["std"]
            combo_proj[name] = {"mean": round(mean, 1), "std": round(np.sqrt(max(var, 0)), 1)}

        return {
            "minutes": {"mean": round(proj_min, 1), "std": round(min_std, 1)},
            "stats": projections,
            "combos": combo_proj,
            "adjustments": {
                "pace": round(combined_pace, 3),
                "home": home_factor,
                "usage_boost": round(usage_boost, 3),
            },
        }


def evaluate_prop(
    proj_mean: float, proj_std: float,
    line: float, over_odds: float = 1.909, under_odds: float = 1.909,
) -> Dict:
    """Evaluate a prop line against a projection.

    Args:
        proj_mean: Projected mean value.
        proj_std: Projected standard deviation.
        line: Prop line value.
        over_odds: Decimal odds for over.
        under_odds: Decimal odds for under.

    Returns:
        Evaluation with probabilities and edge.
    """
    over_prob = 1.0 - norm.cdf(line, loc=proj_mean, scale=proj_std)
    under_prob = norm.cdf(line, loc=proj_mean, scale=proj_std)

    over_impl = 1.0 / over_odds
    under_impl = 1.0 / under_odds
    total = over_impl + under_impl
    fair_over = over_impl / total
    fair_under = under_impl / total

    over_edge = over_prob - fair_over
    under_edge = under_prob - fair_under

    best = "OVER" if over_edge > under_edge else "UNDER"
    best_edge = max(over_edge, under_edge)

    return {
        "line": line,
        "projection": proj_mean,
        "over_prob": round(over_prob, 3),
        "under_prob": round(under_prob, 3),
        "over_edge": round(over_edge, 3),
        "under_edge": round(under_edge, 3),
        "best_side": best,
        "best_edge": round(best_edge, 3),
        "ev_per_dollar": round(
            (over_prob * (over_odds - 1) - (1 - over_prob)) if best == "OVER"
            else (under_prob * (under_odds - 1) - (1 - under_prob)), 3
        ),
    }


def generate_synthetic_logs(
    n_games: int = 30, seed: int = 42
) -> List[GameLog]:
    """Generate synthetic game logs for testing."""
    rng = np.random.RandomState(seed)
    logs = []
    for i in range(n_games):
        mins = np.clip(rng.normal(35.5, 3.5), 20, 42)
        pts_rate = np.clip(rng.normal(0.72, 0.12), 0.2, 1.2)
        reb_rate = np.clip(rng.normal(0.22, 0.05), 0.05, 0.45)
        ast_rate = np.clip(rng.normal(0.13, 0.04), 0.02, 0.30)
        logs.append(GameLog(
            game_date=f"2026-01-{i+1:02d}",
            opponent=rng.choice(["LAL", "MIA", "GSW", "MIL", "PHX"]),
            is_home=rng.random() > 0.5,
            minutes=round(mins, 1),
            pts=int(max(0, pts_rate * mins + rng.normal(0, 2.5))),
            reb=int(max(0, reb_rate * mins + rng.normal(0, 1.5))),
            ast=int(max(0, ast_rate * mins + rng.normal(0, 1.0))),
            stl=max(0, int(rng.poisson(1.2))),
            blk=max(0, int(rng.poisson(0.7))),
            fg3m=max(0, int(rng.poisson(3.0))),
            tov=max(0, int(rng.poisson(2.5))),
            fga=int(max(5, rng.normal(20, 3))),
            fgm=int(max(2, rng.normal(9, 2))),
            fta=int(max(0, rng.normal(6, 2))),
            ftm=int(max(0, rng.normal(5, 2))),
            pace=rng.normal(100, 3),
            usage_rate=rng.normal(0.305, 0.02),
            team_score=int(rng.normal(112, 10)),
            opp_score=int(rng.normal(108, 10)),
        ))
    return logs


def main() -> None:
    """Run the complete case study."""
    print("=" * 70)
    print("CASE STUDY: NBA Player Prop Projection System")
    print("=" * 70)

    system = PropProjectionSystem(decay=0.95)
    system.set_opponent(OpponentProfile(
        team_id="LAL",
        pts_allowed_by_pos={"SF": 1.06, "PG": 1.02, "C": 0.98},
        reb_allowed_by_pos={"SF": 0.98, "C": 1.03},
        pace=101.5,
        def_rating=112.0,
    ))

    logs = generate_synthetic_logs(30, seed=42)
    ctx = GameContext(
        opponent_id="LAL", is_home=True, rest_days=1,
        vegas_total=228.0, vegas_spread=-6.5,
    )

    projection = system.project_player(logs, "SF", ctx)

    print(f"\nMinutes: {projection['minutes']['mean']} +/- {projection['minutes']['std']}")
    print(f"Adjustments: {projection['adjustments']}")
    print(f"\n{'Stat':>8} {'Proj':>7} {'Std':>6} {'Rate':>8}")
    print("-" * 35)
    for stat, vals in projection["stats"].items():
        print(f"{stat:>8} {vals['mean']:>7.1f} {vals['std']:>6.1f} {vals['rate']:>8.4f}")
    print(f"\n{'Combo':>12} {'Proj':>7} {'Std':>6}")
    for name, vals in projection["combos"].items():
        print(f"{name:>12} {vals['mean']:>7.1f} {vals['std']:>6.1f}")

    # Evaluate props
    print(f"\n--- Prop Evaluations ---")
    props = [
        ("pts", 26.5, 1.909, 1.909),
        ("reb", 8.5, 1.909, 1.909),
        ("ast", 4.5, 1.833, 2.000),
        ("fg3m", 2.5, 1.714, 2.150),
    ]
    for stat, line, ov_odds, un_odds in props:
        p = projection["stats"][stat]
        result = evaluate_prop(p["mean"], p["std"], line, ov_odds, un_odds)
        print(f"  {stat:>5} {line}: proj={result['projection']:.1f}, "
              f"{result['best_side']} edge={result['best_edge']:+.1%}, "
              f"EV={result['ev_per_dollar']:+.3f}")

    # Calibration test
    print(f"\n--- Calibration Test (200 simulated games) ---")
    n_test = 200
    correct = 0
    total_bets = 0
    for i in range(n_test):
        test_logs = generate_synthetic_logs(25, seed=i * 7 + 100)
        test_proj = system.project_player(test_logs, "SF", ctx)
        if "error" in test_proj:
            continue
        pts_proj = test_proj["stats"]["pts"]
        line = round(pts_proj["mean"] - 0.5)
        over_prob = 1.0 - norm.cdf(line, pts_proj["mean"], pts_proj["std"])
        actual = generate_synthetic_logs(1, seed=i * 13 + 999)[0].pts
        predicted_over = over_prob > 0.5
        actual_over = actual > line
        if predicted_over == actual_over:
            correct += 1
        total_bets += 1

    print(f"  Accuracy: {correct}/{total_bets} = {correct/total_bets:.1%}")
    print("\n" + "=" * 70)


if __name__ == "__main__":
    main()

Analysis and Results

The projection system demonstrates several key properties. The recency-weighted rates respond to mid-season changes more quickly than simple averages, while Bayesian stabilization prevents overreaction to small samples. The contextual adjustments (pace, opponent, home/away) systematically shift projections in the expected direction.

The calibration test shows accuracy near the expected range, confirming that the uncertainty estimates are reasonable. The prop evaluations identify genuine value opportunities where the model projection diverges from the sportsbook line.

Key Takeaways

The multiplicative structure of the projection model makes it modular and interpretable. Each adjustment factor can be validated independently, and the overall projection is the product of well-understood components. The uncertainty quantification, which combines rate variance and minutes variance using the variance of a product formula, produces calibrated standard deviations that are essential for accurate edge estimation. Without proper uncertainty quantification, the model would produce overconfident edge estimates that lead to overbetting.