5 min read

Predicting future player performance represents one of the most challenging and valuable applications of basketball analytics. Whether evaluating free agent signings, trade acquisitions, or draft prospects, front offices must project how players...

Chapter 22: Player Performance Prediction

Introduction

Predicting future player performance represents one of the most challenging and valuable applications of basketball analytics. Whether evaluating free agent signings, trade acquisitions, or draft prospects, front offices must project how players will perform in coming seasons. This chapter explores the statistical foundations of player projection systems, from simple regression models to sophisticated systems like CARMELO and RAPTOR.

The difficulty of player projection stems from multiple sources of uncertainty: natural performance variation, injury risk, age-related decline, team context changes, and the inherent randomness in basketball outcomes. Effective projection systems must account for all these factors while remaining interpretable and actionable for decision-makers.

We will develop a comprehensive framework for player projections, starting with foundational regression techniques and building toward complete projection systems. Along the way, we examine aging curves, similarity scores, and methods for quantifying projection uncertainty.


22.1 Foundations of Player Projection

The Projection Problem

At its core, player projection asks: given everything we know about a player today, what is our best estimate of their future performance? This question involves several sub-problems:

  1. Skill estimation: What are the player's true underlying abilities?
  2. Aging adjustment: How will those abilities change over time?
  3. Context adjustment: How will team/role changes affect observed performance?
  4. Uncertainty quantification: How confident are we in our projections?

Historical Performance as a Starting Point

The simplest projection approach uses historical performance directly. If a player averaged 20 points per game last season, we might project 20 points next season. However, this naive approach ignores several important factors:

import numpy as np
import pandas as pd
from typing import List, Tuple, Dict, Optional
from dataclasses import dataclass
from scipy import stats
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

@dataclass
class PlayerSeason:
    """Represents a single season of player statistics."""
    player_id: str
    player_name: str
    season: int
    age: float
    games_played: int
    minutes_per_game: float
    points_per_game: float
    rebounds_per_game: float
    assists_per_game: float
    steals_per_game: float
    blocks_per_game: float
    turnovers_per_game: float
    fg_pct: float
    three_pct: float
    ft_pct: float
    true_shooting_pct: float
    usage_rate: float
    per: float  # Player Efficiency Rating
    box_plus_minus: float
    vorp: float
    win_shares: float

class NaiveProjection:
    """
    Naive projection using only last season's statistics.
    Serves as a baseline for more sophisticated methods.
    """

    def __init__(self):
        self.last_season_stats = {}

    def fit(self, player_seasons: List[PlayerSeason]) -> None:
        """Store most recent season for each player."""
        for ps in sorted(player_seasons, key=lambda x: x.season):
            self.last_season_stats[ps.player_id] = ps

    def project(self, player_id: str, target_stat: str) -> float:
        """Project by returning last season's value."""
        if player_id not in self.last_season_stats:
            raise ValueError(f"No data for player {player_id}")
        return getattr(self.last_season_stats[player_id], target_stat)

    def evaluate(self, test_seasons: List[PlayerSeason],
                 target_stat: str) -> Dict[str, float]:
        """Evaluate projection accuracy on test data."""
        predictions = []
        actuals = []

        for ps in test_seasons:
            if ps.player_id in self.last_season_stats:
                pred = self.project(ps.player_id, target_stat)
                actual = getattr(ps, target_stat)
                predictions.append(pred)
                actuals.append(actual)

        predictions = np.array(predictions)
        actuals = np.array(actuals)

        return {
            'rmse': np.sqrt(mean_squared_error(actuals, predictions)),
            'mae': mean_absolute_error(actuals, predictions),
            'correlation': np.corrcoef(actuals, predictions)[0, 1],
            'n_players': len(predictions)
        }

Regression to the Mean

One of the most important concepts in player projection is regression to the mean. Extreme performances in one season tend to move toward average performances in subsequent seasons. This occurs because:

  1. Measurement noise: Single-season statistics contain random variation
  2. True talent estimation: Observed performance = True talent + Luck
  3. Sample size limitations: Even 82 games provides limited information

The degree of regression depends on the reliability of each statistic:

class RegressionToMean:
    """
    Apply regression to the mean based on stat reliability.
    More reliable stats regress less toward population mean.
    """

    # Reliability coefficients (year-to-year correlations)
    # Higher values = more reliable = less regression
    STAT_RELIABILITY = {
        'points_per_game': 0.85,
        'rebounds_per_game': 0.80,
        'assists_per_game': 0.82,
        'steals_per_game': 0.55,
        'blocks_per_game': 0.65,
        'turnovers_per_game': 0.70,
        'fg_pct': 0.60,
        'three_pct': 0.50,
        'ft_pct': 0.85,
        'true_shooting_pct': 0.65,
        'usage_rate': 0.80,
        'per': 0.75,
        'box_plus_minus': 0.70,
        'win_shares': 0.65,
    }

    # League average values (approximate)
    LEAGUE_AVERAGES = {
        'points_per_game': 11.0,
        'rebounds_per_game': 4.5,
        'assists_per_game': 2.5,
        'steals_per_game': 0.7,
        'blocks_per_game': 0.5,
        'turnovers_per_game': 1.3,
        'fg_pct': 0.450,
        'three_pct': 0.360,
        'ft_pct': 0.770,
        'true_shooting_pct': 0.560,
        'usage_rate': 0.200,
        'per': 15.0,
        'box_plus_minus': 0.0,
        'win_shares': 4.0,
    }

    def __init__(self, custom_reliability: Optional[Dict[str, float]] = None):
        self.reliability = self.STAT_RELIABILITY.copy()
        if custom_reliability:
            self.reliability.update(custom_reliability)

    def regress(self, observed_value: float, stat_name: str,
                games_played: int = 82,
                minutes_per_game: float = 30.0) -> float:
        """
        Regress observed value toward mean based on reliability
        and sample size.

        Formula: Regressed = r * Observed + (1-r) * Mean
        where r is adjusted reliability based on playing time
        """
        if stat_name not in self.reliability:
            return observed_value

        base_reliability = self.reliability[stat_name]
        league_mean = self.LEAGUE_AVERAGES.get(stat_name, observed_value)

        # Adjust reliability for sample size
        # Full reliability assumes ~2000 minutes played
        expected_minutes = 2000
        actual_minutes = games_played * minutes_per_game
        sample_adjustment = min(1.0, actual_minutes / expected_minutes)

        adjusted_reliability = base_reliability * sample_adjustment

        # Apply regression
        regressed_value = (adjusted_reliability * observed_value +
                          (1 - adjusted_reliability) * league_mean)

        return regressed_value

    def regress_player_season(self, player_season: PlayerSeason) -> Dict[str, float]:
        """Regress all statistics for a player season."""
        regressed_stats = {}

        for stat_name in self.STAT_RELIABILITY.keys():
            observed = getattr(player_season, stat_name, None)
            if observed is not None:
                regressed_stats[stat_name] = self.regress(
                    observed, stat_name,
                    player_season.games_played,
                    player_season.minutes_per_game
                )

        return regressed_stats

22.2 Regression Models for Player Projections

Simple Linear Regression

Linear regression provides a foundation for understanding the relationship between past and future performance:

class LinearProjectionModel:
    """
    Linear regression model for player projections.
    Uses previous seasons to predict next season.
    """

    def __init__(self, n_prior_seasons: int = 3,
                 regularization: float = 1.0):
        self.n_prior_seasons = n_prior_seasons
        self.regularization = regularization
        self.models = {}
        self.scalers = {}

    def prepare_features(self, player_history: List[PlayerSeason],
                        target_season: int) -> Optional[np.ndarray]:
        """
        Create feature vector from prior seasons.
        Returns None if insufficient history.
        """
        # Get seasons before target
        prior_seasons = [ps for ps in player_history
                        if ps.season < target_season]
        prior_seasons = sorted(prior_seasons,
                              key=lambda x: x.season, reverse=True)

        if len(prior_seasons) < 1:
            return None

        features = []

        # Use up to n_prior_seasons
        for i in range(self.n_prior_seasons):
            if i < len(prior_seasons):
                ps = prior_seasons[i]
                features.extend([
                    ps.points_per_game,
                    ps.rebounds_per_game,
                    ps.assists_per_game,
                    ps.true_shooting_pct,
                    ps.usage_rate,
                    ps.per,
                    ps.box_plus_minus,
                    ps.games_played / 82.0,  # Health proxy
                    ps.minutes_per_game / 36.0,  # Role proxy
                ])
            else:
                # Pad with zeros for missing seasons
                features.extend([0.0] * 9)

        # Add age at target season
        if prior_seasons:
            current_age = prior_seasons[0].age
            target_age = current_age + (target_season - prior_seasons[0].season)
            features.append(target_age)

        return np.array(features)

    def fit(self, player_histories: Dict[str, List[PlayerSeason]],
            target_stat: str) -> None:
        """
        Fit model to predict target_stat.

        Args:
            player_histories: Dict mapping player_id to list of seasons
            target_stat: Statistic to predict
        """
        X = []
        y = []

        for player_id, seasons in player_histories.items():
            seasons = sorted(seasons, key=lambda x: x.season)

            for i, target_season in enumerate(seasons):
                if i == 0:
                    continue  # Need at least one prior season

                features = self.prepare_features(seasons[:i],
                                                target_season.season)
                if features is not None:
                    X.append(features)
                    y.append(getattr(target_season, target_stat))

        X = np.array(X)
        y = np.array(y)

        # Standardize features
        self.scalers[target_stat] = StandardScaler()
        X_scaled = self.scalers[target_stat].fit_transform(X)

        # Fit ridge regression
        self.models[target_stat] = Ridge(alpha=self.regularization)
        self.models[target_stat].fit(X_scaled, y)

    def predict(self, player_history: List[PlayerSeason],
                target_season: int, target_stat: str) -> Optional[float]:
        """Predict target_stat for target_season."""
        if target_stat not in self.models:
            raise ValueError(f"Model not fitted for {target_stat}")

        features = self.prepare_features(player_history, target_season)
        if features is None:
            return None

        features_scaled = self.scalers[target_stat].transform(
            features.reshape(1, -1)
        )
        return self.models[target_stat].predict(features_scaled)[0]

Feature Engineering for Prediction

Effective prediction requires thoughtful feature engineering. Key features for player projections include:

class AdvancedFeatureEngineering:
    """
    Advanced feature engineering for player projections.
    Creates informative features from raw statistics.
    """

    @staticmethod
    def calculate_trajectory(seasons: List[PlayerSeason],
                            stat_name: str) -> Dict[str, float]:
        """
        Calculate performance trajectory over recent seasons.
        Returns trend indicators.
        """
        if len(seasons) < 2:
            return {'trend': 0.0, 'volatility': 0.0, 'acceleration': 0.0}

        seasons = sorted(seasons, key=lambda x: x.season)
        values = [getattr(s, stat_name) for s in seasons]

        # Linear trend (slope of recent performance)
        x = np.arange(len(values))
        slope, _, _, _, _ = stats.linregress(x, values)

        # Volatility (standard deviation of year-to-year changes)
        changes = np.diff(values)
        volatility = np.std(changes) if len(changes) > 1 else 0.0

        # Acceleration (change in slope)
        if len(values) >= 3:
            early_slope, _, _, _, _ = stats.linregress(
                x[:len(x)//2+1], values[:len(values)//2+1]
            )
            late_slope, _, _, _, _ = stats.linregress(
                x[len(x)//2:], values[len(values)//2:]
            )
            acceleration = late_slope - early_slope
        else:
            acceleration = 0.0

        return {
            'trend': slope,
            'volatility': volatility,
            'acceleration': acceleration
        }

    @staticmethod
    def create_composite_features(player_season: PlayerSeason) -> Dict[str, float]:
        """
        Create composite features from raw statistics.
        These capture skill interactions and playing style.
        """
        features = {}

        # Scoring efficiency adjusted for volume
        features['volume_adjusted_efficiency'] = (
            player_season.true_shooting_pct *
            np.log1p(player_season.usage_rate * 100)
        )

        # Playmaking ratio
        if player_season.turnovers_per_game > 0:
            features['ast_to_ratio'] = (
                player_season.assists_per_game /
                player_season.turnovers_per_game
            )
        else:
            features['ast_to_ratio'] = player_season.assists_per_game * 2

        # Stocks (steals + blocks) per minute
        if player_season.minutes_per_game > 0:
            features['stocks_per_minute'] = (
                (player_season.steals_per_game +
                 player_season.blocks_per_game) /
                player_season.minutes_per_game
            )
        else:
            features['stocks_per_minute'] = 0.0

        # Box score productivity
        features['box_productivity'] = (
            player_season.points_per_game +
            1.2 * player_season.rebounds_per_game +
            1.5 * player_season.assists_per_game +
            2.0 * player_season.steals_per_game +
            2.0 * player_season.blocks_per_game -
            player_season.turnovers_per_game
        )

        # Minutes share (indicator of role importance)
        features['minutes_share'] = player_season.minutes_per_game / 48.0

        # Health factor
        features['games_rate'] = player_season.games_played / 82.0

        return features

    @staticmethod
    def calculate_prime_distance(age: float, position: str = 'unknown') -> float:
        """
        Calculate distance from statistical prime.
        Different positions have different prime ages.
        """
        prime_ages = {
            'PG': 27.5,
            'SG': 27.0,
            'SF': 27.0,
            'PF': 26.5,
            'C': 26.0,
            'unknown': 27.0
        }

        prime_age = prime_ages.get(position, 27.0)
        return age - prime_age

22.3 Aging Curves in Basketball

Aging curves describe how player performance changes with age. Understanding these patterns is crucial for projections:

class AgingCurveAnalyzer:
    """
    Analyze and model aging curves for basketball statistics.
    Uses delta method to control for survivor bias.
    """

    def __init__(self):
        self.aging_curves = {}
        self.confidence_intervals = {}

    def calculate_delta_aging(self, player_histories: Dict[str, List[PlayerSeason]],
                              stat_name: str,
                              min_minutes: float = 500) -> Dict[int, List[float]]:
        """
        Calculate year-over-year changes at each age.

        The delta method compares each player to themselves,
        controlling for selection effects.
        """
        age_deltas = {}

        for player_id, seasons in player_histories.items():
            # Filter for minimum playing time
            seasons = [s for s in seasons
                      if s.minutes_per_game * s.games_played >= min_minutes]
            seasons = sorted(seasons, key=lambda x: x.season)

            for i in range(1, len(seasons)):
                prev_season = seasons[i-1]
                curr_season = seasons[i]

                # Only use consecutive seasons
                if curr_season.season - prev_season.season != 1:
                    continue

                age = int(round(curr_season.age))
                prev_value = getattr(prev_season, stat_name)
                curr_value = getattr(curr_season, stat_name)

                delta = curr_value - prev_value

                if age not in age_deltas:
                    age_deltas[age] = []
                age_deltas[age].append(delta)

        return age_deltas

    def fit_aging_curve(self, player_histories: Dict[str, List[PlayerSeason]],
                        stat_name: str,
                        smooth: bool = True) -> None:
        """
        Fit aging curve for a statistic.
        Stores cumulative aging effect by age.
        """
        age_deltas = self.calculate_delta_aging(player_histories, stat_name)

        # Calculate mean delta at each age
        ages = sorted(age_deltas.keys())
        mean_deltas = {}
        std_deltas = {}

        for age in ages:
            if len(age_deltas[age]) >= 10:  # Minimum sample size
                mean_deltas[age] = np.mean(age_deltas[age])
                std_deltas[age] = np.std(age_deltas[age]) / np.sqrt(len(age_deltas[age]))

        if smooth:
            # Apply Gaussian smoothing to reduce noise
            smoothed_deltas = self._smooth_deltas(mean_deltas)
            mean_deltas = smoothed_deltas

        # Convert to cumulative aging curve (relative to age 27)
        reference_age = 27
        cumulative = {reference_age: 0.0}

        # Forward from reference age
        for age in range(reference_age + 1, max(ages) + 1):
            if age in mean_deltas:
                cumulative[age] = cumulative[age - 1] + mean_deltas[age]
            elif age - 1 in cumulative:
                cumulative[age] = cumulative[age - 1]

        # Backward from reference age
        for age in range(reference_age - 1, min(ages) - 1, -1):
            if age + 1 in mean_deltas:
                cumulative[age] = cumulative[age + 1] - mean_deltas[age + 1]
            elif age + 1 in cumulative:
                cumulative[age] = cumulative[age + 1]

        self.aging_curves[stat_name] = cumulative
        self.confidence_intervals[stat_name] = std_deltas

    def _smooth_deltas(self, deltas: Dict[int, float],
                       window: int = 3) -> Dict[int, float]:
        """Apply Gaussian smoothing to age deltas."""
        ages = sorted(deltas.keys())
        values = [deltas[a] for a in ages]

        # Gaussian kernel smoothing
        smoothed = {}
        for i, age in enumerate(ages):
            weights = []
            weighted_values = []
            for j, other_age in enumerate(ages):
                weight = np.exp(-0.5 * ((age - other_age) / window) ** 2)
                weights.append(weight)
                weighted_values.append(weight * values[j])

            smoothed[age] = sum(weighted_values) / sum(weights)

        return smoothed

    def get_aging_adjustment(self, stat_name: str,
                            from_age: float, to_age: float) -> float:
        """
        Get expected change in statistic from one age to another.
        """
        if stat_name not in self.aging_curves:
            return 0.0

        curve = self.aging_curves[stat_name]

        # Interpolate for non-integer ages
        from_adjustment = self._interpolate_curve(curve, from_age)
        to_adjustment = self._interpolate_curve(curve, to_age)

        return to_adjustment - from_adjustment

    def _interpolate_curve(self, curve: Dict[int, float], age: float) -> float:
        """Linearly interpolate aging curve at non-integer age."""
        lower_age = int(np.floor(age))
        upper_age = int(np.ceil(age))

        if lower_age == upper_age:
            return curve.get(lower_age, 0.0)

        lower_val = curve.get(lower_age, 0.0)
        upper_val = curve.get(upper_age, 0.0)

        weight = age - lower_age
        return lower_val + weight * (upper_val - lower_val)

    def visualize_aging_curve(self, stat_name: str) -> Dict:
        """Return data for visualizing aging curve."""
        if stat_name not in self.aging_curves:
            return {}

        curve = self.aging_curves[stat_name]
        ages = sorted(curve.keys())
        values = [curve[a] for a in ages]

        return {
            'ages': ages,
            'cumulative_change': values,
            'stat_name': stat_name
        }

Position-Specific Aging Patterns

Different positions and skill types age differently:

class PositionAgingAnalyzer:
    """
    Analyze position-specific and skill-specific aging patterns.
    """

    # Skill categories and their aging characteristics
    SKILL_AGING_PROFILES = {
        'athleticism': {
            'peak_age': 25,
            'decline_start': 27,
            'decline_rate': 0.03,  # 3% per year after decline starts
            'skills': ['dunks', 'fast_break_pts', 'drives_per_game']
        },
        'shooting': {
            'peak_age': 29,
            'decline_start': 34,
            'decline_rate': 0.01,
            'skills': ['three_pct', 'mid_range_pct', 'ft_pct']
        },
        'playmaking': {
            'peak_age': 28,
            'decline_start': 32,
            'decline_rate': 0.015,
            'skills': ['assists_per_game', 'ast_to_ratio']
        },
        'defense': {
            'peak_age': 26,
            'decline_start': 29,
            'decline_rate': 0.025,
            'skills': ['steals_per_game', 'blocks_per_game', 'def_rating']
        },
        'rebounding': {
            'peak_age': 26,
            'decline_start': 30,
            'decline_rate': 0.015,
            'skills': ['rebounds_per_game', 'reb_pct']
        }
    }

    POSITION_PROFILES = {
        'PG': {
            'primary_skills': ['playmaking', 'shooting'],
            'secondary_skills': ['athleticism'],
            'typical_peak': (26, 30),
            'typical_decline': 32
        },
        'SG': {
            'primary_skills': ['shooting', 'athleticism'],
            'secondary_skills': ['playmaking', 'defense'],
            'typical_peak': (25, 29),
            'typical_decline': 31
        },
        'SF': {
            'primary_skills': ['shooting', 'defense'],
            'secondary_skills': ['athleticism', 'playmaking'],
            'typical_peak': (25, 29),
            'typical_decline': 31
        },
        'PF': {
            'primary_skills': ['rebounding', 'defense'],
            'secondary_skills': ['shooting', 'athleticism'],
            'typical_peak': (25, 28),
            'typical_decline': 30
        },
        'C': {
            'primary_skills': ['rebounding', 'defense'],
            'secondary_skills': ['shooting'],
            'typical_peak': (24, 28),
            'typical_decline': 30
        }
    }

    def estimate_position_aging(self, position: str, age: float,
                               reference_age: float = 27) -> Dict[str, float]:
        """
        Estimate aging effects for a position.
        Returns multipliers for different skill categories.
        """
        if position not in self.POSITION_PROFILES:
            position = 'SF'  # Default to SF profile

        profile = self.POSITION_PROFILES[position]
        aging_effects = {}

        for skill_cat, skill_profile in self.SKILL_AGING_PROFILES.items():
            peak = skill_profile['peak_age']
            decline_start = skill_profile['decline_start']
            decline_rate = skill_profile['decline_rate']

            # Calculate effect at age vs reference_age
            effect_at_age = self._calculate_skill_level(
                age, peak, decline_start, decline_rate
            )
            effect_at_reference = self._calculate_skill_level(
                reference_age, peak, decline_start, decline_rate
            )

            aging_effects[skill_cat] = effect_at_age - effect_at_reference

        return aging_effects

    def _calculate_skill_level(self, age: float, peak: float,
                               decline_start: float, decline_rate: float) -> float:
        """
        Calculate skill level at a given age.
        Models asymmetric improvement/decline pattern.
        """
        if age <= peak:
            # Improvement phase (slower improvement as approaching peak)
            years_to_peak = peak - age
            improvement = 0.02 * years_to_peak  # 2% improvement per year
            return -improvement  # Negative because below peak

        elif age <= decline_start:
            # Plateau phase
            return 0.0

        else:
            # Decline phase
            years_past_decline = age - decline_start
            return -decline_rate * years_past_decline

    def get_composite_aging_adjustment(self, position: str,
                                       from_age: float, to_age: float,
                                       player_style: Optional[Dict[str, float]] = None) -> float:
        """
        Calculate composite aging adjustment based on player style.

        Args:
            position: Player's position
            from_age: Current age
            to_age: Target age
            player_style: Optional dict mapping skill categories to weights
        """
        if player_style is None:
            # Use position-based default weights
            profile = self.POSITION_PROFILES.get(position, self.POSITION_PROFILES['SF'])
            player_style = {skill: 0.3 for skill in profile['primary_skills']}
            player_style.update({skill: 0.1 for skill in profile['secondary_skills']})
            # Normalize weights
            total = sum(player_style.values())
            player_style = {k: v/total for k, v in player_style.items()}

        from_effects = self.estimate_position_aging(position, from_age)
        to_effects = self.estimate_position_aging(position, to_age)

        composite_change = 0.0
        for skill, weight in player_style.items():
            if skill in from_effects and skill in to_effects:
                composite_change += weight * (to_effects[skill] - from_effects[skill])

        return composite_change

22.4 Similarity Scores and Comparable Players

Finding Similar Players

Identifying historically similar players helps validate projections and provides additional context:

class PlayerSimilarityEngine:
    """
    Calculate similarity scores between players.
    Used to find comparable players for projection validation.
    """

    # Features used for similarity calculation
    SIMILARITY_FEATURES = [
        'points_per_game', 'rebounds_per_game', 'assists_per_game',
        'steals_per_game', 'blocks_per_game', 'true_shooting_pct',
        'usage_rate', 'per', 'box_plus_minus'
    ]

    # Feature weights (importance for similarity)
    FEATURE_WEIGHTS = {
        'points_per_game': 1.0,
        'rebounds_per_game': 1.0,
        'assists_per_game': 1.0,
        'steals_per_game': 0.7,
        'blocks_per_game': 0.7,
        'true_shooting_pct': 1.2,
        'usage_rate': 1.0,
        'per': 1.5,
        'box_plus_minus': 1.5
    }

    def __init__(self, historical_seasons: List[PlayerSeason]):
        """Initialize with historical data for comparison."""
        self.historical_data = self._prepare_historical_data(historical_seasons)
        self.feature_stats = self._calculate_feature_statistics()

    def _prepare_historical_data(self, seasons: List[PlayerSeason]) -> pd.DataFrame:
        """Convert player seasons to DataFrame for efficient computation."""
        records = []
        for ps in seasons:
            record = {
                'player_id': ps.player_id,
                'player_name': ps.player_name,
                'season': ps.season,
                'age': ps.age
            }
            for feature in self.SIMILARITY_FEATURES:
                record[feature] = getattr(ps, feature, None)
            records.append(record)

        return pd.DataFrame(records)

    def _calculate_feature_statistics(self) -> Dict[str, Dict[str, float]]:
        """Calculate mean and std for each feature for standardization."""
        stats = {}
        for feature in self.SIMILARITY_FEATURES:
            values = self.historical_data[feature].dropna()
            stats[feature] = {
                'mean': values.mean(),
                'std': values.std()
            }
        return stats

    def calculate_similarity(self, player1: PlayerSeason,
                            player2: PlayerSeason,
                            age_weight: float = 0.1) -> float:
        """
        Calculate similarity score between two player-seasons.
        Returns score from 0 to 1 (1 = identical).
        """
        total_weighted_diff = 0.0
        total_weight = 0.0

        for feature in self.SIMILARITY_FEATURES:
            val1 = getattr(player1, feature, None)
            val2 = getattr(player2, feature, None)

            if val1 is None or val2 is None:
                continue

            # Standardize difference
            std = self.feature_stats[feature]['std']
            if std > 0:
                standardized_diff = abs(val1 - val2) / std
            else:
                standardized_diff = 0.0

            weight = self.FEATURE_WEIGHTS.get(feature, 1.0)
            total_weighted_diff += weight * standardized_diff
            total_weight += weight

        # Add age difference penalty
        age_diff = abs(player1.age - player2.age)
        total_weighted_diff += age_weight * age_diff
        total_weight += age_weight

        if total_weight == 0:
            return 0.0

        # Convert to similarity score (0 to 1)
        avg_diff = total_weighted_diff / total_weight
        similarity = np.exp(-avg_diff)

        return similarity

    def find_similar_players(self, target_player: PlayerSeason,
                            n_similar: int = 10,
                            age_range: Tuple[float, float] = None,
                            min_season: int = None,
                            exclude_self: bool = True) -> List[Dict]:
        """
        Find most similar historical player-seasons.

        Args:
            target_player: Player-season to find comparables for
            n_similar: Number of similar players to return
            age_range: Optional (min_age, max_age) filter
            min_season: Optional minimum season year
            exclude_self: Whether to exclude the target player
        """
        similarities = []

        for _, row in self.historical_data.iterrows():
            # Apply filters
            if exclude_self and row['player_id'] == target_player.player_id:
                continue

            if age_range:
                if row['age'] < age_range[0] or row['age'] > age_range[1]:
                    continue

            if min_season and row['season'] < min_season:
                continue

            # Create PlayerSeason for comparison
            comp_player = self._row_to_player_season(row)

            similarity = self.calculate_similarity(target_player, comp_player)

            similarities.append({
                'player_id': row['player_id'],
                'player_name': row['player_name'],
                'season': row['season'],
                'age': row['age'],
                'similarity': similarity
            })

        # Sort by similarity and return top N
        similarities.sort(key=lambda x: x['similarity'], reverse=True)
        return similarities[:n_similar]

    def _row_to_player_season(self, row: pd.Series) -> PlayerSeason:
        """Convert DataFrame row to PlayerSeason object."""
        return PlayerSeason(
            player_id=row['player_id'],
            player_name=row['player_name'],
            season=int(row['season']),
            age=row['age'],
            games_played=row.get('games_played', 70),
            minutes_per_game=row.get('minutes_per_game', 30),
            points_per_game=row.get('points_per_game', 0),
            rebounds_per_game=row.get('rebounds_per_game', 0),
            assists_per_game=row.get('assists_per_game', 0),
            steals_per_game=row.get('steals_per_game', 0),
            blocks_per_game=row.get('blocks_per_game', 0),
            turnovers_per_game=row.get('turnovers_per_game', 0),
            fg_pct=row.get('fg_pct', 0.45),
            three_pct=row.get('three_pct', 0.35),
            ft_pct=row.get('ft_pct', 0.75),
            true_shooting_pct=row.get('true_shooting_pct', 0.55),
            usage_rate=row.get('usage_rate', 0.20),
            per=row.get('per', 15),
            box_plus_minus=row.get('box_plus_minus', 0),
            vorp=row.get('vorp', 1),
            win_shares=row.get('win_shares', 3)
        )

    def project_from_comparables(self, target_player: PlayerSeason,
                                target_stat: str,
                                years_ahead: int = 1,
                                n_comparables: int = 20) -> Dict[str, float]:
        """
        Project future performance based on how similar players developed.
        """
        # Find similar players at same age
        comparables = self.find_similar_players(
            target_player,
            n_similar=n_comparables,
            age_range=(target_player.age - 1, target_player.age + 1)
        )

        future_values = []
        weights = []

        for comp in comparables:
            # Find this player's performance years_ahead later
            future_data = self.historical_data[
                (self.historical_data['player_id'] == comp['player_id']) &
                (self.historical_data['season'] == comp['season'] + years_ahead)
            ]

            if not future_data.empty:
                future_value = future_data[target_stat].iloc[0]
                future_values.append(future_value)
                weights.append(comp['similarity'])

        if not future_values:
            return {'projection': None, 'confidence': 0.0, 'n_comparables': 0}

        weights = np.array(weights)
        weights = weights / weights.sum()  # Normalize

        weighted_mean = np.average(future_values, weights=weights)
        weighted_std = np.sqrt(np.average(
            (np.array(future_values) - weighted_mean) ** 2, weights=weights
        ))

        return {
            'projection': weighted_mean,
            'std': weighted_std,
            'confidence_interval': (
                weighted_mean - 1.96 * weighted_std,
                weighted_mean + 1.96 * weighted_std
            ),
            'n_comparables': len(future_values)
        }

22.5 CARMELO and RAPTOR Projection Systems

CARMELO Methodology

FiveThirtyEight's CARMELO (Career-Arc Regression Model Estimator with Local Optimization) system projects player careers based on statistical similarity to historical players:

class CARMELOProjection:
    """
    Implementation of CARMELO-style projection methodology.

    CARMELO projects careers by:
    1. Finding similar historical players (comparables)
    2. Weighting comparables by similarity
    3. Tracking how comparables' careers evolved
    4. Projecting based on weighted average outcomes
    """

    def __init__(self, historical_data: Dict[str, List[PlayerSeason]],
                 similarity_engine: PlayerSimilarityEngine):
        self.historical_data = historical_data
        self.similarity_engine = similarity_engine
        self.aging_analyzer = AgingCurveAnalyzer()

    def generate_projection(self, player: PlayerSeason,
                           projection_years: int = 5,
                           n_comparables: int = 10) -> Dict:
        """
        Generate multi-year projection for a player.

        Returns projections, comparables used, and uncertainty estimates.
        """
        # Find comparable players
        comparables = self.similarity_engine.find_similar_players(
            player,
            n_similar=n_comparables,
            age_range=(player.age - 1.5, player.age + 1.5)
        )

        projections = []

        for year in range(1, projection_years + 1):
            year_projection = self._project_year(
                player, comparables, year
            )
            projections.append({
                'year': year,
                'season': player.season + year,
                'age': player.age + year,
                **year_projection
            })

        # Calculate career value projections
        career_metrics = self._calculate_career_metrics(projections)

        return {
            'player_id': player.player_id,
            'player_name': player.player_name,
            'base_season': player.season,
            'base_age': player.age,
            'comparables': comparables[:5],  # Top 5 for display
            'yearly_projections': projections,
            'career_metrics': career_metrics
        }

    def _project_year(self, player: PlayerSeason,
                     comparables: List[Dict],
                     years_ahead: int) -> Dict[str, float]:
        """Project statistics for a specific year."""

        stat_projections = {}

        for stat in ['points_per_game', 'rebounds_per_game', 'assists_per_game',
                     'per', 'box_plus_minus', 'win_shares', 'vorp']:

            weighted_values = []
            weights = []

            for comp in comparables:
                # Get comparable's value at years_ahead from their similar season
                comp_player_id = comp['player_id']
                comp_season = comp['season']

                if comp_player_id in self.historical_data:
                    future_seasons = [
                        s for s in self.historical_data[comp_player_id]
                        if s.season == comp_season + years_ahead
                    ]

                    if future_seasons:
                        future_value = getattr(future_seasons[0], stat)
                        weighted_values.append(future_value)
                        weights.append(comp['similarity'])

            if weighted_values:
                weights = np.array(weights)
                weights = weights / weights.sum()

                projection = np.average(weighted_values, weights=weights)
                std = np.sqrt(np.average(
                    (np.array(weighted_values) - projection) ** 2,
                    weights=weights
                ))

                stat_projections[stat] = projection
                stat_projections[f'{stat}_std'] = std
            else:
                # Fall back to aging curve adjustment
                current_value = getattr(player, stat, 0)
                aging_adj = self.aging_analyzer.get_aging_adjustment(
                    stat, player.age, player.age + years_ahead
                )
                stat_projections[stat] = current_value + aging_adj
                stat_projections[f'{stat}_std'] = current_value * 0.15

        # Estimate probability of still being in league
        stat_projections['probability_in_league'] = self._estimate_survival_probability(
            player, years_ahead, comparables
        )

        return stat_projections

    def _estimate_survival_probability(self, player: PlayerSeason,
                                       years_ahead: int,
                                       comparables: List[Dict]) -> float:
        """Estimate probability player is still in league."""
        still_active = 0
        total_weight = 0

        for comp in comparables:
            comp_player_id = comp['player_id']
            comp_season = comp['season']

            if comp_player_id in self.historical_data:
                future_seasons = [
                    s for s in self.historical_data[comp_player_id]
                    if s.season >= comp_season + years_ahead
                ]

                if future_seasons:
                    still_active += comp['similarity']
                total_weight += comp['similarity']

        if total_weight == 0:
            # Base rate by age
            age = player.age + years_ahead
            return max(0.1, 1.0 - 0.1 * (age - 30))

        return still_active / total_weight

    def _calculate_career_metrics(self, projections: List[Dict]) -> Dict[str, float]:
        """Calculate summary career metrics from projections."""

        # Expected remaining WAR (wins above replacement)
        expected_war = 0
        for proj in projections:
            # Approximate WAR from box plus minus and minutes
            prob = proj.get('probability_in_league', 1.0)
            bpm = proj.get('box_plus_minus', 0)
            # Rough WAR approximation: BPM * 2.7 * (minutes/48) * games
            war_estimate = bpm * 2.7 * (30/48) * 75 / 100  # Simplified
            expected_war += prob * war_estimate

        # Market value estimate (based on projected WAR)
        # Very rough: ~$3M per WAR in current market
        market_value_estimate = expected_war * 3_000_000

        return {
            'expected_remaining_war': expected_war,
            'market_value_estimate': market_value_estimate,
            'peak_projection_year': max(
                projections,
                key=lambda x: x.get('box_plus_minus', 0) * x.get('probability_in_league', 0)
            )['year']
        }

RAPTOR Projections

RAPTOR (Robust Algorithm using Player Tracking and On/Off Ratings) builds on play-by-play and tracking data:

class RAPTORProjection:
    """
    RAPTOR-style projection system.

    RAPTOR differs from box-score based systems by:
    1. Incorporating on/off court impact
    2. Using player tracking data where available
    3. Separating offensive and defensive contributions
    4. Adjusting for teammate and opponent quality
    """

    def __init__(self):
        self.offensive_model = None
        self.defensive_model = None
        self.war_model = None

    def calculate_raptor_components(self, player_season: PlayerSeason,
                                   on_off_data: Optional[Dict] = None,
                                   tracking_data: Optional[Dict] = None) -> Dict:
        """
        Calculate RAPTOR offensive and defensive ratings.

        If tracking data unavailable, estimates from box score.
        """
        # Box score component
        box_offense = self._estimate_box_offense(player_season)
        box_defense = self._estimate_box_defense(player_season)

        # On/off component (if available)
        if on_off_data:
            onoff_offense = on_off_data.get('offensive_on_off', 0)
            onoff_defense = on_off_data.get('defensive_on_off', 0)
        else:
            # Estimate from box score with more uncertainty
            onoff_offense = box_offense * 0.3
            onoff_defense = box_defense * 0.3

        # Combine components
        # RAPTOR uses roughly 50/50 box/on-off blend
        raptor_offense = 0.5 * box_offense + 0.5 * onoff_offense
        raptor_defense = 0.5 * box_defense + 0.5 * onoff_defense

        # Total RAPTOR
        raptor_total = raptor_offense + raptor_defense

        # WAR calculation
        # Points above average * possessions * games / 100
        minutes_pct = player_season.minutes_per_game / 48.0
        war = (raptor_total * minutes_pct * player_season.games_played *
               100 / 2000)  # Roughly 2000 total team possessions

        return {
            'raptor_offense': raptor_offense,
            'raptor_defense': raptor_defense,
            'raptor_total': raptor_total,
            'war': war,
            'box_component': {
                'offense': box_offense,
                'defense': box_defense
            }
        }

    def _estimate_box_offense(self, ps: PlayerSeason) -> float:
        """Estimate offensive RAPTOR component from box score."""
        # Simplified offensive RAPTOR approximation
        # Based on scoring, efficiency, playmaking

        scoring_value = (ps.points_per_game / ps.minutes_per_game * 36 - 11) * 0.5

        efficiency_value = (ps.true_shooting_pct - 0.55) * 15

        playmaking_value = (ps.assists_per_game / ps.minutes_per_game * 36 - 2.5) * 0.8

        turnover_penalty = (ps.turnovers_per_game / ps.minutes_per_game * 36 - 1.5) * -0.5

        return scoring_value + efficiency_value + playmaking_value + turnover_penalty

    def _estimate_box_defense(self, ps: PlayerSeason) -> float:
        """Estimate defensive RAPTOR component from box score."""
        # Box score is poor at capturing defense
        # Use stocks (steals + blocks) and rebounds as proxy

        stocks_value = ((ps.steals_per_game + ps.blocks_per_game) /
                       ps.minutes_per_game * 36 - 1.2) * 1.0

        # Defensive rebounds (estimate as 75% of total rebounds)
        dreb_value = (ps.rebounds_per_game * 0.75 / ps.minutes_per_game * 36 - 3.5) * 0.3

        # Default slight positive for starters, slight negative otherwise
        baseline = 0.5 if ps.minutes_per_game >= 25 else -0.5

        return stocks_value + dreb_value + baseline

    def project_raptor(self, player_season: PlayerSeason,
                      years_ahead: int = 1,
                      aging_analyzer: Optional[AgingCurveAnalyzer] = None) -> Dict:
        """
        Project future RAPTOR ratings.
        """
        current_raptor = self.calculate_raptor_components(player_season)

        # Apply aging adjustments
        if aging_analyzer:
            offense_aging = aging_analyzer.get_aging_adjustment(
                'box_plus_minus', player_season.age,
                player_season.age + years_ahead
            ) * 0.6  # Offense ages slightly slower

            defense_aging = aging_analyzer.get_aging_adjustment(
                'box_plus_minus', player_season.age,
                player_season.age + years_ahead
            ) * 1.2  # Defense ages faster
        else:
            # Default aging assumptions
            age_factor = player_season.age + years_ahead
            if age_factor <= 27:
                offense_aging = 0.3 * (27 - age_factor)
                defense_aging = 0.2 * (27 - age_factor)
            else:
                offense_aging = -0.15 * (age_factor - 27)
                defense_aging = -0.25 * (age_factor - 27)

        projected_offense = current_raptor['raptor_offense'] + offense_aging
        projected_defense = current_raptor['raptor_defense'] + defense_aging

        # Apply regression to mean
        regression_factor = 0.8 ** years_ahead  # More regression further out
        projected_offense = regression_factor * projected_offense
        projected_defense = regression_factor * projected_defense

        projected_total = projected_offense + projected_defense

        # Projected WAR (assuming similar minutes)
        minutes_pct = player_season.minutes_per_game / 48.0
        # Reduce expected games for older players/further projections
        expected_games = min(75, player_season.games_played) * (0.95 ** years_ahead)
        projected_war = projected_total * minutes_pct * expected_games * 100 / 2000

        return {
            'projected_raptor_offense': projected_offense,
            'projected_raptor_defense': projected_defense,
            'projected_raptor_total': projected_total,
            'projected_war': projected_war,
            'projection_year': player_season.season + years_ahead,
            'projected_age': player_season.age + years_ahead
        }

22.6 Marcel-Style Projections

Weighted Average Methodology

Marcel projections, named after baseball analyst Tom Tango's system, use a simple but effective weighted average approach:

class MarcelProjection:
    """
    Marcel-style projection system.

    The Marcel method uses:
    1. Weighted average of last 3 seasons (5/4/3 weighting)
    2. Regression toward mean based on playing time
    3. Age adjustment
    4. Simple and transparent
    """

    # Season weights (most recent first)
    SEASON_WEIGHTS = [5, 4, 3]

    # Regression targets (league average talent level)
    REGRESSION_TARGETS = {
        'points_per_game': 10.0,
        'rebounds_per_game': 4.0,
        'assists_per_game': 2.0,
        'steals_per_game': 0.7,
        'blocks_per_game': 0.4,
        'fg_pct': 0.450,
        'three_pct': 0.350,
        'ft_pct': 0.760,
        'true_shooting_pct': 0.550,
        'per': 13.0,
        'box_plus_minus': -1.5,  # Replacement level
        'win_shares': 2.0,
    }

    # Playing time for full reliability (in minutes per season)
    RELIABILITY_MINUTES = 1500

    def __init__(self, aging_analyzer: Optional[AgingCurveAnalyzer] = None):
        self.aging_analyzer = aging_analyzer

    def project(self, player_seasons: List[PlayerSeason],
                target_season: int) -> Dict[str, float]:
        """
        Generate Marcel projection for all stats.

        Args:
            player_seasons: List of player's historical seasons
            target_season: Season year to project
        """
        # Sort seasons, most recent first
        seasons = sorted(player_seasons, key=lambda x: x.season, reverse=True)

        # Only use seasons before target
        seasons = [s for s in seasons if s.season < target_season]

        if not seasons:
            return {}

        # Limit to 3 most recent seasons
        seasons = seasons[:3]

        projections = {}

        for stat in self.REGRESSION_TARGETS.keys():
            projection = self._project_stat(seasons, stat, target_season)
            projections[stat] = projection

        # Add metadata
        projections['_base_seasons'] = [s.season for s in seasons]
        projections['_projected_season'] = target_season
        projections['_projected_age'] = seasons[0].age + (target_season - seasons[0].season)

        return projections

    def _project_stat(self, seasons: List[PlayerSeason],
                     stat: str, target_season: int) -> float:
        """Project a single statistic."""

        weighted_sum = 0.0
        weight_sum = 0.0
        total_minutes = 0.0

        for i, season in enumerate(seasons):
            value = getattr(season, stat, None)
            if value is None:
                continue

            # Weight by season recency
            season_weight = self.SEASON_WEIGHTS[i] if i < len(self.SEASON_WEIGHTS) else 1

            # Weight by playing time
            season_minutes = season.games_played * season.minutes_per_game
            minutes_weight = min(1.0, season_minutes / self.RELIABILITY_MINUTES)

            combined_weight = season_weight * minutes_weight

            weighted_sum += value * combined_weight
            weight_sum += combined_weight
            total_minutes += season_minutes * season_weight

        if weight_sum == 0:
            return self.REGRESSION_TARGETS.get(stat, 0)

        # Weighted average
        weighted_avg = weighted_sum / weight_sum

        # Regression to mean
        # More regression with less playing time
        reliability = min(1.0, total_minutes / (self.RELIABILITY_MINUTES * sum(self.SEASON_WEIGHTS[:len(seasons)])))
        regression_target = self.REGRESSION_TARGETS.get(stat, weighted_avg)

        regressed = reliability * weighted_avg + (1 - reliability) * regression_target

        # Age adjustment
        if self.aging_analyzer and seasons:
            current_age = seasons[0].age
            target_age = current_age + (target_season - seasons[0].season)

            aging_adj = self.aging_analyzer.get_aging_adjustment(
                stat, current_age, target_age
            )
            regressed += aging_adj
        else:
            # Simple age adjustment
            current_age = seasons[0].age if seasons else 27
            target_age = current_age + (target_season - seasons[0].season)

            if target_age > 30:
                regressed *= (1 - 0.02 * (target_age - 30))  # 2% decline per year after 30

        return max(0, regressed)  # Floor at 0

    def project_with_uncertainty(self, player_seasons: List[PlayerSeason],
                                target_season: int,
                                historical_data: Optional[Dict[str, List[PlayerSeason]]] = None) -> Dict:
        """
        Generate projection with uncertainty estimates.
        """
        point_projection = self.project(player_seasons, target_season)

        # Estimate uncertainty from historical projection errors
        # or use default uncertainty levels
        uncertainty = {}

        for stat in self.REGRESSION_TARGETS.keys():
            if stat in point_projection:
                # Default: uncertainty proportional to stat magnitude
                # and inversely proportional to reliability
                base_uncertainty = abs(point_projection[stat]) * 0.15

                # Increase uncertainty for volatile stats
                volatility_multiplier = {
                    'three_pct': 1.5,
                    'steals_per_game': 1.3,
                    'blocks_per_game': 1.3,
                    'box_plus_minus': 1.2,
                }.get(stat, 1.0)

                uncertainty[stat] = base_uncertainty * volatility_multiplier

        return {
            'projection': point_projection,
            'uncertainty': uncertainty,
            'confidence_intervals': {
                stat: (
                    point_projection[stat] - 1.96 * uncertainty.get(stat, 0),
                    point_projection[stat] + 1.96 * uncertainty.get(stat, 0)
                )
                for stat in point_projection if not stat.startswith('_')
            }
        }

22.7 Projection Uncertainty and Confidence Intervals

Quantifying Uncertainty

All projections carry uncertainty. Properly quantifying this uncertainty is crucial for decision-making:

class ProjectionUncertainty:
    """
    Methods for quantifying and communicating projection uncertainty.
    """

    def __init__(self):
        # Historical standard errors by stat (from validation studies)
        self.base_standard_errors = {
            'points_per_game': 2.5,
            'rebounds_per_game': 1.2,
            'assists_per_game': 1.0,
            'true_shooting_pct': 0.03,
            'per': 3.0,
            'box_plus_minus': 1.8,
            'win_shares': 2.0,
            'vorp': 1.0,
        }

    def calculate_prediction_interval(self, projection: float,
                                     stat: str,
                                     player_reliability: float = 1.0,
                                     years_ahead: int = 1,
                                     confidence: float = 0.90) -> Tuple[float, float]:
        """
        Calculate prediction interval for a projected value.

        Args:
            projection: Point projection
            stat: Statistic being projected
            player_reliability: 0-1 score of how reliable player's stats are
            years_ahead: How many years into the future
            confidence: Confidence level (e.g., 0.90 for 90% CI)
        """
        base_se = self.base_standard_errors.get(stat, projection * 0.2)

        # Adjust for reliability (less reliable = wider interval)
        reliability_adjustment = 1 / max(0.3, player_reliability)

        # Adjust for projection distance (further = wider)
        distance_adjustment = 1 + 0.2 * (years_ahead - 1)

        adjusted_se = base_se * reliability_adjustment * distance_adjustment

        # Get z-score for confidence level
        z = stats.norm.ppf((1 + confidence) / 2)

        lower = projection - z * adjusted_se
        upper = projection + z * adjusted_se

        return (lower, upper)

    def monte_carlo_projection(self, base_projection: Dict[str, float],
                              covariance_matrix: np.ndarray,
                              n_simulations: int = 10000) -> Dict:
        """
        Generate distribution of possible outcomes using Monte Carlo simulation.

        This captures correlations between statistics (e.g., if points up,
        likely usage is also up).
        """
        stats = list(base_projection.keys())
        means = np.array([base_projection[s] for s in stats])

        # Generate correlated samples
        samples = np.random.multivariate_normal(means, covariance_matrix, n_simulations)

        results = {
            'stats': stats,
            'samples': samples,
            'percentiles': {},
            'probability_above': {}
        }

        for i, stat in enumerate(stats):
            stat_samples = samples[:, i]
            results['percentiles'][stat] = {
                '10th': np.percentile(stat_samples, 10),
                '25th': np.percentile(stat_samples, 25),
                '50th': np.percentile(stat_samples, 50),
                '75th': np.percentile(stat_samples, 75),
                '90th': np.percentile(stat_samples, 90)
            }

        return results

    def calculate_player_reliability(self, player_seasons: List[PlayerSeason]) -> float:
        """
        Calculate reliability score based on sample size and consistency.
        """
        if not player_seasons:
            return 0.0

        # Factor 1: Total minutes played
        total_minutes = sum(s.games_played * s.minutes_per_game for s in player_seasons)
        minutes_factor = min(1.0, total_minutes / 5000)  # Full reliability at 5000 minutes

        # Factor 2: Consistency across seasons
        if len(player_seasons) >= 2:
            ppg_values = [s.points_per_game for s in player_seasons]
            consistency = 1 - min(1, np.std(ppg_values) / (np.mean(ppg_values) + 1))
        else:
            consistency = 0.5

        # Factor 3: Number of seasons
        seasons_factor = min(1.0, len(player_seasons) / 4)

        reliability = (minutes_factor * 0.4 + consistency * 0.3 + seasons_factor * 0.3)

        return reliability


class ProjectionEnsemble:
    """
    Combine multiple projection methods into ensemble prediction.

    Ensemble methods typically outperform individual methods by
    averaging out individual model errors.
    """

    def __init__(self, models: List[Tuple[str, object, float]]):
        """
        Initialize with list of (name, model, weight) tuples.
        """
        self.models = models
        self._normalize_weights()

    def _normalize_weights(self):
        """Ensure weights sum to 1."""
        total_weight = sum(w for _, _, w in self.models)
        self.models = [(n, m, w/total_weight) for n, m, w in self.models]

    def project(self, player_seasons: List[PlayerSeason],
                target_season: int, target_stat: str) -> Dict:
        """
        Generate ensemble projection from all models.
        """
        projections = []
        weights = []

        for name, model, weight in self.models:
            try:
                if hasattr(model, 'project'):
                    result = model.project(player_seasons, target_season)
                    if isinstance(result, dict) and target_stat in result:
                        proj = result[target_stat]
                    else:
                        proj = result
                else:
                    proj = model.predict(player_seasons, target_season, target_stat)

                if proj is not None:
                    projections.append((name, proj))
                    weights.append(weight)
            except Exception as e:
                print(f"Warning: {name} failed: {e}")

        if not projections:
            return {'ensemble_projection': None, 'individual_projections': {}}

        # Normalize weights for successful models
        weights = np.array(weights)
        weights = weights / weights.sum()

        # Weighted average
        ensemble_proj = sum(p * w for (_, p), w in zip(projections, weights))

        # Uncertainty from model disagreement
        proj_values = [p for _, p in projections]
        model_disagreement = np.std(proj_values)

        return {
            'ensemble_projection': ensemble_proj,
            'individual_projections': dict(projections),
            'model_weights': dict(zip([n for n, _ in projections], weights)),
            'model_disagreement': model_disagreement,
            'confidence_interval': (
                ensemble_proj - 1.96 * model_disagreement,
                ensemble_proj + 1.96 * model_disagreement
            )
        }

22.8 Evaluating Projection Accuracy

Backtesting Projection Systems

Rigorous evaluation requires testing on held-out historical data:

class ProjectionEvaluator:
    """
    Evaluate projection system accuracy through backtesting.
    """

    def __init__(self, projection_model):
        self.model = projection_model
        self.evaluation_results = []

    def backtest(self, all_player_data: Dict[str, List[PlayerSeason]],
                test_seasons: List[int],
                target_stat: str,
                min_games: int = 40) -> Dict:
        """
        Backtest projection model on historical seasons.

        Args:
            all_player_data: All historical player data
            test_seasons: Seasons to use as test set
            target_stat: Statistic to evaluate
            min_games: Minimum games to include in evaluation
        """
        predictions = []
        actuals = []
        player_info = []

        for test_season in test_seasons:
            for player_id, seasons in all_player_data.items():
                # Find test season data
                test_data = [s for s in seasons if s.season == test_season
                            and s.games_played >= min_games]

                if not test_data:
                    continue

                test_actual = test_data[0]

                # Get training data (seasons before test)
                train_data = [s for s in seasons if s.season < test_season]

                if not train_data:
                    continue

                # Generate projection
                try:
                    projection = self.model.project(train_data, test_season)

                    if isinstance(projection, dict):
                        pred_value = projection.get(target_stat)
                    else:
                        pred_value = projection

                    if pred_value is not None:
                        actual_value = getattr(test_actual, target_stat)

                        predictions.append(pred_value)
                        actuals.append(actual_value)
                        player_info.append({
                            'player_id': player_id,
                            'player_name': test_actual.player_name,
                            'season': test_season,
                            'age': test_actual.age
                        })
                except Exception:
                    continue

        predictions = np.array(predictions)
        actuals = np.array(actuals)

        # Calculate metrics
        metrics = self._calculate_metrics(predictions, actuals)

        # Analyze by subgroups
        subgroup_analysis = self._analyze_subgroups(
            predictions, actuals, player_info
        )

        return {
            'overall_metrics': metrics,
            'subgroup_analysis': subgroup_analysis,
            'n_predictions': len(predictions),
            'test_seasons': test_seasons,
            'target_stat': target_stat
        }

    def _calculate_metrics(self, predictions: np.ndarray,
                          actuals: np.ndarray) -> Dict[str, float]:
        """Calculate comprehensive evaluation metrics."""

        errors = predictions - actuals

        return {
            'rmse': np.sqrt(np.mean(errors ** 2)),
            'mae': np.mean(np.abs(errors)),
            'mape': np.mean(np.abs(errors / (actuals + 0.001))) * 100,
            'correlation': np.corrcoef(predictions, actuals)[0, 1],
            'r_squared': 1 - np.sum(errors ** 2) / np.sum((actuals - actuals.mean()) ** 2),
            'mean_error': np.mean(errors),  # Bias
            'median_error': np.median(errors),
            'error_std': np.std(errors),
            'max_abs_error': np.max(np.abs(errors))
        }

    def _analyze_subgroups(self, predictions: np.ndarray,
                          actuals: np.ndarray,
                          player_info: List[Dict]) -> Dict:
        """Analyze prediction accuracy by subgroup."""

        # By age group
        age_groups = {
            'young (21-25)': [],
            'prime (26-30)': [],
            'veteran (31+)': []
        }

        for i, info in enumerate(player_info):
            age = info['age']
            if age <= 25:
                age_groups['young (21-25)'].append(i)
            elif age <= 30:
                age_groups['prime (26-30)'].append(i)
            else:
                age_groups['veteran (31+)'].append(i)

        subgroup_metrics = {}
        for group_name, indices in age_groups.items():
            if len(indices) >= 10:
                group_preds = predictions[indices]
                group_actuals = actuals[indices]
                subgroup_metrics[group_name] = self._calculate_metrics(
                    group_preds, group_actuals
                )

        # By performance tier
        median_actual = np.median(actuals)
        above_median = actuals >= median_actual
        below_median = ~above_median

        subgroup_metrics['above_median'] = self._calculate_metrics(
            predictions[above_median], actuals[above_median]
        )
        subgroup_metrics['below_median'] = self._calculate_metrics(
            predictions[below_median], actuals[below_median]
        )

        return subgroup_metrics

    def compare_models(self, models: Dict[str, object],
                      all_player_data: Dict[str, List[PlayerSeason]],
                      test_seasons: List[int],
                      target_stat: str) -> pd.DataFrame:
        """
        Compare multiple projection models on same test set.
        """
        results = []

        for model_name, model in models.items():
            evaluator = ProjectionEvaluator(model)
            metrics = evaluator.backtest(
                all_player_data, test_seasons, target_stat
            )

            result = {'model': model_name}
            result.update(metrics['overall_metrics'])
            results.append(result)

        return pd.DataFrame(results).sort_values('rmse')

22.9 Advanced Topics in Player Projection

Injury Risk Modeling

Injuries represent a major source of projection uncertainty:

class InjuryRiskModel:
    """
    Model injury risk and its impact on projections.
    """

    # Base injury rates by age
    BASE_INJURY_RATES = {
        21: 0.08, 22: 0.08, 23: 0.09, 24: 0.09, 25: 0.10,
        26: 0.11, 27: 0.12, 28: 0.13, 29: 0.14, 30: 0.15,
        31: 0.17, 32: 0.19, 33: 0.22, 34: 0.25, 35: 0.28,
        36: 0.32, 37: 0.36, 38: 0.40, 39: 0.45, 40: 0.50
    }

    def estimate_injury_probability(self, player_season: PlayerSeason,
                                   injury_history: List[Dict] = None) -> float:
        """
        Estimate probability of significant injury next season.

        Args:
            player_season: Current player statistics
            injury_history: List of past injuries with severity
        """
        # Base rate by age
        age = int(round(player_season.age))
        base_rate = self.BASE_INJURY_RATES.get(age, 0.20)

        # Adjust for injury history
        if injury_history:
            recent_injuries = [inj for inj in injury_history
                             if inj.get('seasons_ago', 10) <= 3]

            # Each recent injury increases risk
            history_adjustment = len(recent_injuries) * 0.05

            # Major injuries have lasting impact
            major_injuries = [inj for inj in recent_injuries
                            if inj.get('severity', 'minor') == 'major']
            major_adjustment = len(major_injuries) * 0.08

            base_rate += history_adjustment + major_adjustment

        # Adjust for playing time (more minutes = more exposure)
        minutes_adjustment = (player_season.minutes_per_game - 25) * 0.005
        base_rate += max(-0.05, min(0.05, minutes_adjustment))

        # Cap at reasonable range
        return max(0.05, min(0.60, base_rate))

    def adjust_projection_for_injury_risk(self, projection: Dict[str, float],
                                         injury_probability: float) -> Dict[str, float]:
        """
        Adjust projected statistics for injury risk.

        Returns expected values accounting for injury probability.
        """
        adjusted = {}

        # Assume injury causes 30% reduction in season value on average
        injury_impact = 0.70

        for stat, value in projection.items():
            if stat.startswith('_'):
                adjusted[stat] = value
            else:
                # Expected value = P(healthy) * full_value + P(injured) * reduced_value
                expected_value = (
                    (1 - injury_probability) * value +
                    injury_probability * value * injury_impact
                )
                adjusted[stat] = expected_value

        adjusted['injury_probability'] = injury_probability
        adjusted['healthy_projection'] = projection

        return adjusted


class ContextAdjustedProjection:
    """
    Adjust projections for context changes (team, role, etc.).
    """

    def adjust_for_team_change(self, projection: Dict[str, float],
                              old_team_pace: float,
                              new_team_pace: float,
                              old_team_off_rating: float,
                              new_team_off_rating: float) -> Dict[str, float]:
        """
        Adjust counting stats for team pace and quality differences.
        """
        adjusted = projection.copy()

        # Pace adjustment for counting stats
        pace_ratio = new_team_pace / old_team_pace

        pace_adjusted_stats = ['points_per_game', 'rebounds_per_game',
                              'assists_per_game', 'steals_per_game',
                              'blocks_per_game', 'turnovers_per_game']

        for stat in pace_adjusted_stats:
            if stat in adjusted:
                adjusted[stat] = adjusted[stat] * pace_ratio

        # Quality adjustment for efficiency stats
        quality_diff = new_team_off_rating - old_team_off_rating

        # Better team = slightly better efficiency
        efficiency_boost = quality_diff * 0.002  # 0.2% TS per point of ORtg

        if 'true_shooting_pct' in adjusted:
            adjusted['true_shooting_pct'] += efficiency_boost

        return adjusted

    def adjust_for_role_change(self, projection: Dict[str, float],
                              expected_usage_change: float,
                              expected_minutes_change: float) -> Dict[str, float]:
        """
        Adjust for expected changes in role/usage.

        Usage increase typically leads to efficiency decrease.
        """
        adjusted = projection.copy()

        # Usage-efficiency tradeoff
        # ~1% TS decrease per 5% usage increase
        if 'true_shooting_pct' in adjusted:
            ts_adjustment = -0.01 * (expected_usage_change / 5)
            adjusted['true_shooting_pct'] += ts_adjustment

        # Volume stats scale with usage
        if expected_usage_change != 0:
            usage_ratio = 1 + expected_usage_change / 100

            volume_stats = ['points_per_game', 'turnovers_per_game']
            for stat in volume_stats:
                if stat in adjusted:
                    adjusted[stat] *= usage_ratio

        # Win shares and WAR scale with minutes
        if expected_minutes_change != 0:
            minutes_ratio = 1 + expected_minutes_change / 100

            cumulative_stats = ['win_shares', 'vorp']
            for stat in cumulative_stats:
                if stat in adjusted:
                    adjusted[stat] *= minutes_ratio

        return adjusted

Complete Projection Pipeline

Bringing together all components into a complete system:

class ComprehensiveProjectionSystem:
    """
    Complete player projection pipeline combining all methods.
    """

    def __init__(self, historical_data: Dict[str, List[PlayerSeason]]):
        self.historical_data = historical_data

        # Initialize component models
        self.aging_analyzer = AgingCurveAnalyzer()
        self.similarity_engine = None  # Initialize with historical seasons
        self.marcel = MarcelProjection(self.aging_analyzer)
        self.uncertainty_calculator = ProjectionUncertainty()
        self.injury_model = InjuryRiskModel()

        # Fit aging curves
        self._fit_aging_curves()

    def _fit_aging_curves(self):
        """Fit aging curves to historical data."""
        stats_to_fit = ['points_per_game', 'rebounds_per_game',
                       'assists_per_game', 'per', 'box_plus_minus']

        for stat in stats_to_fit:
            self.aging_analyzer.fit_aging_curve(self.historical_data, stat)

    def generate_complete_projection(self, player_id: str,
                                    target_season: int,
                                    context_adjustments: Optional[Dict] = None) -> Dict:
        """
        Generate comprehensive projection with all components.
        """
        if player_id not in self.historical_data:
            return {'error': f'No data for player {player_id}'}

        player_seasons = self.historical_data[player_id]
        recent_season = max(player_seasons, key=lambda x: x.season)

        # 1. Base projection (Marcel method)
        base_projection = self.marcel.project_with_uncertainty(
            player_seasons, target_season
        )

        # 2. Similarity-based projection (if engine available)
        similarity_projection = None
        if self.similarity_engine:
            for stat in ['per', 'box_plus_minus']:
                sim_proj = self.similarity_engine.project_from_comparables(
                    recent_season, stat,
                    years_ahead=target_season - recent_season.season
                )
                if similarity_projection is None:
                    similarity_projection = {}
                similarity_projection[stat] = sim_proj

        # 3. Injury risk adjustment
        injury_prob = self.injury_model.estimate_injury_probability(recent_season)
        injury_adjusted = self.injury_model.adjust_projection_for_injury_risk(
            base_projection['projection'], injury_prob
        )

        # 4. Context adjustments (if provided)
        if context_adjustments:
            final_projection = self._apply_context_adjustments(
                injury_adjusted, context_adjustments
            )
        else:
            final_projection = injury_adjusted

        # 5. Calculate confidence intervals
        reliability = self.uncertainty_calculator.calculate_player_reliability(
            player_seasons
        )

        confidence_intervals = {}
        years_ahead = target_season - recent_season.season

        for stat in ['points_per_game', 'rebounds_per_game', 'assists_per_game',
                     'per', 'box_plus_minus']:
            if stat in final_projection:
                ci = self.uncertainty_calculator.calculate_prediction_interval(
                    final_projection[stat], stat, reliability, years_ahead
                )
                confidence_intervals[stat] = ci

        return {
            'player_id': player_id,
            'player_name': recent_season.player_name,
            'base_season': recent_season.season,
            'target_season': target_season,
            'projected_age': recent_season.age + years_ahead,
            'projection': final_projection,
            'base_projection': base_projection,
            'similarity_projection': similarity_projection,
            'injury_probability': injury_prob,
            'confidence_intervals': confidence_intervals,
            'reliability_score': reliability
        }

    def _apply_context_adjustments(self, projection: Dict,
                                  adjustments: Dict) -> Dict:
        """Apply team/role context adjustments."""
        context_adjuster = ContextAdjustedProjection()

        adjusted = projection.copy()

        if 'team_change' in adjustments:
            tc = adjustments['team_change']
            adjusted = context_adjuster.adjust_for_team_change(
                adjusted,
                tc.get('old_pace', 100), tc.get('new_pace', 100),
                tc.get('old_off_rating', 110), tc.get('new_off_rating', 110)
            )

        if 'role_change' in adjustments:
            rc = adjustments['role_change']
            adjusted = context_adjuster.adjust_for_role_change(
                adjusted,
                rc.get('usage_change', 0),
                rc.get('minutes_change', 0)
            )

        return adjusted

    def generate_multi_year_projection(self, player_id: str,
                                      start_season: int,
                                      n_years: int = 5) -> List[Dict]:
        """Generate projections for multiple future seasons."""
        projections = []

        for year in range(n_years):
            target_season = start_season + year
            proj = self.generate_complete_projection(player_id, target_season)
            projections.append(proj)

        return projections

22.10 Practical Applications

Contract Valuation

Projections enable objective contract valuation:

class ContractValuation:
    """
    Use projections to estimate fair contract value.
    """

    # Approximate dollars per WAR in current market
    DOLLARS_PER_WAR = 3_500_000

    def __init__(self, projection_system: ComprehensiveProjectionSystem):
        self.projector = projection_system

    def estimate_contract_value(self, player_id: str,
                               contract_years: int,
                               salary_cap: float = 140_000_000) -> Dict:
        """
        Estimate fair contract value based on projected performance.
        """
        current_season = 2024  # Would be dynamic in practice
        projections = self.projector.generate_multi_year_projection(
            player_id, current_season + 1, contract_years
        )

        total_value = 0
        yearly_values = []

        for proj in projections:
            if 'error' in proj:
                continue

            # Estimate WAR from projection
            bpm = proj['projection'].get('box_plus_minus', 0)
            injury_prob = proj['injury_probability']

            # Rough WAR estimate
            expected_minutes_pct = 0.6  # ~29 mpg / 48
            expected_games = 75 * (1 - 0.3 * injury_prob)  # Injury-adjusted games

            estimated_war = bpm * expected_minutes_pct * expected_games * 2.7 / 2000

            # Adjust for injury probability (already partially captured above)
            adjusted_war = estimated_war * (1 - 0.2 * injury_prob)

            year_value = adjusted_war * self.DOLLARS_PER_WAR

            yearly_values.append({
                'season': proj['target_season'],
                'projected_war': adjusted_war,
                'value': year_value,
                'as_pct_of_cap': year_value / salary_cap * 100
            })

            total_value += year_value

        return {
            'player_id': player_id,
            'contract_years': contract_years,
            'total_value': total_value,
            'avg_annual_value': total_value / contract_years if contract_years > 0 else 0,
            'yearly_breakdown': yearly_values,
            'cap_percentage': (total_value / contract_years / salary_cap * 100)
                             if contract_years > 0 else 0
        }

Summary

Player performance projection combines statistical modeling, domain knowledge about aging and player development, and careful uncertainty quantification. Key principles include:

  1. Regression to the mean: Extreme performances regress toward average
  2. Aging effects: Performance changes predictably with age
  3. Sample size matters: More data leads to more reliable projections
  4. Context adjustment: Team and role changes affect statistics
  5. Ensemble methods: Combining models improves accuracy
  6. Uncertainty quantification: All projections carry uncertainty

Effective projection systems like CARMELO and RAPTOR integrate these principles with sophisticated similarity matching and extensive historical databases. The Marcel method demonstrates that simple, transparent approaches can perform remarkably well.

The practical value of projections extends to contract valuation, trade analysis, and roster construction. However, projections should always be viewed as probabilistic estimates rather than certainties, and decision-makers must account for the full range of possible outcomes.


Key Equations

Regression to Mean: $$\text{Regressed} = r \cdot \text{Observed} + (1-r) \cdot \text{Mean}$$

Marcel Weighted Average: $$\text{Projection} = \frac{5 \cdot Y_1 + 4 \cdot Y_2 + 3 \cdot Y_3}{5 + 4 + 3}$$

Similarity Score: $$S = \exp\left(-\frac{1}{n}\sum_{i=1}^{n} w_i \cdot \left|\frac{x_i - y_i}{\sigma_i}\right|\right)$$

Aging Adjustment: $$\text{Projected} = \text{Current} + \text{AgingCurve}(age_{target}) - \text{AgingCurve}(age_{current})$$

Expected Value with Injury: $$E[V] = P(\text{healthy}) \cdot V_{full} + P(\text{injured}) \cdot V_{reduced}$$


References

  1. Silver, N. (2015). "CARMELO NBA Player Projections Methodology." FiveThirtyEight.
  2. Silver, N., & Fischer-Baum, R. (2019). "How Our NBA Predictions Work." FiveThirtyEight.
  3. Tango, T., Lichtman, M., & Dolphin, A. (2007). The Book: Playing the Percentages in Baseball.
  4. Kubatko, J., Oliver, D., Pelton, K., & Rosenbaum, D. (2007). "A Starting Point for Analyzing Basketball Statistics."
  5. Myers, D. (2012). "About Box Plus/Minus." Basketball-Reference.
  6. Rosenbaum, D. (2004). "Measuring How NBA Players Help Their Teams Win."