Predicting future player performance represents one of the most challenging and valuable applications of basketball analytics. Whether evaluating free agent signings, trade acquisitions, or draft prospects, front offices must project how players...
In This Chapter
- Introduction
- 22.1 Foundations of Player Projection
- 22.2 Regression Models for Player Projections
- 22.3 Aging Curves in Basketball
- 22.4 Similarity Scores and Comparable Players
- 22.5 CARMELO and RAPTOR Projection Systems
- 22.6 Marcel-Style Projections
- 22.7 Projection Uncertainty and Confidence Intervals
- 22.8 Evaluating Projection Accuracy
- 22.9 Advanced Topics in Player Projection
- 22.10 Practical Applications
- Summary
- Key Equations
- References
Chapter 22: Player Performance Prediction
Introduction
Predicting future player performance represents one of the most challenging and valuable applications of basketball analytics. Whether evaluating free agent signings, trade acquisitions, or draft prospects, front offices must project how players will perform in coming seasons. This chapter explores the statistical foundations of player projection systems, from simple regression models to sophisticated systems like CARMELO and RAPTOR.
The difficulty of player projection stems from multiple sources of uncertainty: natural performance variation, injury risk, age-related decline, team context changes, and the inherent randomness in basketball outcomes. Effective projection systems must account for all these factors while remaining interpretable and actionable for decision-makers.
We will develop a comprehensive framework for player projections, starting with foundational regression techniques and building toward complete projection systems. Along the way, we examine aging curves, similarity scores, and methods for quantifying projection uncertainty.
22.1 Foundations of Player Projection
The Projection Problem
At its core, player projection asks: given everything we know about a player today, what is our best estimate of their future performance? This question involves several sub-problems:
- Skill estimation: What are the player's true underlying abilities?
- Aging adjustment: How will those abilities change over time?
- Context adjustment: How will team/role changes affect observed performance?
- Uncertainty quantification: How confident are we in our projections?
Historical Performance as a Starting Point
The simplest projection approach uses historical performance directly. If a player averaged 20 points per game last season, we might project 20 points next season. However, this naive approach ignores several important factors:
import numpy as np
import pandas as pd
from typing import List, Tuple, Dict, Optional
from dataclasses import dataclass
from scipy import stats
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
@dataclass
class PlayerSeason:
"""Represents a single season of player statistics."""
player_id: str
player_name: str
season: int
age: float
games_played: int
minutes_per_game: float
points_per_game: float
rebounds_per_game: float
assists_per_game: float
steals_per_game: float
blocks_per_game: float
turnovers_per_game: float
fg_pct: float
three_pct: float
ft_pct: float
true_shooting_pct: float
usage_rate: float
per: float # Player Efficiency Rating
box_plus_minus: float
vorp: float
win_shares: float
class NaiveProjection:
"""
Naive projection using only last season's statistics.
Serves as a baseline for more sophisticated methods.
"""
def __init__(self):
self.last_season_stats = {}
def fit(self, player_seasons: List[PlayerSeason]) -> None:
"""Store most recent season for each player."""
for ps in sorted(player_seasons, key=lambda x: x.season):
self.last_season_stats[ps.player_id] = ps
def project(self, player_id: str, target_stat: str) -> float:
"""Project by returning last season's value."""
if player_id not in self.last_season_stats:
raise ValueError(f"No data for player {player_id}")
return getattr(self.last_season_stats[player_id], target_stat)
def evaluate(self, test_seasons: List[PlayerSeason],
target_stat: str) -> Dict[str, float]:
"""Evaluate projection accuracy on test data."""
predictions = []
actuals = []
for ps in test_seasons:
if ps.player_id in self.last_season_stats:
pred = self.project(ps.player_id, target_stat)
actual = getattr(ps, target_stat)
predictions.append(pred)
actuals.append(actual)
predictions = np.array(predictions)
actuals = np.array(actuals)
return {
'rmse': np.sqrt(mean_squared_error(actuals, predictions)),
'mae': mean_absolute_error(actuals, predictions),
'correlation': np.corrcoef(actuals, predictions)[0, 1],
'n_players': len(predictions)
}
Regression to the Mean
One of the most important concepts in player projection is regression to the mean. Extreme performances in one season tend to move toward average performances in subsequent seasons. This occurs because:
- Measurement noise: Single-season statistics contain random variation
- True talent estimation: Observed performance = True talent + Luck
- Sample size limitations: Even 82 games provides limited information
The degree of regression depends on the reliability of each statistic:
class RegressionToMean:
"""
Apply regression to the mean based on stat reliability.
More reliable stats regress less toward population mean.
"""
# Reliability coefficients (year-to-year correlations)
# Higher values = more reliable = less regression
STAT_RELIABILITY = {
'points_per_game': 0.85,
'rebounds_per_game': 0.80,
'assists_per_game': 0.82,
'steals_per_game': 0.55,
'blocks_per_game': 0.65,
'turnovers_per_game': 0.70,
'fg_pct': 0.60,
'three_pct': 0.50,
'ft_pct': 0.85,
'true_shooting_pct': 0.65,
'usage_rate': 0.80,
'per': 0.75,
'box_plus_minus': 0.70,
'win_shares': 0.65,
}
# League average values (approximate)
LEAGUE_AVERAGES = {
'points_per_game': 11.0,
'rebounds_per_game': 4.5,
'assists_per_game': 2.5,
'steals_per_game': 0.7,
'blocks_per_game': 0.5,
'turnovers_per_game': 1.3,
'fg_pct': 0.450,
'three_pct': 0.360,
'ft_pct': 0.770,
'true_shooting_pct': 0.560,
'usage_rate': 0.200,
'per': 15.0,
'box_plus_minus': 0.0,
'win_shares': 4.0,
}
def __init__(self, custom_reliability: Optional[Dict[str, float]] = None):
self.reliability = self.STAT_RELIABILITY.copy()
if custom_reliability:
self.reliability.update(custom_reliability)
def regress(self, observed_value: float, stat_name: str,
games_played: int = 82,
minutes_per_game: float = 30.0) -> float:
"""
Regress observed value toward mean based on reliability
and sample size.
Formula: Regressed = r * Observed + (1-r) * Mean
where r is adjusted reliability based on playing time
"""
if stat_name not in self.reliability:
return observed_value
base_reliability = self.reliability[stat_name]
league_mean = self.LEAGUE_AVERAGES.get(stat_name, observed_value)
# Adjust reliability for sample size
# Full reliability assumes ~2000 minutes played
expected_minutes = 2000
actual_minutes = games_played * minutes_per_game
sample_adjustment = min(1.0, actual_minutes / expected_minutes)
adjusted_reliability = base_reliability * sample_adjustment
# Apply regression
regressed_value = (adjusted_reliability * observed_value +
(1 - adjusted_reliability) * league_mean)
return regressed_value
def regress_player_season(self, player_season: PlayerSeason) -> Dict[str, float]:
"""Regress all statistics for a player season."""
regressed_stats = {}
for stat_name in self.STAT_RELIABILITY.keys():
observed = getattr(player_season, stat_name, None)
if observed is not None:
regressed_stats[stat_name] = self.regress(
observed, stat_name,
player_season.games_played,
player_season.minutes_per_game
)
return regressed_stats
22.2 Regression Models for Player Projections
Simple Linear Regression
Linear regression provides a foundation for understanding the relationship between past and future performance:
class LinearProjectionModel:
"""
Linear regression model for player projections.
Uses previous seasons to predict next season.
"""
def __init__(self, n_prior_seasons: int = 3,
regularization: float = 1.0):
self.n_prior_seasons = n_prior_seasons
self.regularization = regularization
self.models = {}
self.scalers = {}
def prepare_features(self, player_history: List[PlayerSeason],
target_season: int) -> Optional[np.ndarray]:
"""
Create feature vector from prior seasons.
Returns None if insufficient history.
"""
# Get seasons before target
prior_seasons = [ps for ps in player_history
if ps.season < target_season]
prior_seasons = sorted(prior_seasons,
key=lambda x: x.season, reverse=True)
if len(prior_seasons) < 1:
return None
features = []
# Use up to n_prior_seasons
for i in range(self.n_prior_seasons):
if i < len(prior_seasons):
ps = prior_seasons[i]
features.extend([
ps.points_per_game,
ps.rebounds_per_game,
ps.assists_per_game,
ps.true_shooting_pct,
ps.usage_rate,
ps.per,
ps.box_plus_minus,
ps.games_played / 82.0, # Health proxy
ps.minutes_per_game / 36.0, # Role proxy
])
else:
# Pad with zeros for missing seasons
features.extend([0.0] * 9)
# Add age at target season
if prior_seasons:
current_age = prior_seasons[0].age
target_age = current_age + (target_season - prior_seasons[0].season)
features.append(target_age)
return np.array(features)
def fit(self, player_histories: Dict[str, List[PlayerSeason]],
target_stat: str) -> None:
"""
Fit model to predict target_stat.
Args:
player_histories: Dict mapping player_id to list of seasons
target_stat: Statistic to predict
"""
X = []
y = []
for player_id, seasons in player_histories.items():
seasons = sorted(seasons, key=lambda x: x.season)
for i, target_season in enumerate(seasons):
if i == 0:
continue # Need at least one prior season
features = self.prepare_features(seasons[:i],
target_season.season)
if features is not None:
X.append(features)
y.append(getattr(target_season, target_stat))
X = np.array(X)
y = np.array(y)
# Standardize features
self.scalers[target_stat] = StandardScaler()
X_scaled = self.scalers[target_stat].fit_transform(X)
# Fit ridge regression
self.models[target_stat] = Ridge(alpha=self.regularization)
self.models[target_stat].fit(X_scaled, y)
def predict(self, player_history: List[PlayerSeason],
target_season: int, target_stat: str) -> Optional[float]:
"""Predict target_stat for target_season."""
if target_stat not in self.models:
raise ValueError(f"Model not fitted for {target_stat}")
features = self.prepare_features(player_history, target_season)
if features is None:
return None
features_scaled = self.scalers[target_stat].transform(
features.reshape(1, -1)
)
return self.models[target_stat].predict(features_scaled)[0]
Feature Engineering for Prediction
Effective prediction requires thoughtful feature engineering. Key features for player projections include:
class AdvancedFeatureEngineering:
"""
Advanced feature engineering for player projections.
Creates informative features from raw statistics.
"""
@staticmethod
def calculate_trajectory(seasons: List[PlayerSeason],
stat_name: str) -> Dict[str, float]:
"""
Calculate performance trajectory over recent seasons.
Returns trend indicators.
"""
if len(seasons) < 2:
return {'trend': 0.0, 'volatility': 0.0, 'acceleration': 0.0}
seasons = sorted(seasons, key=lambda x: x.season)
values = [getattr(s, stat_name) for s in seasons]
# Linear trend (slope of recent performance)
x = np.arange(len(values))
slope, _, _, _, _ = stats.linregress(x, values)
# Volatility (standard deviation of year-to-year changes)
changes = np.diff(values)
volatility = np.std(changes) if len(changes) > 1 else 0.0
# Acceleration (change in slope)
if len(values) >= 3:
early_slope, _, _, _, _ = stats.linregress(
x[:len(x)//2+1], values[:len(values)//2+1]
)
late_slope, _, _, _, _ = stats.linregress(
x[len(x)//2:], values[len(values)//2:]
)
acceleration = late_slope - early_slope
else:
acceleration = 0.0
return {
'trend': slope,
'volatility': volatility,
'acceleration': acceleration
}
@staticmethod
def create_composite_features(player_season: PlayerSeason) -> Dict[str, float]:
"""
Create composite features from raw statistics.
These capture skill interactions and playing style.
"""
features = {}
# Scoring efficiency adjusted for volume
features['volume_adjusted_efficiency'] = (
player_season.true_shooting_pct *
np.log1p(player_season.usage_rate * 100)
)
# Playmaking ratio
if player_season.turnovers_per_game > 0:
features['ast_to_ratio'] = (
player_season.assists_per_game /
player_season.turnovers_per_game
)
else:
features['ast_to_ratio'] = player_season.assists_per_game * 2
# Stocks (steals + blocks) per minute
if player_season.minutes_per_game > 0:
features['stocks_per_minute'] = (
(player_season.steals_per_game +
player_season.blocks_per_game) /
player_season.minutes_per_game
)
else:
features['stocks_per_minute'] = 0.0
# Box score productivity
features['box_productivity'] = (
player_season.points_per_game +
1.2 * player_season.rebounds_per_game +
1.5 * player_season.assists_per_game +
2.0 * player_season.steals_per_game +
2.0 * player_season.blocks_per_game -
player_season.turnovers_per_game
)
# Minutes share (indicator of role importance)
features['minutes_share'] = player_season.minutes_per_game / 48.0
# Health factor
features['games_rate'] = player_season.games_played / 82.0
return features
@staticmethod
def calculate_prime_distance(age: float, position: str = 'unknown') -> float:
"""
Calculate distance from statistical prime.
Different positions have different prime ages.
"""
prime_ages = {
'PG': 27.5,
'SG': 27.0,
'SF': 27.0,
'PF': 26.5,
'C': 26.0,
'unknown': 27.0
}
prime_age = prime_ages.get(position, 27.0)
return age - prime_age
22.3 Aging Curves in Basketball
Understanding Age-Related Changes
Aging curves describe how player performance changes with age. Understanding these patterns is crucial for projections:
class AgingCurveAnalyzer:
"""
Analyze and model aging curves for basketball statistics.
Uses delta method to control for survivor bias.
"""
def __init__(self):
self.aging_curves = {}
self.confidence_intervals = {}
def calculate_delta_aging(self, player_histories: Dict[str, List[PlayerSeason]],
stat_name: str,
min_minutes: float = 500) -> Dict[int, List[float]]:
"""
Calculate year-over-year changes at each age.
The delta method compares each player to themselves,
controlling for selection effects.
"""
age_deltas = {}
for player_id, seasons in player_histories.items():
# Filter for minimum playing time
seasons = [s for s in seasons
if s.minutes_per_game * s.games_played >= min_minutes]
seasons = sorted(seasons, key=lambda x: x.season)
for i in range(1, len(seasons)):
prev_season = seasons[i-1]
curr_season = seasons[i]
# Only use consecutive seasons
if curr_season.season - prev_season.season != 1:
continue
age = int(round(curr_season.age))
prev_value = getattr(prev_season, stat_name)
curr_value = getattr(curr_season, stat_name)
delta = curr_value - prev_value
if age not in age_deltas:
age_deltas[age] = []
age_deltas[age].append(delta)
return age_deltas
def fit_aging_curve(self, player_histories: Dict[str, List[PlayerSeason]],
stat_name: str,
smooth: bool = True) -> None:
"""
Fit aging curve for a statistic.
Stores cumulative aging effect by age.
"""
age_deltas = self.calculate_delta_aging(player_histories, stat_name)
# Calculate mean delta at each age
ages = sorted(age_deltas.keys())
mean_deltas = {}
std_deltas = {}
for age in ages:
if len(age_deltas[age]) >= 10: # Minimum sample size
mean_deltas[age] = np.mean(age_deltas[age])
std_deltas[age] = np.std(age_deltas[age]) / np.sqrt(len(age_deltas[age]))
if smooth:
# Apply Gaussian smoothing to reduce noise
smoothed_deltas = self._smooth_deltas(mean_deltas)
mean_deltas = smoothed_deltas
# Convert to cumulative aging curve (relative to age 27)
reference_age = 27
cumulative = {reference_age: 0.0}
# Forward from reference age
for age in range(reference_age + 1, max(ages) + 1):
if age in mean_deltas:
cumulative[age] = cumulative[age - 1] + mean_deltas[age]
elif age - 1 in cumulative:
cumulative[age] = cumulative[age - 1]
# Backward from reference age
for age in range(reference_age - 1, min(ages) - 1, -1):
if age + 1 in mean_deltas:
cumulative[age] = cumulative[age + 1] - mean_deltas[age + 1]
elif age + 1 in cumulative:
cumulative[age] = cumulative[age + 1]
self.aging_curves[stat_name] = cumulative
self.confidence_intervals[stat_name] = std_deltas
def _smooth_deltas(self, deltas: Dict[int, float],
window: int = 3) -> Dict[int, float]:
"""Apply Gaussian smoothing to age deltas."""
ages = sorted(deltas.keys())
values = [deltas[a] for a in ages]
# Gaussian kernel smoothing
smoothed = {}
for i, age in enumerate(ages):
weights = []
weighted_values = []
for j, other_age in enumerate(ages):
weight = np.exp(-0.5 * ((age - other_age) / window) ** 2)
weights.append(weight)
weighted_values.append(weight * values[j])
smoothed[age] = sum(weighted_values) / sum(weights)
return smoothed
def get_aging_adjustment(self, stat_name: str,
from_age: float, to_age: float) -> float:
"""
Get expected change in statistic from one age to another.
"""
if stat_name not in self.aging_curves:
return 0.0
curve = self.aging_curves[stat_name]
# Interpolate for non-integer ages
from_adjustment = self._interpolate_curve(curve, from_age)
to_adjustment = self._interpolate_curve(curve, to_age)
return to_adjustment - from_adjustment
def _interpolate_curve(self, curve: Dict[int, float], age: float) -> float:
"""Linearly interpolate aging curve at non-integer age."""
lower_age = int(np.floor(age))
upper_age = int(np.ceil(age))
if lower_age == upper_age:
return curve.get(lower_age, 0.0)
lower_val = curve.get(lower_age, 0.0)
upper_val = curve.get(upper_age, 0.0)
weight = age - lower_age
return lower_val + weight * (upper_val - lower_val)
def visualize_aging_curve(self, stat_name: str) -> Dict:
"""Return data for visualizing aging curve."""
if stat_name not in self.aging_curves:
return {}
curve = self.aging_curves[stat_name]
ages = sorted(curve.keys())
values = [curve[a] for a in ages]
return {
'ages': ages,
'cumulative_change': values,
'stat_name': stat_name
}
Position-Specific Aging Patterns
Different positions and skill types age differently:
class PositionAgingAnalyzer:
"""
Analyze position-specific and skill-specific aging patterns.
"""
# Skill categories and their aging characteristics
SKILL_AGING_PROFILES = {
'athleticism': {
'peak_age': 25,
'decline_start': 27,
'decline_rate': 0.03, # 3% per year after decline starts
'skills': ['dunks', 'fast_break_pts', 'drives_per_game']
},
'shooting': {
'peak_age': 29,
'decline_start': 34,
'decline_rate': 0.01,
'skills': ['three_pct', 'mid_range_pct', 'ft_pct']
},
'playmaking': {
'peak_age': 28,
'decline_start': 32,
'decline_rate': 0.015,
'skills': ['assists_per_game', 'ast_to_ratio']
},
'defense': {
'peak_age': 26,
'decline_start': 29,
'decline_rate': 0.025,
'skills': ['steals_per_game', 'blocks_per_game', 'def_rating']
},
'rebounding': {
'peak_age': 26,
'decline_start': 30,
'decline_rate': 0.015,
'skills': ['rebounds_per_game', 'reb_pct']
}
}
POSITION_PROFILES = {
'PG': {
'primary_skills': ['playmaking', 'shooting'],
'secondary_skills': ['athleticism'],
'typical_peak': (26, 30),
'typical_decline': 32
},
'SG': {
'primary_skills': ['shooting', 'athleticism'],
'secondary_skills': ['playmaking', 'defense'],
'typical_peak': (25, 29),
'typical_decline': 31
},
'SF': {
'primary_skills': ['shooting', 'defense'],
'secondary_skills': ['athleticism', 'playmaking'],
'typical_peak': (25, 29),
'typical_decline': 31
},
'PF': {
'primary_skills': ['rebounding', 'defense'],
'secondary_skills': ['shooting', 'athleticism'],
'typical_peak': (25, 28),
'typical_decline': 30
},
'C': {
'primary_skills': ['rebounding', 'defense'],
'secondary_skills': ['shooting'],
'typical_peak': (24, 28),
'typical_decline': 30
}
}
def estimate_position_aging(self, position: str, age: float,
reference_age: float = 27) -> Dict[str, float]:
"""
Estimate aging effects for a position.
Returns multipliers for different skill categories.
"""
if position not in self.POSITION_PROFILES:
position = 'SF' # Default to SF profile
profile = self.POSITION_PROFILES[position]
aging_effects = {}
for skill_cat, skill_profile in self.SKILL_AGING_PROFILES.items():
peak = skill_profile['peak_age']
decline_start = skill_profile['decline_start']
decline_rate = skill_profile['decline_rate']
# Calculate effect at age vs reference_age
effect_at_age = self._calculate_skill_level(
age, peak, decline_start, decline_rate
)
effect_at_reference = self._calculate_skill_level(
reference_age, peak, decline_start, decline_rate
)
aging_effects[skill_cat] = effect_at_age - effect_at_reference
return aging_effects
def _calculate_skill_level(self, age: float, peak: float,
decline_start: float, decline_rate: float) -> float:
"""
Calculate skill level at a given age.
Models asymmetric improvement/decline pattern.
"""
if age <= peak:
# Improvement phase (slower improvement as approaching peak)
years_to_peak = peak - age
improvement = 0.02 * years_to_peak # 2% improvement per year
return -improvement # Negative because below peak
elif age <= decline_start:
# Plateau phase
return 0.0
else:
# Decline phase
years_past_decline = age - decline_start
return -decline_rate * years_past_decline
def get_composite_aging_adjustment(self, position: str,
from_age: float, to_age: float,
player_style: Optional[Dict[str, float]] = None) -> float:
"""
Calculate composite aging adjustment based on player style.
Args:
position: Player's position
from_age: Current age
to_age: Target age
player_style: Optional dict mapping skill categories to weights
"""
if player_style is None:
# Use position-based default weights
profile = self.POSITION_PROFILES.get(position, self.POSITION_PROFILES['SF'])
player_style = {skill: 0.3 for skill in profile['primary_skills']}
player_style.update({skill: 0.1 for skill in profile['secondary_skills']})
# Normalize weights
total = sum(player_style.values())
player_style = {k: v/total for k, v in player_style.items()}
from_effects = self.estimate_position_aging(position, from_age)
to_effects = self.estimate_position_aging(position, to_age)
composite_change = 0.0
for skill, weight in player_style.items():
if skill in from_effects and skill in to_effects:
composite_change += weight * (to_effects[skill] - from_effects[skill])
return composite_change
22.4 Similarity Scores and Comparable Players
Finding Similar Players
Identifying historically similar players helps validate projections and provides additional context:
class PlayerSimilarityEngine:
"""
Calculate similarity scores between players.
Used to find comparable players for projection validation.
"""
# Features used for similarity calculation
SIMILARITY_FEATURES = [
'points_per_game', 'rebounds_per_game', 'assists_per_game',
'steals_per_game', 'blocks_per_game', 'true_shooting_pct',
'usage_rate', 'per', 'box_plus_minus'
]
# Feature weights (importance for similarity)
FEATURE_WEIGHTS = {
'points_per_game': 1.0,
'rebounds_per_game': 1.0,
'assists_per_game': 1.0,
'steals_per_game': 0.7,
'blocks_per_game': 0.7,
'true_shooting_pct': 1.2,
'usage_rate': 1.0,
'per': 1.5,
'box_plus_minus': 1.5
}
def __init__(self, historical_seasons: List[PlayerSeason]):
"""Initialize with historical data for comparison."""
self.historical_data = self._prepare_historical_data(historical_seasons)
self.feature_stats = self._calculate_feature_statistics()
def _prepare_historical_data(self, seasons: List[PlayerSeason]) -> pd.DataFrame:
"""Convert player seasons to DataFrame for efficient computation."""
records = []
for ps in seasons:
record = {
'player_id': ps.player_id,
'player_name': ps.player_name,
'season': ps.season,
'age': ps.age
}
for feature in self.SIMILARITY_FEATURES:
record[feature] = getattr(ps, feature, None)
records.append(record)
return pd.DataFrame(records)
def _calculate_feature_statistics(self) -> Dict[str, Dict[str, float]]:
"""Calculate mean and std for each feature for standardization."""
stats = {}
for feature in self.SIMILARITY_FEATURES:
values = self.historical_data[feature].dropna()
stats[feature] = {
'mean': values.mean(),
'std': values.std()
}
return stats
def calculate_similarity(self, player1: PlayerSeason,
player2: PlayerSeason,
age_weight: float = 0.1) -> float:
"""
Calculate similarity score between two player-seasons.
Returns score from 0 to 1 (1 = identical).
"""
total_weighted_diff = 0.0
total_weight = 0.0
for feature in self.SIMILARITY_FEATURES:
val1 = getattr(player1, feature, None)
val2 = getattr(player2, feature, None)
if val1 is None or val2 is None:
continue
# Standardize difference
std = self.feature_stats[feature]['std']
if std > 0:
standardized_diff = abs(val1 - val2) / std
else:
standardized_diff = 0.0
weight = self.FEATURE_WEIGHTS.get(feature, 1.0)
total_weighted_diff += weight * standardized_diff
total_weight += weight
# Add age difference penalty
age_diff = abs(player1.age - player2.age)
total_weighted_diff += age_weight * age_diff
total_weight += age_weight
if total_weight == 0:
return 0.0
# Convert to similarity score (0 to 1)
avg_diff = total_weighted_diff / total_weight
similarity = np.exp(-avg_diff)
return similarity
def find_similar_players(self, target_player: PlayerSeason,
n_similar: int = 10,
age_range: Tuple[float, float] = None,
min_season: int = None,
exclude_self: bool = True) -> List[Dict]:
"""
Find most similar historical player-seasons.
Args:
target_player: Player-season to find comparables for
n_similar: Number of similar players to return
age_range: Optional (min_age, max_age) filter
min_season: Optional minimum season year
exclude_self: Whether to exclude the target player
"""
similarities = []
for _, row in self.historical_data.iterrows():
# Apply filters
if exclude_self and row['player_id'] == target_player.player_id:
continue
if age_range:
if row['age'] < age_range[0] or row['age'] > age_range[1]:
continue
if min_season and row['season'] < min_season:
continue
# Create PlayerSeason for comparison
comp_player = self._row_to_player_season(row)
similarity = self.calculate_similarity(target_player, comp_player)
similarities.append({
'player_id': row['player_id'],
'player_name': row['player_name'],
'season': row['season'],
'age': row['age'],
'similarity': similarity
})
# Sort by similarity and return top N
similarities.sort(key=lambda x: x['similarity'], reverse=True)
return similarities[:n_similar]
def _row_to_player_season(self, row: pd.Series) -> PlayerSeason:
"""Convert DataFrame row to PlayerSeason object."""
return PlayerSeason(
player_id=row['player_id'],
player_name=row['player_name'],
season=int(row['season']),
age=row['age'],
games_played=row.get('games_played', 70),
minutes_per_game=row.get('minutes_per_game', 30),
points_per_game=row.get('points_per_game', 0),
rebounds_per_game=row.get('rebounds_per_game', 0),
assists_per_game=row.get('assists_per_game', 0),
steals_per_game=row.get('steals_per_game', 0),
blocks_per_game=row.get('blocks_per_game', 0),
turnovers_per_game=row.get('turnovers_per_game', 0),
fg_pct=row.get('fg_pct', 0.45),
three_pct=row.get('three_pct', 0.35),
ft_pct=row.get('ft_pct', 0.75),
true_shooting_pct=row.get('true_shooting_pct', 0.55),
usage_rate=row.get('usage_rate', 0.20),
per=row.get('per', 15),
box_plus_minus=row.get('box_plus_minus', 0),
vorp=row.get('vorp', 1),
win_shares=row.get('win_shares', 3)
)
def project_from_comparables(self, target_player: PlayerSeason,
target_stat: str,
years_ahead: int = 1,
n_comparables: int = 20) -> Dict[str, float]:
"""
Project future performance based on how similar players developed.
"""
# Find similar players at same age
comparables = self.find_similar_players(
target_player,
n_similar=n_comparables,
age_range=(target_player.age - 1, target_player.age + 1)
)
future_values = []
weights = []
for comp in comparables:
# Find this player's performance years_ahead later
future_data = self.historical_data[
(self.historical_data['player_id'] == comp['player_id']) &
(self.historical_data['season'] == comp['season'] + years_ahead)
]
if not future_data.empty:
future_value = future_data[target_stat].iloc[0]
future_values.append(future_value)
weights.append(comp['similarity'])
if not future_values:
return {'projection': None, 'confidence': 0.0, 'n_comparables': 0}
weights = np.array(weights)
weights = weights / weights.sum() # Normalize
weighted_mean = np.average(future_values, weights=weights)
weighted_std = np.sqrt(np.average(
(np.array(future_values) - weighted_mean) ** 2, weights=weights
))
return {
'projection': weighted_mean,
'std': weighted_std,
'confidence_interval': (
weighted_mean - 1.96 * weighted_std,
weighted_mean + 1.96 * weighted_std
),
'n_comparables': len(future_values)
}
22.5 CARMELO and RAPTOR Projection Systems
CARMELO Methodology
FiveThirtyEight's CARMELO (Career-Arc Regression Model Estimator with Local Optimization) system projects player careers based on statistical similarity to historical players:
class CARMELOProjection:
"""
Implementation of CARMELO-style projection methodology.
CARMELO projects careers by:
1. Finding similar historical players (comparables)
2. Weighting comparables by similarity
3. Tracking how comparables' careers evolved
4. Projecting based on weighted average outcomes
"""
def __init__(self, historical_data: Dict[str, List[PlayerSeason]],
similarity_engine: PlayerSimilarityEngine):
self.historical_data = historical_data
self.similarity_engine = similarity_engine
self.aging_analyzer = AgingCurveAnalyzer()
def generate_projection(self, player: PlayerSeason,
projection_years: int = 5,
n_comparables: int = 10) -> Dict:
"""
Generate multi-year projection for a player.
Returns projections, comparables used, and uncertainty estimates.
"""
# Find comparable players
comparables = self.similarity_engine.find_similar_players(
player,
n_similar=n_comparables,
age_range=(player.age - 1.5, player.age + 1.5)
)
projections = []
for year in range(1, projection_years + 1):
year_projection = self._project_year(
player, comparables, year
)
projections.append({
'year': year,
'season': player.season + year,
'age': player.age + year,
**year_projection
})
# Calculate career value projections
career_metrics = self._calculate_career_metrics(projections)
return {
'player_id': player.player_id,
'player_name': player.player_name,
'base_season': player.season,
'base_age': player.age,
'comparables': comparables[:5], # Top 5 for display
'yearly_projections': projections,
'career_metrics': career_metrics
}
def _project_year(self, player: PlayerSeason,
comparables: List[Dict],
years_ahead: int) -> Dict[str, float]:
"""Project statistics for a specific year."""
stat_projections = {}
for stat in ['points_per_game', 'rebounds_per_game', 'assists_per_game',
'per', 'box_plus_minus', 'win_shares', 'vorp']:
weighted_values = []
weights = []
for comp in comparables:
# Get comparable's value at years_ahead from their similar season
comp_player_id = comp['player_id']
comp_season = comp['season']
if comp_player_id in self.historical_data:
future_seasons = [
s for s in self.historical_data[comp_player_id]
if s.season == comp_season + years_ahead
]
if future_seasons:
future_value = getattr(future_seasons[0], stat)
weighted_values.append(future_value)
weights.append(comp['similarity'])
if weighted_values:
weights = np.array(weights)
weights = weights / weights.sum()
projection = np.average(weighted_values, weights=weights)
std = np.sqrt(np.average(
(np.array(weighted_values) - projection) ** 2,
weights=weights
))
stat_projections[stat] = projection
stat_projections[f'{stat}_std'] = std
else:
# Fall back to aging curve adjustment
current_value = getattr(player, stat, 0)
aging_adj = self.aging_analyzer.get_aging_adjustment(
stat, player.age, player.age + years_ahead
)
stat_projections[stat] = current_value + aging_adj
stat_projections[f'{stat}_std'] = current_value * 0.15
# Estimate probability of still being in league
stat_projections['probability_in_league'] = self._estimate_survival_probability(
player, years_ahead, comparables
)
return stat_projections
def _estimate_survival_probability(self, player: PlayerSeason,
years_ahead: int,
comparables: List[Dict]) -> float:
"""Estimate probability player is still in league."""
still_active = 0
total_weight = 0
for comp in comparables:
comp_player_id = comp['player_id']
comp_season = comp['season']
if comp_player_id in self.historical_data:
future_seasons = [
s for s in self.historical_data[comp_player_id]
if s.season >= comp_season + years_ahead
]
if future_seasons:
still_active += comp['similarity']
total_weight += comp['similarity']
if total_weight == 0:
# Base rate by age
age = player.age + years_ahead
return max(0.1, 1.0 - 0.1 * (age - 30))
return still_active / total_weight
def _calculate_career_metrics(self, projections: List[Dict]) -> Dict[str, float]:
"""Calculate summary career metrics from projections."""
# Expected remaining WAR (wins above replacement)
expected_war = 0
for proj in projections:
# Approximate WAR from box plus minus and minutes
prob = proj.get('probability_in_league', 1.0)
bpm = proj.get('box_plus_minus', 0)
# Rough WAR approximation: BPM * 2.7 * (minutes/48) * games
war_estimate = bpm * 2.7 * (30/48) * 75 / 100 # Simplified
expected_war += prob * war_estimate
# Market value estimate (based on projected WAR)
# Very rough: ~$3M per WAR in current market
market_value_estimate = expected_war * 3_000_000
return {
'expected_remaining_war': expected_war,
'market_value_estimate': market_value_estimate,
'peak_projection_year': max(
projections,
key=lambda x: x.get('box_plus_minus', 0) * x.get('probability_in_league', 0)
)['year']
}
RAPTOR Projections
RAPTOR (Robust Algorithm using Player Tracking and On/Off Ratings) builds on play-by-play and tracking data:
class RAPTORProjection:
"""
RAPTOR-style projection system.
RAPTOR differs from box-score based systems by:
1. Incorporating on/off court impact
2. Using player tracking data where available
3. Separating offensive and defensive contributions
4. Adjusting for teammate and opponent quality
"""
def __init__(self):
self.offensive_model = None
self.defensive_model = None
self.war_model = None
def calculate_raptor_components(self, player_season: PlayerSeason,
on_off_data: Optional[Dict] = None,
tracking_data: Optional[Dict] = None) -> Dict:
"""
Calculate RAPTOR offensive and defensive ratings.
If tracking data unavailable, estimates from box score.
"""
# Box score component
box_offense = self._estimate_box_offense(player_season)
box_defense = self._estimate_box_defense(player_season)
# On/off component (if available)
if on_off_data:
onoff_offense = on_off_data.get('offensive_on_off', 0)
onoff_defense = on_off_data.get('defensive_on_off', 0)
else:
# Estimate from box score with more uncertainty
onoff_offense = box_offense * 0.3
onoff_defense = box_defense * 0.3
# Combine components
# RAPTOR uses roughly 50/50 box/on-off blend
raptor_offense = 0.5 * box_offense + 0.5 * onoff_offense
raptor_defense = 0.5 * box_defense + 0.5 * onoff_defense
# Total RAPTOR
raptor_total = raptor_offense + raptor_defense
# WAR calculation
# Points above average * possessions * games / 100
minutes_pct = player_season.minutes_per_game / 48.0
war = (raptor_total * minutes_pct * player_season.games_played *
100 / 2000) # Roughly 2000 total team possessions
return {
'raptor_offense': raptor_offense,
'raptor_defense': raptor_defense,
'raptor_total': raptor_total,
'war': war,
'box_component': {
'offense': box_offense,
'defense': box_defense
}
}
def _estimate_box_offense(self, ps: PlayerSeason) -> float:
"""Estimate offensive RAPTOR component from box score."""
# Simplified offensive RAPTOR approximation
# Based on scoring, efficiency, playmaking
scoring_value = (ps.points_per_game / ps.minutes_per_game * 36 - 11) * 0.5
efficiency_value = (ps.true_shooting_pct - 0.55) * 15
playmaking_value = (ps.assists_per_game / ps.minutes_per_game * 36 - 2.5) * 0.8
turnover_penalty = (ps.turnovers_per_game / ps.minutes_per_game * 36 - 1.5) * -0.5
return scoring_value + efficiency_value + playmaking_value + turnover_penalty
def _estimate_box_defense(self, ps: PlayerSeason) -> float:
"""Estimate defensive RAPTOR component from box score."""
# Box score is poor at capturing defense
# Use stocks (steals + blocks) and rebounds as proxy
stocks_value = ((ps.steals_per_game + ps.blocks_per_game) /
ps.minutes_per_game * 36 - 1.2) * 1.0
# Defensive rebounds (estimate as 75% of total rebounds)
dreb_value = (ps.rebounds_per_game * 0.75 / ps.minutes_per_game * 36 - 3.5) * 0.3
# Default slight positive for starters, slight negative otherwise
baseline = 0.5 if ps.minutes_per_game >= 25 else -0.5
return stocks_value + dreb_value + baseline
def project_raptor(self, player_season: PlayerSeason,
years_ahead: int = 1,
aging_analyzer: Optional[AgingCurveAnalyzer] = None) -> Dict:
"""
Project future RAPTOR ratings.
"""
current_raptor = self.calculate_raptor_components(player_season)
# Apply aging adjustments
if aging_analyzer:
offense_aging = aging_analyzer.get_aging_adjustment(
'box_plus_minus', player_season.age,
player_season.age + years_ahead
) * 0.6 # Offense ages slightly slower
defense_aging = aging_analyzer.get_aging_adjustment(
'box_plus_minus', player_season.age,
player_season.age + years_ahead
) * 1.2 # Defense ages faster
else:
# Default aging assumptions
age_factor = player_season.age + years_ahead
if age_factor <= 27:
offense_aging = 0.3 * (27 - age_factor)
defense_aging = 0.2 * (27 - age_factor)
else:
offense_aging = -0.15 * (age_factor - 27)
defense_aging = -0.25 * (age_factor - 27)
projected_offense = current_raptor['raptor_offense'] + offense_aging
projected_defense = current_raptor['raptor_defense'] + defense_aging
# Apply regression to mean
regression_factor = 0.8 ** years_ahead # More regression further out
projected_offense = regression_factor * projected_offense
projected_defense = regression_factor * projected_defense
projected_total = projected_offense + projected_defense
# Projected WAR (assuming similar minutes)
minutes_pct = player_season.minutes_per_game / 48.0
# Reduce expected games for older players/further projections
expected_games = min(75, player_season.games_played) * (0.95 ** years_ahead)
projected_war = projected_total * minutes_pct * expected_games * 100 / 2000
return {
'projected_raptor_offense': projected_offense,
'projected_raptor_defense': projected_defense,
'projected_raptor_total': projected_total,
'projected_war': projected_war,
'projection_year': player_season.season + years_ahead,
'projected_age': player_season.age + years_ahead
}
22.6 Marcel-Style Projections
Weighted Average Methodology
Marcel projections, named after baseball analyst Tom Tango's system, use a simple but effective weighted average approach:
class MarcelProjection:
"""
Marcel-style projection system.
The Marcel method uses:
1. Weighted average of last 3 seasons (5/4/3 weighting)
2. Regression toward mean based on playing time
3. Age adjustment
4. Simple and transparent
"""
# Season weights (most recent first)
SEASON_WEIGHTS = [5, 4, 3]
# Regression targets (league average talent level)
REGRESSION_TARGETS = {
'points_per_game': 10.0,
'rebounds_per_game': 4.0,
'assists_per_game': 2.0,
'steals_per_game': 0.7,
'blocks_per_game': 0.4,
'fg_pct': 0.450,
'three_pct': 0.350,
'ft_pct': 0.760,
'true_shooting_pct': 0.550,
'per': 13.0,
'box_plus_minus': -1.5, # Replacement level
'win_shares': 2.0,
}
# Playing time for full reliability (in minutes per season)
RELIABILITY_MINUTES = 1500
def __init__(self, aging_analyzer: Optional[AgingCurveAnalyzer] = None):
self.aging_analyzer = aging_analyzer
def project(self, player_seasons: List[PlayerSeason],
target_season: int) -> Dict[str, float]:
"""
Generate Marcel projection for all stats.
Args:
player_seasons: List of player's historical seasons
target_season: Season year to project
"""
# Sort seasons, most recent first
seasons = sorted(player_seasons, key=lambda x: x.season, reverse=True)
# Only use seasons before target
seasons = [s for s in seasons if s.season < target_season]
if not seasons:
return {}
# Limit to 3 most recent seasons
seasons = seasons[:3]
projections = {}
for stat in self.REGRESSION_TARGETS.keys():
projection = self._project_stat(seasons, stat, target_season)
projections[stat] = projection
# Add metadata
projections['_base_seasons'] = [s.season for s in seasons]
projections['_projected_season'] = target_season
projections['_projected_age'] = seasons[0].age + (target_season - seasons[0].season)
return projections
def _project_stat(self, seasons: List[PlayerSeason],
stat: str, target_season: int) -> float:
"""Project a single statistic."""
weighted_sum = 0.0
weight_sum = 0.0
total_minutes = 0.0
for i, season in enumerate(seasons):
value = getattr(season, stat, None)
if value is None:
continue
# Weight by season recency
season_weight = self.SEASON_WEIGHTS[i] if i < len(self.SEASON_WEIGHTS) else 1
# Weight by playing time
season_minutes = season.games_played * season.minutes_per_game
minutes_weight = min(1.0, season_minutes / self.RELIABILITY_MINUTES)
combined_weight = season_weight * minutes_weight
weighted_sum += value * combined_weight
weight_sum += combined_weight
total_minutes += season_minutes * season_weight
if weight_sum == 0:
return self.REGRESSION_TARGETS.get(stat, 0)
# Weighted average
weighted_avg = weighted_sum / weight_sum
# Regression to mean
# More regression with less playing time
reliability = min(1.0, total_minutes / (self.RELIABILITY_MINUTES * sum(self.SEASON_WEIGHTS[:len(seasons)])))
regression_target = self.REGRESSION_TARGETS.get(stat, weighted_avg)
regressed = reliability * weighted_avg + (1 - reliability) * regression_target
# Age adjustment
if self.aging_analyzer and seasons:
current_age = seasons[0].age
target_age = current_age + (target_season - seasons[0].season)
aging_adj = self.aging_analyzer.get_aging_adjustment(
stat, current_age, target_age
)
regressed += aging_adj
else:
# Simple age adjustment
current_age = seasons[0].age if seasons else 27
target_age = current_age + (target_season - seasons[0].season)
if target_age > 30:
regressed *= (1 - 0.02 * (target_age - 30)) # 2% decline per year after 30
return max(0, regressed) # Floor at 0
def project_with_uncertainty(self, player_seasons: List[PlayerSeason],
target_season: int,
historical_data: Optional[Dict[str, List[PlayerSeason]]] = None) -> Dict:
"""
Generate projection with uncertainty estimates.
"""
point_projection = self.project(player_seasons, target_season)
# Estimate uncertainty from historical projection errors
# or use default uncertainty levels
uncertainty = {}
for stat in self.REGRESSION_TARGETS.keys():
if stat in point_projection:
# Default: uncertainty proportional to stat magnitude
# and inversely proportional to reliability
base_uncertainty = abs(point_projection[stat]) * 0.15
# Increase uncertainty for volatile stats
volatility_multiplier = {
'three_pct': 1.5,
'steals_per_game': 1.3,
'blocks_per_game': 1.3,
'box_plus_minus': 1.2,
}.get(stat, 1.0)
uncertainty[stat] = base_uncertainty * volatility_multiplier
return {
'projection': point_projection,
'uncertainty': uncertainty,
'confidence_intervals': {
stat: (
point_projection[stat] - 1.96 * uncertainty.get(stat, 0),
point_projection[stat] + 1.96 * uncertainty.get(stat, 0)
)
for stat in point_projection if not stat.startswith('_')
}
}
22.7 Projection Uncertainty and Confidence Intervals
Quantifying Uncertainty
All projections carry uncertainty. Properly quantifying this uncertainty is crucial for decision-making:
class ProjectionUncertainty:
"""
Methods for quantifying and communicating projection uncertainty.
"""
def __init__(self):
# Historical standard errors by stat (from validation studies)
self.base_standard_errors = {
'points_per_game': 2.5,
'rebounds_per_game': 1.2,
'assists_per_game': 1.0,
'true_shooting_pct': 0.03,
'per': 3.0,
'box_plus_minus': 1.8,
'win_shares': 2.0,
'vorp': 1.0,
}
def calculate_prediction_interval(self, projection: float,
stat: str,
player_reliability: float = 1.0,
years_ahead: int = 1,
confidence: float = 0.90) -> Tuple[float, float]:
"""
Calculate prediction interval for a projected value.
Args:
projection: Point projection
stat: Statistic being projected
player_reliability: 0-1 score of how reliable player's stats are
years_ahead: How many years into the future
confidence: Confidence level (e.g., 0.90 for 90% CI)
"""
base_se = self.base_standard_errors.get(stat, projection * 0.2)
# Adjust for reliability (less reliable = wider interval)
reliability_adjustment = 1 / max(0.3, player_reliability)
# Adjust for projection distance (further = wider)
distance_adjustment = 1 + 0.2 * (years_ahead - 1)
adjusted_se = base_se * reliability_adjustment * distance_adjustment
# Get z-score for confidence level
z = stats.norm.ppf((1 + confidence) / 2)
lower = projection - z * adjusted_se
upper = projection + z * adjusted_se
return (lower, upper)
def monte_carlo_projection(self, base_projection: Dict[str, float],
covariance_matrix: np.ndarray,
n_simulations: int = 10000) -> Dict:
"""
Generate distribution of possible outcomes using Monte Carlo simulation.
This captures correlations between statistics (e.g., if points up,
likely usage is also up).
"""
stats = list(base_projection.keys())
means = np.array([base_projection[s] for s in stats])
# Generate correlated samples
samples = np.random.multivariate_normal(means, covariance_matrix, n_simulations)
results = {
'stats': stats,
'samples': samples,
'percentiles': {},
'probability_above': {}
}
for i, stat in enumerate(stats):
stat_samples = samples[:, i]
results['percentiles'][stat] = {
'10th': np.percentile(stat_samples, 10),
'25th': np.percentile(stat_samples, 25),
'50th': np.percentile(stat_samples, 50),
'75th': np.percentile(stat_samples, 75),
'90th': np.percentile(stat_samples, 90)
}
return results
def calculate_player_reliability(self, player_seasons: List[PlayerSeason]) -> float:
"""
Calculate reliability score based on sample size and consistency.
"""
if not player_seasons:
return 0.0
# Factor 1: Total minutes played
total_minutes = sum(s.games_played * s.minutes_per_game for s in player_seasons)
minutes_factor = min(1.0, total_minutes / 5000) # Full reliability at 5000 minutes
# Factor 2: Consistency across seasons
if len(player_seasons) >= 2:
ppg_values = [s.points_per_game for s in player_seasons]
consistency = 1 - min(1, np.std(ppg_values) / (np.mean(ppg_values) + 1))
else:
consistency = 0.5
# Factor 3: Number of seasons
seasons_factor = min(1.0, len(player_seasons) / 4)
reliability = (minutes_factor * 0.4 + consistency * 0.3 + seasons_factor * 0.3)
return reliability
class ProjectionEnsemble:
"""
Combine multiple projection methods into ensemble prediction.
Ensemble methods typically outperform individual methods by
averaging out individual model errors.
"""
def __init__(self, models: List[Tuple[str, object, float]]):
"""
Initialize with list of (name, model, weight) tuples.
"""
self.models = models
self._normalize_weights()
def _normalize_weights(self):
"""Ensure weights sum to 1."""
total_weight = sum(w for _, _, w in self.models)
self.models = [(n, m, w/total_weight) for n, m, w in self.models]
def project(self, player_seasons: List[PlayerSeason],
target_season: int, target_stat: str) -> Dict:
"""
Generate ensemble projection from all models.
"""
projections = []
weights = []
for name, model, weight in self.models:
try:
if hasattr(model, 'project'):
result = model.project(player_seasons, target_season)
if isinstance(result, dict) and target_stat in result:
proj = result[target_stat]
else:
proj = result
else:
proj = model.predict(player_seasons, target_season, target_stat)
if proj is not None:
projections.append((name, proj))
weights.append(weight)
except Exception as e:
print(f"Warning: {name} failed: {e}")
if not projections:
return {'ensemble_projection': None, 'individual_projections': {}}
# Normalize weights for successful models
weights = np.array(weights)
weights = weights / weights.sum()
# Weighted average
ensemble_proj = sum(p * w for (_, p), w in zip(projections, weights))
# Uncertainty from model disagreement
proj_values = [p for _, p in projections]
model_disagreement = np.std(proj_values)
return {
'ensemble_projection': ensemble_proj,
'individual_projections': dict(projections),
'model_weights': dict(zip([n for n, _ in projections], weights)),
'model_disagreement': model_disagreement,
'confidence_interval': (
ensemble_proj - 1.96 * model_disagreement,
ensemble_proj + 1.96 * model_disagreement
)
}
22.8 Evaluating Projection Accuracy
Backtesting Projection Systems
Rigorous evaluation requires testing on held-out historical data:
class ProjectionEvaluator:
"""
Evaluate projection system accuracy through backtesting.
"""
def __init__(self, projection_model):
self.model = projection_model
self.evaluation_results = []
def backtest(self, all_player_data: Dict[str, List[PlayerSeason]],
test_seasons: List[int],
target_stat: str,
min_games: int = 40) -> Dict:
"""
Backtest projection model on historical seasons.
Args:
all_player_data: All historical player data
test_seasons: Seasons to use as test set
target_stat: Statistic to evaluate
min_games: Minimum games to include in evaluation
"""
predictions = []
actuals = []
player_info = []
for test_season in test_seasons:
for player_id, seasons in all_player_data.items():
# Find test season data
test_data = [s for s in seasons if s.season == test_season
and s.games_played >= min_games]
if not test_data:
continue
test_actual = test_data[0]
# Get training data (seasons before test)
train_data = [s for s in seasons if s.season < test_season]
if not train_data:
continue
# Generate projection
try:
projection = self.model.project(train_data, test_season)
if isinstance(projection, dict):
pred_value = projection.get(target_stat)
else:
pred_value = projection
if pred_value is not None:
actual_value = getattr(test_actual, target_stat)
predictions.append(pred_value)
actuals.append(actual_value)
player_info.append({
'player_id': player_id,
'player_name': test_actual.player_name,
'season': test_season,
'age': test_actual.age
})
except Exception:
continue
predictions = np.array(predictions)
actuals = np.array(actuals)
# Calculate metrics
metrics = self._calculate_metrics(predictions, actuals)
# Analyze by subgroups
subgroup_analysis = self._analyze_subgroups(
predictions, actuals, player_info
)
return {
'overall_metrics': metrics,
'subgroup_analysis': subgroup_analysis,
'n_predictions': len(predictions),
'test_seasons': test_seasons,
'target_stat': target_stat
}
def _calculate_metrics(self, predictions: np.ndarray,
actuals: np.ndarray) -> Dict[str, float]:
"""Calculate comprehensive evaluation metrics."""
errors = predictions - actuals
return {
'rmse': np.sqrt(np.mean(errors ** 2)),
'mae': np.mean(np.abs(errors)),
'mape': np.mean(np.abs(errors / (actuals + 0.001))) * 100,
'correlation': np.corrcoef(predictions, actuals)[0, 1],
'r_squared': 1 - np.sum(errors ** 2) / np.sum((actuals - actuals.mean()) ** 2),
'mean_error': np.mean(errors), # Bias
'median_error': np.median(errors),
'error_std': np.std(errors),
'max_abs_error': np.max(np.abs(errors))
}
def _analyze_subgroups(self, predictions: np.ndarray,
actuals: np.ndarray,
player_info: List[Dict]) -> Dict:
"""Analyze prediction accuracy by subgroup."""
# By age group
age_groups = {
'young (21-25)': [],
'prime (26-30)': [],
'veteran (31+)': []
}
for i, info in enumerate(player_info):
age = info['age']
if age <= 25:
age_groups['young (21-25)'].append(i)
elif age <= 30:
age_groups['prime (26-30)'].append(i)
else:
age_groups['veteran (31+)'].append(i)
subgroup_metrics = {}
for group_name, indices in age_groups.items():
if len(indices) >= 10:
group_preds = predictions[indices]
group_actuals = actuals[indices]
subgroup_metrics[group_name] = self._calculate_metrics(
group_preds, group_actuals
)
# By performance tier
median_actual = np.median(actuals)
above_median = actuals >= median_actual
below_median = ~above_median
subgroup_metrics['above_median'] = self._calculate_metrics(
predictions[above_median], actuals[above_median]
)
subgroup_metrics['below_median'] = self._calculate_metrics(
predictions[below_median], actuals[below_median]
)
return subgroup_metrics
def compare_models(self, models: Dict[str, object],
all_player_data: Dict[str, List[PlayerSeason]],
test_seasons: List[int],
target_stat: str) -> pd.DataFrame:
"""
Compare multiple projection models on same test set.
"""
results = []
for model_name, model in models.items():
evaluator = ProjectionEvaluator(model)
metrics = evaluator.backtest(
all_player_data, test_seasons, target_stat
)
result = {'model': model_name}
result.update(metrics['overall_metrics'])
results.append(result)
return pd.DataFrame(results).sort_values('rmse')
22.9 Advanced Topics in Player Projection
Injury Risk Modeling
Injuries represent a major source of projection uncertainty:
class InjuryRiskModel:
"""
Model injury risk and its impact on projections.
"""
# Base injury rates by age
BASE_INJURY_RATES = {
21: 0.08, 22: 0.08, 23: 0.09, 24: 0.09, 25: 0.10,
26: 0.11, 27: 0.12, 28: 0.13, 29: 0.14, 30: 0.15,
31: 0.17, 32: 0.19, 33: 0.22, 34: 0.25, 35: 0.28,
36: 0.32, 37: 0.36, 38: 0.40, 39: 0.45, 40: 0.50
}
def estimate_injury_probability(self, player_season: PlayerSeason,
injury_history: List[Dict] = None) -> float:
"""
Estimate probability of significant injury next season.
Args:
player_season: Current player statistics
injury_history: List of past injuries with severity
"""
# Base rate by age
age = int(round(player_season.age))
base_rate = self.BASE_INJURY_RATES.get(age, 0.20)
# Adjust for injury history
if injury_history:
recent_injuries = [inj for inj in injury_history
if inj.get('seasons_ago', 10) <= 3]
# Each recent injury increases risk
history_adjustment = len(recent_injuries) * 0.05
# Major injuries have lasting impact
major_injuries = [inj for inj in recent_injuries
if inj.get('severity', 'minor') == 'major']
major_adjustment = len(major_injuries) * 0.08
base_rate += history_adjustment + major_adjustment
# Adjust for playing time (more minutes = more exposure)
minutes_adjustment = (player_season.minutes_per_game - 25) * 0.005
base_rate += max(-0.05, min(0.05, minutes_adjustment))
# Cap at reasonable range
return max(0.05, min(0.60, base_rate))
def adjust_projection_for_injury_risk(self, projection: Dict[str, float],
injury_probability: float) -> Dict[str, float]:
"""
Adjust projected statistics for injury risk.
Returns expected values accounting for injury probability.
"""
adjusted = {}
# Assume injury causes 30% reduction in season value on average
injury_impact = 0.70
for stat, value in projection.items():
if stat.startswith('_'):
adjusted[stat] = value
else:
# Expected value = P(healthy) * full_value + P(injured) * reduced_value
expected_value = (
(1 - injury_probability) * value +
injury_probability * value * injury_impact
)
adjusted[stat] = expected_value
adjusted['injury_probability'] = injury_probability
adjusted['healthy_projection'] = projection
return adjusted
class ContextAdjustedProjection:
"""
Adjust projections for context changes (team, role, etc.).
"""
def adjust_for_team_change(self, projection: Dict[str, float],
old_team_pace: float,
new_team_pace: float,
old_team_off_rating: float,
new_team_off_rating: float) -> Dict[str, float]:
"""
Adjust counting stats for team pace and quality differences.
"""
adjusted = projection.copy()
# Pace adjustment for counting stats
pace_ratio = new_team_pace / old_team_pace
pace_adjusted_stats = ['points_per_game', 'rebounds_per_game',
'assists_per_game', 'steals_per_game',
'blocks_per_game', 'turnovers_per_game']
for stat in pace_adjusted_stats:
if stat in adjusted:
adjusted[stat] = adjusted[stat] * pace_ratio
# Quality adjustment for efficiency stats
quality_diff = new_team_off_rating - old_team_off_rating
# Better team = slightly better efficiency
efficiency_boost = quality_diff * 0.002 # 0.2% TS per point of ORtg
if 'true_shooting_pct' in adjusted:
adjusted['true_shooting_pct'] += efficiency_boost
return adjusted
def adjust_for_role_change(self, projection: Dict[str, float],
expected_usage_change: float,
expected_minutes_change: float) -> Dict[str, float]:
"""
Adjust for expected changes in role/usage.
Usage increase typically leads to efficiency decrease.
"""
adjusted = projection.copy()
# Usage-efficiency tradeoff
# ~1% TS decrease per 5% usage increase
if 'true_shooting_pct' in adjusted:
ts_adjustment = -0.01 * (expected_usage_change / 5)
adjusted['true_shooting_pct'] += ts_adjustment
# Volume stats scale with usage
if expected_usage_change != 0:
usage_ratio = 1 + expected_usage_change / 100
volume_stats = ['points_per_game', 'turnovers_per_game']
for stat in volume_stats:
if stat in adjusted:
adjusted[stat] *= usage_ratio
# Win shares and WAR scale with minutes
if expected_minutes_change != 0:
minutes_ratio = 1 + expected_minutes_change / 100
cumulative_stats = ['win_shares', 'vorp']
for stat in cumulative_stats:
if stat in adjusted:
adjusted[stat] *= minutes_ratio
return adjusted
Complete Projection Pipeline
Bringing together all components into a complete system:
class ComprehensiveProjectionSystem:
"""
Complete player projection pipeline combining all methods.
"""
def __init__(self, historical_data: Dict[str, List[PlayerSeason]]):
self.historical_data = historical_data
# Initialize component models
self.aging_analyzer = AgingCurveAnalyzer()
self.similarity_engine = None # Initialize with historical seasons
self.marcel = MarcelProjection(self.aging_analyzer)
self.uncertainty_calculator = ProjectionUncertainty()
self.injury_model = InjuryRiskModel()
# Fit aging curves
self._fit_aging_curves()
def _fit_aging_curves(self):
"""Fit aging curves to historical data."""
stats_to_fit = ['points_per_game', 'rebounds_per_game',
'assists_per_game', 'per', 'box_plus_minus']
for stat in stats_to_fit:
self.aging_analyzer.fit_aging_curve(self.historical_data, stat)
def generate_complete_projection(self, player_id: str,
target_season: int,
context_adjustments: Optional[Dict] = None) -> Dict:
"""
Generate comprehensive projection with all components.
"""
if player_id not in self.historical_data:
return {'error': f'No data for player {player_id}'}
player_seasons = self.historical_data[player_id]
recent_season = max(player_seasons, key=lambda x: x.season)
# 1. Base projection (Marcel method)
base_projection = self.marcel.project_with_uncertainty(
player_seasons, target_season
)
# 2. Similarity-based projection (if engine available)
similarity_projection = None
if self.similarity_engine:
for stat in ['per', 'box_plus_minus']:
sim_proj = self.similarity_engine.project_from_comparables(
recent_season, stat,
years_ahead=target_season - recent_season.season
)
if similarity_projection is None:
similarity_projection = {}
similarity_projection[stat] = sim_proj
# 3. Injury risk adjustment
injury_prob = self.injury_model.estimate_injury_probability(recent_season)
injury_adjusted = self.injury_model.adjust_projection_for_injury_risk(
base_projection['projection'], injury_prob
)
# 4. Context adjustments (if provided)
if context_adjustments:
final_projection = self._apply_context_adjustments(
injury_adjusted, context_adjustments
)
else:
final_projection = injury_adjusted
# 5. Calculate confidence intervals
reliability = self.uncertainty_calculator.calculate_player_reliability(
player_seasons
)
confidence_intervals = {}
years_ahead = target_season - recent_season.season
for stat in ['points_per_game', 'rebounds_per_game', 'assists_per_game',
'per', 'box_plus_minus']:
if stat in final_projection:
ci = self.uncertainty_calculator.calculate_prediction_interval(
final_projection[stat], stat, reliability, years_ahead
)
confidence_intervals[stat] = ci
return {
'player_id': player_id,
'player_name': recent_season.player_name,
'base_season': recent_season.season,
'target_season': target_season,
'projected_age': recent_season.age + years_ahead,
'projection': final_projection,
'base_projection': base_projection,
'similarity_projection': similarity_projection,
'injury_probability': injury_prob,
'confidence_intervals': confidence_intervals,
'reliability_score': reliability
}
def _apply_context_adjustments(self, projection: Dict,
adjustments: Dict) -> Dict:
"""Apply team/role context adjustments."""
context_adjuster = ContextAdjustedProjection()
adjusted = projection.copy()
if 'team_change' in adjustments:
tc = adjustments['team_change']
adjusted = context_adjuster.adjust_for_team_change(
adjusted,
tc.get('old_pace', 100), tc.get('new_pace', 100),
tc.get('old_off_rating', 110), tc.get('new_off_rating', 110)
)
if 'role_change' in adjustments:
rc = adjustments['role_change']
adjusted = context_adjuster.adjust_for_role_change(
adjusted,
rc.get('usage_change', 0),
rc.get('minutes_change', 0)
)
return adjusted
def generate_multi_year_projection(self, player_id: str,
start_season: int,
n_years: int = 5) -> List[Dict]:
"""Generate projections for multiple future seasons."""
projections = []
for year in range(n_years):
target_season = start_season + year
proj = self.generate_complete_projection(player_id, target_season)
projections.append(proj)
return projections
22.10 Practical Applications
Contract Valuation
Projections enable objective contract valuation:
class ContractValuation:
"""
Use projections to estimate fair contract value.
"""
# Approximate dollars per WAR in current market
DOLLARS_PER_WAR = 3_500_000
def __init__(self, projection_system: ComprehensiveProjectionSystem):
self.projector = projection_system
def estimate_contract_value(self, player_id: str,
contract_years: int,
salary_cap: float = 140_000_000) -> Dict:
"""
Estimate fair contract value based on projected performance.
"""
current_season = 2024 # Would be dynamic in practice
projections = self.projector.generate_multi_year_projection(
player_id, current_season + 1, contract_years
)
total_value = 0
yearly_values = []
for proj in projections:
if 'error' in proj:
continue
# Estimate WAR from projection
bpm = proj['projection'].get('box_plus_minus', 0)
injury_prob = proj['injury_probability']
# Rough WAR estimate
expected_minutes_pct = 0.6 # ~29 mpg / 48
expected_games = 75 * (1 - 0.3 * injury_prob) # Injury-adjusted games
estimated_war = bpm * expected_minutes_pct * expected_games * 2.7 / 2000
# Adjust for injury probability (already partially captured above)
adjusted_war = estimated_war * (1 - 0.2 * injury_prob)
year_value = adjusted_war * self.DOLLARS_PER_WAR
yearly_values.append({
'season': proj['target_season'],
'projected_war': adjusted_war,
'value': year_value,
'as_pct_of_cap': year_value / salary_cap * 100
})
total_value += year_value
return {
'player_id': player_id,
'contract_years': contract_years,
'total_value': total_value,
'avg_annual_value': total_value / contract_years if contract_years > 0 else 0,
'yearly_breakdown': yearly_values,
'cap_percentage': (total_value / contract_years / salary_cap * 100)
if contract_years > 0 else 0
}
Summary
Player performance projection combines statistical modeling, domain knowledge about aging and player development, and careful uncertainty quantification. Key principles include:
- Regression to the mean: Extreme performances regress toward average
- Aging effects: Performance changes predictably with age
- Sample size matters: More data leads to more reliable projections
- Context adjustment: Team and role changes affect statistics
- Ensemble methods: Combining models improves accuracy
- Uncertainty quantification: All projections carry uncertainty
Effective projection systems like CARMELO and RAPTOR integrate these principles with sophisticated similarity matching and extensive historical databases. The Marcel method demonstrates that simple, transparent approaches can perform remarkably well.
The practical value of projections extends to contract valuation, trade analysis, and roster construction. However, projections should always be viewed as probabilistic estimates rather than certainties, and decision-makers must account for the full range of possible outcomes.
Key Equations
Regression to Mean: $$\text{Regressed} = r \cdot \text{Observed} + (1-r) \cdot \text{Mean}$$
Marcel Weighted Average: $$\text{Projection} = \frac{5 \cdot Y_1 + 4 \cdot Y_2 + 3 \cdot Y_3}{5 + 4 + 3}$$
Similarity Score: $$S = \exp\left(-\frac{1}{n}\sum_{i=1}^{n} w_i \cdot \left|\frac{x_i - y_i}{\sigma_i}\right|\right)$$
Aging Adjustment: $$\text{Projected} = \text{Current} + \text{AgingCurve}(age_{target}) - \text{AgingCurve}(age_{current})$$
Expected Value with Injury: $$E[V] = P(\text{healthy}) \cdot V_{full} + P(\text{injured}) \cdot V_{reduced}$$
References
- Silver, N. (2015). "CARMELO NBA Player Projections Methodology." FiveThirtyEight.
- Silver, N., & Fischer-Baum, R. (2019). "How Our NBA Predictions Work." FiveThirtyEight.
- Tango, T., Lichtman, M., & Dolphin, A. (2007). The Book: Playing the Percentages in Baseball.
- Kubatko, J., Oliver, D., Pelton, K., & Rosenbaum, D. (2007). "A Starting Point for Analyzing Basketball Statistics."
- Myers, D. (2012). "About Box Plus/Minus." Basketball-Reference.
- Rosenbaum, D. (2004). "Measuring How NBA Players Help Their Teams Win."