NBA Age Curves

Beginner 10 min read 0 views Nov 27, 2025
# NBA Player Age Curves: Understanding Athletic Aging and Performance Projections Age curves are fundamental tools in basketball analytics that model how player performance changes over the course of a career. Understanding these patterns is crucial for player evaluation, contract decisions, and building sustainable rosters. ## Table of Contents 1. [What Are Age Curves?](#what-are-age-curves) 2. [Methodology for Building Age Curves](#methodology) 3. [Peak Ages by Skill Category](#peak-ages) 4. [Position-Specific Aging Patterns](#position-differences) 5. [Decline Patterns and Career Trajectories](#decline-patterns) 6. [Using Age Curves for Projections](#projections) 7. [Python Implementation](#python-code) 8. [R Implementation](#r-code) 9. [Advanced Considerations](#advanced) --- ## What Are Age Curves? {#what-are-age-curves} Age curves represent the typical trajectory of player performance as a function of age. They help answer critical questions: - At what age do players typically peak? - How quickly do players decline after their peak? - Do different skills age differently? - How do aging patterns vary by position? ### Key Concepts **Delta Method**: Tracks individual player changes year-over-year, controlling for selection bias where worse players leave the league earlier. **Survivor Bias**: Traditional averages are inflated at older ages because only the best players remain in the league. **Aging vs. Development**: Separating physical decline from skill improvement and experience gains. --- ## Methodology for Building Age Curves {#methodology} ### The Delta Method The most robust approach tracks how individual players change from one age to the next: ``` Age Adjustment = Average(Player_Performance_Age_X - Player_Performance_Age_(X-1)) ``` This controls for survivor bias by only comparing players who played in consecutive seasons. ### Steps to Build Age Curves 1. **Collect Player-Season Data**: Gather performance metrics for multiple seasons 2. **Calculate Year-Over-Year Changes**: Compute deltas for each player 3. **Group by Age**: Aggregate changes for all players at each age 4. **Apply Cumulative Adjustments**: Build the full curve from cumulative deltas 5. **Normalize to Peak**: Set peak age performance as baseline (100%) ### Data Requirements - Minimum minutes/games played threshold (e.g., 500+ minutes) - Multiple seasons of data (10+ years recommended) - Consistent metric definitions across seasons - Age calculated to decimal precision (date of birth to season midpoint) --- ## Peak Ages by Skill Category {#peak-ages} Different skills peak and decline at different rates: ### Typical Peak Ages | Skill Category | Peak Age | Notes | |---------------|----------|-------| | **Athleticism** | 23-25 | Speed, vertical leap, explosive movements | | **Scoring Volume** | 27-28 | Raw points per game peaks here | | **Shooting Efficiency** | 28-30 | Experience improves shot selection | | **Three-Point Shooting** | 29-31 | Continues improving with experience | | **Passing/Playmaking** | 28-30 | Court vision and decision-making | | **Rebounding** | 26-28 | Positioning improves but athleticism declines | | **Defense** | 27-29 | Balance of athleticism and experience | | **Overall Impact** | 27-29 | Composite of all skills | ### Skill-Specific Patterns **Early Peak Skills**: - Fast break points - Transition defense - Contest rate at rim - Offensive rebounding **Late Peak Skills**: - Free throw percentage - Three-point percentage - Assist-to-turnover ratio - Defensive positioning --- ## Position-Specific Aging Patterns {#position-differences} ### Point Guards - **Peak Age**: 28-30 - **Longevity**: Often highest among positions - **Key Factor**: Shift from athletic to cerebral game - **Decline Pattern**: Gradual, especially for shooters ### Shooting Guards - **Peak Age**: 27-29 - **Longevity**: Above average for good shooters - **Key Factor**: Shooting ability ages well - **Decline Pattern**: Moderate, athleticism matters more than PG ### Small Forwards - **Peak Age**: 26-28 - **Longevity**: Moderate - **Key Factor**: Two-way versatility most affected - **Decline Pattern**: Steeper than guards, especially defensively ### Power Forwards - **Peak Age**: 26-27 - **Longevity**: Varies widely by playstyle - **Key Factor**: Stretch-4s age better than traditional bigs - **Decline Pattern**: Shooting-dependent players last longer ### Centers - **Peak Age**: 25-27 - **Longevity**: Traditionally shortest, improving with modern play - **Key Factor**: Athletic centers decline faster - **Decline Pattern**: Steep for rim-runners, gradual for skilled bigs --- ## Decline Patterns and Career Trajectories {#decline-patterns} ### Typical Decline Rates **Age 27-30**: 0-2% decline per year (plateau phase) **Age 30-33**: 3-5% decline per year (gradual decline) **Age 33-36**: 5-10% decline per year (steep decline) **Age 36+**: 10-15%+ decline per year (sharp decline) ### Career Trajectory Phases 1. **Entry (Age 19-22)**: Rapid improvement, high variance 2. **Development (Age 22-25)**: Continued improvement, stabilizing 3. **Peak (Age 25-29)**: Maximum performance plateau 4. **Gradual Decline (Age 29-33)**: Slow erosion of skills 5. **Steep Decline (Age 33+)**: Accelerating performance loss ### Factors Affecting Decline Rate - **Injury History**: Major injuries accelerate decline - **Playstyle**: Athleticism-dependent players decline faster - **Usage Rate**: High-usage players may decline earlier - **Positional Fit**: Changing role can extend careers - **Skill Development**: Adding new skills (e.g., 3PT shooting) helps --- ## Using Age Curves for Projections {#projections} ### Basic Projection Formula ``` Projected_Performance = Current_Performance × Age_Adjustment_Factor ``` ### Multi-Year Projections For projecting multiple years ahead: ``` Year_N_Performance = Current_Performance × Π(Age_Factors[age:age+N]) ``` ### Regression to Mean Combine age adjustments with regression: ``` Projected = (Weighted_Career_Average × Regression_Weight) + (Last_Season × Recent_Weight) × Age_Factor ``` ### Confidence Intervals - Wider intervals for younger/older players - Narrower intervals during peak years - Account for injury risk (increases with age) - Consider sample size and consistency ### Practical Applications 1. **Contract Valuation**: Estimate future production over contract length 2. **Trade Analysis**: Project remaining value for each player 3. **Draft Strategy**: Compare rookie potential to veteran certainty 4. **Roster Construction**: Balance age distribution for sustained success 5. **Playing Time Allocation**: Anticipate when younger players overtake veterans --- ## Python Implementation {#python-code} ### Complete Age Curve Analysis Pipeline ```python import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import UnivariateSpline from sklearn.preprocessing import StandardScaler import seaborn as sns # Set style for visualizations sns.set_style("whitegrid") plt.rcParams['figure.figsize'] = (12, 8) class AgeAnalyzer: """ NBA Age Curve Analyzer using the delta method to control for survivor bias. """ def __init__(self, min_minutes=500): """ Initialize analyzer with minimum playing time threshold. Parameters: ----------- min_minutes : int Minimum minutes played to include a player-season """ self.min_minutes = min_minutes self.age_curve = None self.peak_age = None def load_data(self, filepath): """ Load player-season data from CSV. Expected columns: player_id, season, age, minutes, [performance_metrics] """ self.df = pd.read_csv(filepath) # Filter by minimum minutes self.df = self.df[self.df['minutes'] >= self.min_minutes].copy() # Calculate age to decimal precision if birth_date available if 'birth_date' in self.df.columns and 'season' in self.df.columns: self.df['age_decimal'] = self.calculate_age_decimal( self.df['birth_date'], self.df['season'] ) # Sort by player and season self.df.sort_values(['player_id', 'season'], inplace=True) print(f"Loaded {len(self.df)} player-seasons") print(f"Age range: {self.df['age'].min()} to {self.df['age'].max()}") return self.df @staticmethod def calculate_age_decimal(birth_dates, seasons): """ Calculate precise age at season midpoint (typically January 1st). """ birth_dates = pd.to_datetime(birth_dates) season_midpoints = pd.to_datetime(seasons.astype(str) + '-01-01') age_days = (season_midpoints - birth_dates).dt.days return age_days / 365.25 def calculate_deltas(self, metric='per'): """ Calculate year-over-year changes for each player using delta method. Parameters: ----------- metric : str Performance metric to analyze (e.g., 'per', 'bpm', 'ws_per_48') """ # Create a column for next season's value self.df[f'{metric}_next'] = self.df.groupby('player_id')[metric].shift(-1) self.df['age_next'] = self.df.groupby('player_id')['age'].shift(-1) # Calculate delta (change from this season to next) self.df[f'{metric}_delta'] = self.df[f'{metric}_next'] - self.df[metric] # Only keep rows where player played consecutive seasons valid_deltas = self.df[ (self.df['age_next'] == self.df['age'] + 1) & (self.df[f'{metric}_delta'].notna()) ].copy() print(f"Calculated {len(valid_deltas)} valid year-over-year changes") return valid_deltas def build_age_curve(self, metric='per', smoothing=0.3): """ Build age curve using delta method and spline smoothing. Parameters: ----------- metric : str Performance metric to analyze smoothing : float Smoothing parameter for spline (0 = no smoothing, higher = more smooth) """ # Calculate deltas deltas = self.calculate_deltas(metric) # Group by age and calculate average delta age_deltas = deltas.groupby('age').agg({ f'{metric}_delta': ['mean', 'std', 'count'] }).reset_index() age_deltas.columns = ['age', 'delta_mean', 'delta_std', 'count'] # Filter ages with sufficient sample size age_deltas = age_deltas[age_deltas['count'] >= 10].copy() # Build cumulative curve starting from age 19 age_deltas.sort_values('age', inplace=True) age_deltas['cumulative_change'] = age_deltas['delta_mean'].cumsum() # Normalize to peak age (typically 27-28) peak_idx = age_deltas['cumulative_change'].idxmax() peak_value = age_deltas.loc[peak_idx, 'cumulative_change'] self.peak_age = age_deltas.loc[peak_idx, 'age'] age_deltas['normalized_performance'] = ( 100 + (age_deltas['cumulative_change'] - peak_value) * 10 ) # Apply smoothing with spline if smoothing > 0: spline = UnivariateSpline( age_deltas['age'], age_deltas['normalized_performance'], s=smoothing ) age_deltas['smoothed_performance'] = spline(age_deltas['age']) else: age_deltas['smoothed_performance'] = age_deltas['normalized_performance'] self.age_curve = age_deltas print(f"Peak age: {self.peak_age}") print(f"Age curve built for ages {age_deltas['age'].min()} to {age_deltas['age'].max()}") return self.age_curve def plot_age_curve(self, metric='per', save_path=None): """ Visualize the age curve with confidence intervals. """ if self.age_curve is None: raise ValueError("Must build age curve first using build_age_curve()") fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10)) # Plot 1: Age Curve ax1.plot( self.age_curve['age'], self.age_curve['smoothed_performance'], linewidth=3, color='#1f77b4', label='Age Curve (Smoothed)' ) ax1.scatter( self.age_curve['age'], self.age_curve['normalized_performance'], alpha=0.5, s=50, color='#ff7f0e', label='Actual Data' ) # Mark peak age peak_performance = self.age_curve[ self.age_curve['age'] == self.peak_age ]['smoothed_performance'].values[0] ax1.axvline( self.peak_age, color='red', linestyle='--', alpha=0.7, label=f'Peak Age: {self.peak_age:.0f}' ) ax1.axhline( 100, color='gray', linestyle=':', alpha=0.5 ) ax1.set_xlabel('Age', fontsize=12, fontweight='bold') ax1.set_ylabel('Relative Performance (%)', fontsize=12, fontweight='bold') ax1.set_title( f'NBA Age Curve - {metric.upper()}', fontsize=14, fontweight='bold' ) ax1.legend(loc='best') ax1.grid(True, alpha=0.3) # Plot 2: Year-over-Year Change ax2.bar( self.age_curve['age'], self.age_curve['delta_mean'], color=['green' if x >= 0 else 'red' for x in self.age_curve['delta_mean']], alpha=0.6 ) # Add error bars ax2.errorbar( self.age_curve['age'], self.age_curve['delta_mean'], yerr=self.age_curve['delta_std'] / np.sqrt(self.age_curve['count']), fmt='none', ecolor='black', alpha=0.3, capsize=3 ) ax2.axhline(0, color='black', linestyle='-', linewidth=1) ax2.set_xlabel('Age', fontsize=12, fontweight='bold') ax2.set_ylabel('Average Year-over-Year Change', fontsize=12, fontweight='bold') ax2.set_title('Annual Performance Change by Age', fontsize=14, fontweight='bold') ax2.grid(True, alpha=0.3) plt.tight_layout() if save_path: plt.savefig(save_path, dpi=300, bbox_inches='tight') print(f"Plot saved to {save_path}") plt.show() return fig def get_age_adjustment(self, from_age, to_age): """ Get performance adjustment factor from one age to another. Parameters: ----------- from_age : float Current age to_age : float Target age for projection Returns: -------- float Multiplicative adjustment factor """ if self.age_curve is None: raise ValueError("Must build age curve first") # Get performance at both ages (interpolate if needed) from_perf = np.interp( from_age, self.age_curve['age'], self.age_curve['smoothed_performance'] ) to_perf = np.interp( to_age, self.age_curve['age'], self.age_curve['smoothed_performance'] ) adjustment = to_perf / from_perf return adjustment def project_performance(self, current_age, current_performance, target_age, regression_factor=0.3): """ Project future performance using age curve and regression to mean. Parameters: ----------- current_age : float Player's current age current_performance : float Player's current performance level target_age : float Age to project to regression_factor : float Weight of regression to mean (0 = no regression, 1 = full regression) Returns: -------- dict Projection results including point estimate and confidence interval """ # Get age adjustment age_factor = self.get_age_adjustment(current_age, target_age) # Calculate league average for regression league_avg = self.df['per'].mean() if 'per' in self.df.columns else 15.0 # Apply regression to mean regressed_performance = ( current_performance * (1 - regression_factor) + league_avg * regression_factor ) # Apply age adjustment projected = regressed_performance * age_factor # Estimate confidence interval (simplified approach) age_diff = abs(target_age - current_age) uncertainty = 0.05 * age_diff # 5% uncertainty per year lower_bound = projected * (1 - uncertainty) upper_bound = projected * (1 + uncertainty) return { 'projected_performance': projected, 'age_adjustment_factor': age_factor, 'lower_bound': lower_bound, 'upper_bound': upper_bound, 'confidence_level': max(0.5, 0.95 - age_diff * 0.05) } class PositionAgeCurves: """ Analyze age curves separately by position. """ def __init__(self, df, positions=['PG', 'SG', 'SF', 'PF', 'C']): """ Initialize with player data including position. """ self.df = df self.positions = positions self.curves = {} def build_position_curves(self, metric='per'): """ Build separate age curves for each position. """ for pos in self.positions: pos_df = self.df[self.df['position'] == pos].copy() analyzer = AgeAnalyzer() analyzer.df = pos_df curve = analyzer.build_age_curve(metric) self.curves[pos] = { 'curve': curve, 'peak_age': analyzer.peak_age } return self.curves def plot_position_comparison(self, save_path=None): """ Plot age curves for all positions on same chart. """ fig, ax = plt.subplots(figsize=(14, 8)) colors = { 'PG': '#1f77b4', 'SG': '#ff7f0e', 'SF': '#2ca02c', 'PF': '#d62728', 'C': '#9467bd' } for pos in self.positions: if pos in self.curves: curve = self.curves[pos]['curve'] peak = self.curves[pos]['peak_age'] ax.plot( curve['age'], curve['smoothed_performance'], linewidth=3, color=colors[pos], label=f'{pos} (Peak: {peak:.0f})' ) ax.axhline(100, color='gray', linestyle=':', alpha=0.5) ax.set_xlabel('Age', fontsize=12, fontweight='bold') ax.set_ylabel('Relative Performance (%)', fontsize=12, fontweight='bold') ax.set_title( 'NBA Age Curves by Position', fontsize=14, fontweight='bold' ) ax.legend(loc='best', fontsize=10) ax.grid(True, alpha=0.3) plt.tight_layout() if save_path: plt.savefig(save_path, dpi=300, bbox_inches='tight') plt.show() return fig def get_peak_age_summary(self): """ Return summary of peak ages by position. """ summary = pd.DataFrame([ { 'position': pos, 'peak_age': self.curves[pos]['peak_age'], 'peak_performance': self.curves[pos]['curve'][ self.curves[pos]['curve']['age'] == self.curves[pos]['peak_age'] ]['smoothed_performance'].values[0] } for pos in self.positions if pos in self.curves ]) return summary # Example usage if __name__ == "__main__": # Example 1: Basic Age Curve Analysis print("=" * 60) print("EXAMPLE 1: Basic Age Curve Analysis") print("=" * 60) # Create sample data (in practice, load from CSV or database) np.random.seed(42) sample_data = [] for player_id in range(1, 201): # Simulate career from age 20 to retirement career_length = np.random.randint(6, 16) start_age = np.random.randint(19, 23) # Simulate skill level with peak around 27 base_skill = np.random.normal(15, 3) for year in range(career_length): age = start_age + year # Age curve effect if age < 27: age_effect = 1 + (27 - age) * 0.02 else: age_effect = 1 - (age - 27) * 0.03 per = base_skill * age_effect + np.random.normal(0, 1.5) minutes = max(200, 2000 - abs(age - 27) * 50 + np.random.normal(0, 200)) sample_data.append({ 'player_id': player_id, 'season': 2010 + year, 'age': age, 'minutes': minutes, 'per': per, 'position': np.random.choice(['PG', 'SG', 'SF', 'PF', 'C']) }) df = pd.DataFrame(sample_data) # Initialize analyzer analyzer = AgeAnalyzer(min_minutes=500) analyzer.df = df # Build age curve age_curve = analyzer.build_age_curve(metric='per', smoothing=0.5) print("\nAge Curve Summary:") print(age_curve[['age', 'normalized_performance', 'delta_mean', 'count']].to_string(index=False)) # Plot results analyzer.plot_age_curve(metric='per') # Example 2: Player Projection print("\n" + "=" * 60) print("EXAMPLE 2: Multi-Year Player Projection") print("=" * 60) current_age = 25 current_per = 20.5 print(f"\nPlayer Stats: Age {current_age}, PER {current_per:.1f}") print("\nProjections:") print("-" * 60) for target_age in range(26, 36): projection = analyzer.project_performance( current_age=current_age, current_performance=current_per, target_age=target_age, regression_factor=0.25 ) print(f"Age {target_age}: {projection['projected_performance']:.2f} PER " f"({projection['lower_bound']:.2f} - {projection['upper_bound']:.2f}) " f"[{projection['confidence_level']:.0%} confidence]") # Example 3: Position-Specific Analysis print("\n" + "=" * 60) print("EXAMPLE 3: Position-Specific Age Curves") print("=" * 60) pos_analyzer = PositionAgeCurves(df) pos_analyzer.build_position_curves(metric='per') print("\nPeak Ages by Position:") peak_summary = pos_analyzer.get_peak_age_summary() print(peak_summary.to_string(index=False)) pos_analyzer.plot_position_comparison() # Example 4: Skill-Specific Aging print("\n" + "=" * 60) print("EXAMPLE 4: Skill-Specific Age Curves") print("=" * 60) # Simulate different skills with different aging patterns skills_data = df.copy() # Shooting ages well skills_data['three_pt_pct'] = 0.30 + skills_data['age'] * 0.005 + np.random.normal(0, 0.03, len(skills_data)) skills_data['three_pt_pct'] = skills_data['three_pt_pct'].clip(0.20, 0.45) # Athleticism declines early skills_data['athleticism'] = 100 - (skills_data['age'] - 23) * 2.5 + np.random.normal(0, 5, len(skills_data)) skills_data['athleticism'] = skills_data['athleticism'].clip(50, 100) fig, axes = plt.subplots(1, 2, figsize=(16, 6)) # Plot skill comparisons for skill, ax in zip(['three_pt_pct', 'athleticism'], axes): skill_by_age = skills_data.groupby('age')[skill].mean() ax.plot(skill_by_age.index, skill_by_age.values, linewidth=3, marker='o') ax.set_xlabel('Age', fontsize=12, fontweight='bold') ax.set_ylabel(skill.replace('_', ' ').title(), fontsize=12, fontweight='bold') ax.set_title(f'{skill.replace("_", " ").title()} vs Age', fontsize=14, fontweight='bold') ax.grid(True, alpha=0.3) plt.tight_layout() plt.show() print("\nSkill-specific analysis complete!") print("- Three-point shooting improves with age and experience") print("- Athleticism peaks early and declines steadily") ``` ### Simple Projection Function ```python def simple_age_projection(current_stats, current_age, target_age, position='ALL', metric='per'): """ Quick projection using pre-built age curve coefficients. Age adjustment factors based on historical NBA data (2000-2023): - Relative performance indexed to age 27 peak """ # Simplified age factors (indexed to 100 at peak) age_factors = { 19: 85, 20: 88, 21: 91, 22: 94, 23: 96, 24: 98, 25: 99, 26: 100, 27: 100, 28: 100, 29: 98, 30: 96, 31: 93, 32: 90, 33: 86, 34: 82, 35: 77, 36: 72, 37: 67, 38: 62, 39: 57, 40: 52 } # Position adjustments (guards age better) position_modifiers = { 'PG': 1.02, # Age slower 'SG': 1.01, 'SF': 1.00, # Baseline 'PF': 0.99, 'C': 0.98, # Age faster 'ALL': 1.00 } current_factor = age_factors.get(int(current_age), 100) target_factor = age_factors.get(int(target_age), 100) position_mod = position_modifiers.get(position, 1.00) # Calculate adjustment adjustment = (target_factor / current_factor) * position_mod # Project stats projected_stats = {k: v * adjustment for k, v in current_stats.items()} return projected_stats, adjustment # Example usage player_stats = { 'points': 24.5, 'rebounds': 7.2, 'assists': 5.8, 'per': 22.3 } projected, factor = simple_age_projection( current_stats=player_stats, current_age=26, target_age=31, position='PG' ) print(f"Age 26 Stats: {player_stats}") print(f"Age 31 Projection: {projected}") print(f"Adjustment Factor: {factor:.3f}") ``` --- ## R Implementation {#r-code} ### Complete Age Curve Analysis in R ```r # Load required libraries library(tidyverse) library(mgcv) # For GAM smoothing library(ggplot2) library(gridExtra) library(scales) # Age Curve Analyzer Class (R6) library(R6) AgeAnalyzer <- R6Class("AgeAnalyzer", public = list( data = NULL, age_curve = NULL, peak_age = NULL, min_minutes = NULL, initialize = function(min_minutes = 500) { self$min_minutes <- min_minutes message("Age Analyzer initialized") }, load_data = function(filepath) { # Load player-season data self$data <- read_csv(filepath) %>% filter(minutes >= self$min_minutes) %>% arrange(player_id, season) message(sprintf("Loaded %d player-seasons", nrow(self$data))) message(sprintf("Age range: %d to %d", min(self$data$age), max(self$data$age))) return(self$data) }, calculate_deltas = function(metric = "per") { # Calculate year-over-year changes using delta method deltas <- self$data %>% group_by(player_id) %>% arrange(season) %>% mutate( metric_next = lead(!!sym(metric)), age_next = lead(age), metric_delta = metric_next - !!sym(metric), consecutive = (age_next == age + 1) ) %>% ungroup() %>% filter(consecutive == TRUE, !is.na(metric_delta)) message(sprintf("Calculated %d valid year-over-year changes", nrow(deltas))) return(deltas) }, build_age_curve = function(metric = "per", smooth_method = "gam") { # Build age curve using delta method # Calculate deltas deltas <- self$calculate_deltas(metric) # Aggregate by age age_deltas <- deltas %>% group_by(age) %>% summarise( delta_mean = mean(metric_delta, na.rm = TRUE), delta_sd = sd(metric_delta, na.rm = TRUE), delta_se = delta_sd / sqrt(n()), count = n(), .groups = 'drop' ) %>% filter(count >= 10) %>% arrange(age) %>% mutate( cumulative_change = cumsum(delta_mean) ) # Normalize to peak age peak_idx <- which.max(age_deltas$cumulative_change) peak_value <- age_deltas$cumulative_change[peak_idx] self$peak_age <- age_deltas$age[peak_idx] age_deltas <- age_deltas %>% mutate( normalized_performance = 100 + (cumulative_change - peak_value) * 10 ) # Apply smoothing if (smooth_method == "gam") { gam_model <- gam(normalized_performance ~ s(age, k = 8), data = age_deltas) age_deltas$smoothed_performance <- predict(gam_model, newdata = age_deltas) } else if (smooth_method == "loess") { loess_model <- loess(normalized_performance ~ age, data = age_deltas, span = 0.3) age_deltas$smoothed_performance <- predict(loess_model, newdata = age_deltas) } else { age_deltas$smoothed_performance <- age_deltas$normalized_performance } self$age_curve <- age_deltas message(sprintf("Peak age: %.0f", self$peak_age)) message(sprintf("Age curve built for ages %d to %d", min(age_deltas$age), max(age_deltas$age))) return(self$age_curve) }, plot_age_curve = function(metric = "per", save_path = NULL) { # Visualize age curve if (is.null(self$age_curve)) { stop("Must build age curve first using build_age_curve()") } # Plot 1: Age Curve p1 <- ggplot(self$age_curve, aes(x = age)) + geom_line(aes(y = smoothed_performance), color = "#1f77b4", linewidth = 1.5) + geom_point(aes(y = normalized_performance), color = "#ff7f0e", alpha = 0.6, size = 3) + geom_vline(xintercept = self$peak_age, linetype = "dashed", color = "red", alpha = 0.7) + geom_hline(yintercept = 100, linetype = "dotted", color = "gray", alpha = 0.5) + labs( title = sprintf("NBA Age Curve - %s", toupper(metric)), subtitle = sprintf("Peak Age: %.0f", self$peak_age), x = "Age", y = "Relative Performance (%)" ) + theme_minimal() + theme( plot.title = element_text(size = 16, face = "bold"), axis.title = element_text(size = 12, face = "bold") ) # Plot 2: Year-over-Year Change p2 <- ggplot(self$age_curve, aes(x = age, y = delta_mean)) + geom_col(aes(fill = delta_mean >= 0), alpha = 0.7) + geom_errorbar(aes(ymin = delta_mean - delta_se, ymax = delta_mean + delta_se), width = 0.3, alpha = 0.5) + geom_hline(yintercept = 0, linetype = "solid", color = "black") + scale_fill_manual(values = c("TRUE" = "green", "FALSE" = "red")) + labs( title = "Annual Performance Change by Age", x = "Age", y = "Average Year-over-Year Change" ) + theme_minimal() + theme( plot.title = element_text(size = 16, face = "bold"), axis.title = element_text(size = 12, face = "bold"), legend.position = "none" ) # Combine plots combined <- grid.arrange(p1, p2, ncol = 1) if (!is.null(save_path)) { ggsave(save_path, combined, width = 12, height = 10, dpi = 300) message(sprintf("Plot saved to %s", save_path)) } return(combined) }, get_age_adjustment = function(from_age, to_age) { # Get performance adjustment factor if (is.null(self$age_curve)) { stop("Must build age curve first") } # Interpolate performance at both ages from_perf <- approx( self$age_curve$age, self$age_curve$smoothed_performance, xout = from_age )$y to_perf <- approx( self$age_curve$age, self$age_curve$smoothed_performance, xout = to_age )$y adjustment <- to_perf / from_perf return(adjustment) }, project_performance = function(current_age, current_performance, target_age, regression_factor = 0.3) { # Project future performance # Age adjustment age_factor <- self$get_age_adjustment(current_age, target_age) # Regression to mean league_avg <- mean(self$data$per, na.rm = TRUE) regressed_performance <- current_performance * (1 - regression_factor) + league_avg * regression_factor # Apply age adjustment projected <- regressed_performance * age_factor # Confidence interval age_diff <- abs(target_age - current_age) uncertainty <- 0.05 * age_diff list( projected_performance = projected, age_adjustment_factor = age_factor, lower_bound = projected * (1 - uncertainty), upper_bound = projected * (1 + uncertainty), confidence_level = max(0.5, 0.95 - age_diff * 0.05) ) } ) ) # Position-Specific Age Curves analyze_position_curves <- function(data, positions = c('PG', 'SG', 'SF', 'PF', 'C'), metric = 'per') { # Build age curves for each position curves_list <- list() for (pos in positions) { pos_data <- data %>% filter(position == pos) if (nrow(pos_data) < 100) { message(sprintf("Skipping %s - insufficient data", pos)) next } analyzer <- AgeAnalyzer$new() analyzer$data <- pos_data curve <- analyzer$build_age_curve(metric) curves_list[[pos]] <- list( curve = curve, peak_age = analyzer$peak_age ) } return(curves_list) } plot_position_comparison <- function(curves_list, save_path = NULL) { # Plot all position curves together # Combine all curves into one dataframe combined_curves <- bind_rows( lapply(names(curves_list), function(pos) { curves_list[[pos]]$curve %>% mutate(position = pos) }) ) # Create plot p <- ggplot(combined_curves, aes(x = age, y = smoothed_performance, color = position, group = position)) + geom_line(linewidth = 1.5) + geom_hline(yintercept = 100, linetype = "dotted", color = "gray", alpha = 0.5) + scale_color_manual( values = c( 'PG' = '#1f77b4', 'SG' = '#ff7f0e', 'SF' = '#2ca02c', 'PF' = '#d62728', 'C' = '#9467bd' ), labels = function(x) { peak_ages <- sapply(x, function(pos) { sprintf("%s (Peak: %.0f)", pos, curves_list[[pos]]$peak_age) }) return(peak_ages) } ) + labs( title = "NBA Age Curves by Position", x = "Age", y = "Relative Performance (%)", color = "Position" ) + theme_minimal() + theme( plot.title = element_text(size = 16, face = "bold"), axis.title = element_text(size = 12, face = "bold"), legend.position = "right" ) if (!is.null(save_path)) { ggsave(save_path, p, width = 14, height = 8, dpi = 300) } print(p) return(p) } # Example Usage if (interactive()) { # Generate sample data set.seed(42) sample_data <- tibble() for (player_id in 1:200) { career_length <- sample(6:16, 1) start_age <- sample(19:23, 1) base_skill <- rnorm(1, mean = 15, sd = 3) position <- sample(c('PG', 'SG', 'SF', 'PF', 'C'), 1) for (year in 0:(career_length - 1)) { age <- start_age + year # Age curve effect if (age < 27) { age_effect <- 1 + (27 - age) * 0.02 } else { age_effect <- 1 - (age - 27) * 0.03 } per <- base_skill * age_effect + rnorm(1, 0, 1.5) minutes <- max(200, 2000 - abs(age - 27) * 50 + rnorm(1, 0, 200)) sample_data <- bind_rows(sample_data, tibble( player_id = player_id, season = 2010 + year, age = age, minutes = minutes, per = per, position = position )) } } # Example 1: Basic Age Curve cat("\n", strrep("=", 60), "\n") cat("EXAMPLE 1: Basic Age Curve Analysis\n") cat(strrep("=", 60), "\n\n") analyzer <- AgeAnalyzer$new(min_minutes = 500) analyzer$data <- sample_data age_curve <- analyzer$build_age_curve(metric = "per") print(age_curve) analyzer$plot_age_curve(metric = "per") # Example 2: Player Projection cat("\n", strrep("=", 60), "\n") cat("EXAMPLE 2: Multi-Year Player Projection\n") cat(strrep("=", 60), "\n\n") current_age <- 25 current_per <- 20.5 cat(sprintf("Player Stats: Age %d, PER %.1f\n\n", current_age, current_per)) cat("Projections:\n") cat(strrep("-", 60), "\n") for (target_age in 26:35) { projection <- analyzer$project_performance( current_age = current_age, current_performance = current_per, target_age = target_age, regression_factor = 0.25 ) cat(sprintf("Age %d: %.2f PER (%.2f - %.2f) [%.0f%% confidence]\n", target_age, projection$projected_performance, projection$lower_bound, projection$upper_bound, projection$confidence_level * 100)) } # Example 3: Position-Specific Analysis cat("\n", strrep("=", 60), "\n") cat("EXAMPLE 3: Position-Specific Age Curves\n") cat(strrep("=", 60), "\n\n") pos_curves <- analyze_position_curves(sample_data, metric = "per") cat("Peak Ages by Position:\n") peak_summary <- tibble( Position = names(pos_curves), Peak_Age = sapply(pos_curves, function(x) x$peak_age) ) print(peak_summary) plot_position_comparison(pos_curves) } # Simple projection function simple_age_projection <- function(current_stats, current_age, target_age, position = 'ALL') { # Quick projection using pre-built coefficients age_factors <- c( '19' = 85, '20' = 88, '21' = 91, '22' = 94, '23' = 96, '24' = 98, '25' = 99, '26' = 100, '27' = 100, '28' = 100, '29' = 98, '30' = 96, '31' = 93, '32' = 90, '33' = 86, '34' = 82, '35' = 77, '36' = 72, '37' = 67, '38' = 62, '39' = 57, '40' = 52 ) position_modifiers <- c( 'PG' = 1.02, 'SG' = 1.01, 'SF' = 1.00, 'PF' = 0.99, 'C' = 0.98, 'ALL' = 1.00 ) current_factor <- age_factors[as.character(round(current_age))] target_factor <- age_factors[as.character(round(target_age))] position_mod <- position_modifiers[position] adjustment <- (target_factor / current_factor) * position_mod projected_stats <- lapply(current_stats, function(x) x * adjustment) list( projected = projected_stats, adjustment_factor = adjustment ) } # Example player_stats <- list( points = 24.5, rebounds = 7.2, assists = 5.8, per = 22.3 ) result <- simple_age_projection( current_stats = player_stats, current_age = 26, target_age = 31, position = 'PG' ) cat("Current Stats:", paste(names(player_stats), unlist(player_stats), sep = "=", collapse = ", "), "\n") cat("Projected Stats:", paste(names(result$projected), unlist(result$projected), sep = "=", collapse = ", "), "\n") cat("Adjustment Factor:", result$adjustment_factor, "\n") ``` --- ## Advanced Considerations {#advanced} ### Survivor Bias Correction The delta method automatically corrects for survivor bias, but additional considerations: 1. **Minutes-weighted adjustments**: Weight by playing time for more accurate curves 2. **Role changes**: Account for players changing positions or roles 3. **League evolution**: Adjust for era effects and rule changes ### Individual Variation Not all players follow the average curve: - **Elite players** often peak later and decline more gradually - **Athletic specialists** peak earlier but decline faster - **Injury history** dramatically affects individual trajectories - **Playing style adaptations** can extend careers ### Modern NBA Considerations Recent trends affecting age curves: 1. **Three-point revolution**: Shooting aging more important now 2. **Load management**: May extend peak years 3. **Sports science**: Better conditioning and injury prevention 4. **Versatility premium**: Multi-skilled players age better 5. **Pace and space**: Less physical play may benefit older players ### Contract Valuation Use age curves to estimate surplus value: ``` Contract Value = Σ(Projected_WAR[year] × $/WAR - Salary[year]) ``` Account for: - Risk of injury (increases with age) - Replacement level rising over contract - Team option years vs. guaranteed money - Trade value depreciation ### Limitations and Cautions 1. **Small sample sizes** at extreme ages 2. **Selection bias** for players who change teams/roles 3. **Era effects** when using historical data 4. **Individual variance** can be high 5. **Injury risk** not fully captured by age alone --- ## Best Practices ### For Analysis 1. Use minimum playing time thresholds (500+ minutes) 2. Calculate age to decimal precision 3. Apply appropriate smoothing (GAM or spline) 4. Validate with out-of-sample testing 5. Update curves regularly with new data ### For Projections 1. Combine age adjustments with regression to mean 2. Provide confidence intervals 3. Account for position and playstyle 4. Consider recent performance trends 5. Adjust for injury history ### For Decision-Making 1. Don't over-rely on average curves for elite players 2. Consider role changes and team fit 3. Weight recent seasons more heavily for older players 4. Factor in contract structure and team timeline 5. Use ensemble of projection methods --- ## Summary Age curves are powerful tools for understanding player development and decline: - **Typical peak**: Ages 27-29 for overall performance - **Skill variation**: Different abilities age at different rates - **Position matters**: Guards age better than big men on average - **Individual paths vary**: Elite players and specific skills deviate from average - **Projection applications**: Contract valuation, roster construction, playing time decisions By properly implementing age curve analysis with methods like the delta approach, teams can make more informed decisions about player acquisition, development, and long-term roster planning. --- ## References and Further Reading ### Academic Papers - Vaci et al. (2019) - "Large Data and Bayesian Modeling—Aging Curves of NBA Players" - Bradbury (2009) - "Peak Athletic Performance and Ageing" - Wakim & Jin (2014) - "Fitting NBA Career Trajectories Using Nonparametric Models" ### Industry Resources - Basketball-Reference.com - Historical player data - Duncd On Podcast - Age curve discussions and applications - Cleaning the Glass - Modern age curve analysis - FiveThirtyEight - CARMELO projection system (uses age curves) ### Tools and Data - NBA Stats API - Official player statistics - Basketball Reference Scraper - Python package for historical data - nbastatR - R package for NBA data access - Synergy Sports - Advanced video-based metrics

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.