NBA Age Curves
Beginner
10 min read
1 views
Nov 27, 2025
# NBA Player Age Curves: Understanding Athletic Aging and Performance Projections
Age curves are fundamental tools in basketball analytics that model how player performance changes over the course of a career. Understanding these patterns is crucial for player evaluation, contract decisions, and building sustainable rosters.
## Table of Contents
1. [What Are Age Curves?](#what-are-age-curves)
2. [Methodology for Building Age Curves](#methodology)
3. [Peak Ages by Skill Category](#peak-ages)
4. [Position-Specific Aging Patterns](#position-differences)
5. [Decline Patterns and Career Trajectories](#decline-patterns)
6. [Using Age Curves for Projections](#projections)
7. [Python Implementation](#python-code)
8. [R Implementation](#r-code)
9. [Advanced Considerations](#advanced)
---
## What Are Age Curves? {#what-are-age-curves}
Age curves represent the typical trajectory of player performance as a function of age. They help answer critical questions:
- At what age do players typically peak?
- How quickly do players decline after their peak?
- Do different skills age differently?
- How do aging patterns vary by position?
### Key Concepts
**Delta Method**: Tracks individual player changes year-over-year, controlling for selection bias where worse players leave the league earlier.
**Survivor Bias**: Traditional averages are inflated at older ages because only the best players remain in the league.
**Aging vs. Development**: Separating physical decline from skill improvement and experience gains.
---
## Methodology for Building Age Curves {#methodology}
### The Delta Method
The most robust approach tracks how individual players change from one age to the next:
```
Age Adjustment = Average(Player_Performance_Age_X - Player_Performance_Age_(X-1))
```
This controls for survivor bias by only comparing players who played in consecutive seasons.
### Steps to Build Age Curves
1. **Collect Player-Season Data**: Gather performance metrics for multiple seasons
2. **Calculate Year-Over-Year Changes**: Compute deltas for each player
3. **Group by Age**: Aggregate changes for all players at each age
4. **Apply Cumulative Adjustments**: Build the full curve from cumulative deltas
5. **Normalize to Peak**: Set peak age performance as baseline (100%)
### Data Requirements
- Minimum minutes/games played threshold (e.g., 500+ minutes)
- Multiple seasons of data (10+ years recommended)
- Consistent metric definitions across seasons
- Age calculated to decimal precision (date of birth to season midpoint)
---
## Peak Ages by Skill Category {#peak-ages}
Different skills peak and decline at different rates:
### Typical Peak Ages
| Skill Category | Peak Age | Notes |
|---------------|----------|-------|
| **Athleticism** | 23-25 | Speed, vertical leap, explosive movements |
| **Scoring Volume** | 27-28 | Raw points per game peaks here |
| **Shooting Efficiency** | 28-30 | Experience improves shot selection |
| **Three-Point Shooting** | 29-31 | Continues improving with experience |
| **Passing/Playmaking** | 28-30 | Court vision and decision-making |
| **Rebounding** | 26-28 | Positioning improves but athleticism declines |
| **Defense** | 27-29 | Balance of athleticism and experience |
| **Overall Impact** | 27-29 | Composite of all skills |
### Skill-Specific Patterns
**Early Peak Skills**:
- Fast break points
- Transition defense
- Contest rate at rim
- Offensive rebounding
**Late Peak Skills**:
- Free throw percentage
- Three-point percentage
- Assist-to-turnover ratio
- Defensive positioning
---
## Position-Specific Aging Patterns {#position-differences}
### Point Guards
- **Peak Age**: 28-30
- **Longevity**: Often highest among positions
- **Key Factor**: Shift from athletic to cerebral game
- **Decline Pattern**: Gradual, especially for shooters
### Shooting Guards
- **Peak Age**: 27-29
- **Longevity**: Above average for good shooters
- **Key Factor**: Shooting ability ages well
- **Decline Pattern**: Moderate, athleticism matters more than PG
### Small Forwards
- **Peak Age**: 26-28
- **Longevity**: Moderate
- **Key Factor**: Two-way versatility most affected
- **Decline Pattern**: Steeper than guards, especially defensively
### Power Forwards
- **Peak Age**: 26-27
- **Longevity**: Varies widely by playstyle
- **Key Factor**: Stretch-4s age better than traditional bigs
- **Decline Pattern**: Shooting-dependent players last longer
### Centers
- **Peak Age**: 25-27
- **Longevity**: Traditionally shortest, improving with modern play
- **Key Factor**: Athletic centers decline faster
- **Decline Pattern**: Steep for rim-runners, gradual for skilled bigs
---
## Decline Patterns and Career Trajectories {#decline-patterns}
### Typical Decline Rates
**Age 27-30**: 0-2% decline per year (plateau phase)
**Age 30-33**: 3-5% decline per year (gradual decline)
**Age 33-36**: 5-10% decline per year (steep decline)
**Age 36+**: 10-15%+ decline per year (sharp decline)
### Career Trajectory Phases
1. **Entry (Age 19-22)**: Rapid improvement, high variance
2. **Development (Age 22-25)**: Continued improvement, stabilizing
3. **Peak (Age 25-29)**: Maximum performance plateau
4. **Gradual Decline (Age 29-33)**: Slow erosion of skills
5. **Steep Decline (Age 33+)**: Accelerating performance loss
### Factors Affecting Decline Rate
- **Injury History**: Major injuries accelerate decline
- **Playstyle**: Athleticism-dependent players decline faster
- **Usage Rate**: High-usage players may decline earlier
- **Positional Fit**: Changing role can extend careers
- **Skill Development**: Adding new skills (e.g., 3PT shooting) helps
---
## Using Age Curves for Projections {#projections}
### Basic Projection Formula
```
Projected_Performance = Current_Performance × Age_Adjustment_Factor
```
### Multi-Year Projections
For projecting multiple years ahead:
```
Year_N_Performance = Current_Performance × Π(Age_Factors[age:age+N])
```
### Regression to Mean
Combine age adjustments with regression:
```
Projected = (Weighted_Career_Average × Regression_Weight) +
(Last_Season × Recent_Weight) ×
Age_Factor
```
### Confidence Intervals
- Wider intervals for younger/older players
- Narrower intervals during peak years
- Account for injury risk (increases with age)
- Consider sample size and consistency
### Practical Applications
1. **Contract Valuation**: Estimate future production over contract length
2. **Trade Analysis**: Project remaining value for each player
3. **Draft Strategy**: Compare rookie potential to veteran certainty
4. **Roster Construction**: Balance age distribution for sustained success
5. **Playing Time Allocation**: Anticipate when younger players overtake veterans
---
## Python Implementation {#python-code}
### Complete Age Curve Analysis Pipeline
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
from sklearn.preprocessing import StandardScaler
import seaborn as sns
# Set style for visualizations
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
class AgeAnalyzer:
"""
NBA Age Curve Analyzer using the delta method to control for survivor bias.
"""
def __init__(self, min_minutes=500):
"""
Initialize analyzer with minimum playing time threshold.
Parameters:
-----------
min_minutes : int
Minimum minutes played to include a player-season
"""
self.min_minutes = min_minutes
self.age_curve = None
self.peak_age = None
def load_data(self, filepath):
"""
Load player-season data from CSV.
Expected columns: player_id, season, age, minutes, [performance_metrics]
"""
self.df = pd.read_csv(filepath)
# Filter by minimum minutes
self.df = self.df[self.df['minutes'] >= self.min_minutes].copy()
# Calculate age to decimal precision if birth_date available
if 'birth_date' in self.df.columns and 'season' in self.df.columns:
self.df['age_decimal'] = self.calculate_age_decimal(
self.df['birth_date'],
self.df['season']
)
# Sort by player and season
self.df.sort_values(['player_id', 'season'], inplace=True)
print(f"Loaded {len(self.df)} player-seasons")
print(f"Age range: {self.df['age'].min()} to {self.df['age'].max()}")
return self.df
@staticmethod
def calculate_age_decimal(birth_dates, seasons):
"""
Calculate precise age at season midpoint (typically January 1st).
"""
birth_dates = pd.to_datetime(birth_dates)
season_midpoints = pd.to_datetime(seasons.astype(str) + '-01-01')
age_days = (season_midpoints - birth_dates).dt.days
return age_days / 365.25
def calculate_deltas(self, metric='per'):
"""
Calculate year-over-year changes for each player using delta method.
Parameters:
-----------
metric : str
Performance metric to analyze (e.g., 'per', 'bpm', 'ws_per_48')
"""
# Create a column for next season's value
self.df[f'{metric}_next'] = self.df.groupby('player_id')[metric].shift(-1)
self.df['age_next'] = self.df.groupby('player_id')['age'].shift(-1)
# Calculate delta (change from this season to next)
self.df[f'{metric}_delta'] = self.df[f'{metric}_next'] - self.df[metric]
# Only keep rows where player played consecutive seasons
valid_deltas = self.df[
(self.df['age_next'] == self.df['age'] + 1) &
(self.df[f'{metric}_delta'].notna())
].copy()
print(f"Calculated {len(valid_deltas)} valid year-over-year changes")
return valid_deltas
def build_age_curve(self, metric='per', smoothing=0.3):
"""
Build age curve using delta method and spline smoothing.
Parameters:
-----------
metric : str
Performance metric to analyze
smoothing : float
Smoothing parameter for spline (0 = no smoothing, higher = more smooth)
"""
# Calculate deltas
deltas = self.calculate_deltas(metric)
# Group by age and calculate average delta
age_deltas = deltas.groupby('age').agg({
f'{metric}_delta': ['mean', 'std', 'count']
}).reset_index()
age_deltas.columns = ['age', 'delta_mean', 'delta_std', 'count']
# Filter ages with sufficient sample size
age_deltas = age_deltas[age_deltas['count'] >= 10].copy()
# Build cumulative curve starting from age 19
age_deltas.sort_values('age', inplace=True)
age_deltas['cumulative_change'] = age_deltas['delta_mean'].cumsum()
# Normalize to peak age (typically 27-28)
peak_idx = age_deltas['cumulative_change'].idxmax()
peak_value = age_deltas.loc[peak_idx, 'cumulative_change']
self.peak_age = age_deltas.loc[peak_idx, 'age']
age_deltas['normalized_performance'] = (
100 + (age_deltas['cumulative_change'] - peak_value) * 10
)
# Apply smoothing with spline
if smoothing > 0:
spline = UnivariateSpline(
age_deltas['age'],
age_deltas['normalized_performance'],
s=smoothing
)
age_deltas['smoothed_performance'] = spline(age_deltas['age'])
else:
age_deltas['smoothed_performance'] = age_deltas['normalized_performance']
self.age_curve = age_deltas
print(f"Peak age: {self.peak_age}")
print(f"Age curve built for ages {age_deltas['age'].min()} to {age_deltas['age'].max()}")
return self.age_curve
def plot_age_curve(self, metric='per', save_path=None):
"""
Visualize the age curve with confidence intervals.
"""
if self.age_curve is None:
raise ValueError("Must build age curve first using build_age_curve()")
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
# Plot 1: Age Curve
ax1.plot(
self.age_curve['age'],
self.age_curve['smoothed_performance'],
linewidth=3,
color='#1f77b4',
label='Age Curve (Smoothed)'
)
ax1.scatter(
self.age_curve['age'],
self.age_curve['normalized_performance'],
alpha=0.5,
s=50,
color='#ff7f0e',
label='Actual Data'
)
# Mark peak age
peak_performance = self.age_curve[
self.age_curve['age'] == self.peak_age
]['smoothed_performance'].values[0]
ax1.axvline(
self.peak_age,
color='red',
linestyle='--',
alpha=0.7,
label=f'Peak Age: {self.peak_age:.0f}'
)
ax1.axhline(
100,
color='gray',
linestyle=':',
alpha=0.5
)
ax1.set_xlabel('Age', fontsize=12, fontweight='bold')
ax1.set_ylabel('Relative Performance (%)', fontsize=12, fontweight='bold')
ax1.set_title(
f'NBA Age Curve - {metric.upper()}',
fontsize=14,
fontweight='bold'
)
ax1.legend(loc='best')
ax1.grid(True, alpha=0.3)
# Plot 2: Year-over-Year Change
ax2.bar(
self.age_curve['age'],
self.age_curve['delta_mean'],
color=['green' if x >= 0 else 'red' for x in self.age_curve['delta_mean']],
alpha=0.6
)
# Add error bars
ax2.errorbar(
self.age_curve['age'],
self.age_curve['delta_mean'],
yerr=self.age_curve['delta_std'] / np.sqrt(self.age_curve['count']),
fmt='none',
ecolor='black',
alpha=0.3,
capsize=3
)
ax2.axhline(0, color='black', linestyle='-', linewidth=1)
ax2.set_xlabel('Age', fontsize=12, fontweight='bold')
ax2.set_ylabel('Average Year-over-Year Change', fontsize=12, fontweight='bold')
ax2.set_title('Annual Performance Change by Age', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
if save_path:
plt.savefig(save_path, dpi=300, bbox_inches='tight')
print(f"Plot saved to {save_path}")
plt.show()
return fig
def get_age_adjustment(self, from_age, to_age):
"""
Get performance adjustment factor from one age to another.
Parameters:
-----------
from_age : float
Current age
to_age : float
Target age for projection
Returns:
--------
float
Multiplicative adjustment factor
"""
if self.age_curve is None:
raise ValueError("Must build age curve first")
# Get performance at both ages (interpolate if needed)
from_perf = np.interp(
from_age,
self.age_curve['age'],
self.age_curve['smoothed_performance']
)
to_perf = np.interp(
to_age,
self.age_curve['age'],
self.age_curve['smoothed_performance']
)
adjustment = to_perf / from_perf
return adjustment
def project_performance(self, current_age, current_performance, target_age,
regression_factor=0.3):
"""
Project future performance using age curve and regression to mean.
Parameters:
-----------
current_age : float
Player's current age
current_performance : float
Player's current performance level
target_age : float
Age to project to
regression_factor : float
Weight of regression to mean (0 = no regression, 1 = full regression)
Returns:
--------
dict
Projection results including point estimate and confidence interval
"""
# Get age adjustment
age_factor = self.get_age_adjustment(current_age, target_age)
# Calculate league average for regression
league_avg = self.df['per'].mean() if 'per' in self.df.columns else 15.0
# Apply regression to mean
regressed_performance = (
current_performance * (1 - regression_factor) +
league_avg * regression_factor
)
# Apply age adjustment
projected = regressed_performance * age_factor
# Estimate confidence interval (simplified approach)
age_diff = abs(target_age - current_age)
uncertainty = 0.05 * age_diff # 5% uncertainty per year
lower_bound = projected * (1 - uncertainty)
upper_bound = projected * (1 + uncertainty)
return {
'projected_performance': projected,
'age_adjustment_factor': age_factor,
'lower_bound': lower_bound,
'upper_bound': upper_bound,
'confidence_level': max(0.5, 0.95 - age_diff * 0.05)
}
class PositionAgeCurves:
"""
Analyze age curves separately by position.
"""
def __init__(self, df, positions=['PG', 'SG', 'SF', 'PF', 'C']):
"""
Initialize with player data including position.
"""
self.df = df
self.positions = positions
self.curves = {}
def build_position_curves(self, metric='per'):
"""
Build separate age curves for each position.
"""
for pos in self.positions:
pos_df = self.df[self.df['position'] == pos].copy()
analyzer = AgeAnalyzer()
analyzer.df = pos_df
curve = analyzer.build_age_curve(metric)
self.curves[pos] = {
'curve': curve,
'peak_age': analyzer.peak_age
}
return self.curves
def plot_position_comparison(self, save_path=None):
"""
Plot age curves for all positions on same chart.
"""
fig, ax = plt.subplots(figsize=(14, 8))
colors = {
'PG': '#1f77b4',
'SG': '#ff7f0e',
'SF': '#2ca02c',
'PF': '#d62728',
'C': '#9467bd'
}
for pos in self.positions:
if pos in self.curves:
curve = self.curves[pos]['curve']
peak = self.curves[pos]['peak_age']
ax.plot(
curve['age'],
curve['smoothed_performance'],
linewidth=3,
color=colors[pos],
label=f'{pos} (Peak: {peak:.0f})'
)
ax.axhline(100, color='gray', linestyle=':', alpha=0.5)
ax.set_xlabel('Age', fontsize=12, fontweight='bold')
ax.set_ylabel('Relative Performance (%)', fontsize=12, fontweight='bold')
ax.set_title(
'NBA Age Curves by Position',
fontsize=14,
fontweight='bold'
)
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
if save_path:
plt.savefig(save_path, dpi=300, bbox_inches='tight')
plt.show()
return fig
def get_peak_age_summary(self):
"""
Return summary of peak ages by position.
"""
summary = pd.DataFrame([
{
'position': pos,
'peak_age': self.curves[pos]['peak_age'],
'peak_performance': self.curves[pos]['curve'][
self.curves[pos]['curve']['age'] == self.curves[pos]['peak_age']
]['smoothed_performance'].values[0]
}
for pos in self.positions if pos in self.curves
])
return summary
# Example usage
if __name__ == "__main__":
# Example 1: Basic Age Curve Analysis
print("=" * 60)
print("EXAMPLE 1: Basic Age Curve Analysis")
print("=" * 60)
# Create sample data (in practice, load from CSV or database)
np.random.seed(42)
sample_data = []
for player_id in range(1, 201):
# Simulate career from age 20 to retirement
career_length = np.random.randint(6, 16)
start_age = np.random.randint(19, 23)
# Simulate skill level with peak around 27
base_skill = np.random.normal(15, 3)
for year in range(career_length):
age = start_age + year
# Age curve effect
if age < 27:
age_effect = 1 + (27 - age) * 0.02
else:
age_effect = 1 - (age - 27) * 0.03
per = base_skill * age_effect + np.random.normal(0, 1.5)
minutes = max(200, 2000 - abs(age - 27) * 50 + np.random.normal(0, 200))
sample_data.append({
'player_id': player_id,
'season': 2010 + year,
'age': age,
'minutes': minutes,
'per': per,
'position': np.random.choice(['PG', 'SG', 'SF', 'PF', 'C'])
})
df = pd.DataFrame(sample_data)
# Initialize analyzer
analyzer = AgeAnalyzer(min_minutes=500)
analyzer.df = df
# Build age curve
age_curve = analyzer.build_age_curve(metric='per', smoothing=0.5)
print("\nAge Curve Summary:")
print(age_curve[['age', 'normalized_performance', 'delta_mean', 'count']].to_string(index=False))
# Plot results
analyzer.plot_age_curve(metric='per')
# Example 2: Player Projection
print("\n" + "=" * 60)
print("EXAMPLE 2: Multi-Year Player Projection")
print("=" * 60)
current_age = 25
current_per = 20.5
print(f"\nPlayer Stats: Age {current_age}, PER {current_per:.1f}")
print("\nProjections:")
print("-" * 60)
for target_age in range(26, 36):
projection = analyzer.project_performance(
current_age=current_age,
current_performance=current_per,
target_age=target_age,
regression_factor=0.25
)
print(f"Age {target_age}: {projection['projected_performance']:.2f} PER "
f"({projection['lower_bound']:.2f} - {projection['upper_bound']:.2f}) "
f"[{projection['confidence_level']:.0%} confidence]")
# Example 3: Position-Specific Analysis
print("\n" + "=" * 60)
print("EXAMPLE 3: Position-Specific Age Curves")
print("=" * 60)
pos_analyzer = PositionAgeCurves(df)
pos_analyzer.build_position_curves(metric='per')
print("\nPeak Ages by Position:")
peak_summary = pos_analyzer.get_peak_age_summary()
print(peak_summary.to_string(index=False))
pos_analyzer.plot_position_comparison()
# Example 4: Skill-Specific Aging
print("\n" + "=" * 60)
print("EXAMPLE 4: Skill-Specific Age Curves")
print("=" * 60)
# Simulate different skills with different aging patterns
skills_data = df.copy()
# Shooting ages well
skills_data['three_pt_pct'] = 0.30 + skills_data['age'] * 0.005 + np.random.normal(0, 0.03, len(skills_data))
skills_data['three_pt_pct'] = skills_data['three_pt_pct'].clip(0.20, 0.45)
# Athleticism declines early
skills_data['athleticism'] = 100 - (skills_data['age'] - 23) * 2.5 + np.random.normal(0, 5, len(skills_data))
skills_data['athleticism'] = skills_data['athleticism'].clip(50, 100)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# Plot skill comparisons
for skill, ax in zip(['three_pt_pct', 'athleticism'], axes):
skill_by_age = skills_data.groupby('age')[skill].mean()
ax.plot(skill_by_age.index, skill_by_age.values, linewidth=3, marker='o')
ax.set_xlabel('Age', fontsize=12, fontweight='bold')
ax.set_ylabel(skill.replace('_', ' ').title(), fontsize=12, fontweight='bold')
ax.set_title(f'{skill.replace("_", " ").title()} vs Age', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("\nSkill-specific analysis complete!")
print("- Three-point shooting improves with age and experience")
print("- Athleticism peaks early and declines steadily")
```
### Simple Projection Function
```python
def simple_age_projection(current_stats, current_age, target_age,
position='ALL', metric='per'):
"""
Quick projection using pre-built age curve coefficients.
Age adjustment factors based on historical NBA data (2000-2023):
- Relative performance indexed to age 27 peak
"""
# Simplified age factors (indexed to 100 at peak)
age_factors = {
19: 85, 20: 88, 21: 91, 22: 94, 23: 96, 24: 98,
25: 99, 26: 100, 27: 100, 28: 100, 29: 98, 30: 96,
31: 93, 32: 90, 33: 86, 34: 82, 35: 77, 36: 72,
37: 67, 38: 62, 39: 57, 40: 52
}
# Position adjustments (guards age better)
position_modifiers = {
'PG': 1.02, # Age slower
'SG': 1.01,
'SF': 1.00, # Baseline
'PF': 0.99,
'C': 0.98, # Age faster
'ALL': 1.00
}
current_factor = age_factors.get(int(current_age), 100)
target_factor = age_factors.get(int(target_age), 100)
position_mod = position_modifiers.get(position, 1.00)
# Calculate adjustment
adjustment = (target_factor / current_factor) * position_mod
# Project stats
projected_stats = {k: v * adjustment for k, v in current_stats.items()}
return projected_stats, adjustment
# Example usage
player_stats = {
'points': 24.5,
'rebounds': 7.2,
'assists': 5.8,
'per': 22.3
}
projected, factor = simple_age_projection(
current_stats=player_stats,
current_age=26,
target_age=31,
position='PG'
)
print(f"Age 26 Stats: {player_stats}")
print(f"Age 31 Projection: {projected}")
print(f"Adjustment Factor: {factor:.3f}")
```
---
## R Implementation {#r-code}
### Complete Age Curve Analysis in R
```r
# Load required libraries
library(tidyverse)
library(mgcv) # For GAM smoothing
library(ggplot2)
library(gridExtra)
library(scales)
# Age Curve Analyzer Class (R6)
library(R6)
AgeAnalyzer <- R6Class("AgeAnalyzer",
public = list(
data = NULL,
age_curve = NULL,
peak_age = NULL,
min_minutes = NULL,
initialize = function(min_minutes = 500) {
self$min_minutes <- min_minutes
message("Age Analyzer initialized")
},
load_data = function(filepath) {
# Load player-season data
self$data <- read_csv(filepath) %>%
filter(minutes >= self$min_minutes) %>%
arrange(player_id, season)
message(sprintf("Loaded %d player-seasons", nrow(self$data)))
message(sprintf("Age range: %d to %d",
min(self$data$age), max(self$data$age)))
return(self$data)
},
calculate_deltas = function(metric = "per") {
# Calculate year-over-year changes using delta method
deltas <- self$data %>%
group_by(player_id) %>%
arrange(season) %>%
mutate(
metric_next = lead(!!sym(metric)),
age_next = lead(age),
metric_delta = metric_next - !!sym(metric),
consecutive = (age_next == age + 1)
) %>%
ungroup() %>%
filter(consecutive == TRUE, !is.na(metric_delta))
message(sprintf("Calculated %d valid year-over-year changes", nrow(deltas)))
return(deltas)
},
build_age_curve = function(metric = "per", smooth_method = "gam") {
# Build age curve using delta method
# Calculate deltas
deltas <- self$calculate_deltas(metric)
# Aggregate by age
age_deltas <- deltas %>%
group_by(age) %>%
summarise(
delta_mean = mean(metric_delta, na.rm = TRUE),
delta_sd = sd(metric_delta, na.rm = TRUE),
delta_se = delta_sd / sqrt(n()),
count = n(),
.groups = 'drop'
) %>%
filter(count >= 10) %>%
arrange(age) %>%
mutate(
cumulative_change = cumsum(delta_mean)
)
# Normalize to peak age
peak_idx <- which.max(age_deltas$cumulative_change)
peak_value <- age_deltas$cumulative_change[peak_idx]
self$peak_age <- age_deltas$age[peak_idx]
age_deltas <- age_deltas %>%
mutate(
normalized_performance = 100 + (cumulative_change - peak_value) * 10
)
# Apply smoothing
if (smooth_method == "gam") {
gam_model <- gam(normalized_performance ~ s(age, k = 8), data = age_deltas)
age_deltas$smoothed_performance <- predict(gam_model, newdata = age_deltas)
} else if (smooth_method == "loess") {
loess_model <- loess(normalized_performance ~ age, data = age_deltas, span = 0.3)
age_deltas$smoothed_performance <- predict(loess_model, newdata = age_deltas)
} else {
age_deltas$smoothed_performance <- age_deltas$normalized_performance
}
self$age_curve <- age_deltas
message(sprintf("Peak age: %.0f", self$peak_age))
message(sprintf("Age curve built for ages %d to %d",
min(age_deltas$age), max(age_deltas$age)))
return(self$age_curve)
},
plot_age_curve = function(metric = "per", save_path = NULL) {
# Visualize age curve
if (is.null(self$age_curve)) {
stop("Must build age curve first using build_age_curve()")
}
# Plot 1: Age Curve
p1 <- ggplot(self$age_curve, aes(x = age)) +
geom_line(aes(y = smoothed_performance),
color = "#1f77b4", linewidth = 1.5) +
geom_point(aes(y = normalized_performance),
color = "#ff7f0e", alpha = 0.6, size = 3) +
geom_vline(xintercept = self$peak_age,
linetype = "dashed", color = "red", alpha = 0.7) +
geom_hline(yintercept = 100,
linetype = "dotted", color = "gray", alpha = 0.5) +
labs(
title = sprintf("NBA Age Curve - %s", toupper(metric)),
subtitle = sprintf("Peak Age: %.0f", self$peak_age),
x = "Age",
y = "Relative Performance (%)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12, face = "bold")
)
# Plot 2: Year-over-Year Change
p2 <- ggplot(self$age_curve, aes(x = age, y = delta_mean)) +
geom_col(aes(fill = delta_mean >= 0), alpha = 0.7) +
geom_errorbar(aes(ymin = delta_mean - delta_se,
ymax = delta_mean + delta_se),
width = 0.3, alpha = 0.5) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
scale_fill_manual(values = c("TRUE" = "green", "FALSE" = "red")) +
labs(
title = "Annual Performance Change by Age",
x = "Age",
y = "Average Year-over-Year Change"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "none"
)
# Combine plots
combined <- grid.arrange(p1, p2, ncol = 1)
if (!is.null(save_path)) {
ggsave(save_path, combined, width = 12, height = 10, dpi = 300)
message(sprintf("Plot saved to %s", save_path))
}
return(combined)
},
get_age_adjustment = function(from_age, to_age) {
# Get performance adjustment factor
if (is.null(self$age_curve)) {
stop("Must build age curve first")
}
# Interpolate performance at both ages
from_perf <- approx(
self$age_curve$age,
self$age_curve$smoothed_performance,
xout = from_age
)$y
to_perf <- approx(
self$age_curve$age,
self$age_curve$smoothed_performance,
xout = to_age
)$y
adjustment <- to_perf / from_perf
return(adjustment)
},
project_performance = function(current_age, current_performance, target_age,
regression_factor = 0.3) {
# Project future performance
# Age adjustment
age_factor <- self$get_age_adjustment(current_age, target_age)
# Regression to mean
league_avg <- mean(self$data$per, na.rm = TRUE)
regressed_performance <- current_performance * (1 - regression_factor) +
league_avg * regression_factor
# Apply age adjustment
projected <- regressed_performance * age_factor
# Confidence interval
age_diff <- abs(target_age - current_age)
uncertainty <- 0.05 * age_diff
list(
projected_performance = projected,
age_adjustment_factor = age_factor,
lower_bound = projected * (1 - uncertainty),
upper_bound = projected * (1 + uncertainty),
confidence_level = max(0.5, 0.95 - age_diff * 0.05)
)
}
)
)
# Position-Specific Age Curves
analyze_position_curves <- function(data, positions = c('PG', 'SG', 'SF', 'PF', 'C'),
metric = 'per') {
# Build age curves for each position
curves_list <- list()
for (pos in positions) {
pos_data <- data %>% filter(position == pos)
if (nrow(pos_data) < 100) {
message(sprintf("Skipping %s - insufficient data", pos))
next
}
analyzer <- AgeAnalyzer$new()
analyzer$data <- pos_data
curve <- analyzer$build_age_curve(metric)
curves_list[[pos]] <- list(
curve = curve,
peak_age = analyzer$peak_age
)
}
return(curves_list)
}
plot_position_comparison <- function(curves_list, save_path = NULL) {
# Plot all position curves together
# Combine all curves into one dataframe
combined_curves <- bind_rows(
lapply(names(curves_list), function(pos) {
curves_list[[pos]]$curve %>%
mutate(position = pos)
})
)
# Create plot
p <- ggplot(combined_curves, aes(x = age, y = smoothed_performance,
color = position, group = position)) +
geom_line(linewidth = 1.5) +
geom_hline(yintercept = 100, linetype = "dotted", color = "gray", alpha = 0.5) +
scale_color_manual(
values = c(
'PG' = '#1f77b4',
'SG' = '#ff7f0e',
'SF' = '#2ca02c',
'PF' = '#d62728',
'C' = '#9467bd'
),
labels = function(x) {
peak_ages <- sapply(x, function(pos) {
sprintf("%s (Peak: %.0f)", pos, curves_list[[pos]]$peak_age)
})
return(peak_ages)
}
) +
labs(
title = "NBA Age Curves by Position",
x = "Age",
y = "Relative Performance (%)",
color = "Position"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "right"
)
if (!is.null(save_path)) {
ggsave(save_path, p, width = 14, height = 8, dpi = 300)
}
print(p)
return(p)
}
# Example Usage
if (interactive()) {
# Generate sample data
set.seed(42)
sample_data <- tibble()
for (player_id in 1:200) {
career_length <- sample(6:16, 1)
start_age <- sample(19:23, 1)
base_skill <- rnorm(1, mean = 15, sd = 3)
position <- sample(c('PG', 'SG', 'SF', 'PF', 'C'), 1)
for (year in 0:(career_length - 1)) {
age <- start_age + year
# Age curve effect
if (age < 27) {
age_effect <- 1 + (27 - age) * 0.02
} else {
age_effect <- 1 - (age - 27) * 0.03
}
per <- base_skill * age_effect + rnorm(1, 0, 1.5)
minutes <- max(200, 2000 - abs(age - 27) * 50 + rnorm(1, 0, 200))
sample_data <- bind_rows(sample_data, tibble(
player_id = player_id,
season = 2010 + year,
age = age,
minutes = minutes,
per = per,
position = position
))
}
}
# Example 1: Basic Age Curve
cat("\n", strrep("=", 60), "\n")
cat("EXAMPLE 1: Basic Age Curve Analysis\n")
cat(strrep("=", 60), "\n\n")
analyzer <- AgeAnalyzer$new(min_minutes = 500)
analyzer$data <- sample_data
age_curve <- analyzer$build_age_curve(metric = "per")
print(age_curve)
analyzer$plot_age_curve(metric = "per")
# Example 2: Player Projection
cat("\n", strrep("=", 60), "\n")
cat("EXAMPLE 2: Multi-Year Player Projection\n")
cat(strrep("=", 60), "\n\n")
current_age <- 25
current_per <- 20.5
cat(sprintf("Player Stats: Age %d, PER %.1f\n\n", current_age, current_per))
cat("Projections:\n")
cat(strrep("-", 60), "\n")
for (target_age in 26:35) {
projection <- analyzer$project_performance(
current_age = current_age,
current_performance = current_per,
target_age = target_age,
regression_factor = 0.25
)
cat(sprintf("Age %d: %.2f PER (%.2f - %.2f) [%.0f%% confidence]\n",
target_age,
projection$projected_performance,
projection$lower_bound,
projection$upper_bound,
projection$confidence_level * 100))
}
# Example 3: Position-Specific Analysis
cat("\n", strrep("=", 60), "\n")
cat("EXAMPLE 3: Position-Specific Age Curves\n")
cat(strrep("=", 60), "\n\n")
pos_curves <- analyze_position_curves(sample_data, metric = "per")
cat("Peak Ages by Position:\n")
peak_summary <- tibble(
Position = names(pos_curves),
Peak_Age = sapply(pos_curves, function(x) x$peak_age)
)
print(peak_summary)
plot_position_comparison(pos_curves)
}
# Simple projection function
simple_age_projection <- function(current_stats, current_age, target_age,
position = 'ALL') {
# Quick projection using pre-built coefficients
age_factors <- c(
'19' = 85, '20' = 88, '21' = 91, '22' = 94, '23' = 96, '24' = 98,
'25' = 99, '26' = 100, '27' = 100, '28' = 100, '29' = 98, '30' = 96,
'31' = 93, '32' = 90, '33' = 86, '34' = 82, '35' = 77, '36' = 72,
'37' = 67, '38' = 62, '39' = 57, '40' = 52
)
position_modifiers <- c(
'PG' = 1.02, 'SG' = 1.01, 'SF' = 1.00, 'PF' = 0.99, 'C' = 0.98, 'ALL' = 1.00
)
current_factor <- age_factors[as.character(round(current_age))]
target_factor <- age_factors[as.character(round(target_age))]
position_mod <- position_modifiers[position]
adjustment <- (target_factor / current_factor) * position_mod
projected_stats <- lapply(current_stats, function(x) x * adjustment)
list(
projected = projected_stats,
adjustment_factor = adjustment
)
}
# Example
player_stats <- list(
points = 24.5,
rebounds = 7.2,
assists = 5.8,
per = 22.3
)
result <- simple_age_projection(
current_stats = player_stats,
current_age = 26,
target_age = 31,
position = 'PG'
)
cat("Current Stats:", paste(names(player_stats), unlist(player_stats), sep = "=", collapse = ", "), "\n")
cat("Projected Stats:", paste(names(result$projected), unlist(result$projected), sep = "=", collapse = ", "), "\n")
cat("Adjustment Factor:", result$adjustment_factor, "\n")
```
---
## Advanced Considerations {#advanced}
### Survivor Bias Correction
The delta method automatically corrects for survivor bias, but additional considerations:
1. **Minutes-weighted adjustments**: Weight by playing time for more accurate curves
2. **Role changes**: Account for players changing positions or roles
3. **League evolution**: Adjust for era effects and rule changes
### Individual Variation
Not all players follow the average curve:
- **Elite players** often peak later and decline more gradually
- **Athletic specialists** peak earlier but decline faster
- **Injury history** dramatically affects individual trajectories
- **Playing style adaptations** can extend careers
### Modern NBA Considerations
Recent trends affecting age curves:
1. **Three-point revolution**: Shooting aging more important now
2. **Load management**: May extend peak years
3. **Sports science**: Better conditioning and injury prevention
4. **Versatility premium**: Multi-skilled players age better
5. **Pace and space**: Less physical play may benefit older players
### Contract Valuation
Use age curves to estimate surplus value:
```
Contract Value = Σ(Projected_WAR[year] × $/WAR - Salary[year])
```
Account for:
- Risk of injury (increases with age)
- Replacement level rising over contract
- Team option years vs. guaranteed money
- Trade value depreciation
### Limitations and Cautions
1. **Small sample sizes** at extreme ages
2. **Selection bias** for players who change teams/roles
3. **Era effects** when using historical data
4. **Individual variance** can be high
5. **Injury risk** not fully captured by age alone
---
## Best Practices
### For Analysis
1. Use minimum playing time thresholds (500+ minutes)
2. Calculate age to decimal precision
3. Apply appropriate smoothing (GAM or spline)
4. Validate with out-of-sample testing
5. Update curves regularly with new data
### For Projections
1. Combine age adjustments with regression to mean
2. Provide confidence intervals
3. Account for position and playstyle
4. Consider recent performance trends
5. Adjust for injury history
### For Decision-Making
1. Don't over-rely on average curves for elite players
2. Consider role changes and team fit
3. Weight recent seasons more heavily for older players
4. Factor in contract structure and team timeline
5. Use ensemble of projection methods
---
## Summary
Age curves are powerful tools for understanding player development and decline:
- **Typical peak**: Ages 27-29 for overall performance
- **Skill variation**: Different abilities age at different rates
- **Position matters**: Guards age better than big men on average
- **Individual paths vary**: Elite players and specific skills deviate from average
- **Projection applications**: Contract valuation, roster construction, playing time decisions
By properly implementing age curve analysis with methods like the delta approach, teams can make more informed decisions about player acquisition, development, and long-term roster planning.
---
## References and Further Reading
### Academic Papers
- Vaci et al. (2019) - "Large Data and Bayesian Modeling—Aging Curves of NBA Players"
- Bradbury (2009) - "Peak Athletic Performance and Ageing"
- Wakim & Jin (2014) - "Fitting NBA Career Trajectories Using Nonparametric Models"
### Industry Resources
- Basketball-Reference.com - Historical player data
- Duncd On Podcast - Age curve discussions and applications
- Cleaning the Glass - Modern age curve analysis
- FiveThirtyEight - CARMELO projection system (uses age curves)
### Tools and Data
- NBA Stats API - Official player statistics
- Basketball Reference Scraper - Python package for historical data
- nbastatR - R package for NBA data access
- Synergy Sports - Advanced video-based metrics
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions