In the preceding chapters, we explored individual counting statistics and rate-based metrics that capture what players do on the court—points scored, rebounds grabbed, assists made. These metrics tell us about production but leave a fundamental...
In This Chapter
- Introduction
- 10.1 Raw Plus-Minus: Definition and Calculation
- 10.2 On-Court/Off-Court Splits
- 10.3 On/Off Differential Deep Dive
- 10.4 Lineup Analysis Basics
- 10.5 Net Rating Calculation
- 10.6 Why Raw Plus-Minus Is Noisy
- 10.7 Teammate and Opponent Effects
- 10.8 Sample Size Considerations
- 10.9 Introduction to Adjustment Methods
- 10.10 Practical Plus-Minus Analysis
- 10.11 Plus-Minus in Different Contexts
- 10.12 Summary and Bridge to Adjusted Metrics
- Chapter 10 Python Code Reference
- References
Chapter 10: Plus-Minus and On/Off Analysis
Introduction
In the preceding chapters, we explored individual counting statistics and rate-based metrics that capture what players do on the court—points scored, rebounds grabbed, assists made. These metrics tell us about production but leave a fundamental question unanswered: How much does a player's presence actually help their team?
Plus-minus and on/off analysis attempt to answer this question by measuring outcomes when a player is on the court versus when they are off. This conceptually simple approach—crediting or debiting players for what happens during their minutes—serves as the foundation for some of the most sophisticated player evaluation methods in modern basketball analytics.
This chapter introduces the family of plus-minus metrics, from raw calculations to lineup-based analysis. We will examine both the power and limitations of these approaches, setting the stage for the adjusted plus-minus methods covered in Part 3 of this textbook.
10.1 Raw Plus-Minus: Definition and Calculation
The Basic Concept
Raw plus-minus (sometimes called "box plus-minus" in older literature, though this term now typically refers to a different metric) measures the point differential while a player is on the court during a game.
Definition: A player's raw plus-minus equals the points scored by their team minus the points allowed by their team during the minutes that player was on the court.
$$\text{Plus-Minus} = \text{Team Points Scored}_{\text{player on court}} - \text{Team Points Allowed}_{\text{player on court}}$$
Calculation Example
Consider a simple game scenario:
| Time Period | Player A Status | Team Score | Opponent Score |
|---|---|---|---|
| Q1 (12:00-6:00) | On Court | 14 | 10 |
| Q1 (6:00-0:00) | Bench | 8 | 12 |
| Q2 (12:00-8:00) | On Court | 10 | 8 |
| Q2 (8:00-0:00) | Bench | 6 | 10 |
| Q3 (12:00-4:00) | On Court | 16 | 14 |
| Q3 (4:00-0:00) | Bench | 8 | 6 |
| Q4 (12:00-0:00) | On Court | 22 | 18 |
Player A's Plus-Minus Calculation: - Q1 (first 6 min): +4 (14-10) - Q2 (first 4 min): +2 (10-8) - Q3 (first 8 min): +2 (16-14) - Q4 (full): +4 (22-18)
Total Plus-Minus: +12
Player A was on court for 30 minutes during which their team outscored opponents by 12 points.
Game-Level vs. Season-Level Plus-Minus
Plus-minus can be aggregated at multiple levels:
- Single Game: The point differential for one game
- Season Total: Sum of all game plus-minus values
- Career Total: Cumulative plus-minus across seasons
Important Note: Season totals are meaningful for comparison, but raw totals favor players with more minutes. A player with +500 over 3,000 minutes is not necessarily more impactful than one with +300 over 1,500 minutes.
The Data Source Challenge
Calculating plus-minus requires play-by-play data that tracks: - Every scoring event with timestamp - Substitution patterns - Which five players are on court at each moment
Before play-by-play data became widely available (roughly 2000-2001 for the NBA), plus-minus was nearly impossible to calculate systematically. The Dallas Mavericks and a few other forward-thinking franchises began tracking this data independently in the late 1990s.
10.2 On-Court/Off-Court Splits
Expanding Beyond Simple Plus-Minus
While raw plus-minus tells us the point differential during a player's minutes, on/off splits provide richer context by examining team performance rates with and without each player.
Key On/Off Metrics
Offensive Rating (On/Off): - Points scored per 100 possessions with player on court - Points scored per 100 possessions with player off court
Defensive Rating (On/Off): - Points allowed per 100 possessions with player on court - Points allowed per 100 possessions with player off court
Net Rating (On/Off): - Net Rating On = ORtg On - DRtg On - Net Rating Off = ORtg Off - DRtg Off
Calculation Process
To compute on/off splits, we need:
- Possession counts for player-on and player-off periods
- Points scored/allowed during each period
- Pace adjustment to normalize to per-100-possession rates
def calculate_on_off_ratings(player_on_data, player_off_data):
"""
Calculate on/off offensive and defensive ratings.
Parameters:
-----------
player_on_data : dict
Contains 'points_for', 'points_against', 'possessions' when player on court
player_off_data : dict
Contains 'points_for', 'points_against', 'possessions' when player off court
Returns:
--------
dict : On/off ratings and differentials
"""
# On-court ratings (per 100 possessions)
ortg_on = (player_on_data['points_for'] / player_on_data['possessions']) * 100
drtg_on = (player_on_data['points_against'] / player_on_data['possessions']) * 100
net_on = ortg_on - drtg_on
# Off-court ratings
ortg_off = (player_off_data['points_for'] / player_off_data['possessions']) * 100
drtg_off = (player_off_data['points_against'] / player_off_data['possessions']) * 100
net_off = ortg_off - drtg_off
return {
'ortg_on': ortg_on,
'ortg_off': ortg_off,
'ortg_diff': ortg_on - ortg_off,
'drtg_on': drtg_on,
'drtg_off': drtg_off,
'drtg_diff': drtg_on - drtg_off, # Note: negative is good for defense
'net_on': net_on,
'net_off': net_off,
'on_off_diff': net_on - net_off
}
Interpreting On/Off Data
Consider these on/off splits for a hypothetical player:
| Metric | On Court | Off Court | Differential |
|---|---|---|---|
| Offensive Rating | 112.5 | 108.2 | +4.3 |
| Defensive Rating | 106.8 | 110.4 | -3.6 |
| Net Rating | +5.7 | -2.2 | +7.9 |
Interpretation: - The team scores 4.3 more points per 100 possessions with this player on court - The team allows 3.6 fewer points per 100 possessions (defensive improvement) - The combined on/off differential of +7.9 suggests significant positive impact
Historical On/Off Leaders
Some of the largest single-season on/off differentials in NBA history:
| Player | Season | On/Off Differential | Context |
|---|---|---|---|
| LeBron James | 2008-09 | +17.1 | MVP season, weak supporting cast |
| Kevin Garnett | 2003-04 | +16.3 | MVP season, Minnesota |
| Chris Paul | 2008-09 | +14.8 | Elite floor general |
| Tim Duncan | 2002-03 | +13.9 | Championship season |
| Nikola Jokic | 2021-22 | +15.2 | MVP season |
These extreme values often occur when elite players have significant minutes and weaker teammates.
10.3 On/Off Differential Deep Dive
The Math Behind On/Off Differential
On/off differential measures the swing in team performance between player-on and player-off minutes:
$$\text{On/Off Differential} = \text{Net Rating}_{\text{on}} - \text{Net Rating}_{\text{off}}$$
This can be decomposed:
$$\text{On/Off} = (\text{ORtg}_{\text{on}} - \text{DRtg}_{\text{on}}) - (\text{ORtg}_{\text{off}} - \text{DRtg}_{\text{off}})$$
Rearranging:
$$\text{On/Off} = (\text{ORtg}_{\text{on}} - \text{ORtg}_{\text{off}}) - (\text{DRtg}_{\text{on}} - \text{DRtg}_{\text{off}})$$
This shows on/off differential equals offensive improvement minus defensive improvement (where lower defensive rating is better, so we subtract).
Offensive vs. Defensive On/Off
Breaking down on/off into components reveals different player profiles:
Offensive Specialist Example: - ORtg On: 115.0, ORtg Off: 105.0 (Offensive Diff: +10.0) - DRtg On: 112.0, DRtg Off: 108.0 (Defensive Diff: +4.0, meaning worse) - On/Off: +10.0 - 4.0 = +6.0
Defensive Specialist Example: - ORtg On: 108.0, ORtg Off: 110.0 (Offensive Diff: -2.0) - DRtg On: 102.0, DRtg Off: 112.0 (Defensive Diff: -10.0, meaning much better) - On/Off: -2.0 - (-10.0) = +8.0
Both players show +6.0 and +8.0 on/off differentials, but through completely different paths.
The Replacement Level Question
On/off differential implicitly compares a player to their backup—whoever plays when they sit. This creates interpretive challenges:
Scenario A: Star player with terrible backup - Net Rating On: +8.0 - Net Rating Off: -10.0 - On/Off: +18.0
Scenario B: Star player with solid backup - Net Rating On: +8.0 - Net Rating Off: +2.0 - On/Off: +6.0
The player in Scenario A appears three times more valuable by on/off differential, but both might be equally talented. The difference is backup quality.
10.4 Lineup Analysis Basics
Why Analyze Lineups?
Individual on/off analysis treats players in isolation, but basketball is fundamentally about combinations. Lineup analysis examines how specific five-player groups perform together.
Five-Man Lineup Data
For any five-player combination that shares the court, we can calculate:
- Minutes played together
- Points scored and allowed
- Net rating
- Offensive/defensive efficiency
- Individual statistics within the lineup
The Combinatorial Challenge
With a 15-man roster, the number of possible five-player combinations is:
$$\binom{15}{5} = \frac{15!}{5!(15-5)!} = 3,003$$
In practice, most teams use far fewer combinations with significant minutes. A typical NBA team might have: - 10-15 lineups with 100+ minutes - 30-50 lineups with 50+ minutes - 100+ lineups with any recorded time
Identifying Key Lineups
def analyze_lineups(lineup_data, min_minutes=100):
"""
Analyze five-man lineup performance.
Parameters:
-----------
lineup_data : list of dict
Each dict contains lineup info: players, minutes, pts_for, pts_against, possessions
min_minutes : int
Minimum minutes threshold for analysis
Returns:
--------
DataFrame : Filtered and sorted lineup analysis
"""
import pandas as pd
results = []
for lineup in lineup_data:
if lineup['minutes'] >= min_minutes:
ortg = (lineup['pts_for'] / lineup['possessions']) * 100
drtg = (lineup['pts_against'] / lineup['possessions']) * 100
net_rtg = ortg - drtg
results.append({
'players': lineup['players'],
'minutes': lineup['minutes'],
'possessions': lineup['possessions'],
'ortg': round(ortg, 1),
'drtg': round(drtg, 1),
'net_rtg': round(net_rtg, 1)
})
df = pd.DataFrame(results)
return df.sort_values('net_rtg', ascending=False)
The "Death Lineup" Concept
Teams occasionally discover lineup combinations with extreme effectiveness. The term "Death Lineup" was popularized by the Golden State Warriors' small-ball unit featuring Stephen Curry, Klay Thompson, Andre Iguodala, Harrison Barnes, and Draymond Green (2015-16).
What made it work: - Elite spacing (multiple shooters) - Switchable defense - Fast pace advantage - Mismatch creation
Such lineups demonstrate how plus-minus analysis can identify tactical advantages not apparent from individual statistics.
Lineup Context Matters
High-performing lineups often share court time against specific opponent groupings. A "bench mob" lineup might dominate opposing benches but struggle against starters. Net rating without opponent context can be misleading.
10.5 Net Rating Calculation
Team Net Rating
Net Rating is the difference between offensive and defensive rating:
$$\text{Net Rating} = \text{Offensive Rating} - \text{Defensive Rating}$$
Where: - Offensive Rating = Points scored per 100 possessions - Defensive Rating = Points allowed per 100 possessions
Possession Estimation
Accurate net rating requires possession counts. The standard formula:
$$\text{Possessions} \approx \text{FGA} - \text{OREB} + \text{TOV} + 0.44 \times \text{FTA}$$
The 0.44 coefficient for free throws accounts for and-ones, technical free throws, and three-shot fouls that don't consume a full possession.
Player-Level Net Rating
For individual players, net rating is calculated from their on-court time:
def calculate_player_net_rating(player_stats):
"""
Calculate a player's on-court net rating.
Parameters:
-----------
player_stats : dict
Must contain: minutes, team_pts_for, team_pts_against,
team_fga, team_oreb, team_tov, team_fta
Returns:
--------
dict : Offensive, defensive, and net ratings
"""
# Estimate possessions during player's minutes
possessions = (
player_stats['team_fga']
- player_stats['team_oreb']
+ player_stats['team_tov']
+ 0.44 * player_stats['team_fta']
)
# Calculate ratings per 100 possessions
ortg = (player_stats['team_pts_for'] / possessions) * 100
drtg = (player_stats['team_pts_against'] / possessions) * 100
net_rtg = ortg - drtg
return {
'offensive_rating': round(ortg, 1),
'defensive_rating': round(drtg, 1),
'net_rating': round(net_rtg, 1),
'possessions': round(possessions, 0)
}
Contextualizing Net Rating
A player's net rating should be compared to:
- Team average: How does the player compare to team baseline?
- League average: Typically around 0 by definition (points for = points against across league)
- Position average: Centers may have different baseline expectations than guards
Net Rating Limitations
Net rating shares all the limitations of raw plus-minus: - Dependent on teammates - Affected by opponent quality - Subject to noise in small samples - Confounded by role and usage
10.6 Why Raw Plus-Minus Is Noisy
The Fundamental Problem
Raw plus-minus and on/off statistics are inherently noisy measures of individual player value. This section explores why.
Source 1: Small Sample Sizes
Consider a starter who plays 34 minutes per game for 82 games: - Total minutes: 2,788 - Approximate possessions: ~3,100
While 3,000+ possessions seems substantial, scoring is highly variable. A team averaging 110 points per 100 possessions has a standard deviation of roughly 12-15 points per 100 possessions game-to-game.
The math of sample size:
Standard error of net rating estimate: $$SE = \frac{\sigma}{\sqrt{n}}$$
Where $\sigma$ is the standard deviation of net rating and $n$ is the number of independent observations (possessions or games).
For a single game (~100 possessions), even a true +10 net rating team might easily have a -5 game due to variance.
Source 2: Lineup Confounding
Players don't appear in random combinations. Starters play with starters; bench players play with bench players. This creates confounding:
Example: - Player A only plays with elite teammates - Player B only plays with replacement-level teammates
If both have identical true impact, Player A will have better raw plus-minus because their teammates are better.
Source 3: Opponent Quality Variation
Not all minutes are equal: - Some players get "garbage time" minutes against tired benches - Others face opponents' best lineups - Matchup assignments affect difficulty
A player who excels against second units but sits against elite closers might have inflated plus-minus without commensurate value.
Source 4: Score and Time Effects
Game context affects play style: - Teams play differently when ahead vs. behind - Late-game situations have different incentive structures - Clutch time has outsized importance but tiny sample size
If a player primarily plays in blowouts (either direction), their plus-minus may not reflect how they'd perform in competitive situations.
Quantifying the Noise
Research suggests that single-season raw plus-minus has a correlation of roughly 0.30-0.40 with subsequent season values. This means:
- Only ~10-15% of variance carries over year-to-year
- Much of single-season plus-minus is noise, not signal
- Multi-year samples provide more stable estimates
Comparison to Other Metrics
| Metric | Year-to-Year Correlation | Signal Stability |
|---|---|---|
| Points per game | 0.85-0.90 | Very High |
| True Shooting % | 0.60-0.70 | High |
| Assist Rate | 0.80-0.85 | Very High |
| Raw Plus-Minus | 0.30-0.40 | Low |
| Adjusted Plus-Minus | 0.50-0.60 | Moderate |
This demonstrates why raw plus-minus should not be the primary player evaluation tool despite its conceptual appeal.
10.7 Teammate and Opponent Effects
The Core Confound
Basketball is a team sport played against opponents. A player's plus-minus reflects: - Their own contributions - Their teammates' quality - Their opponents' quality - The interaction of all three
Disentangling these effects is the central challenge of plus-minus analysis.
Teammate Quality Bias
Mathematical Illustration:
Let's say Team A has five players with true individual contributions of: - Player 1: +5.0 per 100 possessions - Player 2: +3.0 - Player 3: +1.0 - Player 4: -1.0 - Player 5: -3.0
If they always play together, the team net rating would be +5.0 (+5+3+1-1-3 = +5), and every player would have +5.0 raw plus-minus regardless of their individual contribution.
Raw plus-minus cannot distinguish between the +5.0 player and the -3.0 player in this scenario.
The LeBron James Problem
Elite players like LeBron James demonstrate teammate effect challenges: - LeBron's presence elevates teammate performance - Teammates shoot better due to his gravity and passing - When LeBron sits, those same teammates regress
Should the offensive improvement count as LeBron's contribution or as his teammates' inflated statistics? On/off analysis credits it entirely to LeBron, which may overstate his impact while understating players who benefit from his presence.
Opponent Quality Issues
Scenario Analysis:
Team X has two rotations: - Starters face opponents' starters (better players) - Bench faces opponents' bench (weaker players)
Even if the bench unit has inferior players, they might post similar plus-minus because their opponents are proportionally weaker.
Controlling for Context
Some analysts attempt manual adjustments:
def context_adjusted_plus_minus(player_pm, teammate_quality, opponent_quality,
league_avg_teammate=0, league_avg_opponent=0):
"""
Rough adjustment for teammate and opponent quality.
This is a simplified illustration; real adjustments require
regression-based approaches (covered in Chapter 14).
Parameters:
-----------
player_pm : float
Raw plus-minus per 100 possessions
teammate_quality : float
Average plus-minus of teammates during player's minutes
opponent_quality : float
Average plus-minus of opponents faced
league_avg_teammate : float
League average teammate quality (typically 0)
league_avg_opponent : float
League average opponent quality (typically 0)
Returns:
--------
float : Context-adjusted plus-minus estimate
"""
teammate_adjustment = teammate_quality - league_avg_teammate
opponent_adjustment = opponent_quality - league_avg_opponent
# Subtract teammate boost, add opponent difficulty
adjusted_pm = player_pm - (teammate_adjustment * 0.5) + (opponent_adjustment * 0.5)
return adjusted_pm
This simplistic approach illustrates the concept but lacks the rigor of proper statistical adjustment. Chapter 14 will introduce regression-based methods (RAPM) that address these issues more systematically.
Interaction Effects
Beyond simple additive effects, player combinations create synergies: - Pick-and-roll duos - Shooting around a dominant post player - Defensive switching schemes requiring specific personnel
These interactions mean that player value is partially context-dependent—a player might be +3.0 with one set of teammates and +1.0 with another, even against identical opponents.
10.8 Sample Size Considerations
The Stability Threshold
How many minutes or possessions do we need for reliable plus-minus estimates?
Research findings suggest: - 500 minutes: Extremely noisy, not reliable for individual comparison - 1,000 minutes: Still substantial noise, broad conclusions only - 2,000 minutes: Moderate reliability, clear patterns emerge - 3,000+ minutes: Reasonable stability, suitable for analysis
For lineup analysis, thresholds are even more demanding because five-player combinations have less data.
Minimum Thresholds by Analysis Type
| Analysis Type | Minimum Sample | Ideal Sample |
|---|---|---|
| Individual On/Off | 500 min | 2,000+ min |
| Two-man combinations | 300 min | 1,000+ min |
| Five-man lineups | 100 min | 500+ min |
| Player vs. specific opponent | 100 min | 300+ min |
The Lineup Sample Size Problem
With five-man lineups, sample sizes become critically limited:
def lineup_sample_analysis(lineup_minutes, net_rating, confidence_level=0.95):
"""
Estimate confidence interval for lineup net rating.
Parameters:
-----------
lineup_minutes : float
Minutes played by this five-man unit
net_rating : float
Observed net rating per 100 possessions
confidence_level : float
Desired confidence level (default 0.95)
Returns:
--------
dict : Net rating with confidence interval
"""
from scipy import stats
import numpy as np
# Approximate possessions (using ~2.0 possessions per minute)
possessions = lineup_minutes * 2.0
# Estimated standard deviation of net rating
# (Empirically, about 35-40 points per 100 possessions for single games)
# This decreases with sqrt of sample size
se_per_100 = 37 / np.sqrt(possessions / 100)
# Z-score for confidence level
z = stats.norm.ppf((1 + confidence_level) / 2)
margin = z * se_per_100
return {
'net_rating': net_rating,
'lower_bound': round(net_rating - margin, 1),
'upper_bound': round(net_rating + margin, 1),
'margin_of_error': round(margin, 1),
'possessions': round(possessions, 0)
}
# Example: Lineup with 200 minutes and +15 net rating
result = lineup_sample_analysis(200, 15.0)
# Might return: {'net_rating': 15.0, 'lower_bound': 3.5, 'upper_bound': 26.5, ...}
A lineup with 200 minutes and a +15 net rating might have a 95% confidence interval of roughly +3 to +27—too wide for confident conclusions.
Multi-Year Analysis
To improve reliability, analysts often aggregate across seasons:
Advantages: - Larger samples reduce noise - Career plus-minus more stable than single-season
Disadvantages: - Players improve and decline over time - Team context changes (new teammates, systems) - Older data may not represent current ability
A common compromise: Use 2-3 year rolling windows with recency weighting.
Bayesian Approaches
When sample sizes are small, Bayesian methods can incorporate prior information:
def bayesian_plus_minus_estimate(observed_pm, sample_size,
prior_mean=0, prior_strength=500):
"""
Bayesian estimate of true plus-minus using conjugate prior.
Parameters:
-----------
observed_pm : float
Observed plus-minus per 100 possessions
sample_size : float
Number of possessions observed
prior_mean : float
Prior expectation (typically league average = 0)
prior_strength : float
Effective sample size of prior (higher = more conservative)
Returns:
--------
float : Posterior estimate (shrunk toward prior)
"""
# Weighted average of prior and observed
total_weight = sample_size + prior_strength
posterior = (
(prior_strength * prior_mean + sample_size * observed_pm) /
total_weight
)
return round(posterior, 2)
# Example: Player with +8.0 in 1000 possessions
# With prior_strength=500, estimate shrinks toward 0
estimate = bayesian_plus_minus_estimate(8.0, 1000, prior_mean=0, prior_strength=500)
# Returns approximately +5.3 (shrunk from +8.0 toward 0)
This approach acknowledges that extreme observed values likely contain substantial noise and "regresses to the mean."
10.9 Introduction to Adjustment Methods
The Need for Adjustment
We have established that raw plus-minus suffers from: 1. Teammate quality confounding 2. Opponent quality confounding 3. Sample size noise 4. Lineup interaction effects
Adjusted plus-minus methods attempt to isolate individual contributions by statistically controlling for these factors.
The Basic Intuition
Imagine we could construct a mathematical model:
$$\text{Team Net Rating} = \sum_{i \in \text{teammates}} \text{Contribution}_i - \sum_{j \in \text{opponents}} \text{Contribution}_j + \epsilon$$
If we observe thousands of different lineup combinations, we can use regression to estimate each player's individual contribution while controlling for who else is on the court.
Adjusted Plus-Minus (APM) Overview
Adjusted Plus-Minus (APM) uses regression to estimate individual contributions:
- Each possession or time segment is a data point
- Predictors: indicator variables for each player (on court or not)
- Outcome: point differential for that segment
- Coefficients: estimated player contributions
def apm_regression_setup(game_segments):
"""
Illustrate APM regression data structure.
Parameters:
-----------
game_segments : list of dict
Each segment contains:
- home_players: list of 5 player IDs
- away_players: list of 5 player IDs
- home_margin: point differential (home perspective)
- possessions: number of possessions in segment
Returns:
--------
X, y : Design matrix and target vector for regression
"""
import numpy as np
# Collect all unique player IDs
all_players = set()
for seg in game_segments:
all_players.update(seg['home_players'])
all_players.update(seg['away_players'])
player_list = sorted(all_players)
player_to_idx = {p: i for i, p in enumerate(player_list)}
n_players = len(player_list)
# Build design matrix
X = np.zeros((len(game_segments), n_players))
y = np.zeros(len(game_segments))
weights = np.zeros(len(game_segments))
for i, seg in enumerate(game_segments):
# Home players get +1, away players get -1
for p in seg['home_players']:
X[i, player_to_idx[p]] = 1
for p in seg['away_players']:
X[i, player_to_idx[p]] = -1
# Target: margin per 100 possessions
y[i] = (seg['home_margin'] / seg['possessions']) * 100
weights[i] = seg['possessions']
return X, y, weights, player_list
From APM to RAPM
Basic APM suffers from multicollinearity—players who always play together have perfectly correlated indicators, making individual effects unidentifiable.
Regularized Adjusted Plus-Minus (RAPM) addresses this through: - Ridge regression (L2 penalty) - Prior information (shrinkage toward zero or expected value) - Multi-year data pooling
Chapter 14 will cover RAPM methodology in detail.
The Adjustment Landscape
Modern adjusted plus-minus variants include:
| Method | Key Feature | Chapter |
|---|---|---|
| APM | Basic regression adjustment | 14 |
| RAPM | Ridge regularization | 14 |
| RPM | ESPN's prior-informed version | 14 |
| RAPTOR | FiveThirtyEight's hybrid approach | 15 |
| EPM | Dunks & Threes tracking-informed | 15 |
| LEBRON | BBall-Index's comprehensive model | 15 |
All these methods build on the foundational concepts of this chapter—understanding team performance with and without specific players on the court.
What Adjustments Accomplish
Well-designed adjustments achieve: 1. Deconfounding: Separate player contribution from teammate/opponent effects 2. Noise reduction: Regularization reduces extreme values 3. Stability: Adjusted metrics have higher year-to-year correlation 4. Interpretability: Coefficients approximate points added per 100 possessions
Limitations Remain
Even sophisticated adjustments cannot fully solve: - Role specificity: A player's value depends on their role - System effects: Some players fit certain systems better - Injury/fatigue: Performance varies within seasons - Playoff adjustments: Regular season patterns may not transfer
10.10 Practical Plus-Minus Analysis
Building a Complete Analysis Pipeline
Let's construct a full plus-minus analysis workflow using Python:
"""
Complete Plus-Minus Analysis Module
Chapter 10: Plus-Minus and On/Off Analysis
"""
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
@dataclass
class PlayerOnOffStats:
"""Container for player on/off statistics."""
player_id: str
player_name: str
minutes_on: float
minutes_off: float
team_pts_on: int
team_pts_off: int
opp_pts_on: int
opp_pts_off: int
possessions_on: float
possessions_off: float
class PlusMinusAnalyzer:
"""
Comprehensive plus-minus and on/off analysis toolkit.
"""
def __init__(self, play_by_play_data: pd.DataFrame):
"""
Initialize with play-by-play data.
Expected columns:
- game_id, period, time_remaining
- home_players (list of 5), away_players (list of 5)
- event_type, points_scored, scoring_team
"""
self.pbp = play_by_play_data
self._preprocess_data()
def _preprocess_data(self):
"""Convert raw PBP to possession-level summaries."""
# Implementation would parse play-by-play into segments
# with known lineups and point differentials
pass
def calculate_raw_plus_minus(self, player_id: str,
game_id: Optional[str] = None) -> Dict:
"""
Calculate raw plus-minus for a player.
Parameters:
-----------
player_id : str
Player identifier
game_id : str, optional
Specific game or None for season total
Returns:
--------
dict : Plus-minus statistics
"""
# Filter data for this player's on-court time
if game_id:
data = self.pbp[self.pbp['game_id'] == game_id]
else:
data = self.pbp
# Identify possessions where player was on court
player_on_mask = data.apply(
lambda x: player_id in x['home_players'] or
player_id in x['away_players'], axis=1
)
on_court_data = data[player_on_mask]
# Calculate plus-minus
# Determine if player was home or away for each segment
# Sum point differentials from their perspective
# Placeholder calculation
plus_minus = 0
minutes = 0
return {
'player_id': player_id,
'plus_minus': plus_minus,
'minutes': minutes,
'games': len(on_court_data['game_id'].unique()) if not game_id else 1
}
def calculate_on_off_splits(self, player_id: str) -> Dict:
"""
Calculate comprehensive on/off splits for a player.
Returns offensive rating, defensive rating, and net rating
for both on-court and off-court periods.
"""
# Get on-court and off-court statistics
on_stats = self._get_on_court_stats(player_id)
off_stats = self._get_off_court_stats(player_id)
# Calculate ratings
ortg_on = (on_stats['team_pts'] / on_stats['possessions']) * 100
drtg_on = (on_stats['opp_pts'] / on_stats['possessions']) * 100
ortg_off = (off_stats['team_pts'] / off_stats['possessions']) * 100
drtg_off = (off_stats['opp_pts'] / off_stats['possessions']) * 100
return {
'player_id': player_id,
'on_court': {
'minutes': on_stats['minutes'],
'possessions': on_stats['possessions'],
'ortg': round(ortg_on, 1),
'drtg': round(drtg_on, 1),
'net_rtg': round(ortg_on - drtg_on, 1)
},
'off_court': {
'minutes': off_stats['minutes'],
'possessions': off_stats['possessions'],
'ortg': round(ortg_off, 1),
'drtg': round(drtg_off, 1),
'net_rtg': round(ortg_off - drtg_off, 1)
},
'differentials': {
'ortg_diff': round(ortg_on - ortg_off, 1),
'drtg_diff': round(drtg_on - drtg_off, 1),
'net_diff': round((ortg_on - drtg_on) - (ortg_off - drtg_off), 1)
}
}
def _get_on_court_stats(self, player_id: str) -> Dict:
"""Extract statistics when player is on court."""
# Implementation details
pass
def _get_off_court_stats(self, player_id: str) -> Dict:
"""Extract statistics when player is off court."""
# Implementation details
pass
def analyze_lineup(self, player_ids: List[str]) -> Dict:
"""
Analyze a specific five-player lineup.
Parameters:
-----------
player_ids : list
Exactly 5 player identifiers
Returns:
--------
dict : Lineup performance statistics
"""
if len(player_ids) != 5:
raise ValueError("Must provide exactly 5 player IDs")
player_set = set(player_ids)
# Find all possessions with this exact lineup
lineup_mask = self.pbp.apply(
lambda x: set(x['home_players']) == player_set or
set(x['away_players']) == player_set, axis=1
)
lineup_data = self.pbp[lineup_mask]
if len(lineup_data) == 0:
return {'error': 'Lineup never played together'}
# Calculate lineup statistics
# ...
return {
'players': player_ids,
'minutes': 0, # Placeholder
'possessions': 0,
'ortg': 0.0,
'drtg': 0.0,
'net_rtg': 0.0
}
def find_best_lineups(self, min_minutes: int = 100,
top_n: int = 10) -> pd.DataFrame:
"""
Find the best-performing lineups by net rating.
Parameters:
-----------
min_minutes : int
Minimum minutes threshold
top_n : int
Number of lineups to return
Returns:
--------
DataFrame : Top lineups sorted by net rating
"""
# Extract all unique lineups
all_lineups = self._extract_unique_lineups()
results = []
for lineup in all_lineups:
stats = self.analyze_lineup(lineup)
if stats.get('minutes', 0) >= min_minutes:
results.append(stats)
df = pd.DataFrame(results)
return df.nlargest(top_n, 'net_rtg')
def _extract_unique_lineups(self) -> List[List[str]]:
"""Get all unique five-player combinations from data."""
# Implementation
pass
def calculate_two_man_stats(self, player1_id: str,
player2_id: str) -> Dict:
"""
Calculate statistics for a two-player combination.
Useful for analyzing pick-and-roll pairs, backcourt duos, etc.
"""
# Find possessions where both players were on court
both_on_mask = self.pbp.apply(
lambda x: (player1_id in x['home_players'] and
player2_id in x['home_players']) or
(player1_id in x['away_players'] and
player2_id in x['away_players']), axis=1
)
# Calculate combined statistics
# ...
return {
'players': [player1_id, player2_id],
'minutes_together': 0,
'net_rtg_together': 0.0,
'net_rtg_only_p1': 0.0, # P1 on, P2 off
'net_rtg_only_p2': 0.0, # P2 on, P1 off
'net_rtg_neither': 0.0 # Both off
}
def possession_estimator(fga: int, oreb: int, tov: int, fta: int) -> float:
"""
Estimate possessions using the standard formula.
Parameters:
-----------
fga : int
Field goal attempts
oreb : int
Offensive rebounds
tov : int
Turnovers
fta : int
Free throw attempts
Returns:
--------
float : Estimated possessions
"""
return fga - oreb + tov + 0.44 * fta
def net_rating_confidence_interval(net_rtg: float,
possessions: float,
confidence: float = 0.95) -> Tuple[float, float]:
"""
Calculate confidence interval for observed net rating.
Parameters:
-----------
net_rtg : float
Observed net rating per 100 possessions
possessions : float
Number of possessions observed
confidence : float
Confidence level (default 0.95)
Returns:
--------
tuple : (lower_bound, upper_bound)
"""
from scipy import stats
# Empirical standard deviation of net rating ~35-40 per 100 possessions
sigma = 37
# Standard error decreases with sqrt of sample size
se = sigma / np.sqrt(possessions / 100)
# Z-score for confidence level
z = stats.norm.ppf((1 + confidence) / 2)
margin = z * se
return (round(net_rtg - margin, 1), round(net_rtg + margin, 1))
Interpreting Results
When analyzing plus-minus data, always consider:
- Sample size: Is there enough data to trust the numbers?
- Context: Who were the teammates and opponents?
- Role: Does the player have meaningful minutes?
- Trend: Is this consistent with other evidence?
Red Flags in Plus-Minus Analysis
Watch for these warning signs:
- Extreme values with small samples: A +25 net rating in 150 minutes is noise
- Contradictory evidence: Great plus-minus but poor box score stats (or vice versa)
- Garbage time inflation: High plus-minus driven by blowout minutes
- Unsustainable teammate shooting: On-court three-point percentage far above teammates' averages
10.11 Plus-Minus in Different Contexts
Regular Season vs. Playoffs
Plus-minus dynamics change in playoffs: - Rotations tighten (fewer players, more minutes for starters) - Opponent quality increases - Game preparation intensifies - Sample sizes shrink dramatically
A player's playoff plus-minus may differ substantially from regular season due to: - Different role/minutes - Facing elite opponents only - Higher variance in short series
Clutch Time Analysis
Plus-minus in "clutch" situations (typically defined as final 5 minutes, score within 5 points) receives special attention:
def clutch_plus_minus(player_data, clutch_definition='standard'):
"""
Calculate plus-minus specifically in clutch situations.
Parameters:
-----------
player_data : DataFrame
Play-by-play data with score and time information
clutch_definition : str
'standard' (5 min, 5 pts), 'tight' (2 min, 3 pts),
'expanded' (5 min, 10 pts)
Returns:
--------
dict : Clutch plus-minus statistics
"""
definitions = {
'standard': {'time': 5 * 60, 'margin': 5},
'tight': {'time': 2 * 60, 'margin': 3},
'expanded': {'time': 5 * 60, 'margin': 10}
}
params = definitions.get(clutch_definition, definitions['standard'])
# Filter for clutch situations
clutch_mask = (
(player_data['time_remaining'] <= params['time']) &
(player_data['period'] >= 4) &
(abs(player_data['score_margin']) <= params['margin'])
)
clutch_data = player_data[clutch_mask]
# Calculate plus-minus in these situations
# ...
return {
'clutch_minutes': 0,
'clutch_plus_minus': 0,
'clutch_net_rtg': 0.0,
'definition': clutch_definition
}
Caution: Clutch samples are extremely small. Even over a full season, most players have only 50-100 clutch possessions, making these statistics highly unreliable for individual assessment.
Position-Specific Considerations
Plus-minus interpretation varies by position:
Point Guards: - Often have high offensive on/off due to playmaking influence - May show team-wide effects (assists create teammate scoring)
Centers: - Defensive on/off often most significant - Rim protection creates team-wide defensive improvement
Wings: - More balanced offensive/defensive contributions - Switching ability affects defensive versatility
10.12 Summary and Bridge to Adjusted Metrics
Chapter Summary
This chapter introduced the family of plus-minus metrics:
- Raw Plus-Minus: Simple point differential during player's minutes
- On/Off Splits: Team performance rates with vs. without player
- On/Off Differential: Net rating swing from player presence
- Lineup Analysis: Five-man combination performance
- Net Rating: Per-100-possession efficiency measure
We explored the fundamental limitations: - Small sample sizes create noise - Teammate quality confounds individual measurement - Opponent quality varies non-randomly - Context (score, time, garbage time) affects interpretation
Key Takeaways
- Plus-minus captures outcomes but not necessarily individual contribution
- On/off differential compares to replacement (backup), not league average
- Lineup analysis requires substantial minutes for reliability
- Single-season raw plus-minus has low year-to-year stability
- Adjustment methods (RAPM) attempt to isolate individual effects
Looking Ahead
Part 3 of this textbook addresses the limitations identified here through adjusted plus-minus methods:
- Chapter 14: Regularized Adjusted Plus-Minus (RAPM)
- Chapter 15: Modern all-in-one metrics (RPM, RAPTOR, EPM)
- Chapter 16: Tracking-enhanced impact metrics
These approaches use the foundation established here—measuring team performance with different player combinations—but apply sophisticated statistical techniques to isolate individual contributions from the noise of teammates, opponents, and random variation.
The Analytical Mindset
Raw plus-minus analysis teaches a crucial lesson: outcomes matter, but attribution is hard. A team outscoring opponents by 10 points during a player's minutes is a real result—the challenge is determining how much credit that specific player deserves versus their teammates, luck, and circumstances.
This tension between capturing real outcomes and properly attributing them runs throughout basketball analytics. Plus-minus methods sit at the heart of this challenge, making them essential foundation for any serious analyst's toolkit.
Chapter 10 Python Code Reference
The complete Python module for this chapter is available in code/chapter_10_examples.py. Key functions include:
calculate_raw_plus_minus(): Basic plus-minus calculationcalculate_on_off_splits(): Comprehensive on/off analysisanalyze_lineup(): Five-man lineup statisticspossession_estimator(): Standard possession formulanet_rating_confidence_interval(): Statistical uncertainty quantificationbayesian_plus_minus_estimate(): Shrinkage estimation for small samples
References
-
Oliver, D. (2004). Basketball on Paper: Rules and Tools for Performance Analysis. Potomac Books.
-
Kubatko, J., Oliver, D., Pelton, K., & Rosenbaum, D. T. (2007). A starting point for analyzing basketball statistics. Journal of Quantitative Analysis in Sports, 3(3).
-
Rosenbaum, D. T. (2004). Measuring how NBA players help their teams win. 82games.com.
-
Sill, J. (2010). Improved NBA adjusted +/- using regularization and out-of-sample testing. MIT Sloan Sports Analytics Conference.
-
Engelmann, J. (2017). Possession-based player performance analysis in basketball. MIT Sloan Sports Analytics Conference.
-
Grassetti, L., Bellio, R., Di Gaspero, L., Fonseca, G., & Vidoni, P. (2021). An extended regularized adjusted plus-minus analysis for lineup management in basketball. Journal of Quantitative Analysis in Sports, 17(2), 85-100.