> "Basketball is a game of runs, matchups, and minutes. If you understand all three, you understand the game."
Learning Objectives
- Calculate and apply Dean Oliver's Four Factors of basketball success to predict team win totals and game outcomes
- Evaluate and apply player impact metrics (BPM, RPM, RAPTOR, EPM) for team-level projections
- Quantify the impact of rest, travel, altitude, and schedule density on NBA game outcomes
- Build lineup-based models using net rating data and minutes-weighted projections
- Identify and exploit NBA-specific betting market inefficiencies including total movement, live betting, and player props
In This Chapter
Chapter 16: Modeling the NBA
"Basketball is a game of runs, matchups, and minutes. If you understand all three, you understand the game." --- Daryl Morey (attributed)
The NBA presents a fundamentally different modeling challenge from the NFL. Where the NFL offers 17 regular-season games per team with enormous single-game variance, the NBA provides 82 games with a more predictable relationship between team quality and outcomes. Where the NFL is dominated by the quarterback position, the NBA distributes influence more evenly across five starters and a rotation of eight to ten players. Where the NFL plays once per week with full rest, the NBA grinds through a schedule that demands four to five games per week, creating fatigue effects that move the needle on nightly outcomes.
These structural differences create a distinct set of modeling opportunities. The NBA's large sample size means team-level metrics stabilize quickly, allowing mid-season model updates with real predictive power. The schedule's density creates measurable rest and travel effects that the market does not always fully price. The sport's fluid, possession-based structure lends itself naturally to a decomposition into offensive and defensive efficiency. And the growing player prop market offers a new frontier of opportunities for quantitative bettors who can model individual performance distributions.
This chapter builds on the general modeling framework from Chapters 9 and 10 and the sport-specific modeling approach introduced in Chapter 15 for the NFL. We will adapt those principles to basketball's unique data landscape, metrics, and market dynamics.
In this chapter, you will learn to: - Decompose team quality into the Four Factors and use them to predict wins - Evaluate and apply modern player impact metrics to team-level projections - Quantify how rest, travel, and schedule density affect performance - Build lineup-based models that account for who is actually on the floor - Exploit NBA-specific market patterns in spreads, totals, and player props
16.1 The Four Factors of Basketball Success
Oliver's Framework
In his landmark 2004 book Basketball on Paper, Dean Oliver identified the four statistical factors that most strongly determine a basketball team's success. These "Four Factors" have become the foundation of modern basketball analytics, and they remain the starting point for any serious NBA model. The factors, listed in approximate order of importance, are:
- Effective Field Goal Percentage (eFG%): Measures shooting efficiency, adjusting for the extra value of three-pointers
- Turnover Percentage (TOV%): Measures how often a team turns the ball over per possession
- Offensive Rebound Percentage (OREB%): Measures how often a team recovers its own missed shots
- Free Throw Rate (FT Rate): Measures a team's ability to get to and convert free throws
Each factor is calculated on both offense and defense (i.e., what a team achieves on offense and what it allows on defense), creating eight total inputs for modeling.
The Formulas
Effective Field Goal Percentage:
$$\text{eFG\%} = \frac{\text{FG} + 0.5 \times \text{3P}}{\text{FGA}}$$
The 0.5 multiplier on three-pointers accounts for the fact that a made three is worth 1.5 times a made two. A team that shoots 45% on twos and 35% on threes (with 40% of attempts from three) has:
$$\text{eFG\%} = \frac{(0.60 \times 0.45 \times \text{FGA}) + (0.40 \times 0.35 \times \text{FGA}) + 0.5 \times (0.40 \times 0.35 \times \text{FGA})}{\text{FGA}}$$
Simplifying: if we assume 80 FGA, 48 two-point attempts (making 21.6), 32 three-point attempts (making 11.2), total FG = 32.8, 3P = 11.2:
$$\text{eFG\%} = \frac{32.8 + 0.5 \times 11.2}{80} = \frac{38.4}{80} = 0.480$$
Turnover Percentage:
$$\text{TOV\%} = \frac{\text{TOV}}{\text{FGA} + 0.44 \times \text{FTA} + \text{TOV}}$$
The denominator approximates possessions. The 0.44 coefficient on free throw attempts accounts for the fact that not all free throws end a possession (and-ones, technical fouls, three-shot fouls).
Offensive Rebound Percentage:
$$\text{OREB\%} = \frac{\text{OREB}}{\text{OREB} + \text{Opp DREB}}$$
This measures the percentage of available offensive rebounds that the team captures.
Free Throw Rate:
$$\text{FT Rate} = \frac{\text{FTA}}{\text{FGA}}$$
Some analysts prefer FT/FGA (free throws made per field goal attempted), but the standard definition uses attempts.
Oliver's Weights
Oliver estimated the relative importance of each factor through regression analysis:
| Factor | Approximate Weight |
|---|---|
| eFG% | 40% |
| TOV% | 25% |
| OREB% | 20% |
| FT Rate | 15% |
Shooting efficiency dominates. This is intuitive: basketball is fundamentally about putting the ball in the basket, and eFG% directly measures how efficiently a team does this. Turnovers are the next most important because they forfeit entire possessions. Offensive rebounding creates additional possessions, and free throw rate provides "free" points.
Implementing the Four Factors in Python
# pip install nba_api pandas numpy scikit-learn
from nba_api.stats.endpoints import leaguedashteamstats, teamestimatedmetrics
from nba_api.stats.endpoints import leaguestandings
from nba_api.stats.static import teams
import pandas as pd
import numpy as np
import time
def get_four_factors(season='2024-25', season_type='Regular Season'):
"""
Retrieve and calculate the Four Factors for all NBA teams.
Parameters:
-----------
season : str
NBA season in 'YYYY-YY' format (e.g., '2024-25')
season_type : str
'Regular Season' or 'Playoffs'
Returns:
--------
DataFrame with Four Factors for each team (offense and defense)
"""
# Offensive Four Factors
time.sleep(0.6) # Rate limiting for NBA API
off_stats = leaguedashteamstats.LeagueDashTeamStats(
season=season,
season_type_all_star=season_type,
per_mode_detailed='PerGame'
).get_data_frames()[0]
# Defensive Four Factors (opponent stats)
time.sleep(0.6)
opp_stats = leaguedashteamstats.LeagueDashTeamStats(
season=season,
season_type_all_star=season_type,
per_mode_detailed='PerGame',
measure_type_detailed_defense='Opponent'
).get_data_frames()[0]
# Calculate offensive Four Factors
four_factors = pd.DataFrame()
four_factors['team'] = off_stats['TEAM_NAME']
four_factors['team_id'] = off_stats['TEAM_ID']
four_factors['wins'] = off_stats['W']
four_factors['losses'] = off_stats['L']
four_factors['win_pct'] = off_stats['W_PCT']
# Offensive factors
four_factors['off_efg'] = (
(off_stats['FGM'] + 0.5 * off_stats['FG3M']) / off_stats['FGA']
)
four_factors['off_tov_pct'] = (
off_stats['TOV'] /
(off_stats['FGA'] + 0.44 * off_stats['FTA'] + off_stats['TOV'])
)
four_factors['off_oreb_pct'] = (
off_stats['OREB'] / (off_stats['OREB'] + opp_stats['DREB'])
)
four_factors['off_ft_rate'] = off_stats['FTA'] / off_stats['FGA']
# Defensive factors (opponent's offensive factors = your defensive factors)
four_factors['def_efg'] = (
(opp_stats['OPP_FGM'] + 0.5 * opp_stats['OPP_FG3M']) / opp_stats['OPP_FGA']
if 'OPP_FGM' in opp_stats.columns
else (opp_stats['FGM'] + 0.5 * opp_stats['FG3M']) / opp_stats['FGA']
)
four_factors['def_tov_pct'] = (
opp_stats['TOV'] /
(opp_stats['FGA'] + 0.44 * opp_stats['FTA'] + opp_stats['TOV'])
if 'TOV' in opp_stats.columns else 0.13
)
four_factors['def_ft_rate'] = (
opp_stats['FTA'] / opp_stats['FGA']
if 'FTA' in opp_stats.columns else 0.25
)
return four_factors
# Retrieve Four Factors
ff = get_four_factors(season='2024-25')
print("2024-25 NBA Four Factors (Offensive):")
print(ff[['team', 'win_pct', 'off_efg', 'off_tov_pct',
'off_oreb_pct', 'off_ft_rate']].sort_values(
'win_pct', ascending=False
).to_string(index=False, float_format='%.3f'))
Predicting Wins from the Four Factors
The Four Factors are strongly predictive of team win percentage. We can build a regression model that predicts wins from the eight factors (four offensive, four defensive):
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
def four_factors_win_model(seasons_list):
"""
Build a model predicting win percentage from the Four Factors.
Uses multiple seasons of data for training.
"""
all_data = []
for season in seasons_list:
time.sleep(0.6)
ff = get_four_factors(season=season)
ff['season'] = season
all_data.append(ff)
data = pd.concat(all_data, ignore_index=True)
# Feature columns
feature_cols = [
'off_efg', 'off_tov_pct', 'off_oreb_pct', 'off_ft_rate',
'def_efg', 'def_tov_pct', 'def_ft_rate'
]
X = data[feature_cols].values
y = data['win_pct'].values
model = LinearRegression()
model.fit(X, y)
predictions = model.predict(X)
r2 = r2_score(y, predictions)
print(f"Four Factors Win Model (n={len(data)} team-seasons)")
print(f"R-squared: {r2:.4f}")
print(f"\nFeature Weights:")
for feat, coef in sorted(zip(feature_cols, model.coef_),
key=lambda x: abs(x[1]), reverse=True):
direction = "(higher = more wins)" if coef > 0 else "(lower = more wins)"
print(f" {feat:15s}: {coef:+.4f} {direction}")
print(f" {'Intercept':15s}: {model.intercept_:+.4f}")
return model, feature_cols
# Build the model using 5 seasons
win_model, features = four_factors_win_model(
['2020-21', '2021-22', '2022-23', '2023-24', '2024-25']
)
The R-squared of this model is typically in the range of 0.90 to 0.94, meaning the Four Factors explain over 90% of the variance in team win percentage. This is a remarkably strong relationship and demonstrates that basketball success can be decomposed into a small number of measurable components.
From Win Percentage to Point Differential
For betting purposes, we often need to convert between win percentage and expected point differential. The NBA exhibits a reliable relationship between the two, expressed through the Pythagorean win expectation:
$$\text{Win\%} \approx \frac{\text{PF}^{13.91}}{\text{PF}^{13.91} + \text{PA}^{13.91}}$$
where PF is points for and PA is points against. The exponent of 13.91 (sometimes approximated as 14) was estimated by Daryl Morey and later refined by others. For practical modeling, a simpler linear approximation is often sufficient:
$$\text{Win\%} \approx 0.5 + 0.033 \times \text{MOV}$$
where MOV is average margin of victory. This means each additional point of average margin corresponds to approximately 0.033 in win percentage, or about 2.7 wins over an 82-game season.
Inverting:
$$\text{MOV} \approx \frac{\text{Win\%} - 0.5}{0.033}$$
A team with a .600 win percentage is expected to have a margin of about +3.0 points per game. A team with a .700 win percentage is expected to have a margin of about +6.1 points per game.
def pythagorean_wins(points_for, points_against, games=82, exponent=13.91):
"""Calculate Pythagorean win expectation for an NBA team."""
win_pct = points_for**exponent / (points_for**exponent + points_against**exponent)
return win_pct * games
def margin_to_wins(margin_per_game, games=82):
"""Convert average margin of victory to expected wins (linear approx)."""
win_pct = 0.5 + 0.033 * margin_per_game
return max(0, min(games, win_pct * games))
# Example: converting team ratings to expected wins
print("Margin of Victory to Expected Wins:")
for mov in [-8, -5, -3, -1, 0, 1, 3, 5, 8, 12]:
wins = margin_to_wins(mov)
print(f" MOV {mov:+3d}: {wins:.1f} wins ({wins/82:.3f} win%)")
Pace and Possession-Based Thinking
A critical concept in basketball analytics that does not have a direct NFL analog is pace: the number of possessions a team uses per 48 minutes. Pace matters because:
-
Raw statistics are pace-dependent. A team that plays at 105 possessions per game will score more points, grab more rebounds, and commit more turnovers than a team playing at 95 possessions, even if their per-possession efficiency is identical.
-
Pace affects game totals. Two fast-paced teams meeting will produce more possessions and thus more total points. Two slow teams meeting will produce fewer.
-
Per-possession metrics are more meaningful. Offensive rating (points per 100 possessions) is a better measure of offensive quality than points per game.
The pace formula:
$$\text{Pace} = \frac{48 \times (\text{Poss}_{\text{team}} + \text{Poss}_{\text{opp}})}{2 \times \text{Minutes Played}}$$
where possessions are estimated as:
$$\text{Poss} \approx \text{FGA} + 0.44 \times \text{FTA} - \text{OREB} + \text{TOV}$$
def get_team_pace_and_ratings(season='2024-25'):
"""Get pace, offensive rating, and defensive rating for all teams."""
time.sleep(0.6)
advanced = leaguedashteamstats.LeagueDashTeamStats(
season=season,
season_type_all_star='Regular Season',
measure_type_detailed_defense='Advanced'
).get_data_frames()[0]
ratings = advanced[['TEAM_NAME', 'W', 'L', 'OFF_RATING', 'DEF_RATING',
'NET_RATING', 'PACE']].copy()
ratings.columns = ['team', 'wins', 'losses', 'off_rtg', 'def_rtg',
'net_rtg', 'pace']
ratings = ratings.sort_values('net_rtg', ascending=False)
print(f"2024-25 NBA Team Ratings:")
print(ratings.to_string(index=False, float_format='%.1f'))
return ratings
team_ratings = get_team_pace_and_ratings()
Predicting Game Totals from Pace
The expected number of possessions in a game is a function of both teams' pace preferences. When a fast team meets a slow team, the game pace falls somewhere between their typical rates, weighted toward the slower team (you can slow the game down more easily than you can speed it up):
$$\text{Expected Pace}_{\text{game}} = \frac{\text{Pace}_A + \text{Pace}_B}{2} \times \frac{\text{League Avg Pace}}{\text{League Avg Pace}}$$
The expected total points:
$$\text{Expected Total} = \text{Expected Pace}_{\text{game}} \times \left(\frac{\text{ORtg}_A}{100} + \frac{\text{ORtg}_B}{100}\right) \times \frac{1}{2}$$
More precisely, since each team's offense faces the other's defense:
$$\text{Expected Total} \approx \frac{\text{Pace}_{\text{game}}}{100} \times \left(\frac{\text{ORtg}_A + \text{ORtg}_B + \text{DRtg}_A^{*} + \text{DRtg}_B^{*}}{2} - \text{League Avg Rtg}\right)$$
where $\text{DRtg}^*$ adjustments account for the interaction between each offense and the opposing defense.
def predict_game_total(home_off_rtg, home_def_rtg, home_pace,
away_off_rtg, away_def_rtg, away_pace,
league_avg_off_rtg=112.0, league_avg_pace=100.0):
"""
Predict the total points in an NBA game.
Uses the matchup between each team's offense and the opponent's defense,
adjusted for pace.
"""
# Expected game pace
expected_pace = (home_pace + away_pace) / 2
# Expected efficiency for each team
# Home offense vs away defense
home_expected_rtg = (home_off_rtg + away_def_rtg) / 2
# Away offense vs home defense
away_expected_rtg = (away_off_rtg + home_def_rtg) / 2
# Expected points (scale by pace)
home_points = home_expected_rtg * expected_pace / 100
away_points = away_expected_rtg * expected_pace / 100
total = home_points + away_points
return {
'expected_pace': expected_pace,
'home_expected_points': home_points,
'away_expected_points': away_points,
'predicted_total': total,
'predicted_spread': home_points - away_points
}
# Example: High-powered matchup
result = predict_game_total(
home_off_rtg=118.5, home_def_rtg=110.2, home_pace=101.5,
away_off_rtg=115.0, away_def_rtg=108.5, away_pace=99.8
)
print("Game Prediction Example:")
for k, v in result.items():
print(f" {k}: {v:.1f}")
16.2 Player Impact Metrics
The Challenge of Measuring Individual Contribution
Basketball's continuous, fluid structure makes measuring individual contribution far more challenging than in sports with discrete, position-specific plays. A player's impact includes not just the plays they directly make (shots, assists, rebounds) but also their effect on spacing, defensive attention, screen-setting, help defense, and team chemistry---factors that often do not appear in the box score.
Over the past two decades, basketball analytics has developed increasingly sophisticated metrics to solve this problem. Understanding these metrics is essential for NBA modeling because player availability is the primary source of team strength variation throughout the season. Unlike the NFL, where rosters are relatively stable week-to-week, NBA teams see constant lineup variation from injuries, rest days, load management, and rotation changes.
Box Plus/Minus (BPM)
BPM, developed by Daniel Myers, estimates a player's contribution per 100 possessions relative to league average using only box score statistics. The model regresses adjusted plus/minus data on box score stats to create a formula that can be applied to any player with box score data.
The BPM formula is complex, involving interactions between multiple statistics, but the key inputs are:
$$\text{BPM} = a_1 \cdot \text{TS\%} + a_2 \cdot \text{AST\%} + a_3 \cdot \text{REB\%} + a_4 \cdot \text{STL\%} + a_5 \cdot \text{BLK\%} + a_6 \cdot \text{TOV\%} + a_7 \cdot \text{USG\%} + \ldots + \text{Position Adjustment}$$
BPM is expressed as points per 100 possessions relative to average (0.0). A BPM of +5.0 means the player's team is expected to outscore opponents by 5 more points per 100 possessions than average when that player is on the court. Elite players typically have BPM values of +6 to +10. Role players cluster around 0, and replacement-level players are around -2.0.
Strengths: Publicly available, transparent formula, applicable to historical data. Weaknesses: Cannot capture off-ball defense, spacing effects, or other "hidden" contributions.
Real Plus/Minus (RPM) and RAPTOR
RPM (Real Plus/Minus), developed by Jeremias Engelmann and Steve Ilardi for ESPN, uses a regularized adjusted plus/minus framework. It starts with raw plus/minus data (how much a team outscored opponents when a player was on the court), then applies ridge regression with box-score-based priors to stabilize the estimates.
The mathematical framework:
$$\text{margin}_{s} = \sum_{i \in \text{home}_{s}} \beta_i - \sum_{j \in \text{away}_{s}} \beta_j + \text{HCA} + \epsilon_s$$
where $s$ indexes each "stint" (continuous stretch with the same 10 players on the floor), $\beta_i$ is player $i$'s impact, and HCA is home court advantage. The ridge regression prior:
$$\beta_i \sim N(\text{BPM Prior}_i, \sigma^2)$$
This Bayesian-like approach uses box score expectations as a starting point and lets the on-court data pull the estimate away from the prior when there is sufficient evidence.
RAPTOR, developed by FiveThirtyEight (now owned by ABC News), follows a similar approach but uses a hybrid of box score metrics and player tracking data to build priors. RAPTOR separates offensive and defensive contributions:
$$\text{RAPTOR}_{\text{total}} = \text{RAPTOR}_{\text{offense}} + \text{RAPTOR}_{\text{defense}}$$
Estimated Plus-Minus (EPM)
EPM, developed by Taylor Snarr and published by Dunks & Threes, represents the current state of the art in publicly available player impact metrics. It uses a similar regularized adjusted plus/minus framework as RPM but incorporates more sophisticated priors built from tracking data and play-by-play event data.
EPM is expressed in the same units as BPM and RPM: points per 100 possessions relative to average. As of 2024-25, the top EPM players typically include the expected MVP candidates, providing face-validity for the metric.
Accessing Player Metrics in Python
from nba_api.stats.endpoints import leaguedashplayerstats, playerestimatedmetrics
from nba_api.stats.endpoints import playerdashboardbygeneralsplits
def get_player_advanced_stats(season='2024-25', min_minutes=500):
"""
Get advanced player stats including BPM-related metrics from the NBA API.
"""
time.sleep(0.6)
stats = leaguedashplayerstats.LeagueDashPlayerStats(
season=season,
season_type_all_star='Regular Season',
per_mode_detailed='PerGame'
).get_data_frames()[0]
# Filter by minutes
stats['TOTAL_MIN'] = stats['MIN'] * stats['GP']
stats = stats[stats['TOTAL_MIN'] >= min_minutes].copy()
# Calculate some derived metrics
stats['TS_PCT'] = stats['PTS'] / (2 * (stats['FGA'] + 0.44 * stats['FTA']))
stats['EFG_PCT'] = (stats['FGM'] + 0.5 * stats['FG3M']) / stats['FGA']
# Get estimated metrics (includes offensive/defensive ratings)
time.sleep(0.6)
estimated = playerestimatedmetrics.PlayerEstimatedMetrics(
season=season,
season_type='Regular Season'
).get_data_frames()[0]
# Merge
merged = stats.merge(
estimated[['PLAYER_ID', 'E_OFF_RATING', 'E_DEF_RATING', 'E_NET_RATING',
'E_PACE', 'E_USG_PCT']],
on='PLAYER_ID',
how='left'
)
merged = merged.sort_values('E_NET_RATING', ascending=False)
return merged
players = get_player_advanced_stats(season='2024-25')
print("Top 20 Players by Estimated Net Rating (2024-25):")
display_cols = ['PLAYER_NAME', 'TEAM_ABBREVIATION', 'GP', 'MIN', 'PTS',
'TS_PCT', 'E_OFF_RATING', 'E_DEF_RATING', 'E_NET_RATING']
print(players[display_cols].head(20).to_string(index=False, float_format='%.1f'))
From Player Metrics to Team Projections
The key application of player impact metrics for betting is projecting how a team will perform on a given night, accounting for the specific players available. The basic framework:
$$\text{Team Rating}_{\text{tonight}} = \sum_{i=1}^{N} \text{Player Impact}_i \times \frac{\text{Expected Minutes}_i}{48} + \text{Baseline}$$
where the sum is over the $N$ players expected to play, and the baseline accounts for the average replacement-level contributions.
def project_team_strength(player_impacts, expected_minutes, baseline_rtg=100.0):
"""
Project team strength for a specific game based on
available players and their expected minutes.
Parameters:
-----------
player_impacts : dict
{player_name: impact_per_100_possessions}
expected_minutes : dict
{player_name: expected_minutes_tonight}
baseline_rtg : float
League average rating (typically ~100 for net rating)
Returns:
--------
Projected net rating for the team tonight
"""
total_impact = 0
total_minutes = 0
for player, minutes in expected_minutes.items():
impact = player_impacts.get(player, -2.0) # Replacement level default
total_impact += impact * (minutes / 48)
total_minutes += minutes
# Normalize: 240 total player-minutes per game (5 * 48)
if total_minutes > 0:
projected_net_rtg = total_impact * (240 / total_minutes)
else:
projected_net_rtg = 0
return projected_net_rtg
# Example: Projecting a team with and without their star
full_strength = {
'Star Player': 8.5,
'Second Star': 4.2,
'Starter 3': 1.5,
'Starter 4': 0.8,
'Starter 5': -0.5,
'Bench 1': -1.0,
'Bench 2': -1.5,
'Bench 3': -2.5,
}
full_minutes = {
'Star Player': 36,
'Second Star': 34,
'Starter 3': 30,
'Starter 4': 28,
'Starter 5': 26,
'Bench 1': 22,
'Bench 2': 18,
'Bench 3': 14,
}
# Without star player (minutes redistributed)
no_star_minutes = {
'Second Star': 36,
'Starter 3': 33,
'Starter 4': 32,
'Starter 5': 30,
'Bench 1': 26,
'Bench 2': 24,
'Bench 3': 20,
'Bench 4': 12, # Deep bench guy, not in regular rotation
}
no_star_impacts = {**full_strength, 'Bench 4': -3.5}
rating_full = project_team_strength(full_strength, full_minutes)
rating_no_star = project_team_strength(no_star_impacts, no_star_minutes)
print(f"Projected Net Rating (full strength): {rating_full:+.1f}")
print(f"Projected Net Rating (no star): {rating_no_star:+.1f}")
print(f"Difference: {rating_full - rating_no_star:.1f} points per 100 possessions")
print(f"Estimated spread impact: {(rating_full - rating_no_star) * 1.0:.1f} points")
# Note: 1 point of net rating ≈ 1 point of game margin for a 100-pace game
Handling Mid-Season Player Trades
One of the NBA's unique modeling challenges is the mid-season trade deadline. When a significant player changes teams, both the acquiring and losing teams change in strength. A robust model must update player availability in near-real-time:
def update_team_roster(team, date, roster_changes):
"""
Update team projection based on roster changes (trades, injuries, returns).
Parameters:
-----------
roster_changes : list of dicts
Each dict: {'player': str, 'action': 'added'|'removed',
'impact': float, 'expected_minutes': float}
"""
total_impact_change = 0
for change in roster_changes:
if change['action'] == 'added':
impact = change['impact'] * (change['expected_minutes'] / 48)
total_impact_change += impact
print(f" + {change['player']}: {change['impact']:+.1f} impact, "
f"{change['expected_minutes']} min -> {impact:+.2f} contribution")
elif change['action'] == 'removed':
# Losing a player: subtract their contribution, add replacement level
impact = -change['impact'] * (change['expected_minutes'] / 48)
replacement = -2.0 * (change['expected_minutes'] / 48)
total_impact_change += impact + replacement
print(f" - {change['player']}: lost {change['impact']:+.1f} impact, "
f"replaced at -2.0 -> {impact + replacement:+.2f} net change")
print(f" Net team strength change: {total_impact_change:+.2f} per game")
return total_impact_change
# Example: Mid-season trade
print("Trade Impact Analysis:")
print("\nTeam A acquires:")
team_a_change = update_team_roster('Team A', '2025-02-06', [
{'player': 'Star Wing', 'action': 'added', 'impact': 3.5, 'expected_minutes': 32},
{'player': 'Role Player', 'action': 'removed', 'impact': -0.5, 'expected_minutes': 22},
])
print("\nTeam B acquires:")
team_b_change = update_team_roster('Team B', '2025-02-06', [
{'player': 'Star Wing', 'action': 'removed', 'impact': 3.5, 'expected_minutes': 32},
{'player': 'Role Player', 'action': 'added', 'impact': -0.5, 'expected_minutes': 22},
{'player': 'Draft Pick', 'action': 'added', 'impact': -3.0, 'expected_minutes': 15},
])
16.3 Rest, Travel, and Schedule Effects
The Back-to-Back Effect
The NBA's compressed schedule creates one of the most well-documented edges in sports betting: the back-to-back effect. Teams playing the second game of a back-to-back (B2B) historically perform worse than expected. The magnitude of this effect is:
- Second game of B2B vs. rested opponent: approximately -2.5 to -4.0 points of impact
- Second game of B2B vs. another B2B team: approximately neutral (both teams fatigued)
- Third game in four nights: approximately -1.5 to -2.5 points
The effect is driven by physical fatigue, reduced practice time, and load management (star players resting). Modern NBA teams have become increasingly aggressive about resting players on B2Bs, which partially mitigates the fatigue effect but introduces a different source of uncertainty for bettors: will the star play?
def analyze_b2b_effect(seasons=['2022-23', '2023-24', '2024-25']):
"""
Analyze the performance impact of back-to-back games.
Uses NBA API schedule data to identify B2B situations
and compare performance.
"""
from nba_api.stats.endpoints import leaguegamelog
import datetime
all_games = []
for season in seasons:
time.sleep(0.6)
gamelog = leaguegamelog.LeagueGameLog(
season=season,
season_type_all_star='Regular Season'
).get_data_frames()[0]
gamelog['season'] = season
all_games.append(gamelog)
games = pd.concat(all_games, ignore_index=True)
games['GAME_DATE'] = pd.to_datetime(games['GAME_DATE'])
games['MARGIN'] = games['PLUS_MINUS']
# Identify back-to-back games
games = games.sort_values(['TEAM_ID', 'GAME_DATE'])
games['PREV_GAME_DATE'] = games.groupby('TEAM_ID')['GAME_DATE'].shift(1)
games['DAYS_REST'] = (games['GAME_DATE'] - games['PREV_GAME_DATE']).dt.days
games['IS_B2B'] = (games['DAYS_REST'] == 1).astype(int)
# Performance by rest days
print("NBA Performance by Days of Rest:")
print(f"{'Rest Days':>10} | {'Games':>6} | {'Avg Margin':>11} | {'Win%':>6}")
print("-" * 45)
for rest in [1, 2, 3, 4, 5]:
subset = games[games['DAYS_REST'] == rest]
if len(subset) > 50:
avg_margin = subset['MARGIN'].mean()
win_pct = (subset['WL'] == 'W').mean()
print(f"{rest:>10} | {len(subset):>6} | {avg_margin:>+10.2f} | {win_pct:>5.1%}")
# B2B vs non-B2B detailed breakdown
b2b = games[games['IS_B2B'] == 1]
rested = games[games['DAYS_REST'] >= 2]
print(f"\nBack-to-Back Performance:")
print(f" B2B games: {len(b2b):>5} games, avg margin: {b2b['MARGIN'].mean():+.2f}")
print(f" Rested games: {len(rested):>5} games, avg margin: {rested['MARGIN'].mean():+.2f}")
print(f" B2B penalty: {b2b['MARGIN'].mean() - rested['MARGIN'].mean():.2f} points")
return games
rest_data = analyze_b2b_effect()
Travel Distance Impact
Not all back-to-backs are created equal. A team playing at home on consecutive nights faces less fatigue than a team that played in Los Angeles last night and flew overnight to Boston. Travel distance, direction (east-to-west is different from west-to-east due to circadian rhythms), and time zone changes all contribute to fatigue.
# NBA arena coordinates (latitude, longitude)
ARENA_COORDS = {
'ATL': (33.757, -84.396), 'BOS': (42.366, -71.062),
'BKN': (40.683, -73.975), 'CHA': (35.225, -80.839),
'CHI': (41.881, -87.674), 'CLE': (41.496, -81.688),
'DAL': (32.790, -96.810), 'DEN': (39.749, -105.008),
'DET': (42.341, -83.055), 'GSW': (37.768, -122.388),
'HOU': (29.751, -95.362), 'IND': (39.764, -86.156),
'LAC': (34.043, -118.267), 'LAL': (34.043, -118.267),
'MEM': (35.138, -90.051), 'MIA': (25.781, -80.187),
'MIL': (43.045, -87.917), 'MIN': (44.980, -93.276),
'NOP': (29.949, -90.082), 'NYK': (40.751, -73.994),
'OKC': (35.463, -97.515), 'ORL': (28.539, -81.384),
'PHI': (39.901, -75.172), 'PHX': (33.446, -112.071),
'POR': (45.532, -122.667), 'SAC': (38.580, -121.500),
'SAS': (29.427, -98.438), 'TOR': (43.643, -79.379),
'UTA': (40.768, -111.901), 'WAS': (38.898, -77.021),
}
def haversine_distance(coord1, coord2):
"""Calculate distance between two coordinates in miles."""
from math import radians, cos, sin, asin, sqrt
lat1, lon1 = radians(coord1[0]), radians(coord1[1])
lat2, lon2 = radians(coord2[0]), radians(coord2[1])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Earth radius in miles
return c * r
def calculate_travel_impact(team_from, team_to, is_b2b=False):
"""
Estimate the performance impact of travel.
Returns estimated point penalty (negative = worse performance).
"""
if team_from not in ARENA_COORDS or team_to not in ARENA_COORDS:
return 0
distance = haversine_distance(ARENA_COORDS[team_from], ARENA_COORDS[team_to])
tz_change = abs(ARENA_COORDS[team_from][1] - ARENA_COORDS[team_to][1]) / 15
# Base travel impact
if distance < 500:
travel_penalty = -0.2
elif distance < 1000:
travel_penalty = -0.5
elif distance < 2000:
travel_penalty = -0.8
else:
travel_penalty = -1.2
# Time zone penalty
if tz_change >= 3:
travel_penalty -= 0.5
elif tz_change >= 2:
travel_penalty -= 0.3
# B2B amplifier
if is_b2b:
travel_penalty *= 1.5
return travel_penalty, distance
# Example travel impact calculations
print("Travel Impact Examples:")
print(f"{'Route':30s} | {'Distance':>8s} | {'Impact':>7s} | {'B2B Impact':>10s}")
print("-" * 65)
routes = [
('LAL', 'LAC', 'LA to LA (same arena)'),
('NYK', 'BKN', 'NY to Brooklyn'),
('BOS', 'PHI', 'Boston to Philly'),
('MIA', 'MIN', 'Miami to Minnesota'),
('LAL', 'BOS', 'LA to Boston'),
('POR', 'MIA', 'Portland to Miami'),
]
for team_from, team_to, label in routes:
impact, dist = calculate_travel_impact(team_from, team_to, is_b2b=False)
impact_b2b, _ = calculate_travel_impact(team_from, team_to, is_b2b=True)
print(f"{label:30s} | {dist:>7.0f}mi | {impact:>+6.2f} | {impact_b2b:>+9.2f}")
Altitude Effects
The Denver Nuggets play at 5,280 feet above sea level, creating a measurable home-court advantage above and beyond the typical NBA HCA. Visiting teams, particularly those unaccustomed to altitude, experience reduced aerobic capacity that affects their performance, especially in the second half when fatigue accumulates.
Historical analysis shows:
- Nuggets' home margin vs. expectation: approximately +1.5 to +2.5 additional points beyond normal HCA
- Utah Jazz (4,327 feet): approximately +0.5 to +1.0 additional points
- Effect concentrated in the second half: The altitude effect doubles in the second half compared to the first
def analyze_altitude_effect(games_df):
"""
Analyze the home court advantage for high-altitude teams.
"""
# Altitude of NBA arenas (feet above sea level)
altitudes = {
'DEN': 5280, 'UTA': 4327, 'PHX': 1086, 'OKC': 1201,
'SAC': 30, 'GSW': 5, 'LAL': 340, 'LAC': 340,
'POR': 50, 'MIN': 815, 'MIL': 617, 'CHI': 597,
'IND': 715, 'CLE': 653, 'DET': 600, 'TOR': 249,
'BOS': 20, 'NYK': 33, 'BKN': 30, 'PHI': 39,
'WAS': 25, 'ATL': 1050, 'MIA': 6, 'ORL': 82,
'CHA': 751, 'NOP': -2, 'MEM': 337, 'SAS': 650,
'DAL': 430, 'HOU': 80,
}
games_home = games_df[games_df['MATCHUP'].str.contains('vs.')].copy()
# Add altitude information
games_home['altitude'] = games_home['TEAM_ABBREVIATION'].map(altitudes)
games_home['high_altitude'] = games_home['altitude'] > 3000
high_alt = games_home[games_home['high_altitude']]
low_alt = games_home[~games_home['high_altitude']]
print("Altitude Effect Analysis:")
print(f" High altitude home games (DEN, UTA): {len(high_alt)} games")
print(f" Average margin: {high_alt['MARGIN'].mean():+.2f}")
print(f" Low altitude home games: {len(low_alt)} games")
print(f" Average margin: {low_alt['MARGIN'].mean():+.2f}")
print(f" Altitude premium: {high_alt['MARGIN'].mean() - low_alt['MARGIN'].mean():.2f} pts")
return altitudes
# Analyze altitude with our rest data
altitude_analysis = analyze_altitude_effect(rest_data)
Schedule Density and Fatigue Modeling
Beyond simple back-to-backs, overall schedule density matters. A team playing its fifth game in seven days faces cumulative fatigue even if none of the games were strict back-to-backs. We can build a comprehensive fatigue index:
def calculate_fatigue_index(team_schedule, game_date, lookback_days=7):
"""
Calculate a fatigue index based on recent schedule density.
Components:
- Number of games in the lookback window
- Total travel distance in the lookback window
- Whether last game was yesterday (B2B)
- Cumulative minutes of star players
Returns a fatigue score (higher = more fatigued)
"""
recent = team_schedule[
(team_schedule['GAME_DATE'] < game_date) &
(team_schedule['GAME_DATE'] >= game_date - pd.Timedelta(days=lookback_days))
]
# Component 1: Games played
games_played = len(recent)
# Component 2: Days since last game
if len(recent) > 0:
last_game = recent['GAME_DATE'].max()
days_since_last = (game_date - last_game).days
else:
days_since_last = 3
# Component 3: B2B indicator
is_b2b = 1 if days_since_last == 1 else 0
# Fatigue index formula
fatigue = (
1.0 * games_played +
2.0 * is_b2b +
max(0, (4 - days_since_last)) * 0.5
)
return {
'fatigue_index': fatigue,
'games_in_window': games_played,
'days_rest': days_since_last,
'is_b2b': is_b2b
}
def fatigue_to_spread_adjustment(home_fatigue, away_fatigue):
"""
Convert fatigue indices to a spread adjustment.
Based on regression of historical game margins on fatigue differential.
Each unit of fatigue differential ≈ 0.5 points of spread impact.
"""
fatigue_diff = away_fatigue['fatigue_index'] - home_fatigue['fatigue_index']
# Positive = away team more fatigued = home team benefit
adjustment = fatigue_diff * 0.5
return adjustment
# Example
home_fatigue = {
'fatigue_index': 3.0, # Normal schedule
'games_in_window': 3,
'days_rest': 2,
'is_b2b': 0
}
away_fatigue = {
'fatigue_index': 6.5, # Heavy schedule + B2B
'games_in_window': 4,
'days_rest': 1,
'is_b2b': 1
}
adj = fatigue_to_spread_adjustment(home_fatigue, away_fatigue)
print(f"Fatigue adjustment: {adj:+.1f} points (favoring home team)")
Has the Market Priced in Rest Effects?
A critical question is whether the betting market has already incorporated rest and travel effects. If so, there is no edge to exploit. Historical analysis suggests:
-
The market has partially but not fully priced in B2B effects. Before 2018, B2B teams were consistently undervalued, leading to a profitable "fade the B2B" strategy. As this pattern became more widely known, the market improved. However, the increasing prevalence of load management has introduced new uncertainty that the market still processes imperfectly.
-
Travel distance effects remain partially underpriced. The market adjusts for obvious situations (cross-country B2B flights) but may underweight more subtle fatigue from schedule density over a longer window.
-
Late injury/rest announcements create windows of opportunity. When a star player's decision to rest on a B2B is announced close to tipoff, the line may not fully adjust before the game.
def backtest_b2b_strategy(games_df, min_rest_diff=1):
"""
Backtest a strategy of betting against B2B teams
when facing a rested opponent.
Note: This requires historical closing line data.
We use game results as a proxy here.
"""
games_df = games_df.copy()
games_df = games_df.sort_values(['TEAM_ID', 'GAME_DATE'])
games_df['DAYS_REST'] = (
games_df['GAME_DATE'] -
games_df.groupby('TEAM_ID')['GAME_DATE'].shift(1)
).dt.days
# Identify matchups where one team is on B2B and the other is rested
home_games = games_df[games_df['MATCHUP'].str.contains('vs.')].copy()
away_games = games_df[games_df['MATCHUP'].str.contains('@')].copy()
# Merge to get both teams' rest for each game
merged = home_games.merge(
away_games[['GAME_ID', 'TEAM_ABBREVIATION', 'DAYS_REST']],
on='GAME_ID',
suffixes=('_home', '_away')
)
# Strategy: Bet on the more rested team
merged['rest_diff'] = merged['DAYS_REST_home'] - merged['DAYS_REST_away']
# Home team rested, away on B2B
home_rested = merged[merged['rest_diff'] >= min_rest_diff]
home_rested_win = (home_rested['MARGIN'] > 0).mean()
# Away team rested, home on B2B
away_rested = merged[merged['rest_diff'] <= -min_rest_diff]
away_rested_win = (away_rested['MARGIN'] < 0).mean()
print(f"B2B Strategy Backtest (min rest diff: {min_rest_diff} days)")
print(f"\nHome rested vs away B2B: {len(home_rested)} games")
print(f" Home win rate: {home_rested_win:.1%}")
print(f" Home avg margin: {home_rested['MARGIN'].mean():+.1f}")
print(f"\nAway rested vs home B2B: {len(away_rested)} games")
print(f" Away win rate: {away_rested_win:.1%}")
print(f" Home avg margin: {away_rested['MARGIN'].mean():+.1f}")
return merged
b2b_results = backtest_b2b_strategy(rest_data, min_rest_diff=1)
16.4 Lineup-Based Modeling
Why Lineups Matter in the NBA
The NBA is fundamentally a lineup game. Unlike football, where 22 different players are on the field at any time and formations are diverse, basketball features exactly 10 players on the court. The chemistry, spacing, defensive coverage, and skill complementarity of specific five-man units can differ dramatically from what aggregate team statistics suggest.
A team's full-game statistics blend the performance of its starting lineup (typically 30-34 minutes) with various bench combinations. If the starters are +12 per 100 possessions but the bench lineup is -8 per 100 possessions, the team-level number masks significant variation. For betting, the key is projecting which lineups will play the most minutes tonight.
Accessing Lineup Data
from nba_api.stats.endpoints import leaguedashlineups
def get_lineup_data(season='2024-25', min_minutes=50):
"""
Get five-man lineup data for all NBA teams.
Parameters:
-----------
min_minutes : int
Minimum minutes played together to include a lineup
"""
time.sleep(0.6)
lineups = leaguedashlineups.LeagueDashLineups(
season=season,
season_type_all_star='Regular Season',
measure_type_detailed_defense='Advanced',
group_quantity=5 # Five-man lineups
).get_data_frames()[0]
# Filter by minutes
lineups = lineups[lineups['MIN'] >= min_minutes].copy()
# Sort by net rating
lineups = lineups.sort_values('NET_RATING', ascending=False)
print(f"Five-man lineups with {min_minutes}+ minutes: {len(lineups)}")
print(f"\nTop 10 lineups by net rating:")
display_cols = ['GROUP_NAME', 'TEAM_ABBREVIATION', 'MIN', 'GP',
'OFF_RATING', 'DEF_RATING', 'NET_RATING']
print(lineups[display_cols].head(10).to_string(index=False, float_format='%.1f'))
print(f"\nBottom 10 lineups by net rating:")
print(lineups[display_cols].tail(10).to_string(index=False, float_format='%.1f'))
return lineups
lineup_data = get_lineup_data()
Net Rating by Lineup and Minutes-Weighted Projections
The core of lineup-based modeling is weighting each lineup's net rating by the minutes it is expected to play:
$$\text{Projected Net Rating} = \sum_{l=1}^{L} \frac{\text{Minutes}_l}{\text{Total Minutes}} \times \text{NetRtg}_l$$
where $l$ indexes each lineup combination and $\text{Total Minutes} = 48$ per game (or 240 total player-minutes across five positions).
def lineup_weighted_projection(team, lineup_data, injury_list=None):
"""
Project a team's net rating using minutes-weighted lineup data.
When players are injured, lineups containing those players are
excluded and minutes are redistributed.
Parameters:
-----------
team : str
Team abbreviation
lineup_data : DataFrame
Five-man lineup data from NBA API
injury_list : list of str
Names of injured players to exclude
"""
team_lineups = lineup_data[
lineup_data['TEAM_ABBREVIATION'] == team
].copy()
if injury_list:
# Remove lineups containing injured players
for player in injury_list:
team_lineups = team_lineups[
~team_lineups['GROUP_NAME'].str.contains(player, case=False, na=False)
]
if len(team_lineups) == 0:
return None
# Weight by historical minutes
total_min = team_lineups['MIN'].sum()
team_lineups['weight'] = team_lineups['MIN'] / total_min
projected_off_rtg = (team_lineups['OFF_RATING'] * team_lineups['weight']).sum()
projected_def_rtg = (team_lineups['DEF_RATING'] * team_lineups['weight']).sum()
projected_net_rtg = (team_lineups['NET_RATING'] * team_lineups['weight']).sum()
return {
'team': team,
'lineups_used': len(team_lineups),
'total_minutes': total_min,
'projected_off_rtg': projected_off_rtg,
'projected_def_rtg': projected_def_rtg,
'projected_net_rtg': projected_net_rtg,
'injuries': injury_list or []
}
# Example: Full strength vs missing a starter
full_projection = lineup_weighted_projection('BOS', lineup_data)
injury_projection = lineup_weighted_projection(
'BOS', lineup_data, injury_list=['Jaylen Brown']
)
if full_projection and injury_projection:
print("Boston Celtics Lineup Projections:")
print(f"\nFull strength:")
print(f" Lineups used: {full_projection['lineups_used']}")
print(f" Projected Net Rating: {full_projection['projected_net_rtg']:+.1f}")
print(f"\nWithout Jaylen Brown:")
print(f" Lineups used: {injury_projection['lineups_used']}")
print(f" Projected Net Rating: {injury_projection['projected_net_rtg']:+.1f}")
diff = full_projection['projected_net_rtg'] - injury_projection['projected_net_rtg']
print(f"\n Impact: {diff:.1f} points of net rating")
print(f" Approximate spread impact: {diff:.1f} points")
Injury Replacement Modeling
When a starter is injured, the team does not simply lose that player's contribution. The bench players who replace them also change the dynamics of every lineup they participate in. A proper injury model must account for:
- Direct replacement effect: The difference between the injured player's impact and their replacement
- Cascade effect: How the replacement affects the performance of other players in the lineup
- Minutes redistribution: How the coach adjusts rotation patterns
def model_injury_replacement(team, injured_player, lineup_data, player_stats):
"""
Model the full impact of losing a player to injury.
Uses lineup data to estimate how the team performs with
and without the injured player.
"""
team_lineups = lineup_data[
lineup_data['TEAM_ABBREVIATION'] == team
].copy()
# Lineups with the injured player
with_player = team_lineups[
team_lineups['GROUP_NAME'].str.contains(injured_player, case=False, na=False)
]
# Lineups without the injured player
without_player = team_lineups[
~team_lineups['GROUP_NAME'].str.contains(injured_player, case=False, na=False)
]
if len(with_player) == 0:
return {'error': f'{injured_player} not found in lineup data'}
# Historical performance with vs without
with_net = (with_player['NET_RATING'] * with_player['MIN']).sum() / with_player['MIN'].sum()
without_net = (without_player['NET_RATING'] * without_player['MIN']).sum() / without_player['MIN'].sum() if len(without_player) > 0 else 0
player_minutes_pct = with_player['MIN'].sum() / team_lineups['MIN'].sum()
# Estimated impact
# The injured player's minutes will now be played by the 'without' lineups
full_team_net = (
team_lineups['NET_RATING'] * team_lineups['MIN']
).sum() / team_lineups['MIN'].sum()
# New projection: all minutes played by 'without' lineups
if len(without_player) > 0:
new_net = without_net # All minutes now with 'without' lineups
else:
# No data without this player; estimate with replacement-level penalty
new_net = full_team_net - 5.0 # Rough estimate
impact = full_team_net - new_net
result = {
'team': team,
'injured_player': injured_player,
'minutes_with_player': f"{player_minutes_pct:.1%} of team minutes",
'net_rating_with': with_net,
'net_rating_without': without_net,
'full_team_rating': full_team_net,
'projected_rating_without': new_net,
'estimated_impact': impact,
'lineups_with': len(with_player),
'lineups_without': len(without_player)
}
return result
# Example
impact = model_injury_replacement('BOS', 'Jayson Tatum', lineup_data, players)
print("Injury Impact Analysis:")
for k, v in impact.items():
if isinstance(v, float):
print(f" {k:30s}: {v:+.1f}")
else:
print(f" {k:30s}: {v}")
Two-Man Combination Analysis
Beyond five-man lineups, two-man combinations provide larger sample sizes and useful insights about player synergy:
def get_two_man_combinations(season='2024-25', team=None, min_minutes=200):
"""
Get two-man combination data showing how pairs of players
perform together and apart.
"""
time.sleep(0.6)
combos = leaguedashlineups.LeagueDashLineups(
season=season,
season_type_all_star='Regular Season',
measure_type_detailed_defense='Advanced',
group_quantity=2
).get_data_frames()[0]
combos = combos[combos['MIN'] >= min_minutes].copy()
if team:
combos = combos[combos['TEAM_ABBREVIATION'] == team]
combos = combos.sort_values('NET_RATING', ascending=False)
print(f"Two-man combinations ({min_minutes}+ min):")
display_cols = ['GROUP_NAME', 'TEAM_ABBREVIATION', 'MIN',
'OFF_RATING', 'DEF_RATING', 'NET_RATING']
print(combos[display_cols].head(15).to_string(index=False, float_format='%.1f'))
return combos
two_man = get_two_man_combinations(season='2024-25')
16.5 NBA Betting Market Patterns
Total Movement Patterns
NBA totals exhibit distinctive betting patterns. Unlike NFL totals, which are relatively stable from open to close, NBA totals can move significantly based on injury news, rest announcements, and sharp action. Key patterns include:
-
Totals tend to move toward the over. Public money heavily favors overs in the NBA, as casual bettors prefer to root for points. This creates a persistent bias where opening totals are set slightly lower than "true" to attract two-sided action.
-
Large total moves often signal sharp money on the under. When a total drops by 3 or more points from the opener, it usually indicates either a significant injury (star player out reduces scoring) or sharp action on the under.
-
Season-long total trends. Early in the season, totals tend to lag the actual pace of play as oddsmakers use last season's data. This can create early-season opportunities.
def analyze_total_movement(seasons=['2022-23', '2023-24']):
"""
Analyze NBA total movement patterns and their predictive value.
Note: Opening lines are not available from the NBA API.
This analysis uses closing totals and results.
"""
from nba_api.stats.endpoints import leaguegamelog
all_games = []
for season in seasons:
time.sleep(0.6)
gamelog = leaguegamelog.LeagueGameLog(
season=season,
season_type_all_star='Regular Season'
).get_data_frames()[0]
gamelog['season'] = season
all_games.append(gamelog)
games = pd.concat(all_games, ignore_index=True)
# Get unique games (each game appears twice, once per team)
home_games = games[games['MATCHUP'].str.contains('vs.')].copy()
away_games = games[games['MATCHUP'].str.contains('@')].copy()
game_totals = home_games.merge(
away_games[['GAME_ID', 'PTS']],
on='GAME_ID',
suffixes=('_home', '_away')
)
game_totals['actual_total'] = game_totals['PTS_home'] + game_totals['PTS_away']
game_totals['GAME_DATE'] = pd.to_datetime(game_totals['GAME_DATE'])
print("NBA Scoring Analysis:")
print(f" Average total points: {game_totals['actual_total'].mean():.1f}")
print(f" Median total points: {game_totals['actual_total'].median():.1f}")
print(f" Std deviation: {game_totals['actual_total'].std():.1f}")
# Distribution of totals
total_bins = [(180, 200), (200, 210), (210, 220), (220, 230),
(230, 240), (240, 250), (250, 260), (260, 280)]
print(f"\nTotal Points Distribution:")
for low, high in total_bins:
count = ((game_totals['actual_total'] >= low) &
(game_totals['actual_total'] < high)).sum()
pct = count / len(game_totals) * 100
bar = '#' * int(pct)
print(f" {low}-{high}: {count:>4} ({pct:>5.1f}%) {bar}")
# Monthly scoring trends
game_totals['month'] = game_totals['GAME_DATE'].dt.month
monthly = game_totals.groupby('month')['actual_total'].agg(['mean', 'count'])
print(f"\nMonthly Average Total Points:")
for month, row in monthly.iterrows():
print(f" Month {month:>2}: {row['mean']:.1f} ({row['count']} games)")
return game_totals
total_analysis = analyze_total_movement()
Live Betting in the NBA
Live (in-game) betting has transformed NBA wagering. The NBA's scoring structure---frequent possessions, runs and counter-runs, timeouts and momentum shifts---creates a dynamic in-game line that fluctuates far more than in the NFL. Key principles for NBA live betting:
-
Regression to the mean during runs. When a team goes on a 15-2 run, the live line dramatically shifts. But scoring runs regress. A team that shoots 80% from three during a run will not sustain that rate. The live bettor who recognizes this can bet against the run.
-
Foul trouble. When a team's star player picks up early fouls, the live line adjusts. But the adjustment is often excessive---the star may avoid further fouls and return to dominate the second half. Conversely, the market may not adjust enough when a key player actually fouls out.
-
Garbage time. NBA games frequently feature extended "garbage time" where the trailing team makes cosmetic scoring runs against the leader's reserves. This inflates late-game totals and can affect live total bets.
def model_live_regression(current_lead, minutes_remaining,
home_pregame_spread=-3.0,
historical_variance_per_min=3.2):
"""
Estimate the probability of each team winning given the
current game state.
Based on a simplified model of scoring variance.
Parameters:
-----------
current_lead : float
Home team's current lead (positive = home winning)
minutes_remaining : float
Minutes remaining in the game
home_pregame_spread : float
Pre-game spread (negative = home favored)
historical_variance_per_min : float
Standard deviation of scoring margin per minute
"""
# Expected margin at end of game
# The pregame spread implies a scoring rate differential
expected_rate = -home_pregame_spread / 48 # Points per minute expected advantage
# Expected final margin
expected_remaining_margin = expected_rate * minutes_remaining
expected_final_lead = current_lead + expected_remaining_margin
# Uncertainty: grows with time remaining
sigma = historical_variance_per_min * np.sqrt(minutes_remaining)
# Probability home wins (lead > 0 at end)
from scipy import stats
if sigma > 0:
prob_home_wins = stats.norm.cdf(expected_final_lead / sigma)
else:
prob_home_wins = 1.0 if expected_final_lead > 0 else 0.0
# Fair moneyline
if prob_home_wins > 0.5:
fair_ml_home = -100 * prob_home_wins / (1 - prob_home_wins)
fair_ml_away = 100 * (1 - prob_home_wins) / prob_home_wins
else:
fair_ml_home = 100 * (1 - prob_home_wins) / prob_home_wins if prob_home_wins > 0 else 999
fair_ml_away = -100 * prob_home_wins / (1 - prob_home_wins) if prob_home_wins < 1 else -999
return {
'current_lead': current_lead,
'minutes_remaining': minutes_remaining,
'expected_final_lead': expected_final_lead,
'uncertainty_sigma': sigma,
'prob_home_wins': prob_home_wins,
'fair_ml_home': fair_ml_home,
'fair_ml_away': fair_ml_away,
}
# Example: Various live game states
print("Live Game Win Probability Model:")
print(f"{'State':35s} | {'P(Home)':>8} | {'Home ML':>8} | {'Away ML':>8}")
print("-" * 70)
scenarios = [
(0, 48.0, 'Tip-off, Home -3'),
(8, 36.0, 'Home up 8, 3Q start'),
(8, 12.0, 'Home up 8, 4Q start'),
(8, 5.0, 'Home up 8, 5 min left'),
(-5, 24.0, 'Home down 5, halftime'),
(-12, 24.0, 'Home down 12, halftime'),
(15, 6.0, 'Home up 15, 6 min left'),
(-3, 2.0, 'Home down 3, 2 min left'),
]
for lead, minutes, label in scenarios:
result = model_live_regression(lead, minutes, home_pregame_spread=-3.0)
print(f"{label:35s} | {result['prob_home_wins']:>7.1%} | "
f"{result['fair_ml_home']:>+7.0f} | {result['fair_ml_away']:>+7.0f}")
Player Prop Correlations
The player props market---bets on individual player statistical performance---has exploded in the NBA. Points, rebounds, assists, three-pointers made, and combination props (PRA = points + rebounds + assists) are all widely available. Key principles for modeling player props:
-
Usage rate drives variance. When teammates are injured, a player's usage rate increases, boosting their counting stats. The props market adjusts for this, but not always efficiently.
-
Pace affects props. A player facing a fast-paced team will have more possessions and thus more opportunities to accumulate stats. A game with 110 possessions per team provides roughly 10% more opportunities than a game with 100.
-
Defensive matchup matters. A guard facing a team that ranks 28th in perimeter defense will have a better night than one facing the league's best perimeter defense, all else equal.
-
Correlation between props. Points and assists are positively correlated for most players. Points and rebounds are less correlated. Understanding these correlations is crucial for same-game parlays.
def model_player_prop(player_name, stat, opponent,
player_season_avg, player_recent_avg,
player_usage_rate, opponent_defense_rank,
pace_adjustment=1.0, injury_usage_boost=0.0):
"""
Model a player's expected statistical output for a specific game.
Parameters:
-----------
player_season_avg : float
Season average for the stat
player_recent_avg : float
Average over last 5-10 games
player_usage_rate : float
Player's usage rate (0-1)
opponent_defense_rank : int
Opponent's rank in defending this stat (1=best, 30=worst)
pace_adjustment : float
Expected pace relative to league average (1.0 = avg)
injury_usage_boost : float
Expected increase in usage due to teammate injuries (0-0.1)
"""
# Weighted average of season and recent performance
base_projection = 0.6 * player_season_avg + 0.4 * player_recent_avg
# Opponent adjustment: defense rank 1-30, centered at 15.5
# Each rank step ≈ 1-2% impact depending on the stat
defense_factor = {
'points': 0.015,
'rebounds': 0.008,
'assists': 0.012,
'threes': 0.018,
'PRA': 0.012,
}
opp_adj = (opponent_defense_rank - 15.5) * defense_factor.get(stat, 0.01)
base_projection *= (1 + opp_adj)
# Pace adjustment
base_projection *= pace_adjustment
# Usage boost from injuries
if injury_usage_boost > 0:
usage_multiplier = (player_usage_rate + injury_usage_boost) / player_usage_rate
base_projection *= min(usage_multiplier, 1.25) # Cap at 25% boost
# Standard deviation (typically 20-30% of the mean for NBA stats)
std_pct = {
'points': 0.30,
'rebounds': 0.35,
'assists': 0.35,
'threes': 0.50,
'PRA': 0.22,
}
std = base_projection * std_pct.get(stat, 0.30)
return {
'player': player_name,
'stat': stat,
'projection': base_projection,
'std': std,
'range_68': (base_projection - std, base_projection + std),
'range_95': (base_projection - 2*std, base_projection + 2*std),
}
# Example projection
proj = model_player_prop(
player_name='Jayson Tatum',
stat='points',
opponent='MIL',
player_season_avg=27.5,
player_recent_avg=30.2,
player_usage_rate=0.31,
opponent_defense_rank=18,
pace_adjustment=1.02,
injury_usage_boost=0.02
)
print("Player Prop Projection:")
for k, v in proj.items():
if isinstance(v, tuple):
print(f" {k:15s}: ({v[0]:.1f}, {v[1]:.1f})")
elif isinstance(v, float):
print(f" {k:15s}: {v:.1f}")
else:
print(f" {k:15s}: {v}")
# Evaluate the prop line
from scipy import stats as scipy_stats
prop_line = 27.5
prob_over = 1 - scipy_stats.norm.cdf(
prop_line, loc=proj['projection'], scale=proj['std']
)
print(f"\nProp line: {prop_line}")
print(f"Probability OVER: {prob_over:.1%}")
print(f"Fair odds: {-100 * prob_over / (1 - prob_over):+.0f} / "
f"{100 * (1 - prob_over) / prob_over:+.0f}")
Tanking Teams and End-of-Season Dynamics
The NBA's draft lottery system creates perverse incentives for teams eliminated from playoff contention. "Tanking"---deliberately losing games to improve draft position---is officially prohibited but widely practiced through rest, lineup choices, and effort level. This creates a unique late-season dynamic:
- Teams eliminated from playoff contention may rest veteran players, give developmental players heavy minutes, and generally perform below their stated ability.
- Teams fighting for playoff seeding are maximally motivated and often outperform their season-average metrics.
- The market sometimes fails to fully adjust for tanking incentives, particularly early in the tanking window (February/March) when the intent is less obvious.
def analyze_tanking_patterns(seasons=['2022-23', '2023-24']):
"""
Analyze performance patterns of teams below playoff contention
in the second half of the season.
"""
from nba_api.stats.endpoints import leaguestandings, leaguegamelog
for season in seasons:
time.sleep(0.6)
standings = leaguestandings.LeagueStandings(
season=season,
season_type='Regular Season'
).get_data_frames()[0]
time.sleep(0.6)
gamelog = leaguegamelog.LeagueGameLog(
season=season,
season_type_all_star='Regular Season'
).get_data_frames()[0]
gamelog['GAME_DATE'] = pd.to_datetime(gamelog['GAME_DATE'])
# Split season at All-Star break (approximately game 55)
team_games = gamelog.groupby('TEAM_ID').apply(
lambda x: x.sort_values('GAME_DATE').reset_index(drop=True)
).reset_index(drop=True)
team_games['game_num'] = team_games.groupby('TEAM_ID').cumcount() + 1
first_half = team_games[team_games['game_num'] <= 50]
second_half = team_games[team_games['game_num'] > 50]
# Identify bottom teams by first-half win percentage
first_half_records = first_half.groupby('TEAM_ABBREVIATION').agg(
wins=('WL', lambda x: (x == 'W').sum()),
games=('WL', 'count')
)
first_half_records['win_pct'] = first_half_records['wins'] / first_half_records['games']
# Bottom 6 teams
bottom_teams = first_half_records.nsmallest(6, 'win_pct').index.tolist()
top_teams = first_half_records.nlargest(6, 'win_pct').index.tolist()
# Second half performance
bottom_second = second_half[second_half['TEAM_ABBREVIATION'].isin(bottom_teams)]
top_second = second_half[second_half['TEAM_ABBREVIATION'].isin(top_teams)]
bottom_win_pct = (bottom_second['WL'] == 'W').mean()
top_win_pct = (top_second['WL'] == 'W').mean()
# Compare to first half
bottom_first_pct = first_half_records.loc[bottom_teams, 'win_pct'].mean()
top_first_pct = first_half_records.loc[top_teams, 'win_pct'].mean()
print(f"\n{season} Season Split Analysis:")
print(f" Bottom 6 teams: {bottom_teams}")
print(f" First half win%: {bottom_first_pct:.1%}")
print(f" Second half win%: {bottom_win_pct:.1%}")
print(f" Change: {bottom_win_pct - bottom_first_pct:+.1%}")
print(f" Top 6 teams: {top_teams}")
print(f" First half win%: {top_first_pct:.1%}")
print(f" Second half win%: {top_win_pct:.1%}")
print(f" Change: {top_win_pct - top_first_pct:+.1%}")
analyze_tanking_patterns()
Putting It All Together: A Complete NBA Game Analysis
def complete_nba_analysis(home_team, away_team, season,
team_ratings, lineup_data,
home_injuries=None, away_injuries=None,
home_b2b=False, away_b2b=False,
market_spread=None, market_total=None):
"""
Complete NBA game analysis combining all Chapter 16 tools.
"""
print(f"{'='*65}")
print(f"NBA GAME ANALYSIS: {away_team} @ {home_team}")
print(f"Season: {season}")
if market_spread:
print(f"Market: {home_team} {market_spread:+.1f}, O/U {market_total}")
print(f"{'='*65}")
# 1. Base team ratings
home_rtg = team_ratings[team_ratings['team'].str.contains(home_team, case=False)]
away_rtg = team_ratings[team_ratings['team'].str.contains(away_team, case=False)]
if len(home_rtg) > 0 and len(away_rtg) > 0:
home_rtg = home_rtg.iloc[0]
away_rtg = away_rtg.iloc[0]
print(f"\n--- TEAM RATINGS ---")
print(f" {home_team}: ORtg {home_rtg['off_rtg']:.1f}, "
f"DRtg {home_rtg['def_rtg']:.1f}, "
f"Net {home_rtg['net_rtg']:+.1f}, "
f"Pace {home_rtg['pace']:.1f}")
print(f" {away_team}: ORtg {away_rtg['off_rtg']:.1f}, "
f"DRtg {away_rtg['def_rtg']:.1f}, "
f"Net {away_rtg['net_rtg']:+.1f}, "
f"Pace {away_rtg['pace']:.1f}")
# 2. Spread prediction from ratings
hca = 3.0 # NBA home court advantage (~3 points)
net_diff = home_rtg['net_rtg'] - away_rtg['net_rtg']
base_spread = -(net_diff + hca) # Negative = home favored
print(f"\n--- BASE PREDICTION ---")
print(f" Net rating differential: {net_diff:+.1f}")
print(f" Home court advantage: +{hca:.1f}")
print(f" Base predicted spread: {home_team} {-base_spread:+.1f}")
# 3. Rest adjustments
rest_adj = 0
if home_b2b and not away_b2b:
rest_adj = -2.5 # Home team penalized
elif away_b2b and not home_b2b:
rest_adj = 2.5 # Away team penalized (benefits home)
elif home_b2b and away_b2b:
rest_adj = 0 # Both fatigued
if rest_adj != 0:
print(f"\n--- REST ADJUSTMENT ---")
if home_b2b:
print(f" {home_team} on back-to-back")
if away_b2b:
print(f" {away_team} on back-to-back")
print(f" Rest adjustment: {rest_adj:+.1f} points")
# 4. Injury adjustments using lineup data
injury_adj = 0
if home_injuries:
home_proj = lineup_weighted_projection(home_team, lineup_data, home_injuries)
home_full = lineup_weighted_projection(home_team, lineup_data)
if home_proj and home_full:
home_injury_impact = home_full['projected_net_rtg'] - home_proj['projected_net_rtg']
injury_adj -= home_injury_impact
print(f"\n--- INJURY ADJUSTMENTS ---")
print(f" {home_team} missing: {', '.join(home_injuries)}")
print(f" Impact: {-home_injury_impact:+.1f} points")
if away_injuries:
away_proj = lineup_weighted_projection(away_team, lineup_data, away_injuries)
away_full = lineup_weighted_projection(away_team, lineup_data)
if away_proj and away_full:
away_injury_impact = away_full['projected_net_rtg'] - away_proj['projected_net_rtg']
injury_adj += away_injury_impact
print(f" {away_team} missing: {', '.join(away_injuries)}")
print(f" Impact: {away_injury_impact:+.1f} points (benefits {home_team})")
# 5. Final prediction
final_spread = -base_spread + rest_adj + injury_adj if len(home_rtg) > 0 else None
if final_spread is not None:
print(f"\n--- FINAL PREDICTION ---")
print(f" Model spread: {home_team} {final_spread:+.1f}")
if market_spread:
edge = final_spread - market_spread
print(f" Market spread: {home_team} {market_spread:+.1f}")
print(f" Edge: {edge:+.1f} points")
if abs(edge) >= 2.0:
side = home_team if edge < 0 else away_team
print(f" ** POTENTIAL EDGE on {side} **")
# 6. Total prediction
if len(home_rtg) > 0 and len(away_rtg) > 0:
total_pred = predict_game_total(
home_rtg['off_rtg'], home_rtg['def_rtg'], home_rtg['pace'],
away_rtg['off_rtg'], away_rtg['def_rtg'], away_rtg['pace']
)
print(f"\n Predicted total: {total_pred['predicted_total']:.1f}")
if market_total:
total_edge = total_pred['predicted_total'] - market_total
print(f" Market total: {market_total}")
print(f" Edge: {total_edge:+.1f}")
print(f"\n{'='*65}")
# Example analysis
complete_nba_analysis(
home_team='BOS', away_team='MIL',
season='2024-25',
team_ratings=team_ratings,
lineup_data=lineup_data,
away_injuries=['Giannis Antetokounmpo'],
away_b2b=True,
market_spread=-7.5,
market_total=223.5
)
16.6 Chapter Summary
This chapter provided a comprehensive framework for modeling the NBA from a betting perspective, covering the sport's unique data, metrics, schedule effects, and market dynamics.
Key Takeaways
The Four Factors: - Dean Oliver's Four Factors (eFG%, TOV%, OREB%, FT Rate) explain over 90% of the variance in NBA team win percentage. - Shooting efficiency (eFG%) is by far the most important factor, followed by turnovers, offensive rebounding, and free throw rate. - Pace is a critical adjustment factor: per-possession metrics are more meaningful than per-game stats, and pace directly drives game totals. - The Pythagorean win expectation (with exponent ~13.91) provides a reliable mapping between point differential and win percentage.
Player Impact: - BPM, RPM, RAPTOR, and EPM each attempt to measure individual player impact per 100 possessions. EPM (and similar regularized adjusted plus/minus metrics) represents the current state of the art. - Player availability is the primary source of nightly team strength variation. A robust NBA model must update projections based on who is playing tonight. - Minutes-weighted lineup projections capture the full impact of player absence, including cascade effects on bench usage and lineup chemistry.
Rest and Travel: - The back-to-back effect is one of the most well-documented patterns in NBA betting, worth approximately 2.5 to 4.0 points depending on the opponent's rest situation and travel involved. - The market has improved at pricing B2B effects but still does not fully account for cumulative schedule fatigue and late rest decisions. - Altitude (Denver, Utah) provides a measurable additional home-court advantage that compounds with visitor fatigue.
Market Patterns: - NBA totals attract heavy public money on the over. Understanding the distribution of actual totals helps identify mispriced lines. - Live betting regression to the mean is a powerful concept: scoring runs are temporary, and the live market often overreacts to them. - Player prop modeling requires attention to usage rate, pace, defensive matchup, and teammate availability. Correlations between props are important for same-game parlay analysis. - Tanking dynamics in the second half of the season create a distinctive pattern where eliminated teams underperform their established level.
Comparing NFL and NBA Modeling
| Dimension | NFL (Chapter 15) | NBA (Chapter 16) |
|---|---|---|
| Sample size | 17 games/team | 82 games/team |
| Data richness | Play-by-play + tracking | Box score + tracking + lineup |
| Key metric | EPA/play | Net Rating per 100 poss |
| Position dominance | QB overwhelms all | More distributed |
| Schedule effects | Weekly, minimal fatigue | B2B, travel, altitude |
| Market efficiency | Very high on spreads | High, but rest edges exist |
| Key numbers | 3, 7 (margin clustering) | None (continuous scoring) |
| Total variance | High (weather, game flow) | Moderate (pace-driven) |
| Live betting | Limited by game structure | Rich opportunities |
| Player props | Growing market | Massive market |
Looking Ahead
Chapter 17 will apply our modeling framework to Major League Baseball, where the large sample of 162 games, the dominance of starting pitching, and the sport's historical embrace of analytics create yet another distinct modeling environment. The concepts of possession-based efficiency and player impact that we developed here will transfer, but the specific metrics and market dynamics will differ substantially.
Practice Exercises
-
Four Factors Regression: Download team-level Four Factors data for the past 5 NBA seasons. Build a regression model predicting win percentage from the eight factors (four offensive, four defensive). What is the R-squared? Which factor has the largest coefficient? How does the model perform out of sample?
-
B2B Profit Simulation: Using NBA schedule data, identify all back-to-back situations for the 2023-24 season. Assuming a flat -110 line and a strategy of always betting against the B2B team (when the opponent is rested), simulate the season-long profit/loss. How sensitive is the result to the juice assumption?
-
Player Impact Analysis: Choose an NBA team and use the NBA API to pull their five-man lineup data. Calculate the minutes-weighted net rating for the full team. Then simulate removing each starter and recalculating. Which player has the largest impact? Does this align with that player's BPM/EPM?
-
Pace and Totals: Build a totals prediction model using pace and offensive/defensive ratings. Evaluate it against market totals for the 2024-25 season. Does incorporating pace improve predictions compared to a model using only ratings?
-
Player Prop Model: Select a star player and build a Bayesian model for their points scored. Use their season average as the prior, update with their last 5 games (recency weighting), and adjust for the opponent's defensive ranking. How does your model's projection compare to the sportsbook's prop line? Track the results over 10 games to evaluate accuracy.
-
Tanking Detection: Analyze the performance of teams eliminated from playoff contention in the final 20 games of the 2023-24 season. Compare their ATS (against the spread) record to their ATS record earlier in the season. Is there a statistically significant difference?
Further Reading
- Oliver, Dean. Basketball on Paper: Rules and Tools for Performance Analysis. Potomac Books, 2004.
- Kubatko, Justin, et al. "A Starting Point for Analyzing Basketball Statistics." Journal of Quantitative Analysis in Sports, 2007.
- Engelmann, Jeremias. "A New Player Evaluation Technique for Players of the National Basketball Association (NBA)." MIT Sloan Sports Analytics Conference, 2011.
- Silver, Nate. "How Our RAPTOR Metric Works." FiveThirtyEight, 2019.
- Snarr, Taylor. "Introducing EPM: Estimated Plus-Minus." Dunks & Threes, 2022.
- Pelton, Kevin. "NBA Statistical Primer." ESPN, updated annually.
- Goldman, Michael and Richard Puerzer. "NBA Betting Market Efficiency." Journal of Prediction Markets, 2016.
- Cooper, Harris et al. "The Impact of Rest on NBA Player Performance." MIT Sloan Sports Analytics Conference, 2020.
- Berri, David J. Stumbling on Wins: Two Economists Expose the Pitfalls on the Road to Victory in Professional Sports. FT Press, 2010.
- Hollinger, John. Pro Basketball Forecast. Potomac Books, 2005.