Case Study: Logistic Regression for NBA Moneyline Predictions

Executive Summary

Predicting the winner of an NBA game is a binary classification problem: the home team wins or it does not. Logistic regression is the natural statistical tool for this task, mapping a set of pre-game features to a win probability between 0 and 1. This case study builds a complete logistic regression pipeline for NBA moneyline prediction, from feature engineering through model evaluation and practical betting deployment. We emphasize calibration --- the alignment between predicted probabilities and observed win frequencies --- because in betting, a probability estimate is only useful if it can be trusted at face value when compared against the sportsbook's implied odds.

Background

The NBA Moneyline Market

A moneyline bet is the simplest wager in sports: pick the winner. The sportsbook expresses each team's price through odds that imply a win probability. For example, a home team at -180 carries an implied probability of approximately 64.3% (before vig removal). The bettor's objective is to identify games where their estimated probability meaningfully exceeds the market's implied probability.

The NBA plays an 82-game regular season with 30 teams, producing 1,230 games per season. Home teams win approximately 58--60% of the time in recent seasons, a strong baseline that any model must beat to have predictive value. The sheer volume of games --- compared to the NFL's 272 --- makes the NBA an attractive sport for model-based betting, as the larger sample accelerates the detection of genuine edges.

Why Logistic Regression?

Logistic regression is the preferred tool for this problem for several reasons. First, it directly models the probability of a binary outcome without the awkwardness of linear regression producing probabilities outside [0, 1]. Second, it is interpretable: each coefficient tells you how a feature affects the log-odds of winning. Third, it tends to produce well-calibrated probabilities out of the box, which is critical for betting applications. Fourth, its simplicity makes it resistant to overfitting on the relatively small per-season samples available in sports.

Data and Feature Engineering

Dataset Construction

We construct a dataset covering five NBA regular seasons (2019-20 through 2023-24), totaling approximately 6,150 games. For each game, features are computed using only information available before tip-off.

Feature Definitions

Team Strength Features: - home_net_rating: Home team's net rating (offensive rating minus defensive rating), computed as a rolling average over the last 15 games. Net rating is the single most predictive team-level statistic in basketball analytics. - away_net_rating: Away team's net rating, computed identically. - net_rating_diff: The difference home_net_rating - away_net_rating. This is the primary predictor.

Home Court Advantage: - home_court: Always 1 for our modeling framework (since we always predict from the home team's perspective). The coefficient on this variable captures the average home-court advantage.

Rest and Schedule Features: - home_rest_days: Days since the home team's last game. Values of 0 indicate a back-to-back. - away_rest_days: Days since the away team's last game. - home_b2b: Binary indicator --- 1 if the home team is on the second game of a back-to-back. - away_b2b: Binary indicator for the away team. - rest_advantage: home_rest_days - away_rest_days.

Travel and Fatigue: - away_travel_miles: Approximate distance the away team traveled for this game, computed from team city coordinates. Long-distance travel (cross-country games) correlates with reduced performance.

Recent Form: - home_last10_winpct: Home team's win percentage over their last 10 games. - away_last10_winpct: Away team's win percentage over their last 10 games.

Strength of Schedule Adjustment: - home_sos: Average net rating of the home team's opponents over their last 15 games. A team with a 10-5 record against strong opponents is better than a 10-5 record against weak ones. - away_sos: Same for the away team.

Temporal Integrity

All rolling averages use only games played before the current game date. Early-season estimates (first 10 games) are blended with the previous season's end-of-year statistics using a shrinkage factor of $w_t = \min(g_t / 20, 1)$, where $g_t$ is the number of games played in the current season.

Model Building

Preprocessing

Features are standardized using StandardScaler fitted only on the training data. Standardization is important for logistic regression because it ensures the regularization penalty treats all features equally, regardless of their original scale.

Train-Test Split

We use a strict temporal split: - Training: 2019-20 through 2022-23 (4,920 games) - Testing: 2023-24 (1,230 games)

This maintains temporal integrity and simulates real-world deployment.

Model Fitting

We fit a logistic regression using scikit-learn with L2 regularization (Ridge penalty). The regularization parameter $C$ is selected via 5-fold time-series cross-validation on the training set, searching over $C \in \{0.01, 0.1, 0.5, 1.0, 5.0, 10.0\}$. The optimal value is $C = 1.0$.

Coefficient Interpretation

The fitted model yields the following coefficients (from synthetic data):

Feature	Coefficient	Odds Ratio
Intercept	0.32	---
net_rating_diff	0.38	1.46
home_court (absorbed into intercept)	---	---
rest_advantage	0.09	1.09
home_b2b	-0.21	0.81
away_b2b	0.18	1.20
away_travel_miles (scaled)	0.06	1.06
home_last10_winpct	0.15	1.16
away_last10_winpct	-0.12	0.89
home_sos	0.04	1.04

Key interpretations:

The intercept of 0.32 reflects the baseline home-court advantage. In log-odds terms, a neutral game (all features at their scaled mean of zero) gives the home team log-odds of 0.32, corresponding to a probability of $\sigma(0.32) = 0.579$, or about 58% --- closely matching the observed NBA home win rate.

The coefficient of 0.38 on net_rating_diff means that a one-standard-deviation increase in the net rating difference (home team better) increases the log-odds of a home win by 0.38. The odds ratio of 1.46 means the odds of a home win multiply by 1.46 for each standard-deviation increase in net rating advantage.

A home back-to-back (home_b2b = 1) reduces the log-odds by 0.21. Converting: for a team with an otherwise neutral game (50/50 before the intercept), a back-to-back reduces their win probability from about 58% (with home-court) to approximately 53%.

Model Evaluation

Discrimination Metrics

On the 2023-24 test set:

Accuracy: 65.2% (vs. 58.5% home-win baseline)
AUC-ROC: 0.71
Log-loss: 0.62 (vs. 0.68 for a naive model predicting the home team always at 58.5%)

The model significantly outperforms the naive baseline across all metrics. An AUC of 0.71 indicates good discriminative ability --- the model ranks winners ahead of losers about 71% of the time.

Calibration Assessment

Calibration is the most important evaluation for a betting model. We bin the model's predicted probabilities into deciles and compare against observed win frequencies:

Predicted Prob Bin	Mean Predicted	Observed Win%	Count
0.30 -- 0.40	0.36	0.34	89
0.40 -- 0.50	0.45	0.43	156
0.50 -- 0.60	0.55	0.56	298
0.60 -- 0.70	0.65	0.64	312
0.70 -- 0.80	0.74	0.72	243
0.80 -- 0.90	0.84	0.86	102
0.90 -- 1.00	0.93	0.90	30

The calibration is excellent. Predicted probabilities closely match observed frequencies across all bins. The calibration plot (predicted vs. observed) hugs the 45-degree line, with a Brier score of 0.212. This means the model's probability outputs can be trusted for direct comparison against sportsbook-implied probabilities.

Brier Score Decomposition

The Brier score of 0.212 can be decomposed into: - Reliability: 0.002 (excellent calibration --- near zero) - Resolution: 0.038 (moderate ability to separate outcomes) - Uncertainty: 0.248 (inherent unpredictability of NBA games)

The model's primary strength is its reliability (calibration), which is exactly what matters for betting.

Betting Application

Edge Computation

For each game in the test set, we compute the edge as:

$$\text{Edge} = p_{\text{model}} - p_{\text{market}}$$

where $p_{\text{market}}$ is the sportsbook's no-vig implied probability (computed by normalizing both sides' implied probabilities to sum to 1.0).

Betting Strategy

We implement a simple strategy: bet on the side (home or away) where the model's probability exceeds the market's no-vig probability by at least a threshold $\tau$. We test several thresholds:

Threshold ($\tau$)	Bets Placed	Win Rate	Avg Odds	ROI	Units Profit
0.02 (2 pp)	412	54.1%	-112	+2.1%	+8.7
0.03 (3 pp)	298	55.0%	-108	+3.8%	+11.3
0.05 (5 pp)	164	56.7%	-104	+5.9%	+9.7
0.07 (7 pp)	82	58.5%	-101	+8.2%	+6.7
0.10 (10 pp)	34	61.8%	+102	+13.1%	+4.5

The pattern is clear: higher thresholds produce higher win rates and ROI but fewer bets. This is the classic precision-recall tradeoff in betting: stricter filters improve accuracy at the cost of opportunity volume.

Optimal Threshold Selection

The optimal threshold depends on the bettor's goals. For maximizing total units profit, the 3-percentage-point threshold performs best (11.3 units). For maximizing ROI (which matters for Kelly Criterion sizing), the 10-percentage-point threshold wins. A practical recommendation is to use $\tau = 0.05$, which balances meaningful edge, adequate sample size, and strong ROI.

Bankroll Simulation

Using Kelly Criterion bet sizing with $\tau = 0.05$ on the test set:

Starting bankroll: $10,000
Ending bankroll: $12,340
Maximum drawdown: $1,180 (11.8%)
Bets per month: approximately 20
Annualized return: 23.4%

The Kelly fractions are small (typically 1--3% of bankroll) because the edges are modest. This is realistic: sports betting edges are thin, and the path to profitability is through volume and discipline, not large individual wagers.

Model Limitations and Caveats

What the Model Misses

1. Injuries and lineup changes. The model uses team-level statistics that reflect the team's recent performance with its actual lineup. However, game-day injury reports (announced 30--90 minutes before tip-off) can significantly alter expected performance. A team missing its star player may perform 3--5 net rating points worse than its recent average. Incorporating injury-adjusted projections would be a significant enhancement.

2. Motivation and schedule effects. Late-season games where playoff seeding is locked can see star players resting. Back-to-back games preceding a marquee matchup may see reduced effort. The model captures rest patterns but not motivational factors.

3. Referee assignments. Research has shown that referee crews vary in their foul-calling tendencies, affecting pace and free-throw rates. This is a second-order effect but could matter in totals-adjacent analysis.

4. Market efficiency. The sportsbook's line is itself a highly informed estimate. Our model agrees with the market 70%+ of the time. The edges we identify are small (3--7 percentage points), and it is possible that some of these apparent edges reflect model noise rather than genuine inefficiency.

Sample Size Concerns

With 164 bets at the 5-percentage-point threshold, a 56.7% win rate against a break-even rate of approximately 52.4% (at -104 average odds) yields:

$$z = \frac{0.567 - 0.524}{\sqrt{0.524 \times 0.476 / 164}} = \frac{0.043}{0.039} = 1.10$$

This corresponds to a one-tailed p-value of 0.136 --- not statistically significant at conventional levels. Confirming this edge with 95% confidence would require approximately 800--1,000 qualifying bets, or 5--6 seasons of deployment. This is the fundamental tension in sports betting: edges are real but small, and proving them requires patience that most bettors lack.

Lessons Learned

1. Calibration trumps accuracy. A model that is 65% accurate but perfectly calibrated is more useful for betting than a model that is 70% accurate but systematically overestimates its confidence. Betting decisions hinge on comparing probabilities, not on being right or wrong game by game.

2. Net rating difference is king. This single feature explains more variance than all situational features combined. The model works because it starts with the right base --- a strong team-strength metric --- and adds marginal improvements from rest, travel, and form.

3. Logistic regression's simplicity is a feature, not a bug. With ~1,200 games per season, complex models (deep neural networks, large gradient-boosting ensembles) tend to overfit. Logistic regression's inductive bias toward simplicity matches the data regime.

4. Edge thresholds matter enormously. Betting on every game where the model sees any edge is a recipe for marginal (if any) profitability. Requiring a 5+ percentage point edge dramatically improves outcomes but requires patience.

5. Live monitoring is essential. Model performance should be tracked in real time. If the model's actual win rate falls more than 2 standard deviations below its predicted win rate over a 100-bet window, something has changed (distributional shift, market adaptation, data quality issue) and the model should be reviewed.

Your Turn: Extension Projects

Add player-level features. Incorporate individual player on/off net rating data to capture the impact of specific lineup configurations. How much does this improve over team-level features alone?
Build a second model for point spreads. Adapt the logistic regression to predict against-the-spread outcomes instead of moneyline outcomes. Which market shows more exploitable inefficiency?
Implement Platt scaling. Even though the model is already well-calibrated, apply Platt scaling post-hoc and measure whether it further improves the Brier score on a held-out calibration set.
Compare to a market-only model. Build a baseline that uses only the closing line's implied probability (no model features) and compare its Brier score to the logistic regression. How much value does the model add beyond what the market already knows?
Test across multiple seasons. Perform full walk-forward validation over all five seasons, re-fitting the model each year. Does the model's edge persist across different seasons, or is it concentrated in specific years?

Discussion Questions

The model's best threshold (10 pp edge) produces only 34 bets per season. Is this enough to sustain a professional betting operation? What practical constraints would this create?
If the sportsbook detects that you are consistently beating the closing line, they may limit your account. How does this reality affect the practical value of even a well-performing model?
The model uses a 15-game rolling window for net rating. What are the tradeoffs of using a shorter (5-game) or longer (30-game) window?
How would you modify this model for NBA playoff games, where series dynamics, increased effort, and coaching adjustments differ markedly from regular-season patterns?
Should the edge threshold be constant across the season, or should it vary (e.g., require larger edges early in the season when team-strength estimates are noisier)?