Chapter 25: Case Study 1 - Building an Elo Rating System for NBA Prediction
Introduction
Elo ratings, originally developed for chess, have become one of the most popular and effective approaches for rating teams and predicting game outcomes across sports. This case study walks through the complete development of an NBA Elo rating system, from initial implementation through calibration and validation.
Part 1: Elo Rating Fundamentals
The Core Algorithm
The Elo system is based on a simple principle: when two opponents compete, the outcome updates both of their ratings based on the difference between the expected and actual results.
Key Components:
- Expected Score Calculation:
E_A = 1 / (1 + 10^((R_B - R_A) / 400))
Where E_A is Team A's expected win probability and R_A, R_B are the ratings.
- Rating Update:
R_A_new = R_A + K × (S_A - E_A)
Where S_A is the actual result (1 for win, 0 for loss), and K is the learning rate.
- Home Court Adjustment: Add a fixed amount (typically 100 points) to the home team's rating for prediction purposes.
Implementation
class NBAEloSystem:
def __init__(self, k_factor=20, home_advantage=100, initial_rating=1500):
self.k_factor = k_factor
self.home_advantage = home_advantage
self.initial_rating = initial_rating
self.ratings = {}
def get_rating(self, team):
return self.ratings.get(team, self.initial_rating)
def expected_score(self, rating_a, rating_b, home_advantage=0):
"""Calculate expected win probability for team A."""
return 1 / (1 + 10 ** ((rating_b - rating_a - home_advantage) / 400))
def update_ratings(self, home_team, away_team, home_score, away_score):
"""Update ratings after a game."""
# Get current ratings
home_rating = self.get_rating(home_team)
away_rating = self.get_rating(away_team)
# Calculate expected scores
home_expected = self.expected_score(home_rating, away_rating, self.home_advantage)
# Actual result
home_actual = 1 if home_score > away_score else 0
# Update ratings
self.ratings[home_team] = home_rating + self.k_factor * (home_actual - home_expected)
self.ratings[away_team] = away_rating + self.k_factor * (home_actual - 1 + home_expected)
Part 2: Calibrating the Parameters
K-Factor Optimization
The K-factor determines how quickly ratings change. Too high, and ratings are noisy; too low, and they adapt slowly to real changes.
Testing Process: 1. Split data into training (seasons 1-5) and validation (season 6) 2. Test K values from 10 to 40 3. Measure prediction accuracy on validation set
Results:
| K-Factor | Accuracy | Brier Score | Log Loss |
|---|---|---|---|
| 10 | 64.2% | 0.228 | 0.581 |
| 15 | 65.1% | 0.223 | 0.572 |
| 20 | 65.8% | 0.219 | 0.564 |
| 25 | 65.5% | 0.221 | 0.568 |
| 30 | 64.9% | 0.224 | 0.575 |
Optimal K-Factor: 20 (balances responsiveness and stability)
Home Court Advantage
Testing different home court adjustments:
| HCA (Elo points) | Equivalent Spread | Accuracy |
|---|---|---|
| 50 | 1.8 pts | 64.5% |
| 75 | 2.7 pts | 65.2% |
| 100 | 3.6 pts | 65.8% |
| 125 | 4.5 pts | 65.4% |
Optimal HCA: 100 Elo points (approximately 3.6-point spread advantage)
Season Carryover
Between seasons, ratings should regress toward the mean to account for roster turnover:
Formula:
R_new_season = R_old_season × carryover + mean_rating × (1 - carryover)
Testing carryover rates:
| Carryover | Year 2 Accuracy |
|---|---|
| 100% | 63.5% |
| 75% | 65.2% |
| 50% | 64.1% |
Optimal Carryover: 75% (moderate regression to mean)
Part 3: Enhancements
Margin of Victory Adjustment
Basic Elo only considers wins/losses. We can incorporate margin of victory:
MOV Multiplier:
mov_mult = log(abs(margin) + 1) × (2.2 / ((rating_diff × 0.001) + 2.2))
This multiplier: - Increases for larger margins - Decreases for expected blowouts (to prevent rating inflation)
Impact: - Accuracy: 65.8% → 66.4% - Brier Score: 0.219 → 0.214 - Spread RMSE: 11.8 → 11.2 points
Recency Weighting
Recent games matter more than early-season games. We can apply a recency adjustment:
def recency_weight(games_ago, decay=0.995):
return decay ** games_ago
This slightly improves late-season predictions.
Part 4: Validation
Accuracy Over Time
Tracking season-long accuracy:
| Month | Accuracy | Notes |
|---|---|---|
| October | 58.2% | Ratings still converging |
| November | 62.5% | Improving |
| December | 64.8% | Near stable |
| January | 65.5% | Good accuracy |
| February | 66.1% | Peak |
| March | 65.8% | Stable |
| April | 65.2% | Slight decline (rest, tanking) |
Comparison to Benchmarks
| Model | Accuracy | Brier Score |
|---|---|---|
| Baseline (home team) | 58.0% | 0.243 |
| Win % based | 61.5% | 0.234 |
| Simple Elo | 65.8% | 0.219 |
| Enhanced Elo | 66.4% | 0.214 |
| Vegas closing line | 67.2% | 0.208 |
The enhanced Elo system approaches but doesn't exceed market accuracy.
Against the Spread
Testing the model against point spreads (converting Elo difference to spread):
Conversion:
spread = (elo_diff + hca) / 28 # ~28 Elo points per point of spread
Results (2000 games): - ATS Accuracy: 51.8% - Not statistically significant (p = 0.21) - Conclusion: No edge against the market
Calibration Check
| Predicted Win % | Actual Win % | Games |
|---|---|---|
| 50-55% | 53.2% | 450 |
| 55-60% | 57.8% | 380 |
| 60-65% | 62.1% | 320 |
| 65-70% | 67.5% | 280 |
| 70-75% | 71.8% | 200 |
| 75%+ | 78.2% | 150 |
The model is well-calibrated across probability ranges.
Part 5: Practical Applications
Pre-Game Predictions
For a game between Team A (Elo: 1620) vs Team B (Elo: 1480) with Team A at home:
Elo difference: 1620 - 1480 + 100 = 240
Expected win prob (A): 1 / (1 + 10^(-240/400)) = 80.4%
Predicted spread: 240 / 28 = 8.6 points
Power Rankings
Current Elo ratings translate to power rankings:
| Rank | Team | Elo Rating | vs Average |
|---|---|---|---|
| 1 | Boston | 1680 | +180 |
| 2 | Denver | 1655 | +155 |
| 3 | Milwaukee | 1640 | +140 |
| ... | ... | ... | ... |
| 15 | League Avg | 1500 | 0 |
| ... | ... | ... | ... |
| 30 | Detroit | 1340 | -160 |
Playoff Predictions
Using Elo for playoff series simulation:
- Calculate single-game win probability
- Simulate 7-game series (thousands of times)
- Report series win probability and expected games
Example: - Team A Elo: 1650, Team B Elo: 1550 - Home court to Team A - Game-by-game win probs: 75%, 75%, 68%, 68%, 75%, 68%, 75% - Series win prob for A: 87% - Expected games: 5.2
Part 6: Lessons Learned
What Works Well
- Simplicity: Basic Elo captures most predictable variance
- Adaptability: Ratings update automatically with each game
- Interpretability: Easy to explain and understand
- Calibration: Produces well-calibrated probabilities
Limitations
- No roster information: Injuries not incorporated
- No matchup specifics: Style matchups ignored
- Slow to adapt: Takes 15-20 games to converge
- Market benchmark: Doesn't beat betting markets
Recommendations
- Use Elo as a baseline model
- Combine with other approaches (efficiency models, injury adjustments)
- Don't expect to beat the market consistently
- Update parameters annually
Exercises
Exercise 1
Implement the basic Elo system and run it on one NBA season. Compare your ratings to final standings.
Exercise 2
Test different K-factors on your implementation. Create a graph showing accuracy vs. K-factor.
Exercise 3
Add margin-of-victory adjustment to your system. Measure the improvement in prediction accuracy.
Exercise 4
Use your Elo system to simulate the NBA playoffs. Compare to actual results.
Conclusion
An Elo rating system provides a robust, interpretable foundation for NBA game prediction. While it doesn't beat betting markets, it achieves approximately 66% accuracy with well-calibrated probabilities. The system serves as an excellent baseline for more sophisticated models and provides intuitive power rankings that capture team strength throughout the season.