Chapter 27 Key Takeaways: Advanced Regression and Classification

Core Concepts

XGBoost is the workhorse of modern sports prediction. Gradient-boosted trees build an ensemble sequentially, where each tree corrects the errors of the previous ensemble. XGBoost's innovations --- second-order gradients, built-in regularization ($\gamma T + \frac{1}{2}\lambda\sum w_j^2$), and column subsampling --- make it the most powerful algorithm for tabular sports data.
Hyperparameter tuning is essential but must respect temporal ordering. Use time-series cross-validation (training on earlier games, testing on later ones) rather than random k-fold CV. The most critical hyperparameters are max_depth (tree complexity), learning_rate (contribution per tree), and reg_lambda (L2 regularization).
Random forests complement XGBoost in ensembles. Random forests reduce variance through bagging and feature subsampling, decorrelating trees. They are more robust to hyperparameter choices and harder to overfit, making them excellent team members in a stacking ensemble alongside XGBoost and logistic regression.
Probability calibration is not optional for betting. A predicted 70% that actually corresponds to 62% leads to phantom value bets and lost money. Platt scaling (2-parameter sigmoid) works for small calibration sets; isotonic regression (nonparametric) works for larger ones. Always calibrate on held-out data.
SHAP values are the gold standard for model interpretability. Grounded in Shapley values from game theory, SHAP decomposition explains exactly why the model makes each prediction. TreeSHAP provides exact, polynomial-time computation for tree-based models, making interpretability practical at scale.

Practical Insights for Bettors

Feature engineering determines the ceiling of any model. Rolling averages, matchup-specific features (offense vs. defense), contextual features (rest, travel), and rating system outputs (Elo, Massey, PageRank from Chapter 26) are the building blocks. Market-derived features (the spread itself) are powerful but reflect information the market already has.
Stacking diverse models beats tuning a single model. An ensemble of logistic regression + random forest + XGBoost, combined via a logistic regression meta-learner trained on out-of-fold predictions, typically improves log-loss by 1-3% over the best individual model. Diversity matters more than individual model quality.
Calibrate after any rebalancing. Class weighting and SMOTE improve recall for rare outcomes (upsets, blowouts) but distort probability estimates. Always recalibrate on the original distribution. For betting, well-calibrated probabilities matter more than classification accuracy.
Monitor for model decay. Sports models degrade over time due to rule changes, player movement, and market adaptation. Track weekly log-loss and calibration curves. Plan to retrain when performance degrades beyond a threshold.
Use SHAP to audit every significant bet. Before placing a large wager, examine the SHAP breakdown. If the edge comes primarily from noise features or features you do not trust, skip the bet. This transforms model-based betting from blind signal-following into informed analysis.

Mathematical Foundations

Gradient boosting minimizes loss via functional gradient descent. Each tree $f_t$ is fit to the negative gradient of the loss function with respect to current predictions: $\hat{y}^{(t)} = \hat{y}^{(t-1)} + \eta f_t(x)$. The learning rate $\eta$ controls the step size, preventing overshooting.
Bagging reduces variance proportionally to tree decorrelation. The ensemble variance formula $\text{Var} = \rho\sigma^2 + \frac{1-\rho}{B}\sigma^2$ shows that as the number of trees $B$ grows, variance approaches $\rho\sigma^2$. Reducing correlation $\rho$ through feature subsampling is the key to further improvement.
ECE measures calibration quality. Expected Calibration Error $= \sum \frac{n_b}{N}|p_b - \hat{p}_b|$ averages the absolute calibration error across bins, weighted by bin size. A well-calibrated sports model achieves ECE below 0.03 (3 percentage points).
SHAP values satisfy additivity. The prediction equals the base value plus the sum of all SHAP values: $f(x) = \phi_0 + \sum_j \phi_j$. This ensures complete, consistent attribution of every prediction to its feature contributions.

Common Pitfalls

Data leakage through standard cross-validation. Using random k-fold CV on sequential game data trains on future games to predict past games. This produces artificially inflated metrics and models that fail in deployment.
Overconfidence from uncalibrated models. XGBoost naturally produces overconfident predictions on sports data. Without calibration, the model sees large edges everywhere, leading to excessive betting and losses.
Spurious feature importance. Built-in gain-based feature importance can overweight high-cardinality features that are used in many splits but do not genuinely predict outcomes. Always cross-reference with SHAP importance.
Confusing classification accuracy with betting profitability. A model with 55% accuracy at -110 odds is profitable (+1.4% ROI). A model with 60% accuracy that only bets on -300 favorites is not. Focus on calibrated probability estimates and the expected value of each bet.

Connections to Other Chapters

Chapter 9 (Regression Analysis): Logistic regression serves as both a baseline model and the preferred meta-learner in stacking ensembles.
Chapter 10 (Bayesian Methods): Bayesian optimization can replace grid search for hyperparameter tuning when the search space is large.
Chapter 26 (Ratings and Rankings): Elo, Massey, and PageRank ratings are among the most powerful features for XGBoost models. The ensemble rating methods from Chapter 26 are a special case of the stacking framework.
Chapter 28 (Bankroll Management): Calibrated probabilities feed directly into Kelly Criterion bet sizing. Without calibration, Kelly sizing produces overbets that increase variance and risk of ruin.