22 min read

Learning Objectives

Test sports performance data for stationarity using the ADF and KPSS tests and apply appropriate transformations
Build and validate ARIMA models for forecasting team and player performance metrics
Quantify mean reversion in sports statistics and estimate half-lives of performance deviations
Detect structural changepoints in team performance caused by injuries, trades, and coaching changes
Identify and exploit seasonal and calendar effects in sports betting markets

In This Chapter

Chapter Overview
23.1 Stationarity and Trend Detection
23.2 ARIMA Models for Performance Prediction
23.3 Mean Reversion in Sports Performance
23.4 Changepoint Detection
23.5 Seasonal Patterns and Calendar Effects
23.6 Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 23: Time Series Analysis for Betting

"The best prophet of the future is the past." --- Lord Byron

Chapter Overview

Sports data is inherently temporal. A team's performance unfolds game by game across a season. A player's shooting percentage evolves week by week. Betting lines move minute by minute as kickoff approaches. To ignore the sequential, time-dependent structure of this data is to discard one of the richest sources of predictive information available to the analytical bettor.

Time series analysis provides the mathematical framework for modeling data that arrives in ordered sequences. Unlike the cross-sectional methods we explored in earlier chapters --- where each observation is treated as independent --- time series methods explicitly model the dependence between observations at different points in time. This dependence is precisely what makes prediction possible: if today's performance is related to yesterday's performance in a quantifiable way, we can build models that leverage this relationship.

This chapter is the first in Part V: Advanced Quantitative Methods. It assumes familiarity with the statistical foundations developed in Chapter 6 and the regression modeling techniques from Chapter 9. We will build substantially on those concepts, extending them into the temporal domain.

We begin with the critical concept of stationarity --- a property that determines which models are appropriate for a given data series. We then develop ARIMA models, the workhorse of classical time series forecasting, and apply them to team performance prediction. From there, we investigate mean reversion, one of the most powerful phenomena in sports analytics, before turning to changepoint detection --- the problem of identifying when a team's underlying talent level has genuinely shifted. We close with an analysis of seasonal and calendar effects that create recurring patterns in sports outcomes and betting markets.

In this chapter, you will learn to:

Test any sports time series for stationarity and apply the correct transformations when the series is non-stationary
Build ARIMA models that forecast team performance metrics with quantified uncertainty
Measure mean reversion rates and estimate how quickly extreme performances regress toward the mean
Detect structural breaks in team performance data using both frequentist and Bayesian methods
Identify exploitable seasonal patterns in sports scheduling and betting markets

23.1 Stationarity and Trend Detection

Why Stationarity Matters

The concept of stationarity is the foundation upon which all classical time series analysis rests. A time series $\{y_t\}$ is said to be strictly stationary if the joint probability distribution of any collection of observations $(y_{t_1}, y_{t_2}, \ldots, y_{t_k})$ is invariant to shifts in time. That is, the statistical properties of the series do not depend on when you observe it.

In practice, we almost always work with a weaker condition called weak stationarity (or covariance stationarity), which requires only three properties:

Constant mean: $E[y_t] = \mu$ for all $t$
Constant variance: $\text{Var}(y_t) = \sigma^2$ for all $t$
Autocovariance depends only on lag: $\text{Cov}(y_t, y_{t+h}) = \gamma(h)$ for all $t$ and lag $h$

Why does this matter for sports betting? Consider a team's point differential over a season. If the team improves steadily due to player development, the mean of the series is not constant --- there is a trend. If we naively fit a model that assumes stationarity, our forecasts will systematically underestimate the team's current performance level. Conversely, if we mistake random fluctuation for a genuine trend, we may overreact to noise.

Non-stationarity in sports data arises from several sources:

Trends: A team gradually improving or declining over a season
Structural breaks: A star player gets injured, fundamentally changing the team's performance level
Evolving variance: A team's consistency may change over time (e.g., more volatile performance early in the season when the rotation is being established)
Seasonality: Recurring patterns tied to the schedule (e.g., NBA teams playing worse on the second night of back-to-backs)

The Augmented Dickey-Fuller (ADF) Test

The most widely used test for stationarity is the Augmented Dickey-Fuller (ADF) test. The null hypothesis of the ADF test is that the series has a unit root --- that is, the series is non-stationary. Rejecting the null hypothesis provides evidence that the series is stationary.

The test is based on the regression:

$$\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \sum_{i=1}^{p} \delta_i \Delta y_{t-i} + \varepsilon_t$$

where $\Delta y_t = y_t - y_{t-1}$ is the first difference. The null hypothesis is $H_0: \gamma = 0$ (unit root present, non-stationary) versus $H_1: \gamma < 0$ (stationary). The test statistic does not follow a standard $t$-distribution; instead, we compare it to the Dickey-Fuller distribution, for which critical values are tabulated.

Key parameters when applying the ADF test:

Lag order ($p$): The number of lagged difference terms included to account for serial correlation. Too few lags lead to size distortions; too many reduce power. Information criteria (AIC, BIC) can guide selection.
Trend specification: The regression can include a constant only, a constant and linear trend, or neither. The choice depends on the suspected data-generating process.

The KPSS Test

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test complements the ADF test by reversing the hypotheses. The null hypothesis of the KPSS test is that the series is stationary, and the alternative is that it is non-stationary.

Using both tests together provides a more robust assessment:

ADF Result	KPSS Result	Conclusion
Reject $H_0$ (stationary)	Fail to reject $H_0$ (stationary)	Strong evidence of stationarity
Fail to reject $H_0$ (non-stationary)	Reject $H_0$ (non-stationary)	Strong evidence of non-stationarity
Reject $H_0$ (stationary)	Reject $H_0$ (non-stationary)	Conflicting --- may be trend-stationary
Fail to reject $H_0$ (non-stationary)	Fail to reject $H_0$ (stationary)	Conflicting --- insufficient data or near-boundary

Detrending and Differencing

When a series is non-stationary, we must transform it before modeling. The two primary approaches are:

Detrending: If the non-stationarity is due to a deterministic trend (i.e., the series is trend-stationary), we can fit a regression of $y_t$ on $t$ and work with the residuals:

$$y_t = \alpha + \beta t + \varepsilon_t \quad \Rightarrow \quad \tilde{y}_t = y_t - \hat{\alpha} - \hat{\beta} t$$

Differencing: If the non-stationarity is due to a stochastic trend (i.e., the series is difference-stationary or has a unit root), we take the first difference:

$$\Delta y_t = y_t - y_{t-1}$$

If the first-differenced series is still non-stationary, we difference again to obtain $\Delta^2 y_t = \Delta y_t - \Delta y_{t-1}$. In practice, sports performance data rarely requires more than one or two rounds of differencing.

The choice between detrending and differencing matters: applying the wrong transformation can introduce spurious dynamics or remove genuine signal.

Python Implementation

Let us put these concepts into practice using a concrete sports example. We will test whether a team's rolling net rating (point differential per 100 possessions in basketball) is stationary over the course of a season.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf


def generate_team_performance_series(
    n_games: int = 82,
    true_talent: float = 3.0,
    noise_std: float = 12.0,
    trend_per_game: float = 0.0,
    changepoint_game: int | None = None,
    changepoint_shift: float = 0.0,
    seed: int = 42,
) -> pd.Series:
    """
    Generate a synthetic team performance time series.

    Simulates game-by-game point differentials for a team with a
    specified true talent level, optional linear trend, and optional
    structural break (changepoint).

    Args:
        n_games: Number of games in the season.
        true_talent: Baseline expected point differential.
        noise_std: Standard deviation of game-to-game noise.
        trend_per_game: Linear improvement per game (0 for no trend).
        changepoint_game: Game number at which a structural break occurs.
        changepoint_shift: Magnitude of the shift at the changepoint.
        seed: Random seed for reproducibility.

    Returns:
        A pandas Series indexed by game number containing point
        differentials.
    """
    rng = np.random.default_rng(seed)
    games = np.arange(1, n_games + 1)
    talent = true_talent + trend_per_game * games

    if changepoint_game is not None:
        talent[games >= changepoint_game] += changepoint_shift

    point_diffs = talent + rng.normal(0, noise_std, size=n_games)
    return pd.Series(point_diffs, index=games, name="point_differential")


def test_stationarity(
    series: pd.Series,
    significance_level: float = 0.05,
    verbose: bool = True,
) -> dict:
    """
    Perform both ADF and KPSS stationarity tests on a time series.

    Args:
        series: The time series to test.
        significance_level: Threshold for rejecting null hypotheses.
        verbose: Whether to print detailed results.

    Returns:
        Dictionary containing test statistics, p-values, and
        conclusions for both ADF and KPSS tests.
    """
    # ADF Test (H0: non-stationary / unit root)
    adf_result = adfuller(series.dropna(), autolag="AIC")
    adf_statistic = adf_result[0]
    adf_pvalue = adf_result[1]
    adf_lags = adf_result[2]
    adf_critical = adf_result[4]

    # KPSS Test (H0: stationary)
    kpss_result = kpss(series.dropna(), regression="c", nlags="auto")
    kpss_statistic = kpss_result[0]
    kpss_pvalue = kpss_result[1]
    kpss_critical = kpss_result[3]

    # Determine conclusions
    adf_stationary = adf_pvalue < significance_level
    kpss_stationary = kpss_pvalue > significance_level

    if verbose:
        print("=" * 60)
        print("STATIONARITY TEST RESULTS")
        print("=" * 60)
        print(f"\nAugmented Dickey-Fuller Test")
        print(f"  Test Statistic: {adf_statistic:.4f}")
        print(f"  P-value:        {adf_pvalue:.4f}")
        print(f"  Lags Used:      {adf_lags}")
        print(f"  Critical Values:")
        for key, val in adf_critical.items():
            print(f"    {key}: {val:.4f}")
        print(f"  Conclusion: {'STATIONARY' if adf_stationary else 'NON-STATIONARY'}")

        print(f"\nKPSS Test")
        print(f"  Test Statistic: {kpss_statistic:.4f}")
        print(f"  P-value:        {kpss_pvalue:.4f}")
        print(f"  Critical Values:")
        for key, val in kpss_critical.items():
            print(f"    {key}: {val:.4f}")
        print(f"  Conclusion: {'STATIONARY' if kpss_stationary else 'NON-STATIONARY'}")

        # Combined assessment
        if adf_stationary and kpss_stationary:
            print("\nCOMBINED: Strong evidence of STATIONARITY")
        elif not adf_stationary and not kpss_stationary:
            print("\nCOMBINED: Strong evidence of NON-STATIONARITY")
        else:
            print("\nCOMBINED: CONFLICTING results --- further investigation needed")

    return {
        "adf_statistic": adf_statistic,
        "adf_pvalue": adf_pvalue,
        "adf_stationary": adf_stationary,
        "kpss_statistic": kpss_statistic,
        "kpss_pvalue": kpss_pvalue,
        "kpss_stationary": kpss_stationary,
    }


# Example: Stationary series (constant true talent)
stationary_series = generate_team_performance_series(
    n_games=82, true_talent=3.0, trend_per_game=0.0, seed=42
)
print("Test 1: Stationary series (no trend)")
results_stationary = test_stationarity(stationary_series)

print("\n")

# Example: Non-stationary series (improving team)
trending_series = generate_team_performance_series(
    n_games=82, true_talent=-2.0, trend_per_game=0.12, seed=42
)
print("Test 2: Trending series (improving team)")
results_trending = test_stationarity(trending_series)


def make_stationary(
    series: pd.Series,
    method: str = "difference",
    max_diffs: int = 2,
) -> tuple[pd.Series, int]:
    """
    Transform a non-stationary series into a stationary one.

    Args:
        series: The input time series.
        method: Either 'difference' for differencing or 'detrend'
            for linear detrending.
        max_diffs: Maximum number of differences to apply.

    Returns:
        A tuple of (transformed series, number of differences applied)
        or (detrended series, 0) if detrending was used.
    """
    if method == "detrend":
        t = np.arange(len(series))
        coeffs = np.polyfit(t, series.values, deg=1)
        trend = np.polyval(coeffs, t)
        detrended = pd.Series(
            series.values - trend, index=series.index, name=series.name
        )
        return detrended, 0

    # Differencing approach
    diff_series = series.copy()
    n_diffs = 0
    for d in range(max_diffs):
        adf_pvalue = adfuller(diff_series.dropna(), autolag="AIC")[1]
        if adf_pvalue < 0.05:
            break
        diff_series = diff_series.diff().dropna()
        n_diffs += 1

    return diff_series, n_diffs


# Apply differencing to the trending series
diff_series, n_diffs = make_stationary(trending_series, method="difference")
print(f"\nDifferences required: {n_diffs}")
print("After differencing:")
test_stationarity(diff_series)

Practical Tip: In sports analytics, most game-level performance series for a single season (82 NBA games, 162 MLB games) will be stationary or nearly so. The noise level is typically so high relative to any underlying trend that the ADF test has difficulty rejecting the null. This does not mean trends do not exist --- it means that with only a season's worth of data, the signal-to-noise ratio is too low for formal tests to detect them reliably. Combine statistical tests with domain knowledge: if a team traded its star player, the trend is real regardless of what the ADF test says.

23.2 ARIMA Models for Performance Prediction

The Components of ARIMA

The AutoRegressive Integrated Moving Average (ARIMA) model is the classical workhorse of time series forecasting. It combines three components:

AR (AutoRegressive) of order $p$: The current value depends linearly on its own past values:

$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \varepsilon_t$$

where $\phi_1, \ldots, \phi_p$ are the autoregressive coefficients and $\varepsilon_t$ is white noise. In sports terms, this captures the idea that a team's performance in game $t$ is partially predicted by its performance in recent games --- momentum, form, and underlying talent all contribute.

MA (Moving Average) of order $q$: The current value depends linearly on past forecast errors:

$$y_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}$$

where $\theta_1, \ldots, \theta_q$ are the moving average coefficients. This captures short-lived shocks: a player returns from a minor injury and the team overperforms for a game or two before reverting, or a blown officiating call creates a one-game anomaly that ripples through the next game's motivation.

I (Integrated) of order $d$: The number of differences needed to make the series stationary. An ARIMA($p, d, q$) model applies $d$ differences to the series before fitting an ARMA($p, q$) model.

The full ARIMA($p, d, q$) model for the differenced series $w_t = \Delta^d y_t$ is:

$$w_t = c + \sum_{i=1}^{p} \phi_i w_{t-i} + \varepsilon_t + \sum_{j=1}^{q} \theta_j \varepsilon_{t-j}$$

Model Identification Using ACF and PACF

The Autocorrelation Function (ACF) measures the correlation between $y_t$ and $y_{t-h}$ at lag $h$:

$$\rho(h) = \frac{\text{Cov}(y_t, y_{t-h})}{\text{Var}(y_t)}$$

The Partial Autocorrelation Function (PACF) measures the correlation between $y_t$ and $y_{t-h}$ after removing the linear effect of the intervening values $y_{t-1}, \ldots, y_{t-h+1}$.

The classical identification rules are:

Model	ACF Pattern	PACF Pattern
AR($p$)	Tails off (exponential or oscillating decay)	Cuts off after lag $p$
MA($q$)	Cuts off after lag $q$	Tails off
ARMA($p, q$)	Tails off	Tails off

In practice, the patterns in sports data are rarely as clean as textbook examples. Game-level noise is large, and sample sizes within a single season are modest. This makes automated model selection particularly valuable.

Fitting ARIMA Models in Python

We will use both statsmodels for manual fitting and pmdarima for automated model selection.

import numpy as np
import pandas as pd
import warnings
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import pmdarima as pm
from sklearn.metrics import mean_squared_error, mean_absolute_error


def fit_arima_manual(
    series: pd.Series,
    order: tuple[int, int, int] = (1, 0, 0),
    verbose: bool = True,
) -> dict:
    """
    Fit an ARIMA model with a specified order.

    Args:
        series: The time series to model.
        order: Tuple (p, d, q) specifying the ARIMA order.
        verbose: Whether to print the model summary.

    Returns:
        Dictionary with fitted model, residuals, AIC, BIC, and
        in-sample predictions.
    """
    model = ARIMA(series, order=order)
    fitted = model.fit()

    if verbose:
        print(fitted.summary())

    return {
        "model": fitted,
        "residuals": fitted.resid,
        "aic": fitted.aic,
        "bic": fitted.bic,
        "predictions": fitted.fittedvalues,
    }


def fit_arima_auto(
    series: pd.Series,
    max_p: int = 5,
    max_q: int = 5,
    max_d: int = 2,
    seasonal: bool = False,
    verbose: bool = True,
) -> dict:
    """
    Automatically select and fit the best ARIMA model using pmdarima.

    Uses a stepwise search over (p, d, q) orders to minimize AIC.

    Args:
        series: The time series to model.
        max_p: Maximum AR order to consider.
        max_q: Maximum MA order to consider.
        max_d: Maximum differencing order.
        seasonal: Whether to include seasonal terms.
        verbose: Whether to print the model summary.

    Returns:
        Dictionary with fitted model, order, AIC, BIC, and
        in-sample predictions.
    """
    auto_model = pm.auto_arima(
        series,
        start_p=0,
        start_q=0,
        max_p=max_p,
        max_q=max_q,
        max_d=max_d,
        seasonal=seasonal,
        stepwise=True,
        suppress_warnings=True,
        error_action="ignore",
        trace=verbose,
        information_criterion="aic",
    )

    if verbose:
        print(auto_model.summary())

    return {
        "model": auto_model,
        "order": auto_model.order,
        "aic": auto_model.aic(),
        "bic": auto_model.bic(),
        "predictions": auto_model.predict_in_sample(),
    }


def rolling_arima_forecast(
    series: pd.Series,
    train_size: int = 40,
    order: tuple[int, int, int] | None = None,
    auto_select: bool = True,
    refit_every: int = 10,
) -> pd.DataFrame:
    """
    Perform rolling one-step-ahead ARIMA forecasts.

    Starting from a training window of `train_size` observations,
    this function generates one-step-ahead forecasts for each
    subsequent observation, optionally refitting the model at
    regular intervals.

    Args:
        series: The full time series.
        train_size: Number of initial observations for training.
        order: Fixed ARIMA order; ignored if auto_select is True.
        auto_select: Whether to use auto_arima for order selection.
        refit_every: Refit the model every N steps.

    Returns:
        DataFrame with columns 'actual', 'forecast', and 'error'.
    """
    results = []
    current_order = order

    for i in range(train_size, len(series)):
        train = series.iloc[:i]

        # Refit at specified intervals or on first iteration
        if (i - train_size) % refit_every == 0 or current_order is None:
            if auto_select:
                with warnings.catch_warnings():
                    warnings.simplefilter("ignore")
                    auto_model = pm.auto_arima(
                        train, max_p=3, max_q=3, max_d=1,
                        seasonal=False, stepwise=True,
                        suppress_warnings=True, error_action="ignore",
                    )
                    current_order = auto_model.order

        # Fit and forecast
        try:
            model = ARIMA(train, order=current_order)
            fitted = model.fit()
            forecast = fitted.forecast(steps=1).iloc[0]
        except Exception:
            forecast = train.mean()

        actual = series.iloc[i]
        results.append({
            "game": series.index[i],
            "actual": actual,
            "forecast": forecast,
            "error": actual - forecast,
        })

    return pd.DataFrame(results).set_index("game")


# Worked Example: Forecasting team point differential
np.random.seed(42)
n_games = 82
true_talent = 4.5  # Good team, about 4.5 points better per game
noise = np.random.normal(0, 11, n_games)
# Add slight AR(1) structure: momentum effect
performance = np.zeros(n_games)
performance[0] = true_talent + noise[0]
for t in range(1, n_games):
    performance[t] = (
        0.7 * true_talent
        + 0.3 * performance[t - 1]
        + noise[t] * 0.9
    )

team_series = pd.Series(performance, index=range(1, n_games + 1),
                         name="point_diff")

# Auto-fit ARIMA
print("AUTO-ARIMA MODEL SELECTION")
print("=" * 50)
auto_result = fit_arima_auto(team_series, verbose=True)
print(f"\nSelected order: {auto_result['order']}")

# Rolling forecast evaluation
print("\nROLLING FORECAST EVALUATION")
print("=" * 50)
forecasts = rolling_arima_forecast(team_series, train_size=30)
rmse = np.sqrt(mean_squared_error(forecasts["actual"], forecasts["forecast"]))
mae = mean_absolute_error(forecasts["actual"], forecasts["forecast"])
naive_rmse = np.sqrt(mean_squared_error(
    forecasts["actual"], [team_series.iloc[:i].mean()
                          for i in range(30, len(team_series))]
))
print(f"ARIMA RMSE:    {rmse:.3f}")
print(f"ARIMA MAE:     {mae:.3f}")
print(f"Naive RMSE:    {naive_rmse:.3f}")
print(f"Improvement:   {(1 - rmse / naive_rmse) * 100:.1f}%")

Interpreting ARIMA Results for Betting

The practical value of ARIMA modeling for betting lies not in the point forecast itself --- which is typically very noisy for individual games --- but in several derived insights:

Trend detection: The fitted model reveals whether a team's performance is trending upward or downward, even when the raw data is too noisy to see this clearly. A positive drift term in an ARIMA(0,1,0) model indicates improvement.
Forecast confidence intervals: ARIMA models provide prediction intervals that quantify forecast uncertainty. A 95% prediction interval for next game's point differential might span 25 points, reminding us how uncertain individual game outcomes are.
Residual analysis: If the model residuals show structure (e.g., clustering of large residuals), this suggests the model is missing something --- perhaps a home/away effect, a rest-days effect, or a changepoint.
Model comparison: By comparing ARIMA forecasts against the market's implied forecasts (the point spread), we can identify periods where the model and the market disagree most, which are precisely the situations where betting opportunities exist.

Common Pitfall: ARIMA models are linear and assume Gaussian errors. Sports outcomes often have heavier tails than the normal distribution (blowouts and upsets are more common than a Gaussian model predicts). Consider using robust estimation methods or supplementing ARIMA forecasts with explicit tail-risk modeling when making betting decisions.

23.3 Mean Reversion in Sports Performance

Regression to the Mean

Mean reversion --- the tendency for extreme observations to be followed by less extreme ones --- is one of the most important statistical phenomena in sports. It was first identified by Sir Francis Galton in the 1880s, who noticed that unusually tall parents tended to have children who were tall but closer to the population average. The same principle applies pervasively in sports.

When a team wins 75% of its games through the first quarter of the season, it almost certainly will not sustain that pace. Part of the 75% win rate reflects genuine talent, but part reflects good fortune: close games that fell their way, opponents missing key players, favorable scheduling. As the season progresses, the luck component tends to average out, and the team's record converges toward its true talent level.

Mathematically, if a team's observed performance $y_t$ consists of a true talent component $\mu$ and a noise component $\varepsilon_t$:

$$y_t = \mu + \varepsilon_t, \quad \varepsilon_t \sim N(0, \sigma^2_\varepsilon)$$

then an observation at time $t$ that deviates far from $\mu$ is likely driven largely by a large realization of $\varepsilon_t$, which is, by construction, uncorrelated with future noise terms. The expected value of $y_{t+1}$ conditional on $y_t$ is:

$$E[y_{t+1} | y_t] = \mu + r(y_t - \mu)$$

where $r$ is the regression coefficient (or mean reversion parameter), with $0 \leq r < 1$. If $r = 0$, performance fully reverts to the mean each period. If $r$ is close to 1, performance persists strongly.

Estimating the Half-Life of Performance

The half-life of a performance deviation is the number of time periods required for half of the initial deviation from the mean to dissipate. For an AR(1) process $y_t = \mu + \phi(y_{t-1} - \mu) + \varepsilon_t$, the expected deviation after $k$ periods is:

$$E[y_{t+k} - \mu | y_t] = \phi^k (y_t - \mu)$$

Setting this equal to half the initial deviation:

$$\phi^{h} = 0.5 \quad \Rightarrow \quad h = \frac{\ln(0.5)}{\ln(\phi)} = \frac{-0.693}{\ln(\phi)}$$

For example, if $\phi = 0.85$, the half-life is:

$$h = \frac{-0.693}{\ln(0.85)} = \frac{-0.693}{-0.163} \approx 4.3 \text{ games}$$

This means that if a team is performing 10 points above its true talent level, after about 4.3 games, we would expect it to be only 5 points above.

The Pythagorean Expectation as a Mean Reversion Tool

Bill James's Pythagorean expectation provides an elegant illustration of mean reversion. The formula estimates a team's expected winning percentage based on points scored ($PS$) and points allowed ($PA$):

$$\text{Win\%}_{\text{expected}} = \frac{PS^\gamma}{PS^\gamma + PA^\gamma}$$

where $\gamma$ depends on the sport (approximately 2.0 for MLB, 2.37 for MLB with the refined formula, 13.91 for NFL, 16.5 for NBA).

Teams that significantly outperform their Pythagorean expectation --- winning more games than their scoring margin would predict --- tend to regress toward it. The deviation between actual and Pythagorean winning percentage is often called the luck component, and it provides a direct estimate of how much mean reversion to expect.

Python Implementation

import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.tsa.ar_model import AutoReg


def estimate_mean_reversion(
    series: pd.Series,
    method: str = "ar1",
) -> dict:
    """
    Estimate mean reversion parameters from a time series.

    Args:
        series: The performance time series to analyze.
        method: Estimation method. Options are 'ar1' (fit AR(1) model)
            or 'ols' (regress y_t on y_{t-1}).

    Returns:
        Dictionary with regression coefficient, half-life, mean,
        and standard error.
    """
    if method == "ar1":
        model = AutoReg(series.dropna(), lags=1, old_names=False)
        fitted = model.fit()
        phi = fitted.params.iloc[1]  # AR(1) coefficient
        mu = fitted.params.iloc[0] / (1 - phi)  # Unconditional mean
    elif method == "ols":
        y = series.values[1:]
        x = series.values[:-1]
        slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
        phi = slope
        mu = intercept / (1 - phi) if phi != 1 else np.nan

    # Calculate half-life
    if 0 < phi < 1:
        half_life = -np.log(2) / np.log(phi)
    elif phi <= 0:
        half_life = 0  # Immediate or oscillating reversion
    else:
        half_life = np.inf  # No reversion (unit root)

    return {
        "regression_coefficient": phi,
        "half_life": half_life,
        "unconditional_mean": mu,
        "persistence": phi,
    }


def pythagorean_mean_reversion(
    points_scored: np.ndarray,
    points_allowed: np.ndarray,
    actual_wins: int,
    total_games: int,
    exponent: float = 13.91,
) -> dict:
    """
    Calculate Pythagorean expected wins and the luck component.

    Args:
        points_scored: Array of points scored in each game.
        points_allowed: Array of points allowed in each game.
        actual_wins: Actual number of wins.
        total_games: Total number of games played.
        exponent: Pythagorean exponent for the sport.
            NFL ~ 2.37-2.68 (for point-based), NBA ~ 16.5,
            MLB ~ 1.83-2.0.

    Returns:
        Dictionary with expected wins, luck component, and predicted
        regression.
    """
    total_ps = points_scored.sum()
    total_pa = points_allowed.sum()

    pyth_pct = total_ps ** exponent / (
        total_ps ** exponent + total_pa ** exponent
    )
    expected_wins = pyth_pct * total_games

    luck_wins = actual_wins - expected_wins
    actual_pct = actual_wins / total_games

    return {
        "actual_wins": actual_wins,
        "actual_win_pct": actual_pct,
        "pythagorean_win_pct": pyth_pct,
        "expected_wins": expected_wins,
        "luck_component_wins": luck_wins,
        "predicted_regression_wins": -luck_wins,
        "total_points_scored": total_ps,
        "total_points_allowed": total_pa,
    }


def simulate_mean_reversion(
    true_talent: float = 0.550,
    noise_std: float = 0.15,
    n_periods: int = 162,
    n_simulations: int = 10000,
    extreme_threshold: float = 0.650,
    observation_point: int = 40,
    seed: int = 42,
) -> dict:
    """
    Monte Carlo simulation demonstrating mean reversion.

    Simulates many seasons and examines what happens to teams
    that are performing above an extreme threshold at a specified
    point in the season.

    Args:
        true_talent: True winning probability per game.
        noise_std: Standard deviation of game-level noise.
        n_periods: Total games in a season.
        n_simulations: Number of seasons to simulate.
        extreme_threshold: Win rate threshold to define 'extreme'.
        observation_point: Game number at which we observe performance.
        seed: Random seed.

    Returns:
        Dictionary with simulation results showing regression patterns.
    """
    rng = np.random.default_rng(seed)

    first_half_pcts = []
    second_half_pcts = []

    for _ in range(n_simulations):
        # Each game is Bernoulli with probability = true_talent
        outcomes = rng.binomial(1, true_talent, size=n_periods)
        first_half = outcomes[:observation_point].mean()
        second_half = outcomes[observation_point:].mean()
        first_half_pcts.append(first_half)
        second_half_pcts.append(second_half)

    first_half_pcts = np.array(first_half_pcts)
    second_half_pcts = np.array(second_half_pcts)

    # Filter for extreme first-half performers
    extreme_mask = first_half_pcts >= extreme_threshold
    n_extreme = extreme_mask.sum()

    if n_extreme > 0:
        mean_first_extreme = first_half_pcts[extreme_mask].mean()
        mean_second_extreme = second_half_pcts[extreme_mask].mean()
        regression_amount = mean_first_extreme - mean_second_extreme
    else:
        mean_first_extreme = np.nan
        mean_second_extreme = np.nan
        regression_amount = np.nan

    return {
        "n_simulations": n_simulations,
        "true_talent": true_talent,
        "extreme_threshold": extreme_threshold,
        "n_extreme_teams": n_extreme,
        "mean_first_half_extreme": mean_first_extreme,
        "mean_second_half_extreme": mean_second_extreme,
        "regression_amount": regression_amount,
        "pct_regression_to_mean": (
            regression_amount / (mean_first_extreme - true_talent) * 100
            if n_extreme > 0 else np.nan
        ),
    }


# Worked Example: NFL Team Performance Mean Reversion
print("MEAN REVERSION ANALYSIS")
print("=" * 60)

# Simulate an NFL-like season with AR(1) structure
rng = np.random.default_rng(42)
n_games = 17
true_talent = 2.5  # Above average team
phi_true = 0.15  # Weak persistence in NFL game-to-game

performance = np.zeros(n_games)
performance[0] = true_talent + rng.normal(0, 14)
for t in range(1, n_games):
    performance[t] = (
        true_talent * (1 - phi_true)
        + phi_true * performance[t - 1]
        + rng.normal(0, 14)
    )

nfl_series = pd.Series(performance, index=range(1, n_games + 1),
                        name="point_diff")

# Estimate mean reversion parameters
mr_result = estimate_mean_reversion(nfl_series, method="ols")
print(f"Regression coefficient (phi): {mr_result['regression_coefficient']:.3f}")
print(f"Half-life: {mr_result['half_life']:.1f} games")
print(f"Unconditional mean: {mr_result['unconditional_mean']:.1f}")

# Pythagorean example
print("\nPYTHAGOREAN MEAN REVERSION EXAMPLE")
print("-" * 40)
points_scored = rng.poisson(24, size=17).astype(float)
points_allowed = rng.poisson(21, size=17).astype(float)
actual_wins = 12

pyth_result = pythagorean_mean_reversion(
    points_scored, points_allowed, actual_wins, 17, exponent=2.57
)
print(f"Actual wins: {pyth_result['actual_wins']}")
print(f"Expected wins (Pythagorean): {pyth_result['expected_wins']:.1f}")
print(f"Luck component: {pyth_result['luck_component_wins']:+.1f} wins")
print(f"Predicted regression: {pyth_result['predicted_regression_wins']:+.1f} wins")

# Monte Carlo demonstration
print("\nMONTE CARLO MEAN REVERSION DEMONSTRATION")
print("-" * 40)
sim_result = simulate_mean_reversion(
    true_talent=0.550, extreme_threshold=0.700, observation_point=40,
    n_simulations=50000
)
print(f"Teams above .700 after 40 games: {sim_result['n_extreme_teams']}")
print(f"Their avg first-half win%:  {sim_result['mean_first_half_extreme']:.3f}")
print(f"Their avg second-half win%: {sim_result['mean_second_half_extreme']:.3f}")
print(f"Regression: {sim_result['pct_regression_to_mean']:.1f}% toward true talent")

Betting Implications of Mean Reversion

Mean reversion creates systematic betting opportunities:

Fading hot teams: Teams on long winning streaks are often overvalued by the public because bettors extrapolate recent performance. If you can estimate the true talent level and the current luck component, you can identify when the line has overadjusted.
Backing cold teams: Conversely, teams on losing streaks may offer value. If the losses are driven by bad luck (close losses, opponent free-throw shooting variance) rather than a genuine talent decline, the market may undervalue them.
Season win totals: Preseason over/under win totals can be evaluated by estimating the Pythagorean expectation mid-season and comparing the projected full-season wins against the remaining line.
Player props: Player performance mean-reverts as well. A batter hitting .350 through April will almost certainly not sustain it. A shooter making 45% from three in the first month will regress. The speed of regression depends on sample size and the reliability of the metric.

Key Insight: The rate of mean reversion varies dramatically by sport and by metric. NFL game outcomes have low persistence (high mean reversion) because 17-game samples are small and single-game variance is enormous. MLB batting averages stabilize over hundreds of plate appearances. NBA three-point shooting takes roughly 750 attempts to stabilize. Understanding these stabilization rates is essential for knowing when a performance deviation is signal versus noise.

23.4 Changepoint Detection

When True Talent Changes

Mean reversion assumes that the underlying true talent level is constant and that deviations are driven by noise. But sometimes the talent level itself changes. A star quarterback tears his ACL. A team trades for an elite point guard at the deadline. A new head coach installs a fundamentally different system. In these cases, the old mean is no longer the appropriate target for reversion --- a new regime has begun.

Changepoint detection is the statistical problem of identifying when the data-generating process has shifted. Formally, given a time series $y_1, y_2, \ldots, y_T$, we seek to identify time points $\tau_1, \tau_2, \ldots, \tau_k$ at which the statistical properties of the series change.

The simplest formulation assumes a change in the mean:

$$y_t \sim N(\mu_j, \sigma^2), \quad \tau_{j-1} < t \leq \tau_j$$

where $\mu_j$ is the mean in regime $j$. More sophisticated models allow changes in variance, changes in the autoregressive structure, or changes in the entire distribution.

CUSUM (Cumulative Sum) Method

The CUSUM method detects shifts by tracking the cumulative sum of deviations from a target value. For detecting an upward shift:

$$S_t = \max(0, S_{t-1} + y_t - \mu_0 - k)$$

where $\mu_0$ is the in-control mean and $k$ is an allowance parameter (typically half the shift you want to detect). A changepoint is signaled when $S_t$ exceeds a threshold $h$. For detecting downward shifts, the analogous statistic is:

$$T_t = \min(0, T_{t-1} + y_t - \mu_0 + k)$$

and a changepoint is signaled when $T_t < -h$.

Bayesian Changepoint Detection

The Bayesian approach to changepoint detection provides a probability distribution over possible changepoint locations, offering richer information than a simple yes/no signal.

The Bayesian Online Changepoint Detection (BOCPD) algorithm, introduced by Adams and MacKay (2007), maintains a probability distribution over the run length $r_t$ --- the number of time steps since the last changepoint. At each time step:

Predict: Compute the predictive probability of the new observation given each possible run length.
Update: Use Bayes' rule to update the posterior over run lengths.
Detect: A changepoint is signaled when the probability of $r_t = 0$ (a new run starting) exceeds a threshold.

The algorithm's key advantage is that it operates online --- it can detect changepoints in real time as new data arrives, which is exactly what we need for in-season betting adjustments.

Python Implementation

import numpy as np
import pandas as pd
from scipy import stats
from functools import partial


def cusum_detection(
    series: pd.Series,
    target_mean: float | None = None,
    allowance: float = 0.5,
    threshold: float = 5.0,
) -> dict:
    """
    Detect changepoints using the CUSUM method.

    Args:
        series: The time series to analyze.
        target_mean: Expected mean under the null hypothesis.
            If None, uses the series mean.
        allowance: The allowance parameter k (half the shift to detect).
        threshold: The decision threshold h.

    Returns:
        Dictionary with CUSUM statistics and detected changepoints.
    """
    if target_mean is None:
        target_mean = series.mean()

    values = series.values
    n = len(values)

    s_pos = np.zeros(n)  # Detect upward shift
    s_neg = np.zeros(n)  # Detect downward shift
    changepoints_up = []
    changepoints_down = []

    for t in range(1, n):
        s_pos[t] = max(0, s_pos[t - 1] + values[t] - target_mean - allowance)
        s_neg[t] = min(0, s_neg[t - 1] + values[t] - target_mean + allowance)

        if s_pos[t] > threshold:
            changepoints_up.append(series.index[t])
            s_pos[t] = 0  # Reset after detection

        if s_neg[t] < -threshold:
            changepoints_down.append(series.index[t])
            s_neg[t] = 0  # Reset after detection

    return {
        "cusum_positive": pd.Series(s_pos, index=series.index),
        "cusum_negative": pd.Series(s_neg, index=series.index),
        "changepoints_up": changepoints_up,
        "changepoints_down": changepoints_down,
        "target_mean": target_mean,
        "threshold": threshold,
    }


def bayesian_changepoint_detection(
    series: pd.Series,
    hazard_rate: float = 1 / 20,
    prior_mean: float = 0.0,
    prior_var: float = 100.0,
    observation_var: float = 1.0,
) -> dict:
    """
    Bayesian online changepoint detection (Adams & MacKay, 2007).

    Maintains a posterior over run lengths (time since last changepoint)
    and signals changepoints when the probability of a new run starting
    is high.

    Args:
        series: The time series to analyze.
        hazard_rate: Prior probability of a changepoint at each time step.
            1/20 means we expect a changepoint roughly every 20 observations.
        prior_mean: Prior mean for the normal observation model.
        prior_var: Prior variance for the mean parameter.
        observation_var: Known observation variance.

    Returns:
        Dictionary with run length probabilities and most likely
        changepoint locations.
    """
    values = series.values
    T = len(values)

    # Run length probabilities: R[t, r] = P(r_t = r | y_{1:t})
    R = np.zeros((T + 1, T + 1))
    R[0, 0] = 1.0  # Start with run length 0

    # Sufficient statistics for each run length
    # For normal model: track running mean and precision
    mu_params = np.zeros(T + 1)  # Posterior mean for each run length
    var_params = np.full(T + 1, prior_var)  # Posterior variance

    mu_params[0] = prior_mean
    var_params[0] = prior_var

    changepoint_probs = np.zeros(T)

    for t in range(T):
        # Predictive probability for each run length
        pred_mean = mu_params[: t + 1]
        pred_var = var_params[: t + 1] + observation_var
        pred_probs = stats.norm.pdf(values[t], pred_mean, np.sqrt(pred_var))

        # Growth probabilities (no changepoint)
        growth = R[t, : t + 1] * pred_probs * (1 - hazard_rate)

        # Changepoint probability (run length resets to 0)
        cp_prob = np.sum(R[t, : t + 1] * pred_probs * hazard_rate)

        # Update run length distribution
        R[t + 1, 1 : t + 2] = growth
        R[t + 1, 0] = cp_prob

        # Normalize
        total = R[t + 1, : t + 2].sum()
        if total > 0:
            R[t + 1, : t + 2] /= total

        changepoint_probs[t] = R[t + 1, 0]

        # Update sufficient statistics
        new_mu = np.zeros(t + 2)
        new_var = np.zeros(t + 2)
        new_mu[0] = prior_mean
        new_var[0] = prior_var

        for r in range(1, t + 2):
            # Bayesian update: combine prior with observation
            precision_prior = 1.0 / var_params[r - 1]
            precision_obs = 1.0 / observation_var
            new_precision = precision_prior + precision_obs
            new_var[r] = 1.0 / new_precision
            new_mu[r] = new_var[r] * (
                precision_prior * mu_params[r - 1]
                + precision_obs * values[t]
            )

        mu_params = np.zeros(T + 1)
        var_params = np.full(T + 1, prior_var)
        mu_params[: t + 2] = new_mu
        var_params[: t + 2] = new_var

    # Find most likely changepoints
    cp_series = pd.Series(changepoint_probs, index=series.index)

    return {
        "changepoint_probabilities": cp_series,
        "run_length_matrix": R,
        "most_likely_changepoints": cp_series[
            cp_series > 2 * hazard_rate
        ].sort_values(ascending=False),
    }


# Worked Example: Detecting an injury-related performance drop
print("CHANGEPOINT DETECTION: INJURY SCENARIO")
print("=" * 60)

# Simulate a team that loses its star player at game 45
team_perf = generate_team_performance_series(
    n_games=82,
    true_talent=5.0,
    noise_std=11.0,
    changepoint_game=45,
    changepoint_shift=-7.0,  # Star player worth ~7 points
    seed=123,
)

# CUSUM detection
print("\nCUSUM DETECTION")
print("-" * 40)
cusum_result = cusum_detection(
    team_perf, target_mean=5.0, allowance=2.0, threshold=15.0
)
print(f"Upward changepoints detected at games: {cusum_result['changepoints_up']}")
print(f"Downward changepoints detected at games: {cusum_result['changepoints_down']}")

# Bayesian detection
print("\nBAYESIAN CHANGEPOINT DETECTION")
print("-" * 40)
# Standardize the series for numerical stability
std_series = (team_perf - team_perf.mean()) / team_perf.std()
bayes_result = bayesian_changepoint_detection(
    std_series,
    hazard_rate=1 / 30,
    prior_mean=0.0,
    prior_var=1.0,
    observation_var=1.0,
)

top_changepoints = bayes_result["most_likely_changepoints"].head(5)
print("Top 5 most probable changepoint locations:")
for game, prob in top_changepoints.items():
    print(f"  Game {game}: probability = {prob:.4f}")

# Compare pre- and post-changepoint performance
if len(cusum_result["changepoints_down"]) > 0:
    detected_cp = cusum_result["changepoints_down"][0]
    pre_mean = team_perf[team_perf.index < detected_cp].mean()
    post_mean = team_perf[team_perf.index >= detected_cp].mean()
    print(f"\nPre-changepoint mean (games 1-{detected_cp - 1}): {pre_mean:.2f}")
    print(f"Post-changepoint mean (games {detected_cp}-82): {post_mean:.2f}")
    print(f"Estimated shift: {post_mean - pre_mean:.2f}")

Practical Application to Betting

Changepoint detection is directly actionable for betting. When you detect a structural break in a team's performance --- whether through formal statistical methods or domain knowledge (injury reports, trade announcements) --- you should:

Re-estimate the team's true talent level using only post-changepoint data, while recognizing that the smaller sample increases uncertainty.
Compare your updated estimate against the market. Betting markets often adjust slowly to changepoints, particularly for less high-profile changes like a shift in defensive scheme or a backup catcher's surprising offensive contribution.
Adjust your mean reversion target. If the previous analysis assumed reversion to a pre-changepoint mean, and a genuine structural break has occurred, the reversion target must be updated.
Size bets conservatively during the transition period. Immediately after a changepoint, you have very little data to estimate the new regime's parameters. The Kelly criterion (Chapter 4) scales bet size with confidence, and your confidence should be low in these situations.

Real-World Application: In the 2023-24 NBA season, the Cleveland Cavaliers' performance changed dramatically after the Donovan Mitchell trade rumors were resolved and the team chemistry stabilized. A Bayesian changepoint detector applied to their net rating would have identified the shift several games before the market fully adjusted, creating a window of opportunity for bettors who were monitoring these signals systematically.

23.5 Seasonal Patterns and Calendar Effects

The Nature of Seasonal Effects in Sports

Sports have inherent cyclical structure. An NBA season follows a predictable rhythm: an October/November ramp-up period, a grueling mid-season stretch, the All-Star break, a post-break adjustment, and a late-season push toward the playoffs. An NFL season has Thursday night games, Sunday afternoon and evening games, and Monday night games. MLB has interleague play periods, the trade deadline, and September call-ups.

These cyclical patterns create seasonal effects --- systematic deviations in performance or betting market behavior that recur at regular intervals. From a time series perspective, a seasonal effect is a predictable component of the series that repeats with a known period.

Day-of-Week Effects

One of the most robust calendar effects in sports betting is the day-of-week effect. Several mechanisms drive this:

Rest asymmetry: In the NBA, teams playing the second game of a back-to-back perform measurably worse, particularly on the road. The effect is typically 1-3 points on the spread.
Travel fatigue: Teams traveling across multiple time zones (especially westward flights for East Coast teams) show decreased performance.
Thursday Night Football: NFL teams playing on short rest (Thursday games) show different performance patterns than teams on standard rest.
Scheduling density: In compressed scheduling periods, fatigue accumulates and teams with deeper rosters gain an advantage.

Month-of-Season Effects

Performance tends to vary systematically across the season:

Early season: Higher variance as teams integrate new players and establish rotations. Early-season performance is a less reliable indicator of true talent.
Pre-All-Star break: Fatigue effects begin to accumulate, particularly for teams with heavy minutes concentration on a few players.
Post-All-Star break: Some teams show rejuvenation; others coast. Teams in playoff contention may show increased intensity.
Late season/playoffs: Increased motivation for contenders; "tanking" behavior for eliminated teams distorts incentive structures.

Python Implementation for Seasonal Decomposition

import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.tsa.seasonal import seasonal_decompose, STL


def create_nba_schedule_features(
    game_log: pd.DataFrame,
    date_col: str = "date",
    team_col: str = "team",
    rest_days_col: str = "rest_days",
) -> pd.DataFrame:
    """
    Create calendar and scheduling features from NBA game logs.

    Args:
        game_log: DataFrame with game-level data.
        date_col: Column name containing game dates.
        team_col: Column name for the team identifier.
        rest_days_col: Column name for days since last game.

    Returns:
        DataFrame with added scheduling features.
    """
    df = game_log.copy()
    df[date_col] = pd.to_datetime(df[date_col])

    # Day of week (0 = Monday, 6 = Sunday)
    df["day_of_week"] = df[date_col].dt.dayofweek
    df["day_name"] = df[date_col].dt.day_name()

    # Month
    df["month"] = df[date_col].dt.month
    df["month_name"] = df[date_col].dt.month_name()

    # Back-to-back detection
    df["is_back_to_back"] = (df[rest_days_col] == 0).astype(int)

    # Three-in-four-nights
    df["is_3_in_4"] = 0  # Would require rolling window logic

    # Week of season (approximate)
    season_start = df[date_col].min()
    df["week_of_season"] = ((df[date_col] - season_start).dt.days // 7) + 1

    # Pre/post All-Star break (approximate: mid-February)
    all_star_date = pd.Timestamp(f"{df[date_col].dt.year.mode().iloc[0]}-02-15")
    df["post_allstar"] = (df[date_col] > all_star_date).astype(int)

    return df


def analyze_day_of_week_effect(
    game_data: pd.DataFrame,
    performance_col: str = "point_diff",
    day_col: str = "day_of_week",
    ats_col: str | None = None,
) -> dict:
    """
    Analyze day-of-week effects on team performance and ATS results.

    Args:
        game_data: DataFrame with game results and calendar features.
        performance_col: Column name for the performance metric.
        day_col: Column name for day of week (0=Mon, 6=Sun).
        ats_col: Optional column name for against-the-spread result.

    Returns:
        Dictionary with day-of-week statistics and ANOVA test results.
    """
    day_names = [
        "Monday", "Tuesday", "Wednesday", "Thursday",
        "Friday", "Saturday", "Sunday"
    ]

    # Group by day of week
    grouped = game_data.groupby(day_col)[performance_col]
    day_stats = grouped.agg(["mean", "std", "count"]).reset_index()
    day_stats["day_name"] = day_stats[day_col].map(
        lambda x: day_names[x] if x < 7 else "Unknown"
    )

    # ANOVA test: is performance significantly different across days?
    groups = [group[performance_col].values
              for _, group in game_data.groupby(day_col)]
    f_stat, p_value = stats.f_oneway(*[g for g in groups if len(g) > 1])

    result = {
        "day_statistics": day_stats,
        "anova_f_statistic": f_stat,
        "anova_p_value": p_value,
        "significant": p_value < 0.05,
    }

    # ATS analysis if available
    if ats_col is not None and ats_col in game_data.columns:
        ats_by_day = game_data.groupby(day_col)[ats_col].agg(
            ["mean", "count"]
        ).reset_index()
        ats_by_day.columns = [day_col, "ats_cover_rate", "n_games"]
        ats_by_day["day_name"] = ats_by_day[day_col].map(
            lambda x: day_names[x] if x < 7 else "Unknown"
        )
        result["ats_by_day"] = ats_by_day

    return result


def seasonal_decomposition_analysis(
    series: pd.Series,
    period: int = 7,
    method: str = "stl",
) -> dict:
    """
    Decompose a time series into trend, seasonal, and residual components.

    Args:
        series: The time series to decompose. Should have a regular
            frequency or be resampled.
        period: The seasonal period (e.g., 7 for weekly).
        method: Decomposition method: 'stl' (robust) or 'classical'.

    Returns:
        Dictionary with trend, seasonal, and residual components.
    """
    series_clean = series.dropna()

    if len(series_clean) < 2 * period:
        raise ValueError(
            f"Series length ({len(series_clean)}) must be at least "
            f"2x the period ({period})."
        )

    if method == "stl":
        decomposition = STL(series_clean, period=period, robust=True).fit()
    elif method == "classical":
        decomposition = seasonal_decompose(
            series_clean, period=period, model="additive"
        )
    else:
        raise ValueError(f"Unknown method: {method}")

    return {
        "trend": decomposition.trend,
        "seasonal": decomposition.seasonal,
        "residual": decomposition.resid,
        "original": series_clean,
        "seasonal_strength": 1 - (
            decomposition.resid.dropna().var()
            / (decomposition.seasonal.var() + decomposition.resid.dropna().var())
        ),
    }


def back_to_back_analysis(
    game_data: pd.DataFrame,
    performance_col: str = "point_diff",
    b2b_col: str = "is_back_to_back",
    spread_col: str | None = None,
) -> dict:
    """
    Analyze the impact of back-to-back games on performance and ATS.

    Args:
        game_data: DataFrame with game results.
        performance_col: Column for the performance metric.
        b2b_col: Column indicating back-to-back status (0 or 1).
        spread_col: Optional column with the point spread.

    Returns:
        Dictionary with back-to-back impact statistics.
    """
    b2b_games = game_data[game_data[b2b_col] == 1]
    rest_games = game_data[game_data[b2b_col] == 0]

    b2b_mean = b2b_games[performance_col].mean()
    rest_mean = rest_games[performance_col].mean()
    diff = b2b_mean - rest_mean

    # Two-sample t-test
    t_stat, p_value = stats.ttest_ind(
        b2b_games[performance_col].dropna(),
        rest_games[performance_col].dropna(),
    )

    # Effect size (Cohen's d)
    pooled_std = np.sqrt(
        (b2b_games[performance_col].var() * (len(b2b_games) - 1)
         + rest_games[performance_col].var() * (len(rest_games) - 1))
        / (len(b2b_games) + len(rest_games) - 2)
    )
    cohens_d = diff / pooled_std if pooled_std > 0 else 0

    result = {
        "b2b_mean_performance": b2b_mean,
        "rest_mean_performance": rest_mean,
        "difference": diff,
        "t_statistic": t_stat,
        "p_value": p_value,
        "cohens_d": cohens_d,
        "n_b2b_games": len(b2b_games),
        "n_rest_games": len(rest_games),
    }

    # ATS analysis: do spreads fully account for b2b effect?
    if spread_col is not None and spread_col in game_data.columns:
        b2b_ats = (b2b_games[performance_col] + b2b_games[spread_col]).mean()
        rest_ats = (rest_games[performance_col] + rest_games[spread_col]).mean()
        result["b2b_ats_margin"] = b2b_ats
        result["rest_ats_margin"] = rest_ats
        result["ats_difference"] = b2b_ats - rest_ats

    return result


# Worked Example: Simulated NBA schedule analysis
print("SEASONAL AND CALENDAR EFFECTS ANALYSIS")
print("=" * 60)

# Generate synthetic NBA-like game data
rng = np.random.default_rng(42)
n_games = 82

dates = pd.date_range("2024-10-22", periods=n_games * 2, freq="D")
# Select ~82 game dates (not every day)
game_dates = sorted(rng.choice(dates, size=n_games, replace=False))
game_dates = pd.to_datetime(game_dates)

# Simulate rest days
rest_days = np.diff(game_dates).astype("timedelta64[D]").astype(int)
rest_days = np.concatenate([[3], rest_days])  # First game after 3 days rest

# Base performance with calendar effects
true_talent = 3.0
day_effects = {0: -1.5, 1: 0, 2: 0.5, 3: -2.0, 4: 0, 5: 1.0, 6: 0.5}
b2b_effect = -2.5

point_diffs = []
for i, (date, rest) in enumerate(zip(game_dates, rest_days)):
    base = true_talent
    base += day_effects.get(date.dayofweek, 0)
    if rest == 0:
        base += b2b_effect
    noise = rng.normal(0, 12)
    point_diffs.append(base + noise)

synthetic_games = pd.DataFrame({
    "date": game_dates,
    "team": "SYN",
    "point_diff": point_diffs,
    "rest_days": rest_days,
})

# Add features
synthetic_games = create_nba_schedule_features(synthetic_games)

# Analyze day-of-week effects
print("\nDAY-OF-WEEK ANALYSIS")
print("-" * 40)
dow_result = analyze_day_of_week_effect(synthetic_games)
print(dow_result["day_statistics"][["day_name", "mean", "std", "count"]].to_string())
print(f"\nANOVA F-statistic: {dow_result['anova_f_statistic']:.3f}")
print(f"ANOVA p-value:     {dow_result['anova_p_value']:.3f}")

# Analyze back-to-back effect
print("\nBACK-TO-BACK ANALYSIS")
print("-" * 40)
b2b_result = back_to_back_analysis(synthetic_games)
print(f"B2B mean point diff:  {b2b_result['b2b_mean_performance']:.2f}")
print(f"Rest mean point diff: {b2b_result['rest_mean_performance']:.2f}")
print(f"Difference:           {b2b_result['difference']:.2f}")
print(f"Cohen's d:            {b2b_result['cohens_d']:.3f}")
print(f"p-value:              {b2b_result['p_value']:.4f}")

Exploiting Calendar Effects in Betting

Calendar effects are among the most actionable findings in sports analytics for betting because they are:

Recurring and predictable: Unlike a one-time injury, the NBA schedule is known in advance. You can identify every back-to-back and every long road trip before the season starts.
Quantifiable: The back-to-back effect in the NBA has been measured across thousands of games. While the exact magnitude varies by study, the consensus is a 1-3 point negative effect on the road team.
Potentially underpriced: While modern sportsbooks do adjust for schedule effects, the adjustment may not be complete, particularly for more subtle effects (three-in-four-nights, travel distance, altitude changes) or for interactions (a back-to-back after a road trip for a team with a thin rotation).

To exploit these effects:

Build a database of schedule features for every game.
Estimate the marginal impact of each factor using regression (controlling for team quality, opponent quality, and other factors).
Compare your schedule-adjusted forecast against the market line.
Bet when the discrepancy exceeds your threshold for expected value.

Common Pitfall: Calendar effects that were significant historically may have been priced into the market as sportsbooks have become more sophisticated. The back-to-back effect in the NBA, for example, was significantly underweighted by markets in the early 2010s but is now generally well-captured by major sportsbooks. Always test for profitability after accounting for the vig, not just for statistical significance.

23.6 Chapter Summary

This chapter introduced the core concepts and tools of time series analysis as applied to sports betting:

Stationarity (Section 23.1): We established that most time series methods require stationarity --- constant mean, constant variance, and autocovariance that depends only on lag. We learned to test for stationarity using the ADF test (null: non-stationary) and KPSS test (null: stationary), and to achieve stationarity through differencing or detrending.

ARIMA Modeling (Section 23.2): We developed the ARIMA framework, which combines autoregressive, differencing, and moving average components to model and forecast time-dependent sports data. We used ACF/PACF plots for model identification and pmdarima for automated model selection. Rolling forecast evaluation quantified real-world predictive performance.

Mean Reversion (Section 23.3): We formalized the concept of regression to the mean, estimated half-lives of performance deviations, and connected the Pythagorean expectation to mean reversion analysis. Monte Carlo simulations demonstrated how extreme early-season performers systematically regress.

Changepoint Detection (Section 23.4): We addressed the complementary problem of identifying when a team's true talent level has genuinely changed, using both the frequentist CUSUM method and Bayesian online changepoint detection. This is critical for distinguishing noise (which will revert) from structural breaks (which will not).

Seasonal Patterns (Section 23.5): We identified and quantified recurring calendar effects --- day-of-week, back-to-back, and month-of-season patterns --- and discussed how to incorporate these into a betting framework.

Key Takeaways

Always test for stationarity before fitting time series models. Use both ADF and KPSS for robust conclusions.
ARIMA models are valuable not for their point forecasts (which are noisy) but for their ability to detect trends, quantify uncertainty, and reveal residual structure.
Mean reversion is the single most important concept for sports bettors: most extreme performances are partly driven by luck and will regress. The rate of reversion depends on the sport, the metric, and the sample size.
Changepoint detection distinguishes genuine talent shifts from noise, preventing you from applying mean reversion when the mean itself has changed.
Calendar effects are recurring, quantifiable, and potentially exploitable --- but always verify profitability net of the vig against current market efficiency.

What Comes Next

In Chapter 24, we turn to simulation and Monte Carlo methods. Where this chapter used time series models to make forecasts from historical data, Chapter 24 will use simulation to quantify uncertainty, evaluate betting strategies, and answer "what if" questions by generating thousands of synthetic scenarios. The two approaches are complementary: time series models provide the parameter estimates that simulation then propagates through complex systems.

Chapter 23 Exercises

Download game-level point differential data for an NBA team of your choice. Test the series for stationarity using both ADF and KPSS tests. Does the conclusion change if you use the first 41 games versus the full 82?
Fit an ARIMA model to MLB team run differential over a full season (162 games). Compare the auto-selected model order to your prediction based on ACF/PACF plots. Evaluate rolling one-step-ahead forecasts against a naive mean baseline.
Using five seasons of NFL data, estimate the AR(1) coefficient for team point differential and compute the implied half-life. How many games does it take for half of an extreme performance deviation to dissipate? Compare this to the NBA and MLB.
Simulate a team with a known changepoint (e.g., a 5-point drop in true talent at game 40). Apply both CUSUM and Bayesian changepoint detection. How many games after the true changepoint does each method detect it? How does increasing the noise level affect detection delay?
Using historical NBA scheduling data, estimate the back-to-back effect on point differential. Does the market fully adjust for this effect? Test by examining ATS results for back-to-back teams over multiple seasons.