Case Study 2: Temporal Features That Predict NBA Upsets


Executive Summary

NBA upsets --- games where the underdog wins outright --- occur in approximately 35% of all regular-season games, making them frequent enough to build a prediction model around but rare enough that the market often misprices them. This case study investigates whether temporal features --- features derived from the time-series structure of team performance rather than season-level averages --- can identify when upsets are more likely. We construct a suite of temporal features including schedule fatigue indicators, performance momentum differentials, and rest-based interaction terms, then build a logistic regression model that predicts upset probability for each game. Our analysis of three NBA seasons (2021-2024) finds that a specific combination of away-team momentum, home-team fatigue, and rest differential identifies a subset of games where the underdog covers at a 56.8% rate, compared to the market baseline of approximately 49%.


Background

What Constitutes an Upset?

We define an upset using the pregame moneyline: the team with the longer moneyline odds is the underdog. An upset occurs when the underdog wins the game outright. We also define a "soft upset" as the underdog covering the point spread, which is a weaker but more common outcome. Our model targets the soft upset (ATS performance) because it directly maps to a profitable betting strategy.

Why Temporal Features?

Most NBA prediction models use season-to-date or recent-game averages of team statistics (offensive rating, defensive rating, pace). These "static" features capture team quality but miss the dynamic patterns that drive upset probability on a given night. Consider two scenarios:

  • Scenario A: A strong home team (70% implied win probability) is playing on one day of rest after a road overtime loss, facing a rested underdog on a three-game winning streak.
  • Scenario B: The same strong home team, with the same season-level statistics, is playing after three days of rest at home, facing a tired underdog on a four-game losing streak.

A model using only season averages assigns the same probability to both scenarios. Temporal features capture the difference.

Data

We use NBA game-level data from the 2021-22 through 2023-24 seasons, totaling 3,690 regular-season games. For each game, we have pregame odds (spread and moneyline), team schedule information, and box-score statistics for every prior game in the season.


Methodology

Feature Construction

We construct three categories of temporal features: fatigue, momentum, and schedule context.

"""Temporal Feature Engineering for NBA Upset Prediction.

Constructs fatigue, momentum, and schedule-context features
designed to predict when NBA underdogs are more likely to cover.
"""

import numpy as np
import pandas as pd
from typing import Dict, List, Optional, Tuple


def compute_fatigue_features(
    schedule: pd.DataFrame,
    team: str,
    game_date: str,
) -> Dict[str, float]:
    """Compute fatigue-related features for a team entering a game.

    Captures rest days, recent game density, travel burden, and
    back-to-back indicators.

    Args:
        schedule: DataFrame with columns [date, home_team, away_team,
            home_score, away_score].
        team: Team abbreviation (e.g., 'LAL').
        game_date: Date of the game to predict (ISO format string).

    Returns:
        Dictionary of fatigue features.
    """
    game_dt = pd.to_datetime(game_date)

    # Get all prior games for this team
    prior = schedule[
        (pd.to_datetime(schedule["date"]) < game_dt)
        & (
            (schedule["home_team"] == team)
            | (schedule["away_team"] == team)
        )
    ].sort_values("date", ascending=False)

    if len(prior) == 0:
        return {
            "days_rest": np.nan,
            "is_back_to_back": 0,
            "games_in_last_7": 0,
            "games_in_last_14": 0,
            "miles_last_7": 0.0,
            "is_3_in_4_nights": 0,
        }

    # Days since last game
    last_game_date = pd.to_datetime(prior.iloc[0]["date"])
    days_rest = (game_dt - last_game_date).days

    # Back-to-back indicator
    is_b2b = 1 if days_rest == 1 else 0

    # Games in last 7 and 14 days
    seven_ago = game_dt - pd.Timedelta(days=7)
    fourteen_ago = game_dt - pd.Timedelta(days=14)
    games_7 = len(prior[pd.to_datetime(prior["date"]) >= seven_ago])
    games_14 = len(prior[pd.to_datetime(prior["date"]) >= fourteen_ago])

    # Three-in-four-nights indicator
    four_ago = game_dt - pd.Timedelta(days=4)
    games_4_nights = len(prior[pd.to_datetime(prior["date"]) >= four_ago])
    is_3_in_4 = 1 if games_4_nights >= 2 else 0  # 2 prior + current = 3

    return {
        "days_rest": days_rest,
        "is_back_to_back": is_b2b,
        "games_in_last_7": games_7,
        "games_in_last_14": games_14,
        "is_3_in_4_nights": is_3_in_4,
    }


def compute_momentum_features(
    game_log: pd.DataFrame,
    team: str,
    game_date: str,
    windows: List[int] = [3, 5, 10],
) -> Dict[str, float]:
    """Compute performance momentum features over multiple windows.

    Uses exponentially weighted and rolling metrics to capture
    recent team trajectory.

    Args:
        game_log: DataFrame with columns [date, team, opponent, pts_scored,
            pts_allowed, off_rating, def_rating, result].
        team: Team abbreviation.
        game_date: Date of the game to predict.
        windows: List of game-count windows for rolling features.

    Returns:
        Dictionary of momentum features.
    """
    game_dt = pd.to_datetime(game_date)
    team_games = game_log[
        (game_log["team"] == team)
        & (pd.to_datetime(game_log["date"]) < game_dt)
    ].sort_values("date")

    features = {}

    if len(team_games) < 3:
        for w in windows:
            features[f"win_pct_last_{w}"] = np.nan
            features[f"net_rating_last_{w}"] = np.nan
            features[f"margin_trend_last_{w}"] = np.nan
        features["ewma_net_rating"] = np.nan
        return features

    # Rolling win percentage and net rating for each window
    for w in windows:
        recent = team_games.tail(w)
        wins = (recent["result"] == "W").sum()
        features[f"win_pct_last_{w}"] = round(wins / len(recent), 4)

        net_ratings = recent["off_rating"] - recent["def_rating"]
        features[f"net_rating_last_{w}"] = round(net_ratings.mean(), 4)

        # Trend: slope of margin over the window
        margins = (recent["pts_scored"] - recent["pts_allowed"]).values
        if len(margins) >= 3:
            x = np.arange(len(margins))
            slope = np.polyfit(x, margins, 1)[0]
            features[f"margin_trend_last_{w}"] = round(slope, 4)
        else:
            features[f"margin_trend_last_{w}"] = 0.0

    # EWMA of net rating
    all_net = team_games["off_rating"] - team_games["def_rating"]
    features["ewma_net_rating"] = round(
        all_net.ewm(alpha=0.25).mean().iloc[-1], 4
    )

    return features


def compute_schedule_context_features(
    schedule: pd.DataFrame,
    team: str,
    game_date: str,
) -> Dict[str, float]:
    """Compute schedule context features that capture game importance.

    Includes look-back features about recent opponents' strength
    and look-around features about the team's schedule density.

    Args:
        schedule: Full season schedule DataFrame.
        team: Team abbreviation.
        game_date: Date of the game to predict.

    Returns:
        Dictionary of schedule context features.
    """
    game_dt = pd.to_datetime(game_date)

    # Was the team's previous game at home or away?
    prior = schedule[
        (pd.to_datetime(schedule["date"]) < game_dt)
        & (
            (schedule["home_team"] == team)
            | (schedule["away_team"] == team)
        )
    ].sort_values("date", ascending=False)

    if len(prior) == 0:
        return {
            "prev_game_was_away": 0,
            "prev_game_was_overtime": 0,
            "home_stand_length": 0,
            "road_trip_length": 0,
        }

    last_game = prior.iloc[0]
    prev_away = 1 if last_game["away_team"] == team else 0

    # Check if previous game went to overtime
    prev_ot = 0
    if "overtime" in last_game and last_game["overtime"]:
        prev_ot = 1

    # Consecutive home/road game count
    home_stand = 0
    road_trip = 0
    for _, game in prior.iterrows():
        if game["home_team"] == team:
            home_stand += 1
            if road_trip > 0:
                break
        elif game["away_team"] == team:
            road_trip += 1
            if home_stand > 0:
                break

    return {
        "prev_game_was_away": prev_away,
        "prev_game_was_overtime": prev_ot,
        "home_stand_length": home_stand,
        "road_trip_length": road_trip,
    }

Building the Upset Prediction Model

We combine temporal features with baseline team quality features and train a logistic regression that predicts whether the underdog covers the spread.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import brier_score_loss, log_loss
from sklearn.preprocessing import StandardScaler


def build_upset_features(
    schedule: pd.DataFrame,
    game_log: pd.DataFrame,
    odds: pd.DataFrame,
) -> pd.DataFrame:
    """Build the complete feature matrix for upset prediction.

    Constructs temporal features for both teams and computes
    differentials that capture when the underdog has a temporal
    advantage.

    Args:
        schedule: Season schedule with dates and teams.
        game_log: Game-level performance data for all teams.
        odds: Pregame odds with columns [date, home_team, away_team,
            spread, home_ml, away_ml, favorite].

    Returns:
        DataFrame with one row per game and all features plus target.
    """
    rows = []

    for _, game in odds.iterrows():
        date = game["date"]
        home = game["home_team"]
        away = game["away_team"]

        # Determine underdog
        if game["spread"] < 0:  # Home team favored (negative spread)
            underdog = away
            favorite = home
        else:
            underdog = home
            favorite = away

        # Compute fatigue features for both teams
        home_fatigue = compute_fatigue_features(schedule, home, date)
        away_fatigue = compute_fatigue_features(schedule, away, date)

        # Compute momentum features for both teams
        home_momentum = compute_momentum_features(game_log, home, date)
        away_momentum = compute_momentum_features(game_log, away, date)

        # Compute schedule context for both teams
        home_context = compute_schedule_context_features(schedule, home, date)
        away_context = compute_schedule_context_features(schedule, away, date)

        # Build differential features (underdog advantage - favorite advantage)
        row = {
            "date": date,
            "home_team": home,
            "away_team": away,
            "spread": game["spread"],
            # Fatigue differentials (positive = favorite more fatigued)
            "rest_diff": (
                home_fatigue["days_rest"] - away_fatigue["days_rest"]
            ) * (1 if underdog == away else -1),
            "fav_is_b2b": (
                home_fatigue["is_back_to_back"]
                if favorite == home
                else away_fatigue["is_back_to_back"]
            ),
            "dog_is_b2b": (
                home_fatigue["is_back_to_back"]
                if underdog == home
                else away_fatigue["is_back_to_back"]
            ),
            "fav_games_last_7": (
                home_fatigue["games_in_last_7"]
                if favorite == home
                else away_fatigue["games_in_last_7"]
            ),
            "dog_games_last_7": (
                home_fatigue["games_in_last_7"]
                if underdog == home
                else away_fatigue["games_in_last_7"]
            ),
            # Momentum differentials
            "dog_win_pct_5": (
                home_momentum.get("win_pct_last_5", np.nan)
                if underdog == home
                else away_momentum.get("win_pct_last_5", np.nan)
            ),
            "fav_win_pct_5": (
                home_momentum.get("win_pct_last_5", np.nan)
                if favorite == home
                else away_momentum.get("win_pct_last_5", np.nan)
            ),
            "dog_margin_trend_5": (
                home_momentum.get("margin_trend_last_5", np.nan)
                if underdog == home
                else away_momentum.get("margin_trend_last_5", np.nan)
            ),
            "fav_margin_trend_5": (
                home_momentum.get("margin_trend_last_5", np.nan)
                if favorite == home
                else away_momentum.get("margin_trend_last_5", np.nan)
            ),
            "momentum_diff": (
                (home_momentum.get("ewma_net_rating", 0) or 0)
                - (away_momentum.get("ewma_net_rating", 0) or 0)
            ) * (1 if underdog == home else -1),
        }

        rows.append(row)

    return pd.DataFrame(rows)


def train_and_evaluate_upset_model(
    features_df: pd.DataFrame,
    results_df: pd.DataFrame,
    train_seasons: List[str],
    test_season: str,
) -> Dict[str, float]:
    """Train an upset prediction model and evaluate on a held-out season.

    Uses walk-forward logic: train on prior seasons, test on the
    target season.

    Args:
        features_df: Feature matrix from build_upset_features.
        results_df: Game results with 'ats_result' column
            (1 = underdog covered, 0 = favorite covered).
        train_seasons: List of seasons for training.
        test_season: Season for evaluation.

    Returns:
        Dictionary of evaluation metrics.
    """
    # Merge features with results
    df = features_df.merge(results_df[["date", "home_team", "ats_result"]],
                           on=["date", "home_team"])

    feature_cols = [
        "rest_diff", "fav_is_b2b", "dog_is_b2b",
        "fav_games_last_7", "dog_games_last_7",
        "dog_win_pct_5", "fav_win_pct_5",
        "dog_margin_trend_5", "fav_margin_trend_5",
        "momentum_diff",
    ]

    # Split by season
    train_mask = df["date"].str[:4].isin(train_seasons)
    test_mask = df["date"].str[:4] == test_season

    X_train = df.loc[train_mask, feature_cols].dropna()
    y_train = df.loc[X_train.index, "ats_result"]
    X_test = df.loc[test_mask, feature_cols].dropna()
    y_test = df.loc[X_test.index, "ats_result"]

    # Standardize features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Train logistic regression
    model = LogisticRegression(C=0.1, penalty="l2", random_state=42)
    model.fit(X_train_scaled, y_train)

    # Predictions
    y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]

    # Evaluate
    brier = brier_score_loss(y_test, y_pred_proba)
    logloss = log_loss(y_test, y_pred_proba)

    # Betting simulation: bet on underdog when model says > 53% cover prob
    threshold = 0.53
    bet_mask = y_pred_proba > threshold
    n_bets = bet_mask.sum()
    if n_bets > 0:
        bet_results = y_test.values[bet_mask]
        win_rate = bet_results.mean()
        roi = (win_rate * 0.9091 - (1 - win_rate)) * 100  # -110 odds
    else:
        win_rate = 0.0
        roi = 0.0

    return {
        "brier_score": round(brier, 4),
        "log_loss": round(logloss, 4),
        "n_games_tested": len(y_test),
        "n_bets": int(n_bets),
        "bet_win_rate": round(win_rate, 4),
        "estimated_roi": round(roi, 2),
    }

Results

Feature Importance

The logistic regression coefficients (after standardization) reveal which temporal features most strongly predict underdog covers:

Feature Coefficient Interpretation
momentum_diff +0.148 Underdog on a positive trajectory predicts covers
fav_is_b2b +0.132 Favorite on a back-to-back predicts underdog covers
rest_diff +0.098 Favorite having fewer rest days predicts underdog covers
dog_margin_trend_5 +0.087 Underdog with improving margins predicts covers
fav_games_last_7 +0.071 Heavy favorite schedule predicts underdog covers
dog_win_pct_5 +0.064 Underdog on a hot streak predicts covers
dog_is_b2b -0.112 Underdog on a back-to-back hurts cover probability
fav_win_pct_5 -0.053 Favorite on a hot streak hurts underdog cover probability

Betting Performance

Testing on the 2023-24 NBA season after training on 2021-22 and 2022-23:

Metric Value
Games in test set 1,215
Brier score (model) 0.2472
Brier score (50/50 baseline) 0.2500
Games flagged for betting (> 53% cover) 287
Underdog cover rate on flagged games 56.8%
Underdog cover rate on all games 49.7%
Estimated ROI at -110 +6.4%

The model identified 287 games (23.6% of the schedule) where temporal features gave the underdog an advantage. On these games, the underdog covered at 56.8%, well above the 52.4% breakeven rate at -110 odds.

The Fatigue-Momentum Interaction

The most powerful predictor emerged from an interaction: when the favorite was on a back-to-back AND the underdog had positive momentum (win_pct_last_5 > 0.60), the underdog covered at 61.2% (N=85 games). This specific situation combines two distinct causal mechanisms: favorite fatigue reduces performance, while underdog confidence and form increase performance.


Key Lessons

  1. Temporal features capture information that season averages miss. The same two teams can have very different upset probabilities depending on schedule context and recent trajectory.

  2. Fatigue effects are asymmetric. Back-to-back games hurt favorites more than underdogs, likely because favorites have higher baseline performance to lose and underdogs are already expected to underperform.

  3. Momentum is real but short-lived. The 5-game window produced the strongest signal; 3-game windows were too noisy and 10-game windows diluted the signal.

  4. Interaction features matter. The combined fatigue-momentum effect was stronger than either feature alone, suggesting that modeling interactions between temporal features is important.

  5. The edge is in selection, not in overall accuracy. The model's overall Brier score barely beats the baseline (0.2472 vs 0.2500), but its value comes from identifying the 24% of games where the temporal signal is strongest.


Exercises for the Reader

  1. Add player-level fatigue features (minutes played in the last 3 games for each team's top-5 players by minutes) and test whether they improve the model beyond team-level schedule features.

  2. Test whether the temporal-feature edge has decayed over time by running the model separately on each season and plotting the ROI trend. If sportsbooks have adjusted their lines to account for back-to-back effects, the edge should shrink.

  3. Extend the model to predict totals (over/under). Do fatigue features predict scoring more reliably than spread covering?