Case Study 2: Weather-Adjusted MLB Totals Model

Overview

Totals betting in MLB offers some of the most exploitable edges in all of sports because of a unique informational asymmetry: weather conditions, which significantly impact run scoring, are finalized close to game time. While the market adjusts totals for temperature and wind, the adjustment is often incomplete, particularly in extreme conditions. This case study builds a weather-integrated totals model that combines park factors, environmental adjustments, and run distribution models to identify value on MLB overs and unders.

Our approach is systematic. We first establish neutral-site run projections for each team based on their offensive and pitching profiles. We then apply multi-year regression-adjusted park factors. Finally, we overlay game-specific environmental adjustments using temperature, wind speed and direction, and humidity. The adjusted total is compared to the posted line to identify edges.

The Physics of Ball Flight

Understanding why weather affects MLB totals requires a brief foray into physics. A baseball in flight is subject to three primary forces: gravity, drag (air resistance), and the Magnus force (spin-induced lift or curve). Of these, drag is the most sensitive to environmental conditions.

The drag force on a baseball is:

$$F_D = \frac{1}{2} \rho v^2 C_D A$$

where $\rho$ is air density, $v$ is ball velocity, $C_D$ is the drag coefficient, and $A$ is the cross-sectional area. Lower air density (from higher altitude, higher temperature, or higher humidity) reduces drag and allows the ball to travel farther.

Wind adds a direct velocity component. A 15 mph tailwind effectively reduces the ball's velocity relative to the air by 15 mph, dramatically reducing drag. The relationship is approximately:

$$\Delta d \approx 2.5 \times v_w \text{ feet per mph of tailwind}$$

where $v_w$ is wind speed. A 15 mph outward wind adds approximately 37 feet to a well-hit fly ball --- more than enough to turn a warning-track fly out into a home run.

Building the Model

Component 1: Neutral Run Projections

Each team's neutral-site run expectation combines their offensive quality with the opposing pitcher's quality. For the totals model, we need the full-game run expectation, which includes both the starting pitcher's innings and the bullpen's expected contribution.

$$\lambda_{\text{total}} = \lambda_{\text{vs. starter}} + \lambda_{\text{vs. bullpen}}$$

The starter typically covers 5--6 innings (approximately 55--65% of the game), with the bullpen covering the remainder. The starter projection uses the matchup-adjusted run rate; the bullpen projection uses the opposing team's bullpen FIP, adjusted for recent usage and fatigue.

Component 2: Park Factor Database

We maintain a database of multi-year regression-adjusted park factors for all 30 MLB venues. These factors are regressed toward 1.0 using the formula:

$$\text{PF}_{\text{regressed}} = \frac{n \cdot \text{PF}_{\text{raw}} + k \cdot 1.0}{n + k}$$

where $n$ is the number of years of data and $k = 3$ is the regression constant. This prevents single-season anomalies from distorting the park factor.

Component 3: Environmental Adjustments

The environmental model computes a multiplicative adjustment based on four weather variables, each with an empirically derived coefficient.

Implementation

"""
Case Study 2: Weather-Adjusted MLB Totals Model

Combines park factors, environmental conditions, and run distribution
models to identify value in MLB over/under betting markets.

Requirements:
    pip install numpy scipy pandas
"""

import numpy as np
import pandas as pd
from scipy.stats import nbinom, poisson
from dataclasses import dataclass, field


@dataclass
class ParkProfile:
    """Static park characteristics for park factor computation.

    Attributes:
        name: Park name.
        team: Home team abbreviation.
        base_run_pf: Multi-year regression-adjusted run park factor.
        base_hr_pf: Multi-year regression-adjusted HR park factor.
        altitude_ft: Park elevation in feet.
        is_dome: Whether the park is enclosed.
        wind_exposure: Park's sensitivity to wind (0-1 scale).
    """
    name: str
    team: str
    base_run_pf: float
    base_hr_pf: float
    altitude_ft: float
    is_dome: bool
    wind_exposure: float = 0.5


# Park profiles for selected venues
PARKS: dict[str, ParkProfile] = {
    "COL": ParkProfile("Coors Field", "COL", 1.35, 1.40, 5280, False, 0.6),
    "CHC": ParkProfile("Wrigley Field", "CHC", 1.05, 1.10, 595, False, 0.9),
    "SFG": ParkProfile("Oracle Park", "SFG", 0.92, 0.80, 5, False, 0.8),
    "CIN": ParkProfile("Great American Ball Park", "CIN", 1.10, 1.20, 480, False, 0.7),
    "BOS": ParkProfile("Fenway Park", "BOS", 1.05, 0.95, 20, False, 0.5),
    "NYY": ParkProfile("Yankee Stadium", "NYY", 1.00, 1.15, 55, False, 0.4),
    "SDP": ParkProfile("Petco Park", "SDP", 0.95, 0.88, 15, False, 0.6),
    "SEA": ParkProfile("T-Mobile Park", "SEA", 0.93, 0.85, 20, True, 0.0),
    "HOU": ParkProfile("Minute Maid Park", "HOU", 1.00, 1.02, 45, True, 0.0),
    "MIA": ParkProfile("loanDepot Park", "MIA", 0.94, 0.85, 10, True, 0.0),
    "TEX": ParkProfile("Globe Life Field", "TEX", 1.02, 1.05, 500, True, 0.0),
    "PHI": ParkProfile("Citizens Bank Park", "PHI", 1.04, 1.12, 20, False, 0.6),
    "LAD": ParkProfile("Dodger Stadium", "LAD", 0.97, 0.92, 515, False, 0.3),
    "ATL": ParkProfile("Truist Park", "ATL", 1.00, 1.02, 1050, False, 0.5),
    "MIL": ParkProfile("American Family Field", "MIL", 1.02, 1.08, 600, True, 0.0),
}


@dataclass
class WeatherConditions:
    """Game-time weather conditions.

    Attributes:
        temp_f: Temperature in Fahrenheit.
        wind_mph: Wind speed in mph.
        wind_direction: One of 'out', 'in', 'cross', 'calm'.
        humidity_pct: Relative humidity percentage.
    """
    temp_f: float
    wind_mph: float
    wind_direction: str
    humidity_pct: float


@dataclass
class TeamTotalsInput:
    """Team-level inputs for totals modeling.

    Attributes:
        team: Team abbreviation.
        neutral_offensive_runs: Neutral-site runs per game (full game).
        starter_innings: Expected innings from the starting pitcher.
        starter_era_proxy: Starter's FIP (as ERA proxy for run rate).
        bullpen_fip: Team bullpen aggregate FIP.
        bullpen_fatigue_factor: Fatigue multiplier (1.0 = fresh).
    """
    team: str
    neutral_offensive_runs: float
    starter_innings: float
    starter_era_proxy: float
    bullpen_fip: float
    bullpen_fatigue_factor: float = 1.0


class WeatherAdjuster:
    """Computes environmental adjustment factors for run scoring.

    Uses empirically derived coefficients relating weather
    variables to run-scoring deviations.
    """

    BASELINE_TEMP: float = 72.0
    BASELINE_HUMIDITY: float = 50.0

    TEMP_COEFF: float = 0.002
    WIND_OUT_COEFF: float = 0.008
    WIND_IN_COEFF: float = -0.010
    CROSS_WIND_FRACTION: float = 0.30
    HUMIDITY_COEFF: float = 0.0003

    def compute_factor(
        self,
        weather: WeatherConditions,
        park: ParkProfile,
    ) -> float:
        """Compute weather adjustment factor.

        Args:
            weather: Game-time weather conditions.
            park: Park profile (needed for dome check and wind exposure).

        Returns:
            Multiplicative adjustment (1.0 = neutral).
        """
        if park.is_dome:
            return 1.0

        adj = 0.0

        # Temperature effect
        adj += (weather.temp_f - self.BASELINE_TEMP) * self.TEMP_COEFF

        # Wind effect scaled by park wind exposure
        wind_effect = 0.0
        if weather.wind_direction == "out":
            wind_effect = weather.wind_mph * self.WIND_OUT_COEFF
        elif weather.wind_direction == "in":
            wind_effect = weather.wind_mph * self.WIND_IN_COEFF
        elif weather.wind_direction == "cross":
            wind_effect = (
                weather.wind_mph * self.WIND_OUT_COEFF * self.CROSS_WIND_FRACTION
            )
        adj += wind_effect * park.wind_exposure

        # Humidity effect
        adj += (weather.humidity_pct - self.BASELINE_HUMIDITY) * self.HUMIDITY_COEFF

        return round(1.0 + adj, 4)


class TotalsModel:
    """Produces over/under probabilities from adjusted run projections.

    Uses the negative binomial distribution to model run scoring
    with overdispersion, then computes joint probabilities for
    the game total exceeding or falling short of the posted line.

    Attributes:
        nb_r: Dispersion parameter for negative binomial.
        max_runs: Upper truncation for calculations.
    """

    def __init__(self, nb_r: float = 6.0, max_runs: int = 20):
        self.nb_r = nb_r
        self.max_runs = max_runs

    def _pmf(self, lam: float) -> np.ndarray:
        """Compute negative binomial PMF for run scoring.

        Args:
            lam: Expected runs (mean).

        Returns:
            Array of probabilities P(X=0) through P(X=max_runs).
        """
        k = np.arange(self.max_runs + 1)
        p = self.nb_r / (self.nb_r + lam)
        return nbinom.pmf(k, self.nb_r, p)

    def over_under_probability(
        self,
        lam_home: float,
        lam_away: float,
        total_line: float,
    ) -> dict[str, float]:
        """Compute over/under probabilities for a game total.

        Args:
            lam_home: Adjusted expected runs for home team.
            lam_away: Adjusted expected runs for away team.
            total_line: The posted total (e.g., 8.5).

        Returns:
            Dictionary with over probability, under probability,
            projected total, and edge vs. the line.
        """
        pmf_home = self._pmf(lam_home)
        pmf_away = self._pmf(lam_away)
        joint = np.outer(pmf_home, pmf_away)

        over_prob = 0.0
        under_prob = 0.0

        for i in range(self.max_runs + 1):
            for j in range(self.max_runs + 1):
                total = i + j
                if total > total_line:
                    over_prob += joint[i, j]
                elif total < total_line:
                    under_prob += joint[i, j]

        projected = lam_home + lam_away

        return {
            "projected_total": round(projected, 2),
            "posted_line": total_line,
            "over_prob": round(over_prob, 4),
            "under_prob": round(under_prob, 4),
            "edge_over": round(over_prob - 0.5, 4),
            "edge_under": round(under_prob - 0.5, 4),
        }


class MLBTotalsPipeline:
    """End-to-end pipeline for weather-adjusted totals analysis.

    Combines team projections, park factors, environmental
    adjustments, and run distributions into a complete analysis.
    """

    def __init__(self, nb_r: float = 6.0):
        self.weather_adjuster = WeatherAdjuster()
        self.totals_model = TotalsModel(nb_r=nb_r)

    def analyze_game(
        self,
        home_input: TeamTotalsInput,
        away_input: TeamTotalsInput,
        park_team: str,
        weather: WeatherConditions,
        posted_total: float,
    ) -> dict:
        """Run a complete totals analysis for one game.

        Args:
            home_input: Home team totals input.
            away_input: Away team totals input.
            park_team: Home team abbreviation (identifies park).
            weather: Game-time weather conditions.
            posted_total: The sportsbook's posted total line.

        Returns:
            Comprehensive analysis dictionary.
        """
        park = PARKS.get(park_team)
        if park is None:
            park = ParkProfile("Unknown", park_team, 1.0, 1.0, 0, False, 0.5)

        # Compute environmental factor
        env_factor = self.weather_adjuster.compute_factor(weather, park)

        # Adjust each team's runs: neutral * park * environment
        home_adj = round(
            home_input.neutral_offensive_runs * park.base_run_pf * env_factor, 2
        )
        away_adj = round(
            away_input.neutral_offensive_runs * park.base_run_pf * env_factor, 2
        )

        # Compute over/under probabilities
        ou_result = self.totals_model.over_under_probability(
            home_adj, away_adj, posted_total
        )

        # Determine recommended action
        min_edge = 0.04  # 4% minimum edge to bet
        if ou_result["edge_over"] >= min_edge:
            recommendation = f"BET OVER {posted_total}"
        elif ou_result["edge_under"] >= min_edge:
            recommendation = f"BET UNDER {posted_total}"
        else:
            recommendation = "NO BET (insufficient edge)"

        return {
            "venue": park.name,
            "park_factor": park.base_run_pf,
            "env_factor": env_factor,
            "combined_factor": round(park.base_run_pf * env_factor, 4),
            "home_neutral_runs": home_input.neutral_offensive_runs,
            "away_neutral_runs": away_input.neutral_offensive_runs,
            "home_adjusted_runs": home_adj,
            "away_adjusted_runs": away_adj,
            **ou_result,
            "recommendation": recommendation,
        }


def run_daily_analysis() -> None:
    """Simulate a daily totals analysis for a slate of games.

    Demonstrates the full pipeline across multiple game scenarios
    with varying parks and weather conditions.
    """
    pipeline = MLBTotalsPipeline(nb_r=6.0)

    games = [
        {
            "name": "Game 1: Mets at Cubs (Wrigley, wind blowing out)",
            "home": TeamTotalsInput("CHC", 4.4, 5.5, 3.80, 4.00),
            "away": TeamTotalsInput("NYM", 4.2, 6.0, 3.50, 3.70),
            "park": "CHC",
            "weather": WeatherConditions(86.0, 18.0, "out", 45.0),
            "total": 8.5,
        },
        {
            "name": "Game 2: Dodgers at Giants (Oracle Park, cool fog)",
            "home": TeamTotalsInput("SFG", 3.9, 6.2, 3.30, 3.90),
            "away": TeamTotalsInput("LAD", 4.8, 5.8, 3.10, 3.50),
            "park": "SFG",
            "weather": WeatherConditions(55.0, 16.0, "in", 88.0),
            "total": 7.5,
        },
        {
            "name": "Game 3: Reds at Rockies (Coors Field, hot windy day)",
            "home": TeamTotalsInput("COL", 4.5, 5.0, 4.70, 4.50, 1.10),
            "away": TeamTotalsInput("CIN", 4.6, 5.5, 3.90, 3.80),
            "park": "COL",
            "weather": WeatherConditions(95.0, 12.0, "out", 20.0),
            "total": 12.5,
        },
        {
            "name": "Game 4: Mariners at Astros (dome, weather irrelevant)",
            "home": TeamTotalsInput("HOU", 4.6, 6.0, 3.20, 3.60),
            "away": TeamTotalsInput("SEA", 4.0, 5.8, 3.60, 4.10),
            "park": "HOU",
            "weather": WeatherConditions(98.0, 25.0, "out", 90.0),
            "total": 8.5,
        },
        {
            "name": "Game 5: Cardinals at Padres (mild evening)",
            "home": TeamTotalsInput("SDP", 4.1, 5.5, 3.40, 3.80),
            "away": TeamTotalsInput("STL", 4.0, 5.8, 3.70, 4.00),
            "park": "SDP",
            "weather": WeatherConditions(68.0, 5.0, "calm", 55.0),
            "total": 7.5,
        },
    ]

    print("=" * 70)
    print("MLB WEATHER-ADJUSTED TOTALS ANALYSIS")
    print("=" * 70)

    for game in games:
        print(f"\n--- {game['name']} ---")
        result = pipeline.analyze_game(
            home_input=game["home"],
            away_input=game["away"],
            park_team=game["park"],
            weather=game["weather"],
            posted_total=game["total"],
        )
        print(f"  Venue: {result['venue']}")
        print(f"  Park Factor: {result['park_factor']}")
        print(f"  Env Factor: {result['env_factor']}")
        print(f"  Combined Factor: {result['combined_factor']}")
        print(f"  Neutral Total: "
              f"{result['home_neutral_runs'] + result['away_neutral_runs']:.1f}")
        print(f"  Adjusted Total: {result['projected_total']}")
        print(f"  Posted Line: {result['posted_line']}")
        print(f"  Over Prob: {result['over_prob']:.1%}")
        print(f"  Under Prob: {result['under_prob']:.1%}")
        print(f"  >>> {result['recommendation']}")


if __name__ == "__main__":
    run_daily_analysis()

Analysis of Results

The model produces actionable recommendations for the simulated five-game slate:

Game 1 (Wrigley, wind out): The combined park-and-weather factor of approximately 1.18 pushes the projected total well above the posted 8.5. With an 86-degree day and 18 mph wind blowing out at one of the most wind-sensitive parks in baseball, the model projects a total near 10.2, creating a strong over signal.

Game 2 (Oracle Park, cool fog): The combination of a pitcher-friendly park (PF 0.92), cold temperature (55 degrees F), and 16 mph wind blowing in creates a combined suppression factor below 0.80. Despite two quality offenses, the model projects a total near 6.5, making the under 7.5 attractive.

Game 3 (Coors, hot and windy): Coors Field's extreme altitude (PF 1.35) combined with 95-degree heat and outward wind creates a combined factor exceeding 1.55. The posted total of 12.5 already accounts for Coors, but the extreme weather may push the true expectation higher. The model evaluates whether the market has fully adjusted.

Game 4 (dome): The environmental factor is 1.00 regardless of the extreme outdoor weather, because Minute Maid Park is enclosed. The model relies entirely on the team projections and base park factor.

Game 5 (mild evening): Near-baseline conditions produce an environmental factor close to 1.00. The model finds no significant edge, demonstrating appropriate restraint.

Key Insights for Totals Betting

Weather is the single largest source of game-to-game totals variance at outdoor parks. The difference between a cold night with wind blowing in and a hot day with wind blowing out at the same park can be 2--3 runs on the projected total. This exceeds the typical starting pitcher matchup effect.
Wind-sensitive parks amplify the opportunity. Wrigley Field, Oracle Park, and Kauffman Stadium are particularly sensitive to wind direction. The model's wind exposure parameter captures this variation.
Domes eliminate weather-based edges entirely. At Minute Maid Park, Globe Life Field, T-Mobile Park, and other enclosed venues, the totals model degenerates to a pure team-quality projection. Edges at domes must come from other sources (pitcher matchups, bullpen fatigue, etc.).
The market adjusts for Coors but sometimes over- or underadjusts for weather. The posted total at Coors already reflects the altitude effect. The question is whether the specific game-day weather pushes the true expectation above or below the posted number. Cool evenings at Coors are often undervalued as unders because the park's reputation anchors bettors toward overs.
Timing matters. Weather forecasts become significantly more accurate within 6 hours of game time. A totals model that incorporates late-afternoon weather updates has an informational edge over lines set the previous evening.