Chapter 17: Key Takeaways - Modeling MLB

  1. wOBA is the single most important offensive metric for betting models. Weighted On-Base Average assigns linear weights to each plate appearance outcome based on its actual run value, making it far more predictive of run scoring than batting average or OBP. It should be the primary offensive input to any game-level prediction model.

  2. FIP is more predictive of future pitcher performance than ERA. Fielding Independent Pitching isolates the outcomes a pitcher controls (strikeouts, walks, home runs) from defense and sequencing luck. When FIP diverges significantly from ERA, the market often misprices the pitcher based on the noisier ERA, creating exploitable opportunities.

  3. Starting pitcher quality is the dominant driver of game-to-game variance. No other variable in MLB carries as much predictive weight. The difference between an ace (FIP ~3.00) and a replacement-level starter (FIP ~5.50) shifts expected run totals by 2.0--2.5 runs and moneylines by 50--100 cents.

  4. Metric stabilization rates dictate early-season strategy. Strikeout rate stabilizes in weeks; BABIP takes multiple seasons. In April and May, trust fast-stabilizing metrics and Statcast contact quality data. Regress slow-stabilizing surface statistics heavily toward preseason projections.

  5. Platoon effects are robust, persistent, and quantifiable. Left-handed batters hit approximately 25 wOBA points worse against left-handed pitchers than right-handed pitchers. This translates to roughly 0.3--0.4 runs per game at the lineup level and should be incorporated into every matchup projection.

  6. Park factors vary enormously and must be included in all projections. Coors Field inflates run scoring by 35% while Oracle Park suppresses it by 8%. Multi-year regression-adjusted factors are essential; single-season factors are too noisy to trust.

  7. Environmental conditions create the largest game-to-game totals variance at outdoor parks. Temperature, wind speed and direction, and humidity can shift expected scoring by 1--3 runs at wind-sensitive venues. Real-time weather data is a legitimate informational edge in totals betting.

  8. The negative binomial distribution fits MLB run scoring better than the Poisson. Run clustering (multi-run innings) causes overdispersion that the Poisson model underestimates. A dispersion parameter of $r \approx 5$--$8$ captures the excess variance and improves tail probability estimates.

  9. Run line and first-five-innings markets often offer better relative value than the moneyline. The non-linear relationship between win probability and run-line cover probability creates mispricings that the market does not fully arbitrage. F5 lines isolate the starting pitcher matchup, reducing bullpen noise.

  10. Reverse line movement in MLB is particularly informative. Sharp bettors exploit late-breaking information (weather, lineups, bullpen availability) that becomes available close to game time. Lines that move against the public betting percentages signal informed money on the opposite side.

  11. Umpire effects on totals are measurable and persistent. The most extreme home plate umpires shift expected game scoring by 0.5--1.0 runs. This effect, while modest, can change over/under probabilities by 5--8 percentage points on a standard total.

  12. Closing line value (CLV) is the most reliable indicator of long-term profitability. A model that consistently generates positive CLV is extracting real value from the market, even during short-term losing streaks. Track predictions against the closing line before every game as the primary evaluation metric.