Chapter 17: Further Reading - Modeling MLB
Academic Papers and Research
-
Tango, Tom, Mitchel Lichtman, and Andrew Dolphin. "The Book: Playing the Percentages in Baseball." Potomac Books (2007). The definitive reference for sabermetric analysis applied to game strategy. Covers platoon splits, lineup optimization, bullpen management, and base-running decisions with rigorous statistical analysis. Essential reading for understanding the analytical foundations of MLB modeling.
-
Albert, Jim, and Jay Bennett. "Curve Ball: Baseball, Statistics, and the Role of Chance in the Game." Copernicus/Springer (2003). Excellent introduction to statistical thinking in baseball, covering topics from streaks and slumps to the distribution of runs scored. Provides the probabilistic framework for understanding why run-scoring models work the way they do.
-
Healey, Glenn. "The New Moneyball: How Ballpark Sensors Are Changing Science and Sports." Various publications (2017--2020). A collection of research on how Statcast data has transformed player evaluation. Focuses on exit velocity, launch angle, and the expected statistics revolution (xwOBA, xBA, xSLG) that underpins modern betting models.
-
Petti, Bill, and Jeff Zimmerman. "Research Notebooks on FanGraphs." FanGraphs Community Research (ongoing). A rich collection of applied sabermetric research covering stabilization rates, predictive metrics, park factor methodology, and seasonal patterns. Available freely at fangraphs.com/community.
-
Carleton, Russell. "The Shift: The Next Evolution in Baseball Thinking." Triumph Books (2018). Explores the statistical reasoning behind modern baseball strategy, including defensive shifts, opener strategies, and bullpen deployment. Relevant for understanding how the game's evolution affects betting market structure.
Books
-
Lewis, Michael. "Moneyball: The Art of Winning an Unfair Game." W.W. Norton (2003). The book that brought sabermetrics to public consciousness. While the specific market inefficiencies it describes have long since been arbitraged away, the analytical framework and the lesson about looking beyond conventional wisdom remain foundational.
-
James, Bill. "The Bill James Historical Baseball Abstract." Free Press (2001). The foundational text of sabermetrics by its founding practitioner. Provides historical context for how baseball statistics evolved and why certain metrics predict better than others.
-
Baumer, Benjamin, and Andrew Zimbalist. "The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball." University of Pennsylvania Press (2014). Analyzes how sabermetrics has been adopted across front offices and how the analytics revolution has changed competitive dynamics in MLB.
Data Sources
-
pybaseball (Python library). Open-source library providing access to FanGraphs leaderboards, Statcast data, and Baseball Reference statistics. Available at https://github.com/jldbc/pybaseball. The essential starting point for any Python-based MLB modeling project. Provides both individual and team-level data for current and historical seasons.
-
FanGraphs. Comprehensive advanced statistics including wOBA, FIP, xFIP, SIERA, wRC+, and park factors. Also hosts the community research section with applied analytics articles. Available at https://www.fangraphs.com/. Free tier provides most statistics needed for modeling; premium tier adds projection systems and deeper Statcast integration.
-
Baseball Savant (Statcast). MLB's official Statcast data portal. Provides pitch-by-pitch and batted ball data including exit velocity, launch angle, sprint speed, and expected statistics. Available at https://baseballsavant.mlb.com/. The primary source for Statcast-derived metrics (xwOBA, xBA, xSLG, Stuff+).
-
Baseball Reference. Comprehensive historical statistics, game logs, and the Play Index search tool. Available at https://www.baseball-reference.com/. Particularly useful for historical park factor data and long-term trend analysis.
-
Retrosheet. Free play-by-play data for historical MLB games dating back decades. Available at https://www.retrosheet.org/. Invaluable for researchers building long-horizon models or studying historical market efficiency.
Online Resources and Communities
-
FanGraphs Glossary. Comprehensive definitions and explanations of every advanced baseball metric. Available at https://library.fangraphs.com/. The first resource to consult when encountering an unfamiliar sabermetric term.
-
Tangotiger's Sabermetric Research. Tom Tango's blog and research archive covers stabilization rates, Marcel projections, odds ratio methodology, and other foundational topics. Available at http://www.tangotiger.com/. Tango (co-author of "The Book") is one of the most influential sabermetricians and now works for MLB.
-
The Athletic's MLB Analytics Coverage. Applied analytics articles covering projection systems, Statcast insights, and team-specific analysis. Subscription required but provides consistently high-quality applied content. Contributors include Eno Sarris, Mike Petriello, and others.
Betting Market Resources
-
Unabated. Line-shopping and odds-analysis platform tracking opening lines, line movement, and market consensus across sportsbooks. Essential for identifying the best available number and detecting reverse line movement.
-
Action Network. Provides public betting percentages, sharp money indicators, and historical betting data. Useful for detecting reverse line movement and understanding market dynamics. Some features require subscription.
-
Pinnacle Sports Betting Resources. Pinnacle's editorial content on MLB betting includes articles on market efficiency, closing line value methodology, and bankroll management. As a reduced-juice book, Pinnacle's closing lines are considered among the sharpest in the market.
-
Weather Underground / Visual Crossing. Detailed historical and forecast weather data including hourly wind speed, direction, temperature, and humidity for specific locations. Available at https://www.wunderground.com/ and https://www.visualcrossing.com/. Essential for building the environmental adjustment component of a totals model. APIs available for automated data retrieval.