Chapter 20: Further Reading

Foundational Papers

Match Outcome Prediction

  • Dixon, M. J., & Coles, S. G. (1997). Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Journal of the Royal Statistical Society: Series C, 46(2), 265--280. The seminal paper introducing the Poisson regression framework for match prediction with the low-score correction factor. Every serious match prediction model is either based on or benchmarked against this work.

  • Maher, M. J. (1982). Modelling Association Football Scores. Statistica Neerlandica, 36(3), 109--118. The predecessor to Dixon-Coles that established the independent Poisson model with team-specific attack and defense parameters. Understanding Maher's model is essential for appreciating the Dixon-Coles contribution.

  • Karlis, D., & Ntzoufras, I. (2003). Analysis of Sports Data by Using Bivariate Poisson Models. The Statistician, 52(3), 381--393. Introduces the bivariate Poisson model as an alternative to the Dixon-Coles correction, allowing direct modeling of goal correlation via a covariance parameter.

  • Rue, H., & Salvesen, O. (2000). Prediction and Retrospective Analysis of Soccer Matches in a League. The Statistician, 49(3), 399--418. A Bayesian treatment of the Poisson match model with time-varying team strengths, using Markov chain Monte Carlo for inference.

Player Performance and Aging

  • Dendir, S. (2016). When Do Soccer Players Peak? A Note. Journal of Sports Analytics, 2(2), 89--105. A comprehensive study of aging curves across major European leagues using the delta method, with position-specific peak age estimates.

  • Wakelam, E., Maymin, P., & Khoshgoftaar, T. (2022). Player Performance Prediction in Football. Journal of Sports Analytics, 8(3), 207--232. A survey of machine learning methods for player performance forecasting, comparing random forests, neural networks, and ensemble approaches on large-scale event data.

  • Bonhomme, S., & Sauder, R. (2011). Ballers and Shot-Callers: The Effect of Experience on NBA Free-Throw Shooting. Working paper. While focused on basketball, the methodology for mixed-effects aging curves with random intercepts and slopes translates directly to soccer performance modeling.

Injury Prediction

  • Rossi, A., Pappalardo, L., Cintia, P., Iaia, F. M., Fernandez, J., & Medina, D. (2018). Effective Injury Forecasting in Soccer with GPS Training Data and Machine Learning. PLOS ONE, 13(7), e0201264. One of the first large-scale studies applying gradient boosting to GPS-derived training load data for injury prediction in professional soccer.

  • Gabbett, T. J. (2016). The Training-Injury Prevention Paradox: Should Athletes Be Training Smarter and Harder? British Journal of Sports Medicine, 50(5), 273--280. Introduces the concept of the ACWR sweet spot and danger zone, demonstrating that both undertraining and overtraining increase injury risk.

Uncertainty and Calibration

  • Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359--378. The definitive treatment of proper scoring rules, explaining why log-loss and Brier score are appropriate for evaluating probabilistic forecasts.

  • Niculescu-Mizil, A., & Caruana, R. (2005). Predicting Good Probabilities with Supervised Learning. ICML. Demonstrates that many machine learning models produce poorly calibrated probabilities and introduces Platt scaling and isotonic regression as post-hoc calibration methods.

Books

  • Sumpter, D. (2016). Soccermatics: Mathematical Adventures in the Beautiful Game. Bloomsbury. Accessible introduction covering Poisson models, betting markets, and the mathematics of match prediction.

  • Anderson, C., & Sally, D. (2013). The Numbers Game: Why Everything You Know About Football Is Wrong. Penguin. Covers the role of luck and regression to the mean in soccer, providing intuitive explanations of key predictive modeling concepts.

  • Kuper, S., & Szymanski, S. (2018). Soccernomics. 5th ed. Nation Books. Explores the economics of player transfers, including the predictability (or lack thereof) of transfer success.

  • Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice. 3rd ed. OTexts. The standard reference for time series forecasting methods, including exponential smoothing and ARIMA models used for player metric projection.

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis. 3rd ed. CRC Press. The authoritative textbook on Bayesian methods. Essential for understanding posterior predictive distributions and hierarchical models applied to soccer.

Online Resources

Code Repositories

  • FiveThirtyEight Soccer SPI Model --- fivethirtyeight.com/methodology/how-our-club-soccer-predictions-work Documentation of FiveThirtyEight's public Soccer Power Index model, which combines Poisson regression with Elo-style ratings. A useful benchmark for custom match prediction models.

  • footballmodelling.net --- A community resource for football prediction models, including implementations of Dixon-Coles and related methods.

  • Friends of Tracking Data Science --- github.com/Friends-of-Tracking-Data-Science Open-source implementations of various soccer analytics models, including match prediction and player evaluation tools.

Video Lectures and Talks

  • David Sumpter, "Is Maths Killing Football?" --- TEDx talk. An accessible overview of how mathematical models are used in football, including match prediction and player evaluation.

  • Jan Vecer, "Mathematical Analysis of Soccer" --- Columbia University course. Covers Poisson regression, Elo ratings, and betting market efficiency.

Blog Posts

  • Danny Page, "Expected Goals and Match Prediction" --- A practical walkthrough of building an xG-based match prediction model in Python.

  • James Yorke (StatsBomb), "League Adjustment Factors" --- Discusses the challenges of comparing player performance across leagues and presents practical estimation approaches.

  • Ben Torvaney, "Dixon-Coles in Python" --- A step-by-step tutorial for implementing the Dixon-Coles model from scratch, including parameter estimation and prediction.

Academic Journals

For cutting-edge research on predictive modeling in soccer, monitor:

  • Journal of Quantitative Analysis in Sports
  • Journal of Sports Analytics
  • International Journal of Forecasting
  • Machine Learning (applications track)
  • MIT Sloan Sports Analytics Conference (annual)
  • KDD Sports Analytics Workshop (annual)

Suggested Reading Path

For readers new to predictive modeling in soccer:

  1. Start with Maher (1982) and Dixon & Coles (1997) for the foundational match prediction framework.
  2. Read Sumpter (2016) for intuitive explanations and broader context.
  3. Work through Hyndman & Athanasopoulos (2021), Chapters 7--9 for time series forecasting fundamentals.
  4. Study Gneiting & Raftery (2007) for understanding proper scoring rules and uncertainty quantification.
  5. Explore Rossi et al. (2018) for a practical injury prediction application.
  6. Return to this chapter's code examples and exercises to consolidate understanding.