Chapter 16: Further Reading

Foundational Papers

Dixon-Coles Model

Dixon, M. J., & Coles, S. G. (1997). "Modelling Association Football Scores and Inefficiencies in the Football Betting Market." Journal of the Royal Statistical Society: Series C (Applied Statistics), 46(2), 265-280.

The seminal paper introducing the bivariate Poisson model with low-score correlation correction for football match outcomes. Essential reading for anyone building match outcome models. The paper also introduces the attack-defense parameterization and time-decay weighting that remain the basis of most professional models. Pay particular attention to Section 3, which derives the correction factor $\tau$.

Elo Ratings for Football

Hvattum, L. M., & Arntzen, H. (2010). "Using ELO Ratings for Match Result Prediction in Association Football." International Journal of Forecasting, 26(3), 460-470.

A thorough evaluation of Elo ratings as a predictive tool for football. The paper tests various modifications (goal difference scaling, home advantage adjustments, mean reversion) and finds that a well-tuned Elo system performs competitively with more complex models. Useful for understanding the strengths and limitations of the Elo framework described in Section 16.6.

Bayesian Football Models

Baio, G., & Blangiardo, M. (2010). "Bayesian Hierarchical Model for the Prediction of Football Results." Journal of Applied Statistics, 37(2), 253-264.

Extends the Dixon-Coles framework into a fully Bayesian setting using MCMC estimation. Provides posterior distributions for team strength parameters rather than point estimates, enabling more principled uncertainty quantification. Recommended for readers interested in the Bayesian extension exercises (Exercise 16.30).


Team Style and Tactics

Playing Style Quantification

Fernandez-Navarro, J., Fradua, L., Zubillaga, A., Ford, P. R., & McRobert, A. P. (2016). "Attacking and Defensive Styles of Play in Soccer: Analysis of Spanish and English Elite Teams." Journal of Sports Sciences, 34(24), 2195-2204.

One of the first rigorous studies to quantify team playing styles using match statistics. Identifies key dimensions of attacking and defensive play and compares La Liga and Premier League teams. Provides empirical support for the style fingerprint framework in Section 16.1.

Pressing Analysis

Robberechts, P., Van Haaren, J., & Davis, J. (2019). "Who Will Win It? An In-game Win Probability Model for Football." Proceedings of the 6th Workshop on Machine Learning and Data Mining for Sports Analytics.

While focused on win probability, this paper contains valuable methodology for analyzing team behavior in different game states, directly relevant to the score-state analysis in Section 16.5.

Space Creation

Fernandez, J., & Bornn, L. (2018). "Wide Open Spaces: A Statistical Technique for Measuring Space Creation in Professional Soccer." MIT Sloan Sports Analytics Conference.

Introduces pitch control models and space creation metrics that can be used to measure team width and territorial dominance---two components of the style fingerprint. A technically demanding paper that rewards careful reading.


Squad Analysis

Squad Rotation and Fatigue

Carling, C., Gregson, W., McCall, A., Moreira, A., Wong, D. P., & Bradley, P. S. (2015). "Match Running Performance During Fixture Congestion in Elite Football: Research Issues and Future Directions." Sports Medicine, 45(5), 605-613.

A comprehensive review of the evidence on fixture congestion and player performance. Quantifies the impact of reduced recovery time on physical output metrics. Essential context for the congestion analysis in Section 16.5.3.

Team Cohesion

Grund, T. U. (2012). "Network Structure and Team Performance: The Case of English Premier League Soccer Teams." Social Networks, 34(4), 682-690.

Applies social network analysis to passing networks and examines how network structure relates to team performance. Finds that interaction density and centralization (how evenly passing is distributed) predict team outcomes. Directly relevant to the team chemistry framework in Section 16.4.


Prediction and Simulation

Prediction Model Evaluation

Constantinou, A. C., & Fenton, N. E. (2012). "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models." Journal of Quantitative Analysis in Sports, 8(1).

A critical methodological paper on how to properly evaluate probabilistic predictions in football. Argues that the Brier score and ranked probability score are necessary but not sufficient, and proposes additional metrics. Recommended reading before implementing the calibration analysis in Section 16.7.4.

FiveThirtyEight Methodology

Silver, N., & Fischer-Baum, R. (2018). "How Our Club Soccer Predictions Work." FiveThirtyEight.

A public description of one of the most widely followed football prediction models. While not peer-reviewed, this article provides practical insights into how professional prediction systems handle the challenges described in this chapter: team strength estimation, promotion/relegation, and mid-season updating.

Dynamic Prediction Models

Koopman, S. J., & Lit, R. (2015). "A Dynamic Bivariate Poisson Model for Analysing and Forecasting Match Results in the English Premier League." Journal of the Royal Statistical Society: Series A, 178(1), 167-186.

Extends the Dixon-Coles model with time-varying parameters using state-space methods. The team strength parameters evolve as latent variables, estimated via the Kalman filter. This is the methodological foundation for the dynamic model described in Section 16.7.2.


Books

The Numbers Game

Anderson, C., & Sally, D. (2013). The Numbers Game: Why Everything You Know About Football Is Wrong. New York: Penguin.

An accessible introduction to football analytics for a general audience. Chapters 5-7 cover team-level analysis, including the role of luck in league outcomes and the importance of defense. Good supplementary reading for students who want intuitive explanations before diving into the technical details.

Soccermatics

Sumpter, D. (2016). Soccermatics: Mathematical Adventures in the Beautiful Game. London: Bloomsbury.

Applies mathematical modeling to football, with chapters on Poisson models for goal scoring, network theory for passing, and game theory for penalty kicks. Chapter 3 provides an excellent derivation of the Poisson match model from first principles.

Football Hackers

Biermann, C. (2019). Football Hackers: The Science and Art of a Data Revolution. London: Blink Publishing.

A journalistic account of how data analytics has been adopted by professional football clubs. Contains case studies of clubs that successfully (and unsuccessfully) implemented the kind of team analysis described in this chapter. Particularly relevant for understanding the organizational challenges of deploying analytical models.

Expected Goals Philosophy

Tippett, J. (2019). The Expected Goals Philosophy. Self-published.

A comprehensive guide to xG models and their applications, including expected points tables. Chapter 10 covers season-long analysis using xG-based methods, complementing the technical framework in Section 16.2.


Online Resources

StatsBomb Open Data

StatsBomb (2018-present). Open event data repository. Available at: https://github.com/statsbomb/open-data

Free match event data from several competitions, including detailed event types suitable for computing style metrics, pressing statistics, and passing chemistry scores. Recommended for hands-on implementation of chapter exercises.

Understat

Understat (2014-present). xG statistics platform. Available at: https://understat.com

Provides match-level xG data for the top European leagues, suitable for constructing expected points tables. The site also shows xG timelines and shot maps useful for score-state analysis.

FBref

Sports Reference (2018-present). Football Reference (FBref). Available at: https://fbref.com

Comprehensive football statistics database powered by StatsBomb data for top leagues. Provides squad-level statistics broken down by position, age, and minutes played---directly useful for squad depth and balance analysis.

American Soccer Analysis

American Soccer Analysis (2013-present). Available at: https://www.americansocceranalysis.com

While focused on MLS, this site's methodology articles on expected goals, expected points, and season simulation are among the clearest public descriptions of the techniques covered in this chapter.


Advanced Technical References

Copula Models for Correlated Outcomes

McHale, I. G., & Scarf, P. A. (2011). "Modelling Soccer Matches Using Bivariate Discrete Distributions with General Dependence Structure." Statistica Neerlandica, 65(4), 436-445.

Extends the bivariate Poisson model using copulas to capture more general forms of dependence between home and away goals. Relevant to the discussion of correlation in Section 16.7.4.

Bayesian State-Space Models

Owen, A. (2011). "Dynamic Bayesian Forecasting Models of Football Match Outcomes with Estimation of the Evolution Variance Parameter." IMA Journal of Management Mathematics, 22(2), 99-113.

A technical paper on state-space models for evolving team strength, estimated using sequential Monte Carlo methods. Recommended for readers pursuing the Kalman filter approach in Section 16.7.2.

Machine Learning for Team Analysis

Herold, M., Goes, F. R., Nopp, S., Bosselmann, P., & Memmert, D. (2022). "Machine Learning Approach to Identify Team Performance Indicators in Professional Football." International Journal of Sports Science & Coaching, 17(3), 531-544.

Applies gradient-boosted trees and SHAP values to identify which team performance metrics best predict match outcomes. Provides a data-driven alternative to the expert-selected style dimensions in Section 16.1.