Chapter 7: Further Reading
Foundational xG Resources
Seminal Blog Posts and Articles
-
"Premier League Projections and New Expected Goals" - Michael Caley (2014) - One of the first comprehensive public xG models - Establishes core methodology still used today - https://cartilagefreecaptain.sbnation.com/
-
"Expected Goals and Support Vector Machines" - Mark Eastwood (2014) - Explores machine learning approaches to xG - Early application of SVM to shot classification - https://pena.lt/y/
-
"The xG Philosophy" - StatsBomb (2019) - StatsBomb's explanation of their xG methodology - Details their feature engineering approach - https://statsbomb.com/articles/soccer/the-xg-philosophy/
-
"How StatsBomb Data Helps Measure Counter-Pressing" - StatsBomb (2018) - Explains freeze frame data and defensive positioning - Foundation for advanced xG models - https://statsbomb.com/
Academic Papers
-
"An Examination of Expected Goals and Shot Efficiency in Soccer" - Rathke, A. (2017) - Journal of Human Sport and Exercise - Rigorous statistical validation of xG methodology - DOI: 12.14198/jhse.2017.12.Proc2.05
-
"Expected Goals in Soccer: Explaining Match Results Using Predictive Analytics" - Brechot, M. & Flepp, R. (2020) - Swiss Sports Analytics - Connects xG to match outcome prediction
-
"Beyond Expected Goals" - William Spearman (2018) - MIT Sloan Sports Analytics Conference - Introduces Expected Possession Value and advanced frameworks - Pioneering work on pitch control models - Available at: sloansportsconference.com
Technical Implementation Resources
Machine Learning for xG
-
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" - Géron, A. (2022, 3rd Edition) - O'Reilly Media - Comprehensive ML reference; Chapters 3-7 essential for xG modeling - ISBN: 978-1098125974
-
"The Elements of Statistical Learning" - Hastie, T., Tibshirani, R., & Friedman, J. (2009) - Stanford University (free online) - Theoretical foundation for gradient boosting and model evaluation - https://web.stanford.edu/~hastie/ElemStatLearn/
-
"Pattern Recognition and Machine Learning"
- Bishop, C. M. (2006)
- Springer
- Rigorous treatment of logistic regression and probability calibration
- ISBN: 978-0387310732
Model Calibration
-
"Calibration of Probabilities" - Niculescu-Mizil & Caruana (2005)
- ICML Conference Paper
- Definitive guide to Platt scaling and isotonic regression
- Essential reading for probability model deployment
-
"Predicting Good Probabilities With Supervised Learning"
- Niculescu-Mizil & Caruana (2005)
- Compares calibration methods across model types
Soccer Analytics Books
Comprehensive Texts
-
"The Numbers Game: Why Everything You Know About Soccer Is Wrong"
- Anderson, C. & Sally, D. (2013)
- Penguin Books
- Accessible introduction to soccer analytics concepts
- ISBN: 978-0143124566
-
"Soccermatics: Mathematical Adventures in the Beautiful Game"
- Sumpter, D. (2016)
- Bloomsbury Sigma
- Mathematical modeling of soccer with xG coverage
- ISBN: 978-1472924124
-
"Football Hackers: The Science and Art of a Data Revolution"
- Biermann, C. (2019)
- Blink Publishing
- History and evolution of soccer analytics
- ISBN: 978-1788702058
-
"Expected Goals" - Tippett, J. (2023)
- Bloomsbury Sport
- The first book-length treatment dedicated to xG
- Covers history, methodology, and applications
- ISBN: 978-1399401845
Data Sources and APIs
Free Data
-
StatsBomb Open Data
- Comprehensive event data for selected competitions
- Includes xG values and detailed shot information
- https://github.com/statsbomb/open-data
- Python API:
pip install statsbombpy
-
Understat
- xG data for top 5 European leagues
- Player and team level statistics
- https://understat.com/
-
FBref
- StatsBomb xG data integration
- Historical statistics back to 2017-18
- https://fbref.com/
Commercial Data Providers
-
StatsBomb - https://statsbomb.com/
- Industry-leading event and tracking data
- Freeze frame data for advanced xG models
-
Opta (Stats Perform) - https://www.statsperform.com/
- Comprehensive event data coverage
- Multiple xG model versions
-
Wyscout - https://wyscout.com/
- Video-linked event data
- Good for player scouting applications
-
Second Spectrum / SkillCorner
- Tracking data providers
- Enable position-based xG enhancements
Online Courses and Tutorials
Video Courses
-
"Friends of Tracking" YouTube Channel
- Free lectures by academics and practitioners
- Covers xG, tracking data, and advanced metrics
- https://www.youtube.com/friendsoftracking
-
"Soccermatics" Online Course - David Sumpter
- Uppsala University offering
- Mathematical foundations with Python
- Available on YouTube
-
DataCamp Soccer Analytics Courses
- "Introduction to Soccer Analytics"
- Practical Python implementations
- https://www.datacamp.com/
Written Tutorials
-
FC Python - https://fcpython.com/
- Comprehensive soccer analytics tutorials
- xG model building walkthrough
- Visualization techniques
-
McKay Johns Analytics Blog
- Detailed xG implementation posts
- Comparison of modeling approaches
- https://github.com/mckayjohns
Advanced Topics
Expected Threat (xT) and Possession Value
-
"Actions Speak Louder than Goals" - Decroos et al. (2019)
- KU Leuven Research
- VAEP (Valuing Actions by Estimating Probabilities)
- Academic foundation for action-based valuation
-
"Expected Threat" - Karun Singh (2019)
- Original xT blog post and implementation
- Extends xG concept to all ball actions
- https://karun.in/blog/expected-threat.html
Tracking Data Applications
-
"Physics-Based Modeling of Pass Probabilities" - Spearman et al. (2017)
- Introduces pitch control concepts
- Foundation for position-based xG
-
"Decomposing the Immeasurable Sport" - Fernández & Bornn (2018)
- Expected Possession Value framework
- Advanced tracking data methodology
Deep Learning for xG
- "Deep Soccer Analytics" - Decroos & Davis (2020)
- Neural network approaches to action valuation
- Comparison with tree-based methods
Industry Perspectives
Blogs from Practitioners
-
StatsBomb Blog - https://statsbomb.com/articles/
- Regular posts on methodology and applications
- Case studies from professional work
-
Twenty3 Sport - https://www.twenty3.sport/
- Industry insights and visualization examples
- xG communication best practices
-
Eleven Sports Blog
- Analytics from Belgian sports tech company
- Research-oriented posts
Podcasts
-
"Tifo Football Podcast"
- Regular analytics segments
- Accessible explanations of complex concepts
-
"The Double Pivot"
- Data-focused soccer analysis
- xG discussions and debates
-
"Analytics FC Podcast"
- Industry-focused discussions
- Career advice for aspiring analysts
Research Conferences
Academic Venues
-
MIT Sloan Sports Analytics Conference
- Premier academic sports analytics venue
- xG papers regularly presented
- https://www.sloansportsconference.com/
-
ECML PKDD Workshop on Machine Learning and Data Mining for Sports Analytics
- European machine learning conference
- Soccer analytics research track
-
Journal of Sports Analytics
- Peer-reviewed academic journal
- Regular xG methodology papers
- IOS Press publication
Industry Conferences
-
OptaPro Forum (now Stats Perform Forum)
- Industry conference for soccer analytics
- Presentations from club analysts
-
StatsBomb Conference
- Annual event showcasing latest research
- Mix of academic and industry presentations
Coding Resources
Python Libraries
-
statsbombpy -
pip install statsbombpy- Official StatsBomb Python API
- Access to open data with built-in xG
-
mplsoccer -
pip install mplsoccer- Soccer visualization library
- Shot maps and pitch drawings
- https://mplsoccer.readthedocs.io/
-
socceraction -
pip install socceraction- SPADL data format and VAEP implementation
- Academic research tools
- https://github.com/ML-KULeuven/socceraction
-
scikit-learn -
pip install scikit-learn- Core ML library for xG models
- Logistic regression, gradient boosting, calibration
GitHub Repositories
-
Friends of Tracking Tutorials
- https://github.com/Friends-of-Tracking-Data-FoTD
- Code from video tutorials
- xG model implementations
-
StatsBomb Open Data Repository
- https://github.com/statsbomb/open-data
- Sample code for data access
- Specification documents
Recommended Learning Path
Beginner (Weeks 1-4)
- Read "The Numbers Game" for conceptual foundation
- Complete FC Python xG tutorial
- Explore StatsBomb open data with statsbombpy
- Build simple distance-based xG model
Intermediate (Weeks 5-12)
- Study Caley and Eastwood's original blog posts
- Implement multi-feature logistic regression
- Learn gradient boosting with scikit-learn
- Practice model evaluation and calibration
Advanced (Months 3-6)
- Read Spearman's "Beyond Expected Goals"
- Explore tracking data applications
- Implement Expected Threat model
- Study academic papers on VAEP/action valuation
Expert (Ongoing)
- Follow current research at conferences
- Develop novel feature engineering approaches
- Contribute to open-source projects
- Publish findings or build production systems