Chapter 7: Further Reading

Foundational xG Resources

Seminal Blog Posts and Articles

  1. "Premier League Projections and New Expected Goals" - Michael Caley (2014) - One of the first comprehensive public xG models - Establishes core methodology still used today - https://cartilagefreecaptain.sbnation.com/

  2. "Expected Goals and Support Vector Machines" - Mark Eastwood (2014) - Explores machine learning approaches to xG - Early application of SVM to shot classification - https://pena.lt/y/

  3. "The xG Philosophy" - StatsBomb (2019) - StatsBomb's explanation of their xG methodology - Details their feature engineering approach - https://statsbomb.com/articles/soccer/the-xg-philosophy/

  4. "How StatsBomb Data Helps Measure Counter-Pressing" - StatsBomb (2018) - Explains freeze frame data and defensive positioning - Foundation for advanced xG models - https://statsbomb.com/

Academic Papers

  1. "An Examination of Expected Goals and Shot Efficiency in Soccer" - Rathke, A. (2017) - Journal of Human Sport and Exercise - Rigorous statistical validation of xG methodology - DOI: 12.14198/jhse.2017.12.Proc2.05

  2. "Expected Goals in Soccer: Explaining Match Results Using Predictive Analytics" - Brechot, M. & Flepp, R. (2020) - Swiss Sports Analytics - Connects xG to match outcome prediction

  3. "Beyond Expected Goals" - William Spearman (2018) - MIT Sloan Sports Analytics Conference - Introduces Expected Possession Value and advanced frameworks - Pioneering work on pitch control models - Available at: sloansportsconference.com


Technical Implementation Resources

Machine Learning for xG

  1. "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" - Géron, A. (2022, 3rd Edition) - O'Reilly Media - Comprehensive ML reference; Chapters 3-7 essential for xG modeling - ISBN: 978-1098125974

  2. "The Elements of Statistical Learning" - Hastie, T., Tibshirani, R., & Friedman, J. (2009) - Stanford University (free online) - Theoretical foundation for gradient boosting and model evaluation - https://web.stanford.edu/~hastie/ElemStatLearn/

  3. "Pattern Recognition and Machine Learning"

    • Bishop, C. M. (2006)
    • Springer
    • Rigorous treatment of logistic regression and probability calibration
    • ISBN: 978-0387310732

Model Calibration

  1. "Calibration of Probabilities" - Niculescu-Mizil & Caruana (2005)

    • ICML Conference Paper
    • Definitive guide to Platt scaling and isotonic regression
    • Essential reading for probability model deployment
  2. "Predicting Good Probabilities With Supervised Learning"

    • Niculescu-Mizil & Caruana (2005)
    • Compares calibration methods across model types

Soccer Analytics Books

Comprehensive Texts

  1. "The Numbers Game: Why Everything You Know About Soccer Is Wrong"

    • Anderson, C. & Sally, D. (2013)
    • Penguin Books
    • Accessible introduction to soccer analytics concepts
    • ISBN: 978-0143124566
  2. "Soccermatics: Mathematical Adventures in the Beautiful Game"

    • Sumpter, D. (2016)
    • Bloomsbury Sigma
    • Mathematical modeling of soccer with xG coverage
    • ISBN: 978-1472924124
  3. "Football Hackers: The Science and Art of a Data Revolution"

    • Biermann, C. (2019)
    • Blink Publishing
    • History and evolution of soccer analytics
    • ISBN: 978-1788702058
  4. "Expected Goals" - Tippett, J. (2023)

    • Bloomsbury Sport
    • The first book-length treatment dedicated to xG
    • Covers history, methodology, and applications
    • ISBN: 978-1399401845

Data Sources and APIs

Free Data

  1. StatsBomb Open Data

    • Comprehensive event data for selected competitions
    • Includes xG values and detailed shot information
    • https://github.com/statsbomb/open-data
    • Python API: pip install statsbombpy
  2. Understat

    • xG data for top 5 European leagues
    • Player and team level statistics
    • https://understat.com/
  3. FBref

    • StatsBomb xG data integration
    • Historical statistics back to 2017-18
    • https://fbref.com/

Commercial Data Providers

  1. StatsBomb - https://statsbomb.com/

    • Industry-leading event and tracking data
    • Freeze frame data for advanced xG models
  2. Opta (Stats Perform) - https://www.statsperform.com/

    • Comprehensive event data coverage
    • Multiple xG model versions
  3. Wyscout - https://wyscout.com/

    • Video-linked event data
    • Good for player scouting applications
  4. Second Spectrum / SkillCorner

    • Tracking data providers
    • Enable position-based xG enhancements

Online Courses and Tutorials

Video Courses

  1. "Friends of Tracking" YouTube Channel

    • Free lectures by academics and practitioners
    • Covers xG, tracking data, and advanced metrics
    • https://www.youtube.com/friendsoftracking
  2. "Soccermatics" Online Course - David Sumpter

    • Uppsala University offering
    • Mathematical foundations with Python
    • Available on YouTube
  3. DataCamp Soccer Analytics Courses

    • "Introduction to Soccer Analytics"
    • Practical Python implementations
    • https://www.datacamp.com/

Written Tutorials

  1. FC Python - https://fcpython.com/

    • Comprehensive soccer analytics tutorials
    • xG model building walkthrough
    • Visualization techniques
  2. McKay Johns Analytics Blog

    • Detailed xG implementation posts
    • Comparison of modeling approaches
    • https://github.com/mckayjohns

Advanced Topics

Expected Threat (xT) and Possession Value

  1. "Actions Speak Louder than Goals" - Decroos et al. (2019)

    • KU Leuven Research
    • VAEP (Valuing Actions by Estimating Probabilities)
    • Academic foundation for action-based valuation
  2. "Expected Threat" - Karun Singh (2019)

    • Original xT blog post and implementation
    • Extends xG concept to all ball actions
    • https://karun.in/blog/expected-threat.html

Tracking Data Applications

  1. "Physics-Based Modeling of Pass Probabilities" - Spearman et al. (2017)

    • Introduces pitch control concepts
    • Foundation for position-based xG
  2. "Decomposing the Immeasurable Sport" - Fernández & Bornn (2018)

    • Expected Possession Value framework
    • Advanced tracking data methodology

Deep Learning for xG

  1. "Deep Soccer Analytics" - Decroos & Davis (2020)
    • Neural network approaches to action valuation
    • Comparison with tree-based methods

Industry Perspectives

Blogs from Practitioners

  1. StatsBomb Blog - https://statsbomb.com/articles/

    • Regular posts on methodology and applications
    • Case studies from professional work
  2. Twenty3 Sport - https://www.twenty3.sport/

    • Industry insights and visualization examples
    • xG communication best practices
  3. Eleven Sports Blog

    • Analytics from Belgian sports tech company
    • Research-oriented posts

Podcasts

  1. "Tifo Football Podcast"

    • Regular analytics segments
    • Accessible explanations of complex concepts
  2. "The Double Pivot"

    • Data-focused soccer analysis
    • xG discussions and debates
  3. "Analytics FC Podcast"

    • Industry-focused discussions
    • Career advice for aspiring analysts

Research Conferences

Academic Venues

  1. MIT Sloan Sports Analytics Conference

    • Premier academic sports analytics venue
    • xG papers regularly presented
    • https://www.sloansportsconference.com/
  2. ECML PKDD Workshop on Machine Learning and Data Mining for Sports Analytics

    • European machine learning conference
    • Soccer analytics research track
  3. Journal of Sports Analytics

    • Peer-reviewed academic journal
    • Regular xG methodology papers
    • IOS Press publication

Industry Conferences

  1. OptaPro Forum (now Stats Perform Forum)

    • Industry conference for soccer analytics
    • Presentations from club analysts
  2. StatsBomb Conference

    • Annual event showcasing latest research
    • Mix of academic and industry presentations

Coding Resources

Python Libraries

  1. statsbombpy - pip install statsbombpy

    • Official StatsBomb Python API
    • Access to open data with built-in xG
  2. mplsoccer - pip install mplsoccer

    • Soccer visualization library
    • Shot maps and pitch drawings
    • https://mplsoccer.readthedocs.io/
  3. socceraction - pip install socceraction

    • SPADL data format and VAEP implementation
    • Academic research tools
    • https://github.com/ML-KULeuven/socceraction
  4. scikit-learn - pip install scikit-learn

    • Core ML library for xG models
    • Logistic regression, gradient boosting, calibration

GitHub Repositories

  1. Friends of Tracking Tutorials

    • https://github.com/Friends-of-Tracking-Data-FoTD
    • Code from video tutorials
    • xG model implementations
  2. StatsBomb Open Data Repository

    • https://github.com/statsbomb/open-data
    • Sample code for data access
    • Specification documents

Beginner (Weeks 1-4)

  1. Read "The Numbers Game" for conceptual foundation
  2. Complete FC Python xG tutorial
  3. Explore StatsBomb open data with statsbombpy
  4. Build simple distance-based xG model

Intermediate (Weeks 5-12)

  1. Study Caley and Eastwood's original blog posts
  2. Implement multi-feature logistic regression
  3. Learn gradient boosting with scikit-learn
  4. Practice model evaluation and calibration

Advanced (Months 3-6)

  1. Read Spearman's "Beyond Expected Goals"
  2. Explore tracking data applications
  3. Implement Expected Threat model
  4. Study academic papers on VAEP/action valuation

Expert (Ongoing)

  1. Follow current research at conferences
  2. Develop novel feature engineering approaches
  3. Contribute to open-source projects
  4. Publish findings or build production systems