Chapter 26 Further Reading: Ratings and Ranking Systems

Foundational Works

  1. Elo, A. E. (1978). The Rating of Chessplayers, Past and Present. Arco Publishing. The definitive work by the inventor of the Elo system. Elo provides the full mathematical derivation of his rating formula, discusses its statistical properties, and presents extensive empirical validation using chess tournament data. Essential reading for understanding the theoretical foundations.

  2. Glickman, M. E. (1999). "Parameter Estimation in Large Dynamic Paired Comparison Experiments." Journal of the Royal Statistical Society: Series C, 48(3), 377-394. Mark Glickman's original paper introducing the Glicko system. Presents the mathematical framework for incorporating rating uncertainty into paired comparison models and demonstrates improved predictive accuracy compared to standard Elo.

  3. Glickman, M. E. (2001). "Dynamic Paired Comparison Models with Stochastic Variances." Journal of Applied Statistics, 28(6), 673-689. The Glicko-2 paper, extending the original system with the volatility parameter. Contains the complete algorithmic specification including the Illinois algorithm for volatility estimation. The primary reference for anyone implementing Glicko-2.

  4. Massey, K. (1997). "Statistical Models Applied to the Rating of Sports Teams." Honors thesis, Bluefield College. Kenneth Massey's undergraduate thesis that introduced the Massey rating system. Remarkably clear exposition of the linear algebra framework, with applications to college football. Freely available online and highly recommended for its pedagogical quality.

  5. Brin, S. and Page, L. (1998). "The Anatomy of a Large-Scale Hypertextual Web Search Engine." Computer Networks and ISDN Systems, 30(1-7), 107-117. The original PageRank paper. While focused on web search, the mathematical framework (Markov chains, stationary distributions, the power method) transfers directly to sports ranking. Understanding the web context helps motivate the sports adaptation.

Sports-Specific Applications

  1. Silver, N. (2014). "How Our NFL Predictions Work." FiveThirtyEight. Nate Silver's detailed explanation of FiveThirtyEight's NFL Elo system, including quarterback adjustments, home-field advantage calibration, and the rationale for specific parameter choices. An excellent case study in practical Elo implementation for sports betting research.

  2. Langville, A. N. and Meyer, C. D. (2012). Who's #1? The Science of Rating and Ranking. Princeton University Press. A comprehensive and accessible survey of rating methods applied to sports, including Elo, Massey, Colley, Keener, and PageRank-based approaches. Includes mathematical derivations, worked examples, and comparisons across methods. The single best textbook on the subject.

  3. Hvattum, L. M. and Arntzen, H. (2010). "Using ELO Ratings for Match Result Prediction in Association Football." International Journal of Forecasting, 26(3), 460-470. Empirical study of Elo applied to European soccer. Demonstrates that a well-tuned Elo system provides competitive predictions relative to more complex models. Includes careful analysis of K-factor optimization and calibration.

  4. Stefani, R. T. (2011). "The Methodology of Officially Recognized International Sports Rating Systems." Journal of Quantitative Analysis in Sports, 7(4). Survey of rating systems used by official governing bodies across multiple sports (FIFA, ICC cricket, world rugby). Useful for understanding how different sports have adapted rating system principles to their specific contexts.

Advanced Methods and Extensions

  1. Herbrich, R., Minka, T., and Graepel, T. (2007). "TrueSkill: A Bayesian Skill Rating System." Advances in Neural Information Processing Systems, 19. Microsoft's TrueSkill system for Xbox matchmaking. Extends Glicko-style uncertainty tracking with factor graph inference, enabling multiplayer and team-based rating in a principled Bayesian framework. Important for understanding the frontier of rating system research.

  2. Coulom, R. (2008). "Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength." Computers and Games, Lecture Notes in Computer Science, vol. 5131. An alternative to incremental rating that estimates the complete rating trajectory for each player simultaneously. Uses Bayesian inference to handle time-varying strength, missing games, and uncertain early ratings more elegantly than Elo or Glicko.

  3. Cattelan, M., Varin, C., and Firth, D. (2013). "Dynamic Bradley-Terry Modelling of Sports Tournaments." Journal of the Royal Statistical Society: Series C, 62(1), 135-150. Connects sports rating to the statistical theory of paired comparisons. Shows how Elo can be understood as an approximation to maximum likelihood estimation in a Bradley-Terry model, providing a rigorous statistical foundation.

Ensemble and Combination Methods

  1. Manner, H. (2016). "Modeling and Forecasting the Outcomes of NBA Basketball Games." Journal of Quantitative Analysis in Sports, 12(1), 31-41. Demonstrates ensemble approaches combining multiple prediction methods for NBA forecasting. Includes discussion of weight optimization and calibration techniques relevant to combining rating systems.

  2. Constantinou, A. C. and Fenton, N. E. (2012). "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models." Journal of Quantitative Analysis in Sports, 8(1). Discusses proper scoring rules (log-loss, Brier score, ranked probability score) for evaluating probabilistic predictions in football. Essential reading for anyone who needs to choose the right metric for comparing rating systems.

Network Science and Graph-Based Rankings

  1. Park, J. and Newman, M. E. J. (2005). "A Network-Based Ranking System for US College Football." Journal of Statistical Mechanics: Theory and Experiment, P10014. Applies network science methods (including PageRank variants) to college football ranking. Demonstrates that network-based approaches naturally handle unbalanced schedules and provide intuitive strength-of-schedule adjustments.

  2. Govan, A. Y., Langville, A. N., and Meyer, C. D. (2009). "Offense-Defense Approach to Ranking Team Sports." Journal of Quantitative Analysis in Sports, 5(1). Extends Massey-style ratings with separate offensive and defensive components using an iterative algorithm. Shows how decomposing team strength into offense and defense improves both prediction accuracy and interpretability.

Practical Implementation and Software

  1. Lasek, J., Szlavik, Z., and Bhulai, S. (2013). "The Predictive Power of Ranking Systems in Association Football." International Journal of Applied Pattern Recognition, 1(1), 27-46. Comprehensive empirical comparison of Elo, Glicko, and several other rating systems applied to European soccer leagues. Provides practical guidance on parameter selection and system comparison methodology.

  2. Kovalchik, S. A. (2016). "Searching for the GOAT of Tennis Win Prediction." Journal of Quantitative Analysis in Sports, 12(3), 127-138. Comparison of rating systems (Elo, Glicko, point-based) for tennis prediction. Demonstrates that sport-specific adaptations (surface adjustments, fatigue modeling) can significantly improve rating system accuracy beyond generic implementations.