Chapter 26 Quiz: Ratings and Ranking Systems
Instructions: Answer all 25 questions. Each question is worth 4 points (100 total). Select the best answer for multiple-choice questions and provide the requested work for short-answer questions. Click the answer toggle to check your work.
Question 1. In the Elo rating system, the expected score formula is $E_A = 1/(1 + 10^{(R_B - R_A)/400})$. If Team A has a rating of 1600 and Team B has a rating of 1400, what is Team A's expected score?
(a) 0.640 (b) 0.760 (c) 0.849 (d) 0.909
Answer
**(b) 0.760** $E_A = 1/(1 + 10^{(1400 - 1600)/400}) = 1/(1 + 10^{-0.5}) = 1/(1 + 0.3162) = 1/1.3162 \approx 0.760$ A 200-point Elo advantage corresponds to approximately a 76% expected win rate.Question 2. The K-factor in the Elo system controls:
(a) The initial rating assigned to new teams (b) The home-field advantage adjustment (c) How much a single game result changes the ratings (d) The scaling between Elo points and win probability
Answer
**(c) How much a single game result changes the ratings** The K-factor is the update factor that multiplies the difference between the actual and expected score. A higher K-factor means each game has a larger impact on ratings, making the system more responsive but less stable.Question 3. A team with an Elo rating of 1500 plays at home with a home-field advantage of $H = 48$ against a visiting team rated 1500. What is the home team's expected win probability?
(a) 0.500 (b) 0.528 (c) 0.548 (d) 0.569
Answer
**(d) 0.569** $E_{\text{home}} = 1/(1 + 10^{(1500 - (1500 + 48))/400}) = 1/(1 + 10^{-48/400}) = 1/(1 + 10^{-0.12}) = 1/(1 + 0.7586) = 1/1.7586 \approx 0.569$Question 4. Why does the Elo margin-of-victory multiplier include a logarithmic function $\ln(|MOV| + 1)$ rather than using the raw margin directly?
(a) The logarithm ensures margins are always positive (b) The logarithm prevents blowout victories from distorting ratings excessively (c) The logarithm converts margins to the probability scale (d) The logarithm is required for the zero-sum property to hold
Answer
**(b) The logarithm prevents blowout victories from distorting ratings excessively** A team that wins by 40 points instead of 20 points conveys some additional information, but not twice as much. The logarithmic scaling provides diminishing returns for larger margins, preventing a single blowout from dominating a team's rating trajectory.Question 5. At the start of a new season, an Elo system applies regression toward the mean with $\alpha = 1/3$ and a league mean of 1500. A team ending the previous season at 1680 would start the new season at:
(a) 1500 (b) 1560 (c) 1620 (d) 1680
Answer
**(c) 1620** $R_{\text{new}} = 1680 \times (1 - 1/3) + 1500 \times (1/3) = 1680 \times 0.667 + 1500 \times 0.333 = 1120.6 + 499.5 = 1620$ The team retains two-thirds of its rating advantage over the mean.Question 6. Which statement about the Glicko-2 system is FALSE?
(a) Each team carries a rating, a rating deviation, and a volatility parameter (b) The rating deviation increases when a team is inactive (c) Glicko-2 always produces narrower confidence intervals than Elo (d) The volatility parameter captures how erratically a team performs
Answer
**(c) Glicko-2 always produces narrower confidence intervals than Elo** This is false because Elo does not produce confidence intervals at all. Glicko-2 provides confidence intervals via the rating deviation, but these are not necessarily "narrower" than anything Elo produces --- Elo simply has no uncertainty quantification. Options (a), (b), and (d) are all correct descriptions of Glicko-2 features.Question 7. In Glicko-2, the g-function $g(\phi) = 1/\sqrt{1 + 3\phi^2/\pi^2}$ approaches what value as the opponent's rating deviation $\phi$ becomes very large?
(a) 0 (b) 0.5 (c) 1 (d) $\pi$
Answer
**(a) 0** As $\phi \to \infty$, the denominator $\sqrt{1 + 3\phi^2/\pi^2} \to \infty$, so $g(\phi) \to 0$. This means games against opponents with very uncertain ratings carry almost no weight in the update, which is the intended behavior.Question 8. The Glicko-2 system constant $\tau$ (tau) controls:
(a) The initial rating for new teams (b) The rate at which rating deviation increases during inactivity (c) How much the volatility parameter can change between rating periods (d) The convergence tolerance of the Illinois algorithm
Answer
**(c) How much the volatility parameter can change between rating periods** The $\tau$ parameter constrains volatility changes. Smaller values of $\tau$ (e.g., 0.3) make the volatility more stable, while larger values (e.g., 1.2) allow it to respond more quickly to changes in a team's consistency.Question 9. In the Massey rating system, the Massey matrix $\mathbf{M} = \mathbf{X}^T\mathbf{X}$ has what structure?
(a) Diagonal entries equal the team's win total; off-diagonal entries equal zero (b) Diagonal entries equal games played; off-diagonal entries equal the negative number of games between those teams (c) All entries equal 1 (d) Diagonal entries equal point differential; off-diagonal entries equal head-to-head margin
Answer
**(b) Diagonal entries equal games played; off-diagonal entries equal the negative number of games between those teams** For team $i$, $M_{ii}$ equals the total number of games played by team $i$. For teams $i$ and $j$, $M_{ij}$ equals the negative of the number of games played between them. This structure arises naturally from the design matrix where each game has +1 for one team and -1 for the other.Question 10. Why must the Massey matrix be modified (e.g., replacing the last row with all ones) before solving?
(a) To speed up computation (b) To incorporate home-field advantage (c) Because the matrix is singular (its rows sum to zero) and has no unique solution without a constraint (d) To ensure all ratings are positive
Answer
**(c) Because the matrix is singular (its rows sum to zero) and has no unique solution without a constraint** The Massey matrix is singular because only rating differences are determined by the game results. Adding a row of ones (with a right-hand side of zero) imposes the constraint that ratings sum to zero, which provides the unique solution.Question 11. Massey ratings are in what units?
(a) Win probability (0 to 1) (b) Elo points (typically 1000-2000) (c) Point differential (expected margin of victory) (d) PageRank scores (summing to 1)
Answer
**(c) Point differential (expected margin of victory)** Massey ratings directly represent expected point differential. If Team A has a rating of +5 and Team B has a rating of -3, then Team A is expected to beat Team B by $5 - (-3) = 8$ points. This makes Massey ratings particularly useful for spread betting.Question 12. In the PageRank algorithm applied to sports, edges in the directed graph point from:
(a) Winners to losers, weighted by margin of victory (b) Losers to winners, weighted by margin of victory (c) Every team to every other team, with equal weight (d) Home teams to away teams, regardless of outcome
Answer
**(b) Losers to winners, weighted by margin of victory** In the sports PageRank formulation, a loss creates a directed edge from the loser to the winner. This represents the loser "voting" for the winner's strength. The analogy is to web pages, where a link from page A to page B is treated as A endorsing B.Question 13. The damping factor $d$ in PageRank (typically 0.85) serves what purpose in the sports ranking context?
(a) It sets the home-field advantage (b) It prevents undefeated or isolated teams from dominating the rankings by giving all teams a baseline level of competitiveness (c) It controls the convergence speed of the algorithm (d) It weights recent games more heavily than early-season games
Answer
**(b) It prevents undefeated or isolated teams from dominating the rankings by giving all teams a baseline level of competitiveness** With probability $d$, a team's strength derives from who it has beaten. With probability $1 - d$, all teams share a uniform baseline. This prevents pathological rankings when the competition graph is sparse or disconnected.Question 14. PageRank convergence is computed using the power method. Which convergence criterion is most commonly used?
(a) When all rankings change by less than a fixed percentage (b) When the L1 norm of the difference between consecutive iteration vectors falls below a tolerance (c) After exactly 100 iterations (d) When the top-ranked team does not change between iterations
Answer
**(b) When the L1 norm of the difference between consecutive iteration vectors falls below a tolerance** The standard convergence criterion is $\|\mathbf{r}^{(k+1)} - \mathbf{r}^{(k)}\|_1 < \epsilon$, where $\epsilon$ is a small tolerance (e.g., $10^{-8}$). Convergence is guaranteed and typically occurs within 20-50 iterations.Question 15. Which rating system provides the most explicit treatment of strength of schedule?
(a) Elo (b) Glicko-2 (c) Massey (d) PageRank
Answer
**(d) PageRank** PageRank's recursive definition of team strength --- a team is strong if it beats strong teams --- provides the most explicit modeling of strength of schedule. A win against a highly-ranked team contributes more to your ranking than a win against a weak team, and this effect propagates through the entire network.Question 16. You are building a rating system for the NFL. Which combination of features would you prioritize?
(a) High K-factor, no home advantage, no margin of victory (b) Moderate K-factor (20-30), home advantage (~48 points), log-scaled margin of victory, 1/3 season regression (c) Low K-factor (5), large home advantage (100 points), raw margin of victory (d) Variable K-factor based on score margin, no season regression, PageRank damping of 0.50
Answer
**(b) Moderate K-factor (20-30), home advantage (~48 points), log-scaled margin of victory, 1/3 season regression** The NFL has a short 17-game season with high single-game variance, justifying a moderate K-factor (responsive enough to track form changes but stable enough to avoid overreacting to fluky results). The ~48-point home advantage matches the observed ~57% home win rate. Log-scaled MOV prevents blowouts from distorting ratings, and 1/3 regression accounts for significant offseason roster turnover.Question 17. When combining multiple rating systems, why does a simple average often outperform individual systems?
(a) Averaging eliminates systematic bias from all systems (b) Errors from different systems tend to be uncorrelated, so averaging reduces overall variance (c) The average is always the optimal combination (d) Averaging ensures perfect calibration
Answer
**(b) Errors from different systems tend to be uncorrelated, so averaging reduces overall variance** When systems make different types of errors (because they encode different assumptions), averaging cancels out much of the noise while preserving the shared signal. This is the fundamental insight behind ensemble methods: diverse models with uncorrelated errors produce a better combined prediction than any individual model.Question 18. Which calibration metric would you use to assess whether your ensemble's probability predictions are accurate?
(a) R-squared (b) Expected Calibration Error (ECE) (c) Pearson correlation coefficient (d) Mean absolute error
Answer
**(b) Expected Calibration Error (ECE)** ECE measures how closely predicted probabilities match observed frequencies. A well-calibrated model has low ECE, meaning that when it predicts a 70% win probability, the team actually wins about 70% of the time. R-squared and correlation measure association but not calibration quality.Question 19. A Glicko-2 prediction shows Team A vs Team B: P(A wins) = 0.63, but both teams have high RDs (180 and 200). What should a bettor conclude?
(a) Team A is likely to win and the bet is reliable (b) The probability estimate is unreliable due to high uncertainty in both ratings, and the bettor should consider passing (c) The bettor should increase their wager because high RDs indicate value (d) The prediction is exactly as reliable as any other Glicko-2 prediction
Answer
**(b) The probability estimate is unreliable due to high uncertainty in both ratings, and the bettor should consider passing** High RDs mean the true ratings could be substantially different from the point estimates. The 0.63 probability is based on uncertain ratings, so the true probability could be anywhere from much lower to much higher. A disciplined bettor uses RD to gauge confidence and avoids wagering when uncertainty is high.Question 20. To convert an Elo win probability to American moneyline odds when $p > 0.5$, the formula is:
(a) $\text{ML} = +100 \times p/(1-p)$ (b) $\text{ML} = -100 \times p/(1-p)$ (c) $\text{ML} = +100 \times (1-p)/p$ (d) $\text{ML} = -100/(p-1)$
Answer
**(b) $\text{ML} = -100 \times p/(1-p)$** For favorites ($p > 0.5$), American odds are negative. The formula $\text{ML} = -100p/(1-p)$ converts the probability to the amount you must wager to win \$100. For example, $p = 0.70$ gives $\text{ML} = -100 \times 0.70/0.30 = -233$.Question 21. The Massey system estimates home-field advantage by:
(a) Using a fixed constant calibrated from historical data (b) Adding a column to the design matrix with +1 for home and -1 for away, then solving via least squares (c) Computing the average margin of victory for home teams (d) Subtracting away ratings from home ratings
Answer
**(b) Adding a column to the design matrix with +1 for home and -1 for away, then solving via least squares** The home-advantage column is treated as an additional unknown in the linear system. The least-squares solution simultaneously estimates team ratings and home advantage from the data, rather than imposing a fixed value.Question 22. Which of the following is NOT a benefit of model stacking for combining rating systems?
(a) The meta-model can learn nonlinear interactions between base system predictions (b) Stacking always outperforms the best base model regardless of data size (c) The meta-model can learn which base system is most reliable in different contexts (d) Cross-validation prevents the meta-model from overfitting to base model training predictions
Answer
**(b) Stacking always outperforms the best base model regardless of data size** Stacking can overfit when the meta-model has insufficient data to learn the optimal combination. With very small datasets, a simple average may outperform stacking because the meta-learner does not have enough examples to generalize. Options (a), (c), and (d) are genuine benefits of the stacking approach.Question 23. A team has played 50 games in an Elo system with K=20. What is the approximate maximum rating change this team could have experienced from a single game?
(a) 10 points (b) 20 points (c) 40 points (d) It depends on the margin of victory and opponent rating
Answer
**(d) It depends on the margin of victory and opponent rating** Without margin-of-victory adjustment, the maximum change is $K \times (1 - E)$, which approaches $K = 20$ when a heavy underdog wins (where $E \approx 0$). With MOV adjustment, the effective K can be multiplied significantly. The actual maximum depends on the specific game circumstances --- opponent strength, actual outcome, and margin.Question 24. Weighted Massey ratings use temporal decay $w_k = \lambda^{T-t_k}$. If $\lambda = 0.97$ and a game was played 10 weeks ago ($T - t_k = 10$), its weight is approximately:
(a) 0.74 (b) 0.86 (c) 0.93 (d) 0.97
Answer
**(a) 0.74** $w = 0.97^{10} = 0.7374 \approx 0.74$ The game from 10 weeks ago receives about 74% of the weight of the most recent game. This temporal decay causes the model to emphasize recent performance over early-season results.Question 25. You run an Elo system, a Massey system, and a PageRank system on the same season of data. The three systems produce substantially different rankings for a mid-tier team. What is the most productive interpretation?
(a) At least two of the three systems must contain bugs (b) The differences reflect the different assumptions each system encodes about what constitutes team strength, and the disagreement itself is informative (c) Only the system with the highest log-loss is correct (d) The team's true strength cannot be determined from game results