Chapter 11 Quiz: Regularized Adjusted Plus-Minus (RAPM)
Instructions
This quiz contains 25 questions covering the key concepts from Chapter 11. Questions are organized by topic area. Select the best answer for multiple choice questions, or provide the requested calculation for quantitative questions.
Section 1: Foundational Concepts (Questions 1-7)
Question 1
What is the fundamental problem with using raw plus-minus to evaluate individual player value?
A) It doesn't account for playing time B) It conflates individual ability with teammate and opponent quality C) It only measures offensive contribution D) It requires too much data to calculate
Answer: B
Explanation: Raw plus-minus measures team performance during a player's minutes, not individual contribution. A mediocre player on a great team will have inflated plus-minus, while an excellent player on a poor team will have deflated plus-minus.
Question 2
In a single NBA season, approximately how many possible five-player lineup combinations exist for a team with 15 players?
A) 75 B) 150 C) 3,003 D) 15,000
Answer: C
Explanation: The number of combinations is C(15,5) = 15!/(5!*10!) = 3,003.
Question 3
What is collinearity in the context of RAPM, and why is it problematic?
A) Players having similar statistics, making them hard to rank B) Players appearing in correlated patterns, making individual effects hard to separate C) Players on the same team having correlated performance D) Games being played on consecutive days
Answer: B
Explanation: Collinearity occurs when players consistently appear together (e.g., starters with starters). When two players always appear together, their individual effects cannot be uniquely estimated because any split of their combined contribution fits the data equally well.
Question 4
If the standard error of raw plus-minus is SE = (2.5/sqrt(n)) * 100, what is the approximate standard error for a player with 900 possessions?
A) 5.0 points per 100 possessions B) 8.3 points per 100 possessions C) 11.1 points per 100 possessions D) 2.8 points per 100 possessions
Answer: B
Explanation: SE = (2.5/sqrt(900)) * 100 = (2.5/30) * 100 = 8.33 points per 100 possessions.
Question 5
In the RAPM design matrix, how are players coded using the {-1, 0, +1} convention?
A) +1 for all offensive players, -1 for all defensive players B) +1 for players on the home team, -1 for players on the away team C) +1 for players scoring, -1 for players allowing points D) +1 for starters, -1 for bench players
Answer: B
Explanation: The standard convention codes home team players as +1 and away team players as -1. This ensures positive coefficients indicate positive contribution to scoring margin.
Question 6
How many non-zero entries are in each row of the RAPM design matrix?
A) 5 B) 8 C) 10 D) It varies by stint
Answer: C
Explanation: Each stint has exactly 10 players on the court (5 per team), so each row has exactly 10 non-zero entries (5 coded as +1, 5 coded as -1).
Question 7
What constraint exists on the sum of entries in any row of the RAPM design matrix?
A) Sum equals the point differential B) Sum equals 10 C) Sum equals 0 D) No constraint exists
Answer: C
Explanation: With 5 players coded as +1 and 5 players coded as -1, the sum of any row is always 5 + (-5) = 0.
Section 2: Ordinary Least Squares Problems (Questions 8-11)
Question 8
What does it mean for a matrix to be "ill-conditioned"?
A) The matrix has negative eigenvalues B) Small changes in input produce large changes in output C) The matrix cannot be computed D) The matrix is too large to store
Answer: B
Explanation: An ill-conditioned matrix has a high condition number, meaning small perturbations to the data cause large changes in the solution. This leads to unstable and unreliable estimates.
Question 9
If X'X has eigenvalues [500, 200, 50, 0.5], what is the condition number?
A) 100 B) 500 C) 750.5 D) 1,000
Answer: D
Explanation: Condition number = largest eigenvalue / smallest eigenvalue = 500 / 0.5 = 1,000.
Question 10
Why does ordinary least squares often produce extreme coefficient estimates (e.g., +50 or -50) for basketball lineup data?
A) Players really do have extreme impacts B) The data contains errors C) Near-perfect collinearity inflates variance enormously D) The model is missing important variables
Answer: C
Explanation: Collinearity causes very small eigenvalues in X'X. When inverted, these become very large eigenvalues in (X'X)^(-1), which inflates the variance of coefficient estimates, producing extreme and unstable values.
Question 11
If two players always appear together in the data, what happens when you try to estimate their individual OLS coefficients?
A) Both coefficients equal zero B) Both coefficients are positive C) No unique solution exists D) The model fails to converge
Answer: C
Explanation: Perfect collinearity means the players' columns in X are linearly dependent, making X'X singular (not invertible). Any combination of coefficients that sums to their combined effect fits equally well.
Section 3: Ridge Regression (Questions 12-18)
Question 12
What does the ridge penalty term λ||β||² accomplish?
A) Removes outlier observations B) Shrinks coefficients toward zero C) Increases model complexity D) Eliminates collinearity entirely
Answer: B
Explanation: The ridge penalty adds a cost for large coefficient values, which shrinks estimates toward zero. This reduces variance at the cost of introducing bias.
Question 13
The ridge regression solution is β = (X'X + λI)^(-1)X'y. What is the effect of adding λI to X'X?
A) Reduces all eigenvalues by λ B) Increases all eigenvalues by λ C) Only affects the largest eigenvalue D) Makes the matrix sparse
Answer: B
Explanation: Adding λI increases all eigenvalues by λ. This ensures even the smallest eigenvalues are at least λ, which guarantees invertibility and reduces the condition number.
Question 14
In the Bayesian interpretation of ridge regression, what prior distribution is assumed for coefficients?
A) Uniform distribution B) Normal distribution centered at zero C) Exponential distribution D) No prior is assumed
Answer: B
Explanation: Ridge regression corresponds to placing a Normal(0, τ²) prior on coefficients, where λ = σ²/τ². This reflects belief that player impacts are typically near average (zero).
Question 15
As the ridge parameter λ increases from 0 to infinity, what happens to all coefficient estimates?
A) They increase toward infinity B) They become more variable C) They shrink toward zero D) They converge to the OLS solution
Answer: C
Explanation: As λ → ∞, the penalty term dominates, and minimizing λ||β||² requires β → 0. All coefficients shrink toward zero.
Question 16
What is the primary tradeoff in choosing the ridge parameter λ?
A) Speed vs. accuracy B) Bias vs. variance C) Offense vs. defense D) Individual vs. team effects
Answer: B
Explanation: Larger λ increases bias (shrinking true effects toward zero) but decreases variance (more stable estimates). The optimal λ minimizes total mean squared error by balancing these.
Question 17
Cross-validation for selecting λ involves:
A) Testing different lineup combinations B) Training on subset of data and evaluating prediction error on held-out data C) Comparing RAPM to other metrics D) Adjusting for opponent strength
Answer: B
Explanation: Cross-validation divides data into folds, trains the model on some folds, evaluates prediction error on held-out folds, and selects the λ that minimizes average test error.
Question 18
If adding λI changes the condition number from 10,000 to 50, what does this imply?
A) The model is now worse B) The model is now more stable C) More data is needed D) The λ is too small
Answer: B
Explanation: A lower condition number means the matrix is better conditioned, producing more stable estimates. The reduction from 10,000 to 50 indicates substantially improved stability.
Section 4: Model Implementation (Questions 19-22)
Question 19
Why should stints be weighted by possessions in RAPM estimation?
A) Longer stints have more players B) Longer stints provide more information and should influence estimates more C) Weighting normalizes for team pace D) All stints should be weighted equally
Answer: B
Explanation: A stint with 20 possessions contains more information than a stint with 2 possessions. Weighting by possessions ensures longer, more informative stints have appropriate influence on estimates.
Question 20
The formula for estimating possessions is approximately:
Poss ≈ FGA + 0.44*FTA + TOV - ORB
Why is the free throw coefficient 0.44 rather than 1.0?
A) Free throws are less valuable B) Most free throws come in pairs or triplets from a single possession C) Free throw percentage is typically 44% D) Historical convention
Answer: B
Explanation: A player shooting two free throws uses only one possession, not two. The 0.44 coefficient accounts for the mix of one-shot, two-shot, and three-shot trips to the line.
Question 21
When fitting separate O-RAPM and D-RAPM models, what sign convention makes D-RAPM intuitive?
A) Positive D-RAPM means allowing more points (bad defense) B) Positive D-RAPM means allowing fewer points (good defense) C) D-RAPM should always be zero D) D-RAPM uses absolute values
Answer: B
Explanation: The raw defensive model predicts points allowed, where positive coefficients mean more points allowed (bad). Negating these values makes positive D-RAPM indicate good defense.
Question 22
In a multi-year RAPM model using decay weights [1.0, 0.8, 0.6] for current year back to two years ago, what is the total weight for a stint from two years ago with 10 possessions?
A) 10 B) 6 C) 8 D) 0.6
Answer: B
Explanation: Total weight = possessions × year weight = 10 × 0.6 = 6.
Section 5: Interpretation and Application (Questions 23-25)
Question 23
A player has RAPM = +3.5 and played 2,000 minutes. Using 2.0 minutes per 100 possessions and 2.7 points per win, approximately how many wins did this player add?
A) 6.5 wins B) 10.0 wins C) 13.0 wins D) 35.0 wins
Answer: C
Explanation: Points added = 3.5 × (2000/200) = 35 points. Wins = 35/2.7 = 13.0 wins.
Question 24
Player A has RAPM = +4.0 ± 2.0 (95% CI: +0.0 to +8.0). Player B has RAPM = +3.0 ± 1.0 (95% CI: +1.0 to +5.0). Which statement is most accurate?
A) Player A is definitely better than Player B B) Player B is definitely better than Player A C) The players cannot be statistically distinguished with confidence D) Both players are above average with certainty
Answer: C
Explanation: Player A's confidence interval (+0.0 to +8.0) substantially overlaps with Player B's (+1.0 to +5.0). We cannot confidently conclude one is better than the other.
Question 25
What is a key limitation of RAPM that tracking data and box scores can help address?
A) RAPM cannot measure defensive impact B) RAPM cannot explain why a player is valuable C) RAPM requires too much data D) RAPM overvalues role players
Answer: B
Explanation: RAPM provides a total impact estimate but cannot decompose it into components (shooting, defense, playmaking, etc.). Box scores and tracking data provide this explanatory detail, which is why hybrid metrics combine both approaches.
Scoring Guide
- 23-25 correct: Excellent - Mastery of RAPM concepts
- 19-22 correct: Good - Strong understanding with minor gaps
- 15-18 correct: Satisfactory - Adequate understanding, review recommended
- 11-14 correct: Needs Improvement - Significant review required
- 0-10 correct: Unsatisfactory - Complete chapter review necessary