Chapter 9 Quiz: Scoring Rules and Proper Incentives

Test your understanding of scoring rules, properness, and their connection to prediction markets. Each question has one best answer. Try to answer each question before revealing the solution.


Question 1

What does a scoring rule take as input?

  • (A) A probability forecast and a confidence level
  • (B) A probability forecast and the actual outcome
  • (C) Two probability forecasts from different forecasters
  • (D) A probability forecast and the base rate
Answer **(B) A probability forecast and the actual outcome.** A scoring rule is a function $S(p, y)$ that takes the forecaster's stated probability $p$ and the realized outcome $y$ and produces a numerical score.

Question 2

What is the Brier score for a forecast of $p = 0.8$ when the event does NOT occur ($y = 0$)?

  • (A) 0.04
  • (B) 0.16
  • (C) 0.36
  • (D) 0.64
Answer **(D) 0.64.** $\text{BS} = (p - y)^2 = (0.8 - 0)^2 = 0.64$

Question 3

For the Brier score, lower values indicate:

  • (A) Worse forecasts
  • (B) Better forecasts
  • (C) More uncertain events
  • (D) Higher calibration error
Answer **(B) Better forecasts.** The Brier score ranges from 0 (perfect) to 1 (worst). Lower is better.

Question 4

What is the log score for a forecast of $p = 0.5$ when the event occurs ($y = 1$)?

  • (A) $-0.301$
  • (B) $-0.500$
  • (C) $-0.693$
  • (D) $-1.000$
Answer **(C) $-0.693$.** $\text{LS} = \ln(0.5) = -\ln(2) \approx -0.693$

Question 5

What happens to the log score as the forecast probability approaches 0 for an event that actually occurs?

  • (A) The score approaches 0
  • (B) The score approaches -1
  • (C) The score approaches $-\infty$ (negative infinity)
  • (D) The score approaches 1
Answer **(C) The score approaches $-\infty$.** $\text{LS} = \ln(p) \to -\infty$ as $p \to 0$. The log score imposes an infinite penalty for assigning near-zero probability to an event that actually occurs.

Question 6

A scoring rule is called "proper" if:

  • (A) It always assigns positive scores to correct predictions
  • (B) The forecaster's expected score is maximized by reporting their true belief
  • (C) It is bounded between 0 and 1
  • (D) It penalizes overconfidence more than underconfidence
Answer **(B) The forecaster's expected score is maximized by reporting their true belief.** A proper scoring rule satisfies $\mathbb{E}_q[S(q, y)] \geq \mathbb{E}_q[S(p, y)]$ for all $p$, meaning honest reporting is optimal.

Question 7

What is the difference between a "proper" and a "strictly proper" scoring rule?

  • (A) Strictly proper rules are bounded; proper rules are not
  • (B) Strictly proper rules have a unique optimum at the true belief; proper rules may have other optima
  • (C) Strictly proper rules work for multiple outcomes; proper rules only work for binary events
  • (D) There is no difference; the terms are synonymous
Answer **(B) Strictly proper rules have a unique optimum at the true belief; proper rules may have other optima.** A proper rule means truthful reporting is *at least* as good as any other report. A strictly proper rule means truthful reporting is *strictly better* than any other report -- there is no alternative report that achieves the same expected score.

Question 8

The linear scoring rule $S(p, y) = py + (1-p)(1-y)$ is:

  • (A) Strictly proper
  • (B) Proper but not strictly proper
  • (C) Improper
  • (D) Proper only for binary events
Answer **(C) Improper.** Under the linear rule, a forecaster whose true belief is $q > 0.5$ should report $p = 1$, and a forecaster with $q < 0.5$ should report $p = 0$. The optimal strategy is to always report extreme values, not the true belief.

Question 9

Which of the following is a correct statement about the Brier score decomposition?

  • (A) BS = Resolution - Calibration + Uncertainty
  • (B) BS = Calibration + Resolution + Uncertainty
  • (C) BS = Calibration - Resolution + Uncertainty
  • (D) BS = Calibration + Resolution - Uncertainty
Answer **(C) BS = Calibration - Resolution + Uncertainty.** A good forecaster has low calibration (well-calibrated) and high resolution (able to distinguish likely from unlikely events). Uncertainty depends only on the base rate and cannot be controlled by the forecaster.

Question 10

In the Brier decomposition, what does "resolution" measure?

  • (A) How close the forecaster's probabilities are to the observed frequencies
  • (B) How much the forecaster's probabilities vary across different situations
  • (C) How inherently difficult the forecasting problem is
  • (D) How quickly the forecaster updates their predictions
Answer **(B) How much the forecaster's probabilities vary across different situations.** A forecaster who always predicts 50% has zero resolution. Higher resolution means the forecaster's predictions are more spread out, which is good -- it indicates they can distinguish likely events from unlikely ones.

Question 11

Which scoring rule has a direct connection to KL divergence and information theory?

  • (A) Brier score
  • (B) Logarithmic score
  • (C) Spherical score
  • (D) Ranked Probability Score
Answer **(B) Logarithmic score.** The expected log score difference between truthful and non-truthful reporting equals the KL divergence, which is a fundamental quantity in information theory.

Question 12

The spherical scoring rule is best described as measuring the:

  • (A) Euclidean distance between the forecast and the outcome vector
  • (B) Cosine of the angle between the forecast vector and the outcome direction
  • (C) Entropy of the forecast distribution
  • (D) Absolute deviation of the forecast from the outcome
Answer **(B) Cosine of the angle between the forecast vector and the outcome direction.** The spherical score normalizes the probability vector to the unit sphere and measures its component in the direction of the realized outcome, which is equivalent to the cosine of the angle between them.

Question 13

Robin Hanson's key insight connecting scoring rules to market makers is:

  • (A) Market prices should equal Brier scores
  • (B) Each trader's payoff can be the change in score from their probability update
  • (C) Only the logarithmic score can be used in markets
  • (D) Scoring rules replace the need for market makers
Answer **(B) Each trader's payoff can be the change in score from their probability update.** Hanson showed that if trader $k$ moves the market probability from $p_{k-1}$ to $p_k$, their payoff should be $S(p_k, y) - S(p_{k-1}, y)$. Because the scoring rule is proper, each trader is incentivized to move the price to their true belief.

Question 14

The LMSR (Logarithmic Market Scoring Rule) is generated from which scoring rule?

  • (A) Brier score
  • (B) Spherical score
  • (C) Logarithmic score
  • (D) Ranked Probability Score
Answer **(C) Logarithmic score.** The LMSR is precisely the logarithmic scoring rule turned into a market maker using Hanson's sequential scoring construction. The LMSR's cost function $C = b\ln(\sum e^{q_i/b})$ arises from the log score.

Question 15

A forecaster who always predicts $p = 0.5$ for every event:

  • (A) Has perfect calibration and zero resolution
  • (B) Has zero calibration and perfect resolution
  • (C) Always achieves a Brier score of 0.25
  • (D) Both (A) and (C)
Answer **(D) Both (A) and (C).** A constant $p = 0.5$ forecaster is perfectly calibrated (if the base rate is 0.5) or close to it, and has exactly zero resolution because the forecasts never vary. Their Brier score is always $(0.5 - y)^2 = 0.25$ regardless of the outcome.

Question 16

Which of the following is NOT a property of the Brier score?

  • (A) It is strictly proper
  • (B) It is bounded between 0 and 1
  • (C) It can be decomposed into calibration, resolution, and uncertainty
  • (D) It is a "local" scoring rule
Answer **(D) It is a "local" scoring rule.** The Brier score is not local -- it depends on the entire probability vector, not just the probability assigned to the realized outcome. Only the logarithmic score is local: $\text{LS}(\mathbf{p}, j) = \ln(p_j)$ depends only on $p_j$.

Question 17

For binary events, the log score of a perfect forecast (p = 1 when y = 1) is:

  • (A) 1
  • (B) 0
  • (C) $-1$
  • (D) $+\infty$
Answer **(B) 0.** $\text{LS} = \ln(1) = 0$. The log score ranges from $-\infty$ to 0, with 0 being the best possible score.

Question 18

Why do platforms like Metaculus often clamp probabilities to a range like [0.01, 0.99]?

  • (A) To make the Brier score easier to compute
  • (B) To prevent infinite log scores from extreme forecasts
  • (C) To ensure all scoring rules give the same ranking
  • (D) To discourage participation from overconfident forecasters
Answer **(B) To prevent infinite log scores from extreme forecasts.** The log score goes to $-\infty$ when the forecast assigns probability 0 to the realized outcome. Clamping probabilities prevents a single catastrophically wrong forecast from resulting in an infinite penalty.

Question 19

If you believe an event has a 60% chance of occurring, and you are being scored with a strictly proper scoring rule, what should you report?

  • (A) 50%, to be safe
  • (B) 60%, your true belief
  • (C) 65%, to account for model uncertainty
  • (D) It depends on which scoring rule is being used
Answer **(B) 60%, your true belief.** The defining property of a strictly proper scoring rule is that your expected score is uniquely maximized by reporting your true belief. This holds regardless of which strictly proper rule is used.

Question 20

The "calibration" component of the Brier decomposition measures:

  • (A) How close forecasted probabilities are to 0 or 1
  • (B) The difference between forecasted probabilities and observed frequencies
  • (C) How often the forecaster is correct
  • (D) The variance of the forecaster's predictions
Answer **(B) The difference between forecasted probabilities and observed frequencies.** Calibration (reliability) measures whether, among all times a forecaster says "70%," the event happens roughly 70% of the time. It is computed as $\frac{1}{N}\sum_k n_k(\bar{p}_k - \bar{y}_k)^2$.

Question 21

In a multi-question tournament, why might a risk-averse forecaster choose to hedge (report less extreme probabilities)?

  • (A) Because hedging improves their expected score
  • (B) Because hedging reduces the variance of their total score at the cost of a slightly lower expected score
  • (C) Because proper scoring rules do not work for multiple questions
  • (D) Because hedging is always the optimal strategy
Answer **(B) Because hedging reduces the variance of their total score at the cost of a slightly lower expected score.** With a proper scoring rule, hedging always reduces expected score. However, it also reduces variance. A risk-averse forecaster who cares about their worst-case outcome (rather than their average outcome) might accept a small reduction in expected score for more predictable total performance.

Question 22

The Continuous Ranked Probability Score (CRPS) generalizes which scoring rule to continuous outcomes?

  • (A) Logarithmic score
  • (B) Spherical score
  • (C) Brier score (via the absolute error)
  • (D) Ranked Probability Score
Answer **(C) Brier score (via the absolute error).** The CRPS is the integral of the Brier score over all possible thresholds. It also reduces to the absolute error when the forecast is a point prediction, and to the Brier score for binary events.

Question 23

Which of the following correctly characterizes the sensitivity of the log score?

  • (A) Constant across all probability levels
  • (B) Highest near $p = 0.5$
  • (C) Highest near $p = 0$ and $p = 1$
  • (D) Lowest near $p = 0$ and $p = 1$
Answer **(C) Highest near $p = 0$ and $p = 1$.** The sensitivity of the log score is $\frac{1}{q(1-q)}$, which goes to infinity as $q$ approaches 0 or 1. This means the log score provides the strongest incentives for getting extreme probabilities right.

Question 24

Every strictly proper scoring rule can be written as $S(p, y) = G(p) + G'(p)(y-p)$ where $G$ is:

  • (A) Any differentiable function
  • (B) A strictly concave function
  • (C) A strictly convex function
  • (D) A linear function
Answer **(C) A strictly convex function.** The characterization theorem (due to Savage and Schervish) states that the space of strictly proper scoring rules corresponds exactly to the space of strictly convex functions $G$. The Brier score corresponds to $G(p) = p(1-p)$ and the log score corresponds to the negative entropy $G(p) = -[p\ln p + (1-p)\ln(1-p)]$.

Question 25

A company wants to incentivize accurate forecasting among its employees. Which of the following reward structures preserves the properness of the underlying scoring rule?

  • (A) The top 10% of forecasters receive a bonus
  • (B) Each forecaster's reward is a linear function of their score: $\text{Reward} = a + b \times \text{Score}$
  • (C) Forecasters are penalized only if their score falls below a threshold
  • (D) Forecasters' rewards depend on their relative ranking among peers
Answer **(B) Each forecaster's reward is a linear function of their score.** A linear transformation of a proper scoring rule preserves properness. The forecaster's optimal strategy remains truthful reporting. Options (A), (C), and (D) introduce nonlinear transformations or relative comparisons that can create incentives for strategic behavior such as hedging or extremizing.