Chapter 22 Quiz
Question 1
In a linear regression model predicting election vote margin, the coefficient on the polling average is 0.85. What is the correct interpretation?
A) A 1-point increase in polling average causes a 0.85-point increase in vote margin B) A 1-point increase in polling average is associated with a 0.85-point increase in the predicted vote margin, holding other predictors constant C) 85% of the variation in vote margin is explained by the polling average D) The polling average is 85% correlated with the vote margin
Answer: B The coefficient represents the expected change in the dependent variable for a one-unit change in the predictor, holding all other predictors constant. It does not imply causation (ruling out A), does not represent variance explained (ruling out C), and is not a correlation coefficient (ruling out D).
Question 2
Why is linear regression inappropriate for modeling binary prediction market outcomes (win/loss)?
A) Linear regression is too computationally expensive for binary data B) Linear regression cannot handle categorical predictor variables C) Linear regression can produce predicted values outside [0, 1], violating the probability interpretation D) Linear regression requires at least 1000 data points for binary outcomes
Answer: C The fundamental problem with applying linear regression to binary outcomes is that the predicted values can be less than 0 or greater than 1, making them uninterpretable as probabilities. Logistic regression solves this by using the sigmoid function to constrain outputs to (0, 1).
Question 3
In logistic regression, if the coefficient for "incumbency" is 1.2, what is the odds ratio?
A) 1.2 B) 2.4 C) 3.32 D) 0.30
Answer: C The odds ratio is $e^{\beta} = e^{1.2} \approx 3.32$. This means being an incumbent multiplies the odds of winning by approximately 3.32, compared to not being an incumbent.
Question 4
A logistic regression model for a prediction market produces $\hat{p} = 0.73$ for an event. The current market price is $0.65. Given a transaction cost of $0.02 per trade, what is the appropriate action?
A) Buy, because the edge ($0.08) exceeds the transaction cost B) Sell, because the market is overpriced C) Hold, because the edge may not be statistically significant D) Buy, but only if the Kelly criterion suggests a positive position
Answer: A The model's probability ($0.73) exceeds the market price ($0.65) by $0.08, which exceeds the transaction cost of $0.02. This suggests the contract is underpriced and buying is appropriate. While C and D raise valid practical considerations, the question asks about the appropriate action given the stated framework.
Question 5
What does the sigmoid function $\sigma(z) = \frac{1}{1 + e^{-z}}$ equal when $z = 0$?
A) 0 B) 0.25 C) 0.5 D) 1
Answer: C When $z = 0$: $\sigma(0) = \frac{1}{1 + e^0} = \frac{1}{1 + 1} = 0.5$. This is a fundamental property of the sigmoid function — when the linear predictor equals zero, the predicted probability is exactly 50%.
Question 6
Which regularization method performs automatic feature selection by setting some coefficients exactly to zero?
A) Ridge (L2) B) Lasso (L1) C) Elastic Net D) Both B and C
Answer: D Lasso (L1 regularization) can shrink coefficients exactly to zero, effectively removing features from the model. Elastic Net combines L1 and L2 penalties, and the L1 component gives it the same feature selection capability. Ridge (L2) shrinks coefficients toward zero but never sets them exactly to zero.
Question 7
Before applying regularization to a logistic regression model, features should be:
A) Logarithmically transformed B) Standardized (zero mean, unit variance) C) Converted to binary indicators D) Sorted by importance
Answer: B Regularization penalizes the magnitude of coefficients. If features are on different scales (e.g., polling percentage 0-100 vs. GDP growth 0-5), the penalty treats them unequally — features on larger scales get penalized more. Standardizing features ensures equal treatment by the penalty term.
Question 8
A time series is stationary if:
A) It always increases over time B) Its mean, variance, and autocovariance structure do not change over time C) It has no autocorrelation at any lag D) It can be perfectly predicted from its past values
Answer: B Weak (covariance) stationarity requires a constant mean, constant variance, and an autocovariance structure that depends only on the lag, not on the time index. A stationary series can still have autocorrelation (ruling out C) and is not perfectly predictable (ruling out D).
Question 9
The Augmented Dickey-Fuller (ADF) test has a null hypothesis that:
A) The series is stationary B) The series has a unit root (is non-stationary) C) The series has no autocorrelation D) The series residuals are normally distributed
Answer: B The ADF test's null hypothesis is that the series has a unit root (is non-stationary). A low p-value (typically < 0.05) leads to rejecting the null, concluding the series is stationary. This is the opposite of most hypothesis tests where we hope to fail to reject.
Question 10
If the ACF of a stationary time series cuts off after lag 2 and the PACF decays gradually, the appropriate model is:
A) AR(2) B) MA(2) C) ARMA(2, 2) D) ARIMA(0, 1, 2)
Answer: B The ACF cutting off after lag $q$ while the PACF tails off is the signature of an MA($q$) process. In this case, $q = 2$, so MA(2) is the appropriate model. An AR process would show the opposite pattern (PACF cuts off, ACF tails off).
Question 11
First differencing of a prediction market price series $P_t$ produces:
A) $P_t - P_{t-2}$ B) $P_t / P_{t-1}$ C) $P_t - P_{t-1}$ D) $\log(P_t) - \log(P_{t-1})$
Answer: C First differencing computes the change between consecutive values: $\Delta P_t = P_t - P_{t-1}$. This is the simplest form of differencing and is often sufficient to induce stationarity in trending series. Option D describes log-returns, which is also a common transformation but is not "first differencing."
Question 12
In an ARIMA(1, 1, 0) model for prediction market prices, the model is:
A) An AR(1) model applied to the raw price levels B) An AR(1) model applied to the first differences of prices C) A MA(1) model applied to the first differences of prices D) An integrated model that requires two rounds of differencing
Answer: B ARIMA($p$, $d$, $q$) = ARIMA(1, 1, 0) means: $p=1$ (AR order 1), $d=1$ (one round of differencing), $q=0$ (no MA terms). So it is an AR(1) model applied to the first-differenced series.
Question 13
A GARCH(1,1) model is useful for prediction markets primarily because:
A) It predicts the direction of price movements B) It captures time-varying volatility, useful for position sizing and risk management C) It eliminates all autocorrelation in the price series D) It produces the best long-term price forecasts
Answer: B GARCH models capture the clustering of volatility — the tendency for volatile periods to follow volatile periods. For prediction market traders, knowing the current volatility regime is essential for position sizing and risk management. GARCH does not predict price direction (A), does not address autocorrelation in the mean (C), and is not specifically for long-term forecasting (D).
Question 14
Why does standard k-fold cross-validation fail for time series data?
A) It is too computationally expensive for time series B) It randomly mixes past and future data, creating lookahead bias C) It requires the data to be normally distributed D) It cannot handle autocorrelated residuals
Answer: B Standard k-fold randomly assigns observations to folds, which means training data can contain observations from after the test period. This lookahead bias allows the model to use future information when predicting the past, producing unrealistically optimistic performance estimates. Walk-forward validation preserves temporal ordering.
Question 15
In walk-forward validation with an expanding window, how does the training set change at each step?
A) It stays the same size but shifts forward B) It grows by adding new observations while keeping all previous ones C) It randomly resamples from the available data D) It shrinks to focus on the most recent data
Answer: B An expanding window starts from the beginning of the data and grows with each step. At step $t$, the training set includes all observations from the beginning through time $t$. This contrasts with a rolling window (A), which maintains a fixed size by dropping old observations as new ones are added.
Question 16
A model achieves log-loss of 0.68 on training data and 0.85 on walk-forward validation data. This suggests:
A) The model is well-calibrated B) The model is underfitting C) The model is overfitting D) The model's predictions are too conservative
Answer: C A large gap between training performance and validation performance (0.68 vs. 0.85, lower is better for log-loss) is a classic sign of overfitting. The model has learned patterns in the training data that do not generalize to new data. Remedies include regularization, feature reduction, or more training data.
Question 17
The AIC (Akaike Information Criterion) for model selection:
A) Always selects the simplest model B) Always selects the model with the best fit C) Balances model fit against model complexity, penalizing extra parameters D) Is only valid for linear regression models
Answer: C AIC = $-2\ell + 2k$, where $\ell$ is the log-likelihood and $k$ is the number of parameters. It penalizes model complexity (more parameters increase the second term) while rewarding fit (better fit decreases the first term). It does not always select the simplest or best-fitting model, but rather seeks a balance.
Question 18
Compared to AIC, BIC (Bayesian Information Criterion):
A) Applies a weaker penalty for additional parameters B) Applies a stronger penalty for additional parameters when sample size exceeds 7 C) Does not account for sample size D) Is always preferred over AIC for prediction tasks
Answer: B BIC's penalty is $k \ln(n)$ versus AIC's $2k$. Since $\ln(n) > 2$ when $n > e^2 \approx 7.39$, BIC penalizes extra parameters more heavily for any reasonable sample size. This makes BIC favor simpler models. For prediction tasks, AIC is often preferred because it is optimized for predictive accuracy, not model identification.
Question 19
The Brier score for a set of probability predictions is:
A) The average absolute difference between predicted and actual outcomes B) The average squared difference between predicted probabilities and actual outcomes (0 or 1) C) The log-likelihood of the predictions D) The area under the ROC curve
Answer: B The Brier score is $\frac{1}{n}\sum_{i=1}^n (\hat{p}_i - y_i)^2$, the mean squared difference between predicted probabilities and actual binary outcomes. It ranges from 0 (perfect) to 1 (worst). Like log-loss, it is a proper scoring rule that rewards calibrated probability estimates.
Question 20
A calibration plot shows that when the model predicts probability 0.7, events occur approximately 70% of the time. This means:
A) The model has high accuracy B) The model is well-calibrated C) The model has high discrimination power D) The model is overfitting
Answer: B A model is well-calibrated when its predicted probabilities match the observed frequencies. If 70% of events predicted at probability 0.7 actually occur, the model's confidence is well-matched to reality. Calibration is distinct from accuracy (overall correctness) and discrimination (ability to separate positive and negative cases).
Question 21
When converting a logistic regression probability to a trading signal, a "threshold" is used primarily to:
A) Determine the model's classification accuracy B) Account for transaction costs and model uncertainty C) Set the maximum position size D) Filter out events with low probability
Answer: B The threshold ensures that the model's perceived edge exceeds the cost of trading (bid-ask spread, fees) and accounts for model uncertainty. Without a threshold, the trader would execute many trades with edges smaller than the transaction costs, losing money on each one.
Question 22
The Kelly criterion for position sizing in prediction markets:
A) Always recommends betting the maximum amount B) Sizes positions proportionally to the perceived edge relative to the odds C) Requires the model to be perfectly calibrated D) Only works for binary outcomes
Answer: B The Kelly criterion sizes bets proportionally to the edge (model probability minus market-implied probability) divided by the odds. Larger perceived edges lead to larger positions. In practice, fractional Kelly (e.g., half-Kelly) is used to account for estimation error. The Kelly criterion works for various outcome types, not just binary.
Question 23
In the context of prediction markets, what does "lookahead bias" mean?
A) The model predicts too far into the future B) The model uses information that would not have been available at the time of prediction C) The model's predictions are biased toward recent events D) The model over-weights future events relative to past events
Answer: B Lookahead bias occurs when a model incorporates information from the future — data that would not have been available when the prediction was actually made. In walk-forward validation, this happens when the test period's data leaks into the training set through feature computation, standardization, or data splitting. It produces overly optimistic backtests.
Question 24
For a prediction market contract expiring in 10 days, which modeling approach is most appropriate for short-term price forecasting?
A) Linear regression on historical base rates alone B) ARIMA on recent price changes combined with GARCH for volatility C) Logistic regression using only economic fundamentals D) A simple moving average crossover strategy
Answer: B With 10 days to expiry, short-term price dynamics matter. ARIMA captures autocorrelation in recent price changes for mean forecasting, while GARCH captures the time-varying volatility that is especially relevant near expiry. Economic fundamentals (C) change too slowly to drive 10-day forecasts, and a simple moving average (D) does not provide probabilistic forecasts.
Question 25
A regime-switching model for prediction market prices identifies two regimes. Regime 1 has low volatility and small price changes; Regime 2 has high volatility and larger price changes. The model estimates 85% probability that the market is currently in Regime 1. How should a trader use this information?
A) Ignore it and trade normally B) Use smaller position sizes appropriate for the calm regime, but keep a risk buffer in case of a regime switch C) Only trade during Regime 2 when volatility is high D) Double position sizes because the market is calm
Answer: B Knowing the current regime helps calibrate position sizing and risk management. In a low-volatility regime, normal position sizes are appropriate, but the trader should maintain awareness that regime switches can occur rapidly (the 15% probability of being in Regime 2 is non-trivial). Increasing positions because of calm conditions (D) is dangerous because a regime switch could cause large losses. Only trading in Regime 2 (C) misses the steady opportunities in the calm regime.