Chapter 42 Quiz: Research Frontiers

Instructions: Answer all 25 questions. This quiz is worth 100 points. You have 75 minutes. A calculator is permitted; no notes or internet access. For multiple choice, select the single best answer.

Section 1: Multiple Choice (10 questions, 3 points each = 30 points)

Question 1. Which of the following is an example of an "open problem" in sports betting research as defined in Chapter 42?

(A) How to convert American odds to decimal odds

(B) How to optimally size bets when the edge estimate itself is uncertain

(D) How to read a box score

Answer

**(B) How to optimally size bets when the edge estimate itself is uncertain.** This is the "optimal dynamic bet sizing under uncertainty" problem discussed in Section 42.1. The Kelly Criterion provides the optimal answer when edge is known precisely, but in practice edge estimates carry significant uncertainty. The question of how to account for this meta-uncertainty remains largely unsolved. Options A, C, and D are solved problems covered in earlier chapters.

Question 2. In a Directed Acyclic Graph (DAG), a collider is a variable that:

(A) Causes two or more other variables

(B) Is caused by two or more parent variables

(D) Is unrelated to all other variables

Answer

**(B) Is caused by two or more parent variables.** A collider is a node where two or more arrows point inward (two or more parents). For example, "Highlight Reel Appearance" might be caused by both "Scoring Ability" and "Media Market Size." Conditioning on a collider can create a spurious association between its parents, which is why understanding colliders is critical for causal inference. This is sometimes called "collider bias" or "Berkson's paradox."

Question 3. The exclusion restriction for an instrumental variable requires that:

(A) The instrument is correlated with the treatment variable

(B) The instrument affects the outcome only through its effect on the treatment

(D) The instrument has a large F-statistic in the first stage

Answer

**(B) The instrument affects the outcome only through its effect on the treatment.** The exclusion restriction states that the instrument has no direct effect on the outcome except through the treatment variable. For example, if altitude is used as an instrument for pace (treatment) on winning (outcome), the exclusion restriction requires that altitude does not affect winning through any channel other than pace. This is often the most difficult assumption to defend in practice.

Question 4. In the multi-armed bandit framework, Thompson Sampling selects arms by:

(A) Always choosing the arm with the highest observed win rate

(B) Choosing each arm with equal probability

(D) Choosing the arm with the most pulls

Answer

**(C) Sampling from each arm's posterior distribution and selecting the arm with the highest sample.** Thompson Sampling maintains a posterior distribution (e.g., Beta distribution for Bernoulli rewards) for each arm's reward probability. At each round, it draws one sample from each arm's posterior and selects the arm whose sample is highest. This naturally balances exploration and exploitation: arms with high uncertainty will occasionally produce high samples (exploration), while arms with high estimated probabilities will frequently produce high samples (exploitation).

Question 5. A regression discontinuity design is most appropriate when:

(A) Treatment is randomly assigned to all participants

(B) Treatment is determined by whether a continuous running variable crosses a sharp threshold

(D) The researcher can conduct a randomized controlled trial

Answer

**(B) Treatment is determined by whether a continuous running variable crosses a sharp threshold.** RDD exploits situations where treatment assignment is determined by a continuous variable crossing a cutoff. In sports, examples include the playoff qualification cutoff (based on wins) or draft order cutoff (based on regular-season record). The key assumption is that units just above and just below the cutoff are comparable in all respects except treatment, making the cutoff a quasi-random assignment mechanism.

Question 6. Kyle's lambda measures:

(A) The probability of informed trading in a market

(B) The permanent price impact of order flow

(D) The correlation between opening and closing lines

Answer

**(B) The permanent price impact of order flow.** Kyle's lambda ($\lambda$) from his seminal 1985 model measures how much prices move in response to net order flow. A higher lambda indicates greater price impact per unit of net flow, which typically reflects higher information asymmetry in the market. In betting markets, lambda captures how much the line moves in response to net signed betting volume.

Question 7. Which natural experiment is described in Chapter 42 as useful for studying the causal effect of home-court advantage?

(A) The introduction of the three-point line in basketball

(B) The 2020 NBA Bubble games played without fans

(D) The introduction of VAR in soccer

Answer

**(B) The 2020 NBA Bubble games played without fans.** The COVID-era bubble environments removed home-court factors (crowd, travel, familiar surroundings) while keeping the game itself unchanged. Comparing performance in bubble games versus normal games provides a natural experiment on the causal effect of crowd presence and other home-court factors. This is one of the cleanest natural experiments in sports because the "treatment" (no fans) was determined by an external pandemic, not by any characteristic of the teams.

Question 8. In the Markov Decision Process formulation of sports betting, the "state" at any point in time includes:

(A) Only the current bankroll

(B) Only the model's predictions for today's games

(D) Only the outcomes of the last 10 bets

Answer

**(C) The current bankroll, account status, active bets, model predictions, and position in the season.** The state in the betting MDP must capture all information relevant to future decisions. This includes the bankroll (constraining future bet sizes), sportsbook account status (affecting access), active bets (affecting exposure), model predictions for upcoming games (affecting opportunities), and position in the season (affecting future opportunity flow). A richer state representation enables better policies but requires more data to learn.

Question 9. The decomposition of vigorish into adverse selection, inventory cost, and operating cost is useful because:

(A) It tells you the exact amount of each component for any given bet

(B) It helps identify which markets may be more beatable based on the composition of the vig

(D) It proves that all markets with high vig are inefficient

Answer

**(B) It helps identify which markets may be more beatable based on the composition of the vig.** If most of the vig in a market compensates for adverse selection (risk of trading against informed bettors), the market is informationally rich and hard to beat. If most of the vig covers operating costs, the underlying pricing may be less accurate, creating opportunities. This decomposition provides a structural understanding of market difficulty that complements empirical performance metrics.

Question 10. Conformal prediction provides:

(A) A single point estimate with maximum accuracy

(B) Prediction intervals with guaranteed coverage properties

(D) Optimal bet sizing recommendations

Answer

**(B) Prediction intervals with guaranteed coverage properties.** Conformal prediction is a framework for constructing prediction intervals (or sets) that have a mathematically guaranteed coverage rate. For example, a 90% conformal prediction interval for a game's home win probability will contain the true outcome at least 90% of the time, regardless of the underlying model. This honest uncertainty quantification is directly useful for bet sizing --- wider intervals should lead to smaller bets.

Section 2: True/False with Justification (5 questions, 4 points each = 20 points)

Question 11. "A strong instrument (first-stage F-statistic > 10) guarantees that the IV estimate is unbiased."

Answer

**False.** A strong instrument (F > 10) reduces the finite-sample bias of the IV estimator but does not guarantee unbiasedness. Unbiasedness of the IV estimate also requires: (a) the exclusion restriction holds (the instrument affects the outcome only through the treatment), and (b) the independence assumption holds (the instrument is uncorrelated with unobserved confounders). A strong instrument that violates the exclusion restriction will produce a biased estimate with misleadingly small standard errors.

Question 12. "Thompson Sampling will always converge to the optimal arm eventually."

Answer

**True** (in the limit). Thompson Sampling is provably optimal in the sense that it achieves the Lai-Robbins lower bound on regret asymptotically. Because it always maintains some probability of sampling every arm (the posterior never collapses entirely), it will eventually try all arms enough times to identify the optimal one. However, the convergence rate depends on how distinguishable the arms are: if two arms have very similar true win rates, convergence takes longer.

Question 13. "In a regression discontinuity design, the running variable (e.g., win total) must be randomly assigned."

Answer

**False.** The running variable in an RDD is typically not randomly assigned --- it is a deterministic or quasi-deterministic function of outcomes. What matters is that units cannot precisely manipulate their position relative to the cutoff. If teams could precisely choose to be just above or below the playoff cutoff, the design would be invalid. But in practice, game outcomes have enough randomness that precise manipulation is impossible, which is what makes the RDD valid. The key assumption is local continuity of potential outcomes at the cutoff, not random assignment of the running variable.

Question 14. "Markets with higher Kyle's lambda are better for informed bettors because their bets move the line more."

Answer

**False.** Higher Kyle's lambda means greater price impact per unit of order flow, which is actually worse for informed bettors. A high lambda means that even small bets by informed traders move the price significantly against them, reducing the profit they can extract from their information. Informed bettors prefer markets with low lambda (high liquidity, many uninformed participants), where they can place larger bets with minimal price impact. This is why professional bettors value high-volume, low-margin markets.

Question 15. "Reinforcement learning for sports betting is limited by the fact that real-world betting seasons cannot be simulated faster than real time."

Answer

**False** (partially). While real-world data arrives at the pace of actual seasons, RL training is not limited to real-time. Simulation environments can generate thousands of synthetic seasons rapidly, allowing RL agents to train on far more experience than any individual could accumulate in a lifetime. The challenge is the "sim-to-real gap" --- the difference between the simulated environment and real betting markets. If the simulation is unrealistic, the trained policy may not transfer. But the speed limitation is in simulation fidelity, not in computational speed.

Section 3: Short Answer (5 questions, 6 points each = 30 points)

Question 16. Explain the concept of "adverse selection" in betting markets. How does a sportsbook's response to adverse selection differ between sharp and recreational bettors?

Answer

**Adverse selection** occurs when one party in a transaction has information that the other party lacks. In betting markets, informed (sharp) bettors have better probability estimates than the sportsbook, creating adverse selection risk for the book --- when a sharp bettor places a bet, the sportsbook is more likely to be on the wrong side. Sportsbooks respond differently to adverse selection from sharp vs. recreational bettors: **For sharp bettors:** The sportsbook (a) moves the line in response to sharp bets, treating them as information signals, (b) limits bet sizes for known sharp accounts, reducing exposure, and (c) may eventually close sharp accounts entirely. **For recreational bettors:** The sportsbook (a) does not move lines in response to recreational bets (they are treated as noise), (b) accepts large bets because recreational bettors are net losers, and (c) actively courts recreational bettors through promotions and user experience. Some sophisticated sportsbooks (e.g., Pinnacle) actually welcome sharp bettors early in the week to extract pricing information, then use that information to set more accurate lines for the broader market. This is analogous to a market maker paying for information through adverse selection losses.

Question 17. What is the "sim-to-real gap" in reinforcement learning for sports betting, and why does it make RL deployment challenging?

Answer

The sim-to-real gap refers to the difference between a training simulation and the real-world environment. An RL agent trained in simulation may learn policies that exploit features of the simulation that do not exist in reality, leading to poor real-world performance. In sports betting, the sim-to-real gap manifests in several ways: 1. **Odds availability:** Simulated odds may not reflect the actual odds available at the moment of placement. Real odds change rapidly, and the odds assumed in simulation may not be achievable. 2. **Account limitations:** Simulations rarely model the fact that winning bettors get limited or banned. A policy that assumes unlimited access will not work in practice. 3. **Market impact:** Simulations typically assume no price impact from the agent's bets. In reality, large bets can move lines, especially at smaller sportsbooks. 4. **Edge dynamics:** Simulations often use stationary edge distributions, but real edges change over time as markets adapt. 5. **Execution latency:** Real bet placement involves delays between decision and execution, during which odds may change. These differences mean that a policy achieving excellent returns in simulation may fail in real deployment. Bridging the gap requires increasingly realistic simulations, which in turn requires detailed real-world data about market behavior.

Question 18. Describe a specific natural experiment in sports that could be used to estimate a causal effect. Specify the treatment, the source of quasi-random variation, the outcome, and one key assumption that must hold.

Answer

**Natural experiment:** The effect of a mid-season coaching change on team performance. **Treatment:** Receiving a new head coach mid-season. **Source of quasi-random variation:** While coaching changes are not truly random, they are often triggered by losing streaks that include significant randomness. Two teams with identical true quality might have very different win records through 30 games due to random variation in close-game outcomes. One team fires its coach (treatment), the other does not (control). **Outcome:** Team performance over the next 20 games, measured by point differential or win rate. **Key assumption:** The "regression to the mean" assumption must be addressed. Teams fire coaches during losing streaks, which are often partly due to bad luck. Subsequent improvement may reflect regression to the mean rather than a coaching effect. The study must compare post-change performance to an appropriate counterfactual --- not to the pre-change losing streak, but to what would have been expected given the team's true quality. One approach: compare the improvement of teams that fired their coach to the improvement of teams in similar losing streaks that did not fire their coach. The difference-in-differences estimator can separate the coaching effect from regression to the mean.

Question 19. A bettor is considering whether to add a new sport (tennis) to their portfolio. Using the multi-armed bandit framework, describe how they should approach this decision. What is the "arm" and what is the "reward"?

Answer

In the bandit framework for sport selection: **Arm:** Each sport/market combination is an arm. The new arm is "tennis match winner bets." Existing arms might include "NFL sides," "NBA totals," "MLB moneylines," etc. **Reward:** The per-bet ROI or a binary win/loss signal for each bet placed in that sport. **Approach using Thompson Sampling:** 1. Initialize the tennis arm with a weak prior (e.g., Beta(1, 1) for binary rewards), reflecting no prior information about profitability. 2. Allocate a small fraction of daily betting capital to the tennis arm. Thompson Sampling will initially explore tennis with moderate frequency because the wide prior creates high-variance samples. 3. As bets settle, update the posterior for the tennis arm. If tennis produces positive results, the posterior shifts upward and Thompson Sampling will allocate more capital. If results are poor, the posterior shifts downward and allocation decreases. 4. Over time, the algorithm will converge: if tennis is genuinely profitable, it will receive substantial allocation. If not, allocation will naturally shrink to near zero. **Key advantage:** The bettor does not need to make a binary "add or don't add" decision upfront. The bandit framework provides a principled way to explore the new sport with limited risk while gathering evidence. This is superior to both "jump in fully" (too risky) and "never try" (misses opportunity).

Question 20. Explain how graph neural networks (GNNs) could improve sports prediction compared to traditional feature-based models. Give a specific example of a relational structure in sports that a GNN could capture.

Answer

Traditional feature-based models represent each game as a flat vector of features (team stats, rest days, etc.), discarding the relational structure of the data. Graph neural networks operate on graph-structured data, where nodes represent entities and edges represent relationships, enabling them to capture complex interactions. **Specific example: NBA player interaction networks.** In the NBA, team performance depends not just on individual player quality but on how players interact. A "player interaction graph" would have: - **Nodes:** Individual players, with features like scoring rate, usage rate, defensive rating. - **Edges:** Co-appearance relationships. An edge between two players is weighted by the minutes they play together and the on-court performance metrics of their pairing. A GNN on this graph could learn that: - Player A and Player B have a synergistic effect (their combined performance exceeds the sum of their individual contributions). - A lineup change (Player C replacing Player D) disrupts established player interactions, affecting team performance beyond what the individual player swap suggests. - A team's strength depends on the specific combination of players available, not just the aggregate talent level. Traditional models would need hand-crafted interaction features to capture these effects, and they would miss complex higher-order interactions. A GNN automatically learns the relevant interaction patterns from data. For betting applications, this means a GNN could better predict the impact of lineup changes, rest-related absences, and mid-season trades --- situations where the relational structure changes and traditional models based on team-level aggregates are slow to adapt.

Section 4: Code Analysis (5 questions, 4 points each = 20 points)

Question 21. In the InstrumentalVariableEstimator class, the first-stage F-statistic is computed as:

f_stat = (
    (ss_res_reduced - ss_res_full) / k_instruments
) / (ss_res_full / (n - k_full))

What does each term represent, and why is F > 10 used as a rule of thumb for instrument strength?

Answer

- `ss_res_reduced`: Sum of squared residuals from a model without the instrument (just the intercept and controls). This represents unexplained variation when the instrument is excluded. - `ss_res_full`: Sum of squared residuals from the first-stage regression including the instrument. This represents unexplained variation when the instrument is included. - `k_instruments`: Number of instruments (degrees of freedom used by the instruments). - `n - k_full`: Residual degrees of freedom in the full first-stage model. The numerator measures how much the instrument reduces unexplained variation (per degree of freedom). The denominator normalizes by the remaining variation (per degree of freedom). A large F-statistic means the instrument explains a substantial share of variation in the treatment. The F > 10 rule of thumb comes from Staiger and Stock (1997), who showed that with weak instruments (F < 10), the 2SLS estimator has severe finite-sample bias --- potentially worse than OLS bias --- and confidence intervals have incorrect coverage. F > 10 ensures that the bias of 2SLS relative to OLS is less than 10%.

Question 22. The ThompsonSamplingBandit uses np.random.beta(self.alpha[i], self.beta[i]) to select arms. After 50 pulls of an arm with 30 wins, what are the alpha and beta parameters? What is the posterior mean?

Answer

Starting from the uniform prior (alpha = 1, beta = 1): - After 30 wins out of 50 pulls: 30 wins update alpha, 20 losses update beta. - **Alpha = 1 + 30 = 31** - **Beta = 1 + 20 = 21** **Posterior mean** = alpha / (alpha + beta) = 31 / 52 = **0.596** (approximately 59.6%) The posterior variance = (31 x 21) / (52^2 x 53) = 651 / 143,312 = 0.00454, giving a posterior standard deviation of approximately 0.067. A 95% credible interval would be approximately [0.46, 0.73], reflecting substantial remaining uncertainty about the true win rate. Thompson Sampling's posterior naturally captures this uncertainty and uses it to balance exploration and exploitation.

Question 23. The BettingEnvironment.step() method includes:

stake = self.bankroll * np.clip(
    action['stake_fraction'], 0, 0.1
)

Why is the stake fraction clipped to a maximum of 0.1? How does this constraint interact with the Kelly Criterion?

Answer

The clip to 0.1 (10% of bankroll) serves as a **safety constraint** that prevents the RL agent from making catastrophically large bets during training. Without this constraint, an untrained agent exploring random actions could bet 50% or 100% of the bankroll on a single game, potentially going bankrupt immediately. This would make training extremely difficult because episodes would end prematurely. **Interaction with Kelly:** The full Kelly fraction for typical sports betting edges (2-8%) at typical odds (-110 to +150) is usually between 2-8% of bankroll. The 10% cap is therefore generous enough to allow near-full-Kelly bets for most realistic edges, while preventing the agent from overbetting in pathological cases. In practice, this mirrors the real-world approach of using "maximum bet size" constraints alongside Kelly. Even with a large estimated edge, no single bet should risk more than 10% of the bankroll because (a) edge estimates are uncertain, (b) a sequence of unlucky bets at high stakes could cause ruin, and (c) the geometric growth rate drops sharply for overbetting beyond full Kelly.

Question 24. The SimplePolicyGradient.update() method uses:

if returns.std() > 0:
    returns = (returns - returns.mean()) / (returns.std() + 1e-8)

This is called "return normalization." Explain why this variance reduction technique is important for policy gradient methods and what would happen without it.

Answer

Return normalization serves two purposes: 1. **Centering (subtracting the mean):** This acts as a baseline. Without centering, all actions in episodes with positive total return receive positive reinforcement, even bad actions. By subtracting the mean, actions that produced above-average returns are reinforced and actions that produced below-average returns are discouraged. This dramatically reduces the variance of the gradient estimate. 2. **Scaling (dividing by std):** This normalizes the gradient magnitude across episodes. Without scaling, episodes with large absolute returns would dominate the gradient update, causing unstable training. Scaling ensures that the learning rate has a consistent effect regardless of the reward scale. **Without normalization:** - The policy gradient would have very high variance, requiring many more episodes to converge. - The learning rate would need to be very small to prevent instability in high-reward episodes, but this would make learning painfully slow in low-reward episodes. - In sports betting environments where individual bet rewards are small and noisy, the gradient signal would be almost entirely noise, making learning nearly impossible. The `1e-8` term prevents division by zero when all returns in an episode are identical (which would give std = 0).

Question 25. The chapter discusses the "price discovery process" in betting markets:

$$P_{t+1} = P_t + \lambda \cdot \text{OrderFlow}_t + \varepsilon_t$$

A sportsbook opens an NFL game at Home -3 (-110/-110). In the first hour, it receives \$50,000 in bets on the home team and \$20,000 on the away team. If lambda = 0.00001, what is the expected new line? Express your answer as the new spread.

Answer

Net order flow = \$50,000 - \$20,000 = +\$30,000 (positive indicates net flow toward the home team). Price change = lambda x OrderFlow = 0.00001 x 30,000 = 0.30 The price in this context is the spread. The opening spread is -3 (home favored by 3). The positive order flow on the home team indicates the market believes the home team is even stronger, so the line should move further in the home team's direction. **New spread: Home -3.3** (or approximately -3.5 when rounded to the standard half-point increment). The interpretation: the \$30,000 in net home-team flow caused the line to move 0.3 points toward the home team. Whether this flow contains genuine information or is just recreational money depends on the proportion of informed bettors in the flow --- which is what the PIN model and other microstructure tools help estimate. Note: A lambda of 0.00001 is illustrative. Real price impact parameters depend on the specific market, time of week, and total volume. NFL games with millions in handle would have much smaller lambda than niche markets with thin volume.