Chapter 31 Exercises: The Complete ML Betting Pipeline

Part A: Conceptual (Exercises 1--8)

Exercise 1. Describe the six major components of a production ML betting pipeline (data ingestion, feature store, model training, model serving, bet execution, and monitoring). For each component, explain its purpose, its inputs and outputs, and what happens to the rest of the pipeline if that component fails. Draw a dependency diagram showing which components must complete before others can run.

Exercise 2. Compare monolithic and microservice architectures for a sports betting pipeline. A solo bettor processing 50 bets per day across two sports is choosing between the two approaches. List three advantages of starting with a monolith and two scenarios where migrating to microservices would become necessary. Explain how a well-structured monolith can be refactored into services later.

Exercise 3. Explain the concept of feature versioning and why it is critical in a production pipeline. Provide a concrete scenario where a mismatch between training-time and serving-time feature definitions could silently corrupt predictions. Describe the metadata that should accompany every stored feature value and how a feature store enforces consistency.

Exercise 4. Define model calibration and explain why it matters more for betting systems than for typical classification tasks. A model outputs a predicted home-win probability of 0.65 for a game where the market-implied probability is 0.60. Explain how miscalibration would affect the Kelly criterion bet size calculation. Describe two methods for recalibrating a trained model (Platt scaling and isotonic regression) and their tradeoffs.

Exercise 5. Describe three different scheduling strategies for a betting pipeline: pure cron-based batch processing, event-driven (triggered by new data arrival), and hybrid (batch plus real-time components). For each, list two sports or bet types where it is the most appropriate architecture and explain why. Discuss how latency requirements differ between pre-game and live betting contexts.

Exercise 6. Explain the concept of A/B testing in the context of model deployment for a betting pipeline. A team has a new model (model B) that shows 2% better Brier score in backtesting compared to the production model (model A). Describe how you would design a live A/B test: the traffic splitting strategy, the minimum sample size needed to detect a meaningful difference, the metrics you would track, and the decision criteria for promoting model B.

Exercise 7. Explain model drift and data drift in the context of a sports betting system. Provide two concrete examples of each: one caused by a change in the sport itself (e.g., a rule change) and one caused by a change in the data pipeline (e.g., a provider changing their API schema). Describe how you would build automated drift detection using the Population Stability Index (PSI) and Kolmogorov-Smirnov tests.

Exercise 8. Discuss the ethical and legal considerations of automating bet execution. Address the following: (a) the distinction between automated execution and bot-based exploitation of sportsbook promotions, (b) the responsibility of the system operator when a bug causes unintended large bets, (c) the importance of daily loss limits and kill switches, and (d) the legal status of automated betting in at least two jurisdictions.

Part B: Calculations (Exercises 9--15)

Exercise 9. A model predicts a home-win probability of 0.58. The sportsbook offers the home team at American odds of +110. (a) Convert the odds to implied probability. (b) Compute the expected value per dollar wagered. (c) Using the Kelly criterion with a bankroll of $10,000, compute the optimal bet size. (d) Compute the quarter-Kelly bet size and explain why fractional Kelly is preferred in practice.

Exercise 10. A pipeline ingests odds from three sportsbooks. For a given game, the best available line is -3.5 at book A (-110), -3 at book B (-115), and -4 at book C (-105). Your model predicts the home team wins by 5.2 points with a standard deviation of 10.8 points. (a) For each line, compute the probability that the home team covers. (b) Identify the line offering the best expected value. (c) If you can only bet at one book, which do you choose and what is the edge?

Exercise 11. A model was trained on 2,400 games with 5-fold time-series cross-validation. The fold-level log-loss values were [0.6723, 0.6698, 0.6641, 0.6615, 0.6589]. (a) Compute the mean and standard deviation of cross-validation log-loss. (b) Is there evidence of improving model performance over time? What might explain this pattern? (c) If the baseline (always predicting 50%) has a log-loss of 0.6931, compute the percentage improvement of the model.

Exercise 12. A risk management system sets the following constraints: maximum single bet $200, maximum daily exposure $1,000, maximum correlated exposure to a single team $400. The system has already placed: Bet 1 ($150 on Team A moneyline), Bet 2 ($100 on Team A spread), Bet 3 ($200 on Team B total). A new signal recommends a $250 bet on Team A player prop. (a) Does this bet violate the single-bet limit? (b) Does it violate the daily exposure limit? (c) Does it violate the correlated exposure limit? (d) What is the maximum allowable size for this bet?

Exercise 13. A monitoring system tracks the model's predicted probabilities against actual outcomes using calibration buckets. In the 0.55--0.65 bucket, the model made 120 predictions with a mean predicted probability of 0.60 and an actual win rate of 0.52. (a) Compute the calibration error for this bucket. (b) If the model is used to size bets using Kelly criterion, quantify the impact of this miscalibration on expected ROI per bet in this bucket. (c) Suggest a correction factor that could be applied at serving time.

Exercise 14. An Elo rating system uses K=20 and a home advantage of 100 rating points. Team A (rating 1600) hosts Team B (rating 1520). (a) Compute the expected score for Team A. (b) If Team A wins, compute the new ratings for both teams. (c) If Team B wins the upset, compute the new ratings. (d) How many consecutive wins against 1500-rated opponents would it take for a new team (starting at 1500) to reach a rating of 1700?

Exercise 15. A pipeline retrains its model every 7 days. The current model was trained on games from dates 2024-01-01 to 2024-12-31 (1,200 games). A new batch of 50 games from 2025-01-01 to 2025-01-07 arrives. (a) If using an expanding window, what is the new training set size? (b) If using a sliding window of 365 days, how many old games are dropped? (c) Compute the percentage of the training set that is "new" data in each approach. (d) Discuss which approach is better when the sport's dynamics change slowly versus rapidly.

Part C: Programming (Exercises 16--20)

Exercise 16. Implement a Python class DataIngestionPipeline that: (a) accepts a list of data source URLs, (b) fetches data from each source with configurable retry logic and exponential backoff, (c) validates the response schema against an expected format, (d) logs all successes and failures with timestamps, and (e) stores raw responses in a SQLite database with deduplication based on a composite key. Include type hints, Google-style docstrings, and a main() function that demonstrates the pipeline on synthetic data.

Exercise 17. Write a Python class FeatureStoreWithValidation that extends the feature store concept from the chapter. The class should: (a) store features with version metadata, (b) validate that feature values fall within expected ranges (configurable per feature), (c) raise alerts when more than 10% of values in a batch are out of range, (d) support point-in-time retrieval that guarantees no future data leakage, and (e) provide a get_training_matrix method that returns a pandas DataFrame ready for model training. Include unit tests.

Exercise 18. Implement a ModelEvaluationReport class that takes a trained model and a test set and produces a comprehensive evaluation including: (a) Brier score, log-loss, AUC, and accuracy, (b) a calibration curve with 10 bins, (c) a profit simulation assuming flat betting at various confidence thresholds, (d) feature importance ranking using permutation importance, and (e) a comparison against a baseline model that always predicts the home-team base rate. Output results as a formatted string report.

Exercise 19. Build a BetExecutionEngine class that manages the full bet lifecycle. The engine should: (a) accept a list of recommended bets with edge estimates and confidence levels, (b) apply Kelly criterion position sizing with a configurable fractional Kelly parameter, (c) enforce risk limits (maximum bet size, daily loss limit, correlated exposure limits), (d) simulate execution with configurable slippage and rejection rates, and (e) maintain a complete audit trail of all decisions. Include a demonstration that processes 20 synthetic bet recommendations.

Exercise 20. Write an AlertingSystem class that monitors pipeline health. It should track: (a) data freshness (alert if no new data within N minutes), (b) model prediction latency (alert if P95 exceeds threshold), (c) prediction distribution shift (alert if the mean predicted probability shifts by more than 0.05 from the trailing 7-day average), (d) daily P&L (alert if cumulative loss exceeds a configurable threshold), and (e) feature store staleness (alert if features have not been updated within the expected window). The system should support multiple notification channels (console, email stub, webhook stub).

Part D: Analysis (Exercises 21--25)

Exercise 21. A production pipeline has been running for 6 months. The monthly ROI figures are: +4.2%, +1.8%, -0.5%, -2.1%, +3.4%, +0.9%. (a) Compute the cumulative ROI. (b) Compute the Sharpe ratio assuming a risk-free rate of 0 and monthly returns. (c) Compute the maximum drawdown. (d) Is the performance statistically distinguishable from random at the 95% confidence level? Describe the hypothesis test you would use and compute the test statistic.

Exercise 22. Analyze the following pipeline failure scenario: The odds data source experienced an outage from 2:00 PM to 4:00 PM on a Saturday during the NFL season. The pipeline continued to generate predictions using stale odds data. (a) What is the impact on bet sizing if the model uses market-implied probability as a feature? (b) How would a monitoring system detect this failure? (c) Design a circuit-breaker pattern that halts bet execution when data freshness falls below a threshold. (d) Propose a recovery procedure for bets placed during the outage.

Exercise 23. A modeler notices that the model's Brier score has degraded from 0.228 to 0.241 over the past month. Provide a systematic diagnostic framework: (a) list five possible causes ranked by likelihood, (b) for each cause, describe the specific test or visualization you would use to confirm or rule it out, (c) describe how you would distinguish between model drift (the model's relationship with outcomes has changed) and data drift (the input distribution has changed), and (d) recommend a remediation plan for the two most likely causes.

Exercise 24. Compare two model serving architectures for a live betting scenario: (a) a batch prediction system that pre-computes predictions for all games once per hour, and (b) a real-time API that computes predictions on demand with sub-second latency. For each architecture, analyze: computational cost, staleness of predictions, complexity of implementation, ability to incorporate late-breaking information (e.g., injury announcements 30 minutes before game time), and failure modes. Recommend which architecture to use and under what conditions.

Exercise 25. Design a complete integration test suite for a betting pipeline. The test should cover: (a) end-to-end data flow from ingestion through prediction, (b) feature store temporal correctness (no future data leakage), (c) model serving consistency (same inputs produce same outputs), (d) risk management constraint enforcement, and (e) monitoring alert triggers. For each test, write a brief description of the test case, the setup required, the assertion, and the expected behavior. Provide at least two test cases per component.

Part E: Research (Exercises 26--30)

Exercise 26. Research the concept of shadow deployment (also called shadow mode or dark launching) for ML models. Explain how it differs from A/B testing, describe a concrete implementation for a betting pipeline where a new model runs in shadow mode alongside the production model, and discuss how you would compare shadow-mode predictions against production predictions. Cite at least two industry sources that discuss shadow deployment practices.

Exercise 27. Investigate feature stores used in production ML systems (e.g., Feast, Tecton, Hopsworks). Compare at least three feature store solutions on the following dimensions: support for point-in-time correctness, online (low-latency) and offline (batch) serving, integration with common ML frameworks, and cost. Recommend which would be most appropriate for a solo bettor, a small team, and a large operation.

Exercise 28. Research MLOps practices for model monitoring and retraining triggers. Summarize the concept of "continuous training" as described by Google's MLOps maturity model. Describe three concrete triggers for automated model retraining in a betting context: (a) performance-based (Brier score degradation), (b) data-based (feature distribution shift), and (c) calendar-based (new season). For each trigger, discuss the tradeoffs between retraining too frequently and too infrequently.

Exercise 29. Examine the legal and regulatory landscape for automated sports betting systems in three jurisdictions: the United States (focusing on one regulated state), the United Kingdom, and Australia. For each jurisdiction, summarize: (a) whether automated betting is legal, (b) any licensing requirements, (c) restrictions on data usage (e.g., official league data mandates), and (d) responsible gambling requirements that would apply to an automated system. Cite primary regulatory sources.

Exercise 30. Research the concept of bet execution quality and market microstructure in sports betting. Compare the execution challenges in sports betting to those in financial markets (slippage, fill rates, market impact). Investigate whether there is evidence that sportsbooks adjust lines in response to sharp bettors' activity (a phenomenon analogous to adverse selection in financial markets). Summarize at least three academic or industry sources that address execution quality in sports betting contexts.