Chapter 28 Exercises: Feature Engineering for Sports Betting

Part A: Foundational Concepts (Exercises 1-6)

Exercise 1. Define feature engineering in the context of sports betting models. Explain why feature engineering is often more impactful than algorithm selection, and provide three concrete examples of raw sports data that require transformation before they can be useful as model inputs.

Exercise 2. Explain the difference between a lag feature and a rolling-window feature. For an NFL team's offensive EPA/play, construct both a 1-game lag feature and a 4-game rolling mean feature. Using hypothetical week-by-week values of [0.05, 0.12, -0.03, 0.08, 0.15, -0.01], compute each feature for Weeks 2 through 6. Discuss which captures momentum better and which is more stable.

Exercise 3. Describe the concept of temporal leakage in sports prediction. A modeler includes "season-average points scored" as a feature when predicting the outcome of Week 8 games. Explain precisely why this causes leakage if the average includes Weeks 8 through 17 data, and describe how to fix it. Provide a second, subtler example of temporal leakage that might escape casual inspection.

Exercise 4. Define the information gain ratio and explain how it can be used to rank candidate features for a sports betting model. A feature "home team's rushing yards per game" has an information gain of 0.023 bits with respect to a binary "covers the spread" target, while "home team's turnover differential" has a gain of 0.041 bits. Interpret these values and explain what additional considerations (beyond information gain) you should evaluate before selecting features.

Exercise 5. Explain the bias-variance tradeoff as it relates to feature set size in sports betting models. A model with 8 features achieves an in-sample Brier score of 0.230 and an out-of-sample Brier score of 0.248. A model with 40 features achieves an in-sample Brier score of 0.195 and an out-of-sample Brier score of 0.261. Diagnose the problem with the 40-feature model and recommend a course of action.

Exercise 6. List five categories of features commonly used in NFL game prediction models (e.g., efficiency metrics, situational factors). For each category, provide two specific feature examples, state whether the feature is continuous or categorical, and estimate how many games of data are needed before the feature stabilizes (provide a rough number and your reasoning).

Part B: Data Transformation and Encoding (Exercises 7-12)

Exercise 7. Write Python pseudocode to implement a function that creates exponentially weighted moving average (EWMA) features for any team-level metric. The function should accept a decay parameter (alpha), a minimum number of observations before producing output, and a column name. Apply it to hypothetical "points scored" data for a team over 10 games with alpha = 0.3 and compare the result to a simple rolling mean with window = 4.

Exercise 8. Describe three methods for encoding the categorical variable "stadium" (32 NFL stadiums) as model features. For each method (one-hot encoding, target encoding, and entity embedding), explain the mechanics, state the number of features produced, and discuss the advantages and risks (e.g., overfitting, high dimensionality, leakage). Which method would you recommend for a dataset containing five seasons of NFL games, and why?

Exercise 9. A dataset contains the feature "days since last game" for each team. Values range from 4 (Thursday-to-Monday) to 14 (bye week followed by Monday game). Describe two transformations that might make this feature more useful: one that captures a nonlinear rest advantage and one that captures a relative rest advantage (difference between the two teams). Implement both in pseudocode.

Exercise 10. Explain the purpose of standardization (z-score normalization) and min-max scaling for features in sports prediction models. A feature "total yards per game" has a mean of 345 and a standard deviation of 42 across the league. A team has a value of 410. Compute the z-score and the min-max scaled value (assuming a league range of 260 to 430). Discuss which scaling method is more appropriate for tree-based models versus linear models, and why.

Exercise 11. Design a feature that captures "schedule difficulty adjusted performance" for an NBA team. Specifically, create a feature that adjusts a team's offensive rating by the quality of defenses they have faced so far in the season. Write out the mathematical formula, explain how you would handle the circularity problem (team A's rating depends on team B's rating, which depends on team A's rating), and describe a practical iterative solution.

Exercise 12. A sports betting dataset has 15% missing values in the "starting quarterback" injury status column. Describe four approaches to handling this missingness: (a) deletion, (b) mode imputation, (c) creating a "missing" indicator feature, and (d) model-based imputation. For each, explain the implementation, the assumption about the missingness mechanism (MCAR, MAR, or MNAR), and the risk to model performance. Which approach do you recommend and why?

Part C: Feature Construction and Selection (Exercises 13-18)

Exercise 13. Write a Python function that takes a DataFrame of NFL play-by-play data (with columns game_id, posteam, epa, down, qtr, score_differential) and returns a DataFrame of game-level features including: (a) EPA/play on early downs (1st and 2nd), (b) EPA/play in the first half, (c) EPA/play when leading by 7 or fewer points, and (d) pass rate over expected. Include proper docstrings and type hints.

Exercise 14. Implement a recursive feature elimination (RFE) pipeline in scikit-learn that starts with 30 candidate features for an NFL spread prediction model, uses a Random Forest as the base estimator, and selects the optimal feature subset via cross-validation. Write complete Python code including the import statements, the RFE setup with 5-fold cross-validation, and a visualization of the cross-validation score versus the number of features retained.

Exercise 15. Construct five interaction features for an NBA totals (over/under) model. For each interaction, explain the basketball logic behind why the interaction matters (e.g., "pace x offensive efficiency captures expected possessions multiplied by points per possession"). Write Python code that creates these interactions from a DataFrame of team-game-level statistics.

Exercise 16. Explain the concept of target encoding and describe the "leave-one-out" variant that mitigates overfitting. Implement a target encoder in Python that encodes the categorical feature "referee" for an NFL spread-covering model. The encoder should compute the historical cover rate for each referee, apply additive smoothing with a global prior, and use leave-one-out within each cross-validation fold. Include test code that demonstrates the encoder on synthetic data.

Exercise 17. A modeler has created 120 candidate features for an NBA moneyline prediction model. Describe a three-stage feature selection pipeline: (a) Stage 1 uses variance thresholding to remove near-constant features, (b) Stage 2 uses mutual information to reduce to the top 50 features, and (c) Stage 3 uses L1-penalized logistic regression (Lasso) for final selection. Write the complete scikit-learn pipeline code, explain the hyperparameters you would tune, and describe how you would validate that the selected features generalize to unseen data.

Exercise 18. Design a "momentum" feature suite for MLB betting that captures a team's recent trajectory. Include: (a) a weighted win-loss record over the last 10 games (more recent games weighted higher), (b) a run differential trend (slope of run differential over the last 15 games), (c) a bullpen fatigue index (innings pitched by relievers in the last 3 days), and (d) a home/away split differential. Write Python code that computes all four features from a game log DataFrame.

Part D: Temporal Features and Leakage Prevention (Exercises 19-24)

Exercise 19. Build a "point-in-time" feature generation framework. Write a Python class called PointInTimeFeatureStore that, given a date, returns only the features that would have been available as of that date. The class should maintain a dictionary of feature DataFrames keyed by date and provide a method get_features(team, date) that filters appropriately. Demonstrate with a test case showing that features from future games are never included.

Exercise 20. Explain the concept of "look-ahead bias" in sports model backtests. A researcher reports a model that achieves 56% accuracy against NFL spreads. Describe five specific ways look-ahead bias could have inflated this result. For each, explain the mechanism and propose a concrete mitigation strategy.

Exercise 21. Construct a set of features that capture "regime changes" during an NFL season --- events like a starting quarterback injury, a coaching change, or a trade deadline acquisition. Describe the data sources you would need, the feature representation (binary indicators, time-since-event, or estimated point impact), and how you would handle the fact that regime changes are rare events. Write pseudocode for a function that detects quarterback changes from play-by-play data.

Exercise 22. Implement a feature that captures the "market signal" for an NFL game without introducing leakage. Specifically, create a feature based on the opening line (available Sunday evening for the following week's games) that captures the market's initial assessment of team strength. Explain why using the closing line as a feature would constitute leakage for a model whose predictions need to be available before the closing line is set. Write Python code that merges opening line data with your feature DataFrame using proper temporal joins.

Exercise 23. Design an experiment to measure the marginal predictive value of adding weather features to an NFL totals model. Describe the experimental setup (control model, treatment model, evaluation metric, sample size), the specific weather features you would include (temperature, wind speed, precipitation, humidity, indoor/outdoor indicator), and the statistical test you would use to determine if the improvement is significant. What minimum sample size do you need to detect a 0.5% improvement in Brier score with 80% power?

Exercise 24. A sports betting model uses "opponent-adjusted statistics" as features. Explain the bootstrapping problem: to compute opponent-adjusted stats for Week 5, you need opponent quality ratings, which are themselves based on games played through Week 4, which depend on the quality of those opponents' opponents, and so on. Implement a convergent iterative algorithm in Python that resolves this circularity for a simplified set of 6 teams over 5 weeks. Demonstrate convergence by plotting the rating change across iterations.

Part E: Advanced Applications (Exercises 25-30)

Exercise 25. Build a complete feature engineering pipeline for an NBA spread prediction model. The pipeline should: (a) ingest raw game-level data, (b) compute rolling team-level efficiency metrics (offensive rating, defensive rating, pace, effective FG%), (c) create matchup-specific features (e.g., difference in pace, ratio of three-point attempt rates), (d) add situational features (back-to-back, travel distance, rest days), and (e) output a single modeling-ready DataFrame. Write the full Python code using pandas, include type hints and docstrings, and ensure no temporal leakage.

Exercise 26. Implement a feature importance analysis for an NFL betting model using three methods: (a) permutation importance, (b) SHAP values, and (c) drop-column importance. Write the code for all three methods, run them on a trained XGBoost model with at least 15 features, and compare the rankings. Discuss why the three methods might disagree and which you would trust most for feature selection decisions.

Exercise 27. Design a feature that captures "public perception bias" using line movement data. The hypothesis is that when the line moves toward the popular side (e.g., the public favorite), the unpopular side gains value. Create a feature that quantifies this effect using opening line, closing line, and public betting percentage data. Write Python code that computes the feature, and outline an experiment to test whether this feature has predictive power for ATS outcomes.

Exercise 28. Construct a "team identity" feature set that captures playing style rather than quality. For NBA teams, create features that measure (a) pace preference (possessions per 48 minutes relative to league average), (b) three-point orientation (share of field goal attempts from three), (c) paint reliance (points in the paint as a fraction of total points), and (d) defensive aggressiveness (steal rate + block rate). Explain how playing-style features can improve predictions even after accounting for team quality metrics. Write the code and demonstrate on synthetic data.

Exercise 29. Implement a complete feature store for a multi-sport betting operation that handles NFL, NBA, and MLB. The feature store should support: (a) sport-specific feature definitions, (b) automatic computation on a schedule, (c) point-in-time queries with no leakage, (d) feature versioning, and (e) metadata tracking (feature name, description, author, creation date). Write the Python class definitions and demonstrate with at least two features per sport.

Exercise 30. Design and implement an automated feature discovery system that generates candidate features from a base set of team statistics. The system should: (a) generate all pairwise ratios and differences from a list of base metrics, (b) create rolling windows at 3, 5, and 10 game horizons for each, (c) apply mutual information scoring to rank the generated features against a binary target, (d) select the top K features, and (e) test for multicollinearity using the variance inflation factor (VIF) and remove redundant features. Write the complete Python code and run it on a synthetic dataset of 500 games with 10 base metrics.