Chapter 20 Exercises: Modeling College Sports
Part A: Foundational Concepts (Exercises 1-6)
Exercise 1. Explain why the sparse connectivity of college football's schedule creates a fundamentally different modeling challenge compared to the NFL. If there are 133 FBS teams and each plays 12 games, calculate the fraction of all possible pairings that are actually observed in a single season. Compare this to the NFL.
Exercise 2. Define the concept of a margin-based power rating. Write out the mathematical formulation for predicting the expected margin of a game between team i (home) and team j (away), including the home-field advantage parameter. Explain why the average power rating is typically normalized to zero.
Exercise 3. A team has a power rating of +8.5 and plays an opponent with a rating of -3.2. The home-field advantage is 3.0 points. Compute the predicted margin and convert it to a win probability using a logistic function with sigma = 14.0. What would the win probability be at a neutral site?
Exercise 4. Explain the concept of margin capping in college power ratings. Why is it necessary to cap margins at 24-28 points? Calculate the distortion that would occur if an uncapped 63-0 blowout against an FCS team were included at full value for a team with a true rating of +15.
Exercise 5. Describe the cold-start problem in college football ratings. In Week 1, a model has zero data points for any team. By Week 4, it has only 3-4 games per team. Explain how regression to conference mean addresses this problem and describe the Bayesian interpretation of this approach.
Exercise 6. The blue-chip ratio is defined as the fraction of a team's roster composed of 4-star and 5-star recruits. If a team has 85 scholarship players, of whom 12 are 5-star and 28 are 4-star, calculate their blue-chip ratio. Based on the historical threshold of approximately 50% for championship contention, is this team a plausible championship contender?
Part B: Data and Feature Engineering (Exercises 7-12)
Exercise 7. Design a feature set for a college football power rating model that combines on-field performance with preseason information. Include at least six features and for each, specify whether it is available preseason, whether it becomes more reliable as the season progresses, and approximately how many games are needed before the feature stabilizes.
Exercise 8. Write pseudocode for a rolling power rating update that incorporates both prior information and new game results. Start with preseason priors based on returning production and recruiting talent. After each week, update the ratings using the new game margin, weighted by the current blend of prior and data. Specify the blending schedule across a 12-week season.
Exercise 9. The 247Sports Composite assigns each recruit a score on a 0-1 scale. Explain how you would compute a lag-weighted recruiting composite for a team entering the 2024 season, using recruiting classes from 2020-2024. Apply the standard lag weights (0.10, 0.25, 0.30, 0.25, 0.10) to the following hypothetical composite scores: 2020=82, 2021=87, 2022=91, 2023=85, 2024=89.
Exercise 10. Coaching changes trigger transfer portal activity. Design a metric that captures the net talent change during a coaching transition. Specify what data you would need, how you would weight incoming vs. outgoing transfers, and how you would account for the quality uncertainty of portal transfers (who have a higher bust rate than high school recruits).
Exercise 11. Non-conference games are the Rosetta Stone of college sports modeling because they connect otherwise isolated conference clusters. However, they are also the noisiest data. List five reasons why non-conference games are noisy and for each, propose a method to mitigate the noise in your model.
Exercise 12. Construct a strength-of-schedule metric for college football that accounts for the conference insularity problem. Your metric should: (a) use opponent power ratings rather than opponent win-loss records, (b) adjust for home/away/neutral venue, (c) down-weight FCS games, and (d) provide a schedule-adjusted performance rating. Write the formula explicitly.
Part C: Model Building (Exercises 13-18)
Exercise 13. Implement a least-squares power rating system for a 50-team league using ridge regression. The ridge penalty serves as the conference regression prior. Write the objective function and explain how the regularization parameter controls the regression-to-mean strength. What happens when lambda is very large versus very small?
Exercise 14. Build a preseason prediction model for college football that uses only recruiting data and returning production. Specify the features, describe how you would train the model (what is the target variable?), and estimate its predictive accuracy in terms of RMSE against the closing line for Week 1 games.
Exercise 15. A new coach arrives at a program with a power rating of +5.0. The coach implemented a scheme change, was hired from outside (not a promotion), and has an estimated coaching ability of 1.0 (above average but not elite). Using the coaching change model from the chapter, compute the Year-1, Year-2, and Year-3 adjusted ratings.
Exercise 16. Design a week-by-week model updating procedure for college football that blends four information sources: (a) preseason recruiting-based priors, (b) previous season's performance (regressed), (c) current-season game results, and (d) market-derived information (lines and totals). Specify the relative weights of each source in Weeks 1, 4, 8, and 12.
Exercise 17. Compare the effectiveness of three different prior specifications for college football ratings at the start of a season: (a) uniform prior (all teams start at zero), (b) conference-mean prior (all teams start at their conference's average), and (c) recruiting-weighted prior (teams start at a function of their talent composite). Describe how you would evaluate which prior produces the best early-season predictions.
Exercise 18. The transfer portal has fundamentally changed college football. Build a model component that estimates the net talent impact of a team's portal activity in the offseason. Use the following data for Team X: 3 outgoing transfers (avg 0.87 composite), 5 incoming transfers (avg 0.91 composite), where the team's average roster composite is 0.88. Estimate the net rating impact.
Part D: Market Analysis (Exercises 19-24)
Exercise 19. College football betting markets are less efficient than NFL markets. List five specific sources of inefficiency in college football and for each, describe the mechanism that creates the inefficiency and the typical magnitude of the edge.
Exercise 20. Analyze the concept of "public bias" in college football betting. Specifically, test the hypothesis that nationally televised teams (top-25 ranked, playing on ESPN/ABC) are overbet by the public, causing the line to move against them. Design a backtest that would distinguish this effect from team quality.
Exercise 21. Early-season lines in college football are set with less information than late-season lines, creating both opportunity and risk. A team opens at -14 in Week 1 based entirely on preseason expectations. After a shaky 24-17 win in Week 1, the line for Week 2 against a similar opponent opens at -10.5. Is the 3.5-point adjustment justified? What factors should you consider?
Exercise 22. Bowl games present unique betting angles. Describe three specific factors that affect bowl game outcomes differently from regular-season games. For each factor, explain the direction of the bias and estimate its point impact. Then design a bowl-game-specific adjustment to your regular-season model.
Exercise 23. Overnight lines in college football (released the day after the previous week's games) sometimes differ significantly from the closing line a week later. Track a hypothetical overnight line of Team A -7 that moves to -4.5 by game time. What explanations could account for this movement? Which explanations represent genuine information versus noise?
Exercise 24. Compare the efficiency of the college football point-spread market to the college football totals market. Using the arguments from Chapter 16 about information aggregation, explain which market you would expect to be less efficient and why. Design a simple model-based approach to exploit the less efficient market.
Part E: Advanced Applications (Exercises 25-30)
Exercise 25. Build a Bayesian hierarchical model for college football where team ratings are nested within conferences. The conference-level prior determines the mean and variance of team ratings within each conference. Write out the hierarchical structure, specify appropriate prior distributions, and explain how this naturally handles the cross-conference comparison problem.
Exercise 26. The twelve-team College Football Playoff introduces new modeling considerations. Design a simulation framework that takes regular-season power ratings and simulates the entire playoff bracket, including home-field advantage for higher seeds in first-round games, neutral-site semifinals, and the championship game. Run 10,000 simulations to estimate each team's championship probability.
Exercise 27. Conference realignment (e.g., Texas and Oklahoma joining the SEC) creates a structural break in conference strength data. Design a method for adjusting your conference priors when teams change conferences. How would you initialize the rating for a team that just moved from the Big 12 to the SEC? What happens to the Big 12's conference prior when its two strongest teams leave?
Exercise 28. Build a regression model that predicts the impact of a coaching change in college football using the following features: (a) outgoing coach's tenure length, (b) incoming coach's previous winning percentage, (c) whether a scheme change occurs, (d) net transfer portal talent change, and (e) the team's recruiting rank in the year of the change. Use synthetic data that mirrors historical patterns and report the model's R-squared and the most important feature.
Exercise 29. Design and backtest a college football betting system that operates from Week 1 through the bowl season. The system should: (a) start with recruiting-based preseason priors, (b) update power ratings weekly, (c) incorporate coaching change adjustments, (d) identify games where the model-market discrepancy exceeds a threshold, (e) apply Kelly-criterion bet sizing, and (f) report season-long ROI with confidence intervals.
Exercise 30. The college basketball market differs from college football in several important ways: more games per season (30+ versus 12), more teams (363 D-I versus 133 FBS), and more random outcomes (lower scoring, less separation between teams). Adapt the college football power rating framework from this chapter to college basketball. Identify three specific parameters or assumptions that must change and explain why.