Chapter 20: Key Takeaways - Modeling College Sports

  1. The sparse connectivity of college football is the defining modeling challenge. With 133 FBS teams playing 12-13 games each (mostly within their conference), most team pairs never meet directly. Cross-conference comparisons must be inferred through a handful of non-conference results, making conference strength estimation both critical and inherently uncertain. This is fundamentally different from the NFL, where dense connectivity allows direct comparison of nearly any two teams.

  2. Margin-based power ratings with conference regression form the foundation. Least-squares power ratings that predict the margin of victory provide a natural framework for college sports. The addition of Bayesian regression toward conference means addresses the cold-start problem at the beginning of each season, ensuring that early-season ratings are sensible even with minimal data. The regression strength should decrease as the season progresses and data accumulates.

  3. Margin capping is essential to prevent blowout distortion. College football produces far more lopsided results than professional sports, and uncapped margins allow a single 56-0 win to distort a team's rating. Capping margins at 24-28 points retains the information value of dominant victories while preventing garbage-time scoring and FCS blowouts from contaminating the ratings.

  4. Recruiting data is the most powerful preseason predictor. The 247Sports Composite and the blue-chip ratio are among the strongest predictors of college football success. The lag-weighted recruiting composite, emphasizing classes from 2-3 years ago (when those recruits are upperclassmen), provides a talent signal that has no parallel in professional sports. Teams with blue-chip ratios above 50% are championship contenders; those below this threshold face a hard ceiling regardless of coaching quality.

  5. Coaching changes produce large, predictable effects with high variance. The average Year-1 decline of 1.5-2.5 power rating points is substantial, but the variance around this average is enormous. Scheme changes, source of hire (internal promotion vs. external), and the outgoing coach's tenure all moderate the effect. In the transfer portal era, the coaching change model must also account for the net talent flow during the coaching transition.

  6. The transfer portal has fundamentally altered roster dynamics. Pre-2018 college football models assumed gradual roster evolution through recruiting and development. The transfer portal enables immediate, significant roster changes, making offseason talent tracking essential. Net portal talent flow -- the quality of incoming transfers minus outgoing transfers -- is now a critical predictive feature, particularly for teams undergoing coaching changes.

  7. Early-season markets offer the largest edges but also the highest variance. The combination of information scarcity, stale preseason narratives, and non-conference scheduling creates wider model-market discrepancies in Weeks 1-4 than at any other point in the season. A modeler with superior preseason priors (from recruiting data, portal tracking, and coaching analysis) can exploit these discrepancies, but the small sample means individual bet outcomes are highly variable.

  8. Public bias creates systematic inefficiency in college football markets. Nationally televised, historically prominent programs (Alabama, Ohio State, USC) attract disproportionate public betting action, inflating their lines by 0.5-1.5 points. This creates persistent, small value on their opponents, particularly for lesser-known teams in mid-major conferences that the public underestimates.

  9. Conference strength estimation is circular and contentious. Measuring conference strength requires knowing team strength, which requires knowing conference strength. The model resolves this circularity through simultaneous estimation, but the result is only as reliable as the cross-conference data that connects the conference clusters. A single upset in a prominent non-conference game can ripple through an entire conference's ratings, creating both risk and opportunity.

  10. Bowl games require model adjustments for motivation, preparation time, and roster attrition. The standard regular-season model overweights some factors (recent form, home-field) and underweights others (preparation time favoring underdogs, opt-outs by NFL-bound players) when applied to bowl games. A bowl-specific adjustment layer improves prediction accuracy and can identify systematic biases in the market.

  11. Week-by-week model updating with evolving prior weights is essential. A static model that uses full-season data equally cannot match a dynamic model that starts with strong priors and gradually shifts weight to current-season results. The optimal blend moves from roughly 70/30 prior/data in Week 1 to 20/80 by Week 10. This Bayesian updating framework ensures the model is never worse than the prior (early season) or the data (late season).

  12. The college football market is significantly less efficient than the NFL market, creating genuine opportunities for quantitative bettors. The combination of a large team pool, sparse data, annual roster turnover, passionate but irrational public betting, and coaching change uncertainty means that systematic model-based approaches can achieve edges of 2-5% against the closing line -- an advantage that is largely unavailable in the more efficient professional markets.