College Football Playoff Projections

Beginner 10 min read 0 views Nov 27, 2025
# College Football Playoff Projections ## Overview Predicting College Football Playoff (CFP) selection involves analyzing team performance, strength of schedule, conference championships, and head-to-head results. With the expanded 12-team format (starting 2024), modeling playoff probabilities has become more complex and critical. ## CFP Selection Criteria ### Primary Factors 1. **Win-Loss Record**: Overall and conference record 2. **Strength of Schedule**: Quality of opponents faced 3. **Conference Championships**: Auto-bids for top 5 conference champs 4. **Head-to-Head Results**: Direct matchup outcomes 5. **Common Opponents**: Performance against shared foes 6. **Computer Rankings**: SP+, FPI, and other metrics ### 12-Team Playoff Format (2024+) - **5 Conference Champions**: Automatic bids (ranked) - **7 At-Large Bids**: Best remaining teams - **Top 4 Seeds**: Receive first-round byes - **Seeds 5-12**: Play first-round games ## R Implementation with cfbfastR ```r library(cfbfastR) library(dplyr) library(ggplot2) library(glmnet) library(randomForest) # Load team records and advanced stats team_records <- cfbd_game_info(year = 2023) %>% mutate( winner = if_else(home_points > away_points, home_team, away_team), loser = if_else(home_points < away_points, home_team, away_team) ) %>% pivot_longer(cols = c(winner, loser), names_to = "result", values_to = "team") %>% group_by(team) %>% summarise( games = n(), wins = sum(result == "winner"), losses = sum(result == "loser"), win_pct = wins / games ) # Get SP+ ratings (predictive power rating) sp_ratings <- cfbd_ratings_sp(year = 2023) # Get FPI ratings fpi_ratings <- cfbd_ratings_fpi(year = 2023) # Merge all data playoff_data <- team_records %>% left_join(sp_ratings %>% select(team, rating), by = "team") %>% rename(sp_rating = rating) %>% left_join(fpi_ratings %>% select(team, fpi, sos, resumeRanks.strengthOfRecord), by = "team") # Calculate playoff probability features playoff_features <- playoff_data %>% mutate( # Quality wins (top 25 opponents) quality_wins = wins * (sp_rating / 100), # Adjusted win percentage by SOS adj_win_pct = win_pct * (1 + sos / 100), # Combined ranking score composite_score = (sp_rating + fpi) / 2, # Conference championship bonus (simplified) conf_champ_bonus = if_else(wins >= 12, 5, 0) ) # Simple playoff probability model using logistic regression # Historical playoff teams (2014-2023, 4-team format) # For demonstration: teams with 11+ wins and top 15 SP+ made playoffs playoff_features <- playoff_features %>% mutate( playoff_threshold = (wins >= 11 & sp_rating >= 15), playoff_prob = plogis((sp_rating + fpi * 0.5 + wins * 2 - 30) / 10) ) # Top playoff contenders playoff_contenders <- playoff_features %>% arrange(desc(playoff_prob)) %>% select(team, wins, losses, sp_rating, fpi, sos, playoff_prob) %>% head(20) print("Top 20 Playoff Probability Rankings:") print(playoff_contenders) # Visualize playoff probabilities top_25 <- playoff_features %>% arrange(desc(playoff_prob)) %>% head(25) ggplot(top_25, aes(x = reorder(team, playoff_prob), y = playoff_prob * 100)) + geom_col(aes(fill = wins >= 12), show.legend = TRUE) + scale_fill_manual(values = c("steelblue", "darkgreen"), labels = c("< 12 wins", "12+ wins")) + coord_flip() + labs( title = "CFB Playoff Probabilities - Top 25 Teams", x = "Team", y = "Playoff Probability (%)", fill = "Win Total" ) + theme_minimal() # Simulate playoff scenarios using Monte Carlo simulate_playoff_scenarios <- function(teams_df, n_sims = 10000) { playoff_selections <- matrix(0, nrow = nrow(teams_df), ncol = 1) rownames(playoff_selections) <- teams_df$team for (sim in 1:n_sims) { # Simulate based on probabilities selected <- teams_df %>% mutate( random_factor = rnorm(n(), mean = composite_score, sd = 5), adjusted_score = random_factor + conf_champ_bonus ) %>% arrange(desc(adjusted_score)) %>% head(12) # 12-team playoff # Count selections for (team in selected$team) { playoff_selections[team, 1] <- playoff_selections[team, 1] + 1 } } playoff_pct <- (playoff_selections / n_sims) * 100 return(playoff_pct) } # Run simulation simulation_results <- simulate_playoff_scenarios(playoff_features) simulation_df <- data.frame( team = rownames(simulation_results), playoff_pct = simulation_results[, 1] ) %>% arrange(desc(playoff_pct)) %>% head(20) print("\nPlayoff Probability (10,000 Simulations):") print(simulation_df) # Plot simulation results ggplot(simulation_df, aes(x = reorder(team, playoff_pct), y = playoff_pct)) + geom_col(fill = "darkblue", alpha = 0.7) + geom_hline(yintercept = 50, linetype = "dashed", color = "red") + coord_flip() + labs( title = "CFB Playoff Selection Probability (Monte Carlo Simulation)", x = "Team", y = "Playoff Selection Probability (%)", subtitle = "Based on 10,000 simulations" ) + theme_minimal() ``` ## Python Implementation ```python import pandas as pd import numpy as np import requests import matplotlib.pyplot as plt import seaborn as sns from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler def get_team_records(year): """Fetch team records from CFB Data API""" url = "https://api.collegefootballdata.com/records" params = {'year': year} response = requests.get(url, params=params) return pd.DataFrame(response.json()) def get_sp_ratings(year): """Fetch SP+ ratings""" url = "https://api.collegefootballdata.com/ratings/sp" params = {'year': year} response = requests.get(url, params=params) return pd.DataFrame(response.json()) def get_fpi_ratings(year): """Fetch FPI ratings""" url = "https://api.collegefootballdata.com/ratings/fpi" params = {'year': year} response = requests.get(url, params=params) return pd.DataFrame(response.json()) # Load data for 2023 season records = get_team_records(2023) sp_ratings = get_sp_ratings(2023) fpi_ratings = get_fpi_ratings(2023) # Prepare team records team_records = records[['team', 'total']].copy() team_records['wins'] = team_records['total'].apply(lambda x: x['wins']) team_records['losses'] = team_records['total'].apply(lambda x: x['losses']) team_records['win_pct'] = team_records['wins'] / (team_records['wins'] + team_records['losses']) team_records = team_records[['team', 'wins', 'losses', 'win_pct']] # Merge datasets playoff_df = team_records.merge( sp_ratings[['team', 'rating']], on='team', how='left' ).rename(columns={'rating': 'sp_rating'}) playoff_df = playoff_df.merge( fpi_ratings[['team', 'fpi', 'strengthOfSchedule']], on='team', how='left' ).rename(columns={'strengthOfSchedule': 'sos'}) # Drop teams with missing data playoff_df = playoff_df.dropna() # Calculate composite features playoff_df['composite_rating'] = ( playoff_df['sp_rating'] + playoff_df['fpi'] ) / 2 playoff_df['quality_metric'] = ( playoff_df['win_pct'] * playoff_df['composite_rating'] ) # Simple playoff probability model # Logistic function based on wins and ratings def calculate_playoff_prob(row): """Calculate playoff probability""" # Weighted combination of factors score = ( row['wins'] * 3 + row['sp_rating'] * 0.5 + row['fpi'] * 0.3 + row['sos'] * 0.2 ) # Logistic transformation prob = 1 / (1 + np.exp(-(score - 40) / 5)) return prob * 100 playoff_df['playoff_prob'] = playoff_df.apply(calculate_playoff_prob, axis=1) # Sort by playoff probability playoff_df = playoff_df.sort_values('playoff_prob', ascending=False) print("Top 20 CFB Playoff Contenders - 2023:") print(playoff_df[['team', 'wins', 'losses', 'sp_rating', 'fpi', 'playoff_prob']].head(20)) # Monte Carlo simulation for playoff selection def simulate_playoffs(df, n_simulations=10000, n_teams=12): """ Simulate playoff selection using Monte Carlo method """ results = {team: 0 for team in df['team']} for _ in range(n_simulations): # Add random noise to composite rating df_sim = df.copy() df_sim['sim_score'] = ( df_sim['composite_rating'] + np.random.normal(0, 3, size=len(df)) ) # Select top 12 teams selected = df_sim.nlargest(n_teams, 'sim_score')['team'].tolist() # Count selections for team in selected: results[team] += 1 # Convert to percentages results_df = pd.DataFrame({ 'team': list(results.keys()), 'selection_pct': [v / n_simulations * 100 for v in results.values()] }).sort_values('selection_pct', ascending=False) return results_df # Run simulation simulation = simulate_playoffs(playoff_df) print("\nPlayoff Selection Probability (10,000 Simulations):") print(simulation.head(20)) # Visualizations fig, axes = plt.subplots(2, 2, figsize=(16, 12)) # 1. Top 25 playoff probabilities top_25 = playoff_df.head(25) axes[0, 0].barh(range(len(top_25)), top_25['playoff_prob'], color='darkgreen', alpha=0.7) axes[0, 0].set_yticks(range(len(top_25))) axes[0, 0].set_yticklabels(top_25['team'], fontsize=8) axes[0, 0].set_xlabel('Playoff Probability (%)') axes[0, 0].set_title('Top 25 Playoff Probabilities (Model-Based)') axes[0, 0].invert_yaxis() # 2. Monte Carlo simulation results top_20_sim = simulation.head(20) axes[0, 1].barh(range(len(top_20_sim)), top_20_sim['selection_pct'], color='steelblue', alpha=0.7) axes[0, 1].set_yticks(range(len(top_20_sim))) axes[0, 1].set_yticklabels(top_20_sim['team'], fontsize=8) axes[0, 1].set_xlabel('Selection Probability (%)') axes[0, 1].set_title('Playoff Selection Probability (10K Simulations)') axes[0, 1].axvline(50, color='red', linestyle='--', alpha=0.5) axes[0, 1].invert_yaxis() # 3. Wins vs SP+ rating scatter axes[1, 0].scatter(playoff_df['wins'], playoff_df['sp_rating'], s=playoff_df['playoff_prob']*3, alpha=0.6, c=playoff_df['playoff_prob'], cmap='RdYlGn') axes[1, 0].set_xlabel('Wins') axes[1, 0].set_ylabel('SP+ Rating') axes[1, 0].set_title('Wins vs SP+ Rating (size = playoff probability)') axes[1, 0].grid(alpha=0.3) # 4. Resume strength comparison top_contenders = playoff_df.head(15) x = np.arange(len(top_contenders)) width = 0.35 axes[1, 1].barh(x - width/2, top_contenders['sp_rating'], width, label='SP+ Rating', color='blue', alpha=0.7) axes[1, 1].barh(x + width/2, top_contenders['fpi'], width, label='FPI Rating', color='orange', alpha=0.7) axes[1, 1].set_yticks(x) axes[1, 1].set_yticklabels(top_contenders['team'], fontsize=8) axes[1, 1].set_xlabel('Rating') axes[1, 1].set_title('Top 15 Teams - SP+ vs FPI Comparison') axes[1, 1].legend() axes[1, 1].invert_yaxis() plt.tight_layout() plt.show() ``` ## Key Modeling Considerations ### Feature Engineering 1. **Win Quality**: Weight wins by opponent strength 2. **Loss Penalty**: Timing and margin of losses matter 3. **Conference Strength**: Adjust for schedule difficulty 4. **Momentum**: Recent performance trends 5. **Head-to-Head**: Direct comparison tiebreaker ### Model Validation - Historical accuracy: Test on past playoff selections - Cross-validation: Train on multiple seasons - Calibration: Ensure probabilities match selection rates - Committee bias: Account for subjective factors ### Limitations - Committee subjectivity (eye test, narratives) - Conference championship game outcomes - Injury impacts not captured in season stats - Strength of schedule calculations vary ## Practical Applications 1. **Weekly Rankings**: Update probabilities after each game 2. **Scenario Analysis**: "What if" game outcome modeling 3. **Betting Markets**: Compare model to Vegas odds 4. **Resume Building**: Identify must-win games for teams ## Resources - [ESPN FPI](https://www.espn.com/college-football/fpi) - [SP+ Ratings](https://www.espn.com/college-football/story/_/id/38819392/college-football-sp+-rankings-week-9) - [CFP Selection Committee Protocol](https://collegefootballplayoff.com/sports/2016/10/20/selection-committee-protocol.aspx)

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.