College Football Playoff Projections
Beginner
10 min read
0 views
Nov 27, 2025
# College Football Playoff Projections
## Overview
Predicting College Football Playoff (CFP) selection involves analyzing team performance, strength of schedule, conference championships, and head-to-head results. With the expanded 12-team format (starting 2024), modeling playoff probabilities has become more complex and critical.
## CFP Selection Criteria
### Primary Factors
1. **Win-Loss Record**: Overall and conference record
2. **Strength of Schedule**: Quality of opponents faced
3. **Conference Championships**: Auto-bids for top 5 conference champs
4. **Head-to-Head Results**: Direct matchup outcomes
5. **Common Opponents**: Performance against shared foes
6. **Computer Rankings**: SP+, FPI, and other metrics
### 12-Team Playoff Format (2024+)
- **5 Conference Champions**: Automatic bids (ranked)
- **7 At-Large Bids**: Best remaining teams
- **Top 4 Seeds**: Receive first-round byes
- **Seeds 5-12**: Play first-round games
## R Implementation with cfbfastR
```r
library(cfbfastR)
library(dplyr)
library(ggplot2)
library(glmnet)
library(randomForest)
# Load team records and advanced stats
team_records <- cfbd_game_info(year = 2023) %>%
mutate(
winner = if_else(home_points > away_points, home_team, away_team),
loser = if_else(home_points < away_points, home_team, away_team)
) %>%
pivot_longer(cols = c(winner, loser),
names_to = "result", values_to = "team") %>%
group_by(team) %>%
summarise(
games = n(),
wins = sum(result == "winner"),
losses = sum(result == "loser"),
win_pct = wins / games
)
# Get SP+ ratings (predictive power rating)
sp_ratings <- cfbd_ratings_sp(year = 2023)
# Get FPI ratings
fpi_ratings <- cfbd_ratings_fpi(year = 2023)
# Merge all data
playoff_data <- team_records %>%
left_join(sp_ratings %>% select(team, rating), by = "team") %>%
rename(sp_rating = rating) %>%
left_join(fpi_ratings %>% select(team, fpi, sos, resumeRanks.strengthOfRecord),
by = "team")
# Calculate playoff probability features
playoff_features <- playoff_data %>%
mutate(
# Quality wins (top 25 opponents)
quality_wins = wins * (sp_rating / 100),
# Adjusted win percentage by SOS
adj_win_pct = win_pct * (1 + sos / 100),
# Combined ranking score
composite_score = (sp_rating + fpi) / 2,
# Conference championship bonus (simplified)
conf_champ_bonus = if_else(wins >= 12, 5, 0)
)
# Simple playoff probability model using logistic regression
# Historical playoff teams (2014-2023, 4-team format)
# For demonstration: teams with 11+ wins and top 15 SP+ made playoffs
playoff_features <- playoff_features %>%
mutate(
playoff_threshold = (wins >= 11 & sp_rating >= 15),
playoff_prob = plogis((sp_rating + fpi * 0.5 + wins * 2 - 30) / 10)
)
# Top playoff contenders
playoff_contenders <- playoff_features %>%
arrange(desc(playoff_prob)) %>%
select(team, wins, losses, sp_rating, fpi, sos, playoff_prob) %>%
head(20)
print("Top 20 Playoff Probability Rankings:")
print(playoff_contenders)
# Visualize playoff probabilities
top_25 <- playoff_features %>%
arrange(desc(playoff_prob)) %>%
head(25)
ggplot(top_25, aes(x = reorder(team, playoff_prob), y = playoff_prob * 100)) +
geom_col(aes(fill = wins >= 12), show.legend = TRUE) +
scale_fill_manual(values = c("steelblue", "darkgreen"),
labels = c("< 12 wins", "12+ wins")) +
coord_flip() +
labs(
title = "CFB Playoff Probabilities - Top 25 Teams",
x = "Team",
y = "Playoff Probability (%)",
fill = "Win Total"
) +
theme_minimal()
# Simulate playoff scenarios using Monte Carlo
simulate_playoff_scenarios <- function(teams_df, n_sims = 10000) {
playoff_selections <- matrix(0, nrow = nrow(teams_df), ncol = 1)
rownames(playoff_selections) <- teams_df$team
for (sim in 1:n_sims) {
# Simulate based on probabilities
selected <- teams_df %>%
mutate(
random_factor = rnorm(n(), mean = composite_score, sd = 5),
adjusted_score = random_factor + conf_champ_bonus
) %>%
arrange(desc(adjusted_score)) %>%
head(12) # 12-team playoff
# Count selections
for (team in selected$team) {
playoff_selections[team, 1] <- playoff_selections[team, 1] + 1
}
}
playoff_pct <- (playoff_selections / n_sims) * 100
return(playoff_pct)
}
# Run simulation
simulation_results <- simulate_playoff_scenarios(playoff_features)
simulation_df <- data.frame(
team = rownames(simulation_results),
playoff_pct = simulation_results[, 1]
) %>%
arrange(desc(playoff_pct)) %>%
head(20)
print("\nPlayoff Probability (10,000 Simulations):")
print(simulation_df)
# Plot simulation results
ggplot(simulation_df, aes(x = reorder(team, playoff_pct), y = playoff_pct)) +
geom_col(fill = "darkblue", alpha = 0.7) +
geom_hline(yintercept = 50, linetype = "dashed", color = "red") +
coord_flip() +
labs(
title = "CFB Playoff Selection Probability (Monte Carlo Simulation)",
x = "Team",
y = "Playoff Selection Probability (%)",
subtitle = "Based on 10,000 simulations"
) +
theme_minimal()
```
## Python Implementation
```python
import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
def get_team_records(year):
"""Fetch team records from CFB Data API"""
url = "https://api.collegefootballdata.com/records"
params = {'year': year}
response = requests.get(url, params=params)
return pd.DataFrame(response.json())
def get_sp_ratings(year):
"""Fetch SP+ ratings"""
url = "https://api.collegefootballdata.com/ratings/sp"
params = {'year': year}
response = requests.get(url, params=params)
return pd.DataFrame(response.json())
def get_fpi_ratings(year):
"""Fetch FPI ratings"""
url = "https://api.collegefootballdata.com/ratings/fpi"
params = {'year': year}
response = requests.get(url, params=params)
return pd.DataFrame(response.json())
# Load data for 2023 season
records = get_team_records(2023)
sp_ratings = get_sp_ratings(2023)
fpi_ratings = get_fpi_ratings(2023)
# Prepare team records
team_records = records[['team', 'total']].copy()
team_records['wins'] = team_records['total'].apply(lambda x: x['wins'])
team_records['losses'] = team_records['total'].apply(lambda x: x['losses'])
team_records['win_pct'] = team_records['wins'] / (team_records['wins'] + team_records['losses'])
team_records = team_records[['team', 'wins', 'losses', 'win_pct']]
# Merge datasets
playoff_df = team_records.merge(
sp_ratings[['team', 'rating']], on='team', how='left'
).rename(columns={'rating': 'sp_rating'})
playoff_df = playoff_df.merge(
fpi_ratings[['team', 'fpi', 'strengthOfSchedule']],
on='team', how='left'
).rename(columns={'strengthOfSchedule': 'sos'})
# Drop teams with missing data
playoff_df = playoff_df.dropna()
# Calculate composite features
playoff_df['composite_rating'] = (
playoff_df['sp_rating'] + playoff_df['fpi']
) / 2
playoff_df['quality_metric'] = (
playoff_df['win_pct'] * playoff_df['composite_rating']
)
# Simple playoff probability model
# Logistic function based on wins and ratings
def calculate_playoff_prob(row):
"""Calculate playoff probability"""
# Weighted combination of factors
score = (
row['wins'] * 3 +
row['sp_rating'] * 0.5 +
row['fpi'] * 0.3 +
row['sos'] * 0.2
)
# Logistic transformation
prob = 1 / (1 + np.exp(-(score - 40) / 5))
return prob * 100
playoff_df['playoff_prob'] = playoff_df.apply(calculate_playoff_prob, axis=1)
# Sort by playoff probability
playoff_df = playoff_df.sort_values('playoff_prob', ascending=False)
print("Top 20 CFB Playoff Contenders - 2023:")
print(playoff_df[['team', 'wins', 'losses', 'sp_rating', 'fpi',
'playoff_prob']].head(20))
# Monte Carlo simulation for playoff selection
def simulate_playoffs(df, n_simulations=10000, n_teams=12):
"""
Simulate playoff selection using Monte Carlo method
"""
results = {team: 0 for team in df['team']}
for _ in range(n_simulations):
# Add random noise to composite rating
df_sim = df.copy()
df_sim['sim_score'] = (
df_sim['composite_rating'] +
np.random.normal(0, 3, size=len(df))
)
# Select top 12 teams
selected = df_sim.nlargest(n_teams, 'sim_score')['team'].tolist()
# Count selections
for team in selected:
results[team] += 1
# Convert to percentages
results_df = pd.DataFrame({
'team': list(results.keys()),
'selection_pct': [v / n_simulations * 100 for v in results.values()]
}).sort_values('selection_pct', ascending=False)
return results_df
# Run simulation
simulation = simulate_playoffs(playoff_df)
print("\nPlayoff Selection Probability (10,000 Simulations):")
print(simulation.head(20))
# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# 1. Top 25 playoff probabilities
top_25 = playoff_df.head(25)
axes[0, 0].barh(range(len(top_25)), top_25['playoff_prob'],
color='darkgreen', alpha=0.7)
axes[0, 0].set_yticks(range(len(top_25)))
axes[0, 0].set_yticklabels(top_25['team'], fontsize=8)
axes[0, 0].set_xlabel('Playoff Probability (%)')
axes[0, 0].set_title('Top 25 Playoff Probabilities (Model-Based)')
axes[0, 0].invert_yaxis()
# 2. Monte Carlo simulation results
top_20_sim = simulation.head(20)
axes[0, 1].barh(range(len(top_20_sim)), top_20_sim['selection_pct'],
color='steelblue', alpha=0.7)
axes[0, 1].set_yticks(range(len(top_20_sim)))
axes[0, 1].set_yticklabels(top_20_sim['team'], fontsize=8)
axes[0, 1].set_xlabel('Selection Probability (%)')
axes[0, 1].set_title('Playoff Selection Probability (10K Simulations)')
axes[0, 1].axvline(50, color='red', linestyle='--', alpha=0.5)
axes[0, 1].invert_yaxis()
# 3. Wins vs SP+ rating scatter
axes[1, 0].scatter(playoff_df['wins'], playoff_df['sp_rating'],
s=playoff_df['playoff_prob']*3, alpha=0.6,
c=playoff_df['playoff_prob'], cmap='RdYlGn')
axes[1, 0].set_xlabel('Wins')
axes[1, 0].set_ylabel('SP+ Rating')
axes[1, 0].set_title('Wins vs SP+ Rating (size = playoff probability)')
axes[1, 0].grid(alpha=0.3)
# 4. Resume strength comparison
top_contenders = playoff_df.head(15)
x = np.arange(len(top_contenders))
width = 0.35
axes[1, 1].barh(x - width/2, top_contenders['sp_rating'],
width, label='SP+ Rating', color='blue', alpha=0.7)
axes[1, 1].barh(x + width/2, top_contenders['fpi'],
width, label='FPI Rating', color='orange', alpha=0.7)
axes[1, 1].set_yticks(x)
axes[1, 1].set_yticklabels(top_contenders['team'], fontsize=8)
axes[1, 1].set_xlabel('Rating')
axes[1, 1].set_title('Top 15 Teams - SP+ vs FPI Comparison')
axes[1, 1].legend()
axes[1, 1].invert_yaxis()
plt.tight_layout()
plt.show()
```
## Key Modeling Considerations
### Feature Engineering
1. **Win Quality**: Weight wins by opponent strength
2. **Loss Penalty**: Timing and margin of losses matter
3. **Conference Strength**: Adjust for schedule difficulty
4. **Momentum**: Recent performance trends
5. **Head-to-Head**: Direct comparison tiebreaker
### Model Validation
- Historical accuracy: Test on past playoff selections
- Cross-validation: Train on multiple seasons
- Calibration: Ensure probabilities match selection rates
- Committee bias: Account for subjective factors
### Limitations
- Committee subjectivity (eye test, narratives)
- Conference championship game outcomes
- Injury impacts not captured in season stats
- Strength of schedule calculations vary
## Practical Applications
1. **Weekly Rankings**: Update probabilities after each game
2. **Scenario Analysis**: "What if" game outcome modeling
3. **Betting Markets**: Compare model to Vegas odds
4. **Resume Building**: Identify must-win games for teams
## Resources
- [ESPN FPI](https://www.espn.com/college-football/fpi)
- [SP+ Ratings](https://www.espn.com/college-football/story/_/id/38819392/college-football-sp+-rankings-week-9)
- [CFP Selection Committee Protocol](https://collegefootballplayoff.com/sports/2016/10/20/selection-committee-protocol.aspx)
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions