Case Study 1: Building an Analytics Platform for a Power Five Program
Executive Summary
This case study documents the implementation of a comprehensive analytics platform for a Power Five college football program. Over 18 months, the program transformed from spreadsheet-based analysis to a production-grade system serving coaches, recruiters, and executives. The platform now processes over 100,000 plays per season, delivers real-time game insights, and has been credited with improving fourth-down decision-making by 15%.
Background
The Program
The program (anonymized as "State University") is a Power Five conference member with: - Annual football budget: $45M - Analytics staff: 3 full-time members - Technology budget for analytics: $250K annually - Existing tools: Hudl video, basic spreadsheets, third-party recruiting database
The Challenge
Before the project, State University's analytics capabilities were fragmented:
- Data Silos: Play-by-play data in spreadsheets, video in Hudl, recruiting in separate system
- Manual Processes: Analysts spent 60% of time on data collection, not analysis
- Slow Insights: Weekly reports took 2-3 days to produce
- Limited Game-Day Support: No real-time decision support for coaches
- No Historical Integration: Difficult to analyze trends across seasons
Goals
The athletic department set these objectives:
| Goal | Metric | Target |
|---|---|---|
| Reduce manual work | Analyst time on data prep | < 20% |
| Faster insights | Weekly report turnaround | Same day |
| Game-day support | Real-time dashboard uptime | 99.9% |
| Comprehensive data | Years of historical data | 10 seasons |
| User adoption | Staff actively using system | 80%+ |
Phase 1: Discovery and Planning (Months 1-3)
Stakeholder Interviews
The team conducted 15 interviews across all stakeholder groups:
Head Coach Interview Highlights:
"I need to know if we should go for it on fourth down within 10 seconds. Right now, I'm going with my gut because there's no time to calculate."
Offensive Coordinator:
"I want to know what our opponents do on third-and-medium in the red zone. Getting that information currently takes a full day."
Recruiting Director:
"We're tracking 500+ prospects across three classes. I can't quickly answer who our highest-rated uncommitted prospects are by position."
Requirements Synthesis
From interviews, the team identified 47 requirements, prioritized into tiers:
Must Have (Launch): - Automated play-by-play ingestion - EPA calculation for all plays - Fourth-down decision support - Weekly game reports - Basic recruiting dashboard
Should Have (Phase 2): - Real-time win probability - Opponent tendency analysis - Player evaluation metrics - Mobile dashboard access
Nice to Have (Phase 3): - Video integration - Predictive recruiting rankings - Custom model development tools
Architecture Decision
After evaluating options, the team selected:
┌─────────────────────────────────────────────────────────────────┐
│ SELECTED ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Technology Stack: │
│ ├── Database: PostgreSQL 15 │
│ ├── Cache: Redis 7 │
│ ├── API: Python FastAPI │
│ ├── Frontend: React + D3.js │
│ ├── Deployment: Docker + Kubernetes (cloud) │
│ └── Monitoring: Prometheus + Grafana │
│ │
│ Why This Stack: │
│ • PostgreSQL: Reliable, good analytics performance │
│ • FastAPI: Modern async Python, auto-documentation │
│ • React: Industry standard, component ecosystem │
│ • Kubernetes: Auto-scaling for game days │
│ │
└─────────────────────────────────────────────────────────────────┘
Decision Rationale: - Python ecosystem aligned with existing analytics skills - Cloud deployment allowed auto-scaling without hardware investment - Open-source stack minimized licensing costs - React had strong component libraries for sports visualizations
Phase 2: Core Development (Months 4-9)
Data Pipeline Implementation
The first major deliverable was the automated data pipeline:
# Simplified version of the production pipeline
class StateUniversityPipeline:
"""Main data processing pipeline."""
def __init__(self):
self.ingester = CFBDataIngester(api_key=config.cfb_api_key)
self.processor = PlayProcessor()
self.store = PostgresStore(config.database_url)
self.quality_monitor = QualityMonitor()
async def run_daily_ingestion(self):
"""Run daily data ingestion and processing."""
logger.info("Starting daily ingestion")
# Step 1: Fetch new games
games = await self.ingester.fetch_recent_games(days=7)
logger.info(f"Found {len(games)} games to process")
# Step 2: For each game, fetch and process plays
for game in games:
try:
plays = await self.ingester.fetch_plays(game['id'])
# Step 3: Validate
valid_plays, errors = self.quality_monitor.validate_batch(plays)
if errors:
logger.warning(f"Game {game['id']}: {len(errors)} validation errors")
# Step 4: Calculate EPA
processed = self.processor.calculate_epa_batch(valid_plays)
# Step 5: Store
self.store.upsert_plays(processed)
except Exception as e:
logger.error(f"Failed to process game {game['id']}: {e}")
self.quality_monitor.log_error(game['id'], str(e))
logger.info("Daily ingestion complete")
Challenges Encountered:
-
API Rate Limits: The data provider limited requests to 100/minute. Solution: Implemented request queuing with exponential backoff.
-
Data Quality Issues: Approximately 3% of plays had missing or invalid data. Solution: Built a quality monitoring system that logged issues and used sensible defaults.
-
Historical Data Volume: Loading 10 years of data took 3 days initially. Solution: Parallelized processing and used database COPY for bulk inserts.
EPA Model Development
The team built a custom EPA model using 5 years of historical data:
# Model training process (simplified)
class EPAModelTrainer:
"""Train expected points model."""
def prepare_training_data(self, plays_df):
"""Prepare features and labels for training."""
features = []
labels = []
for _, play in plays_df.iterrows():
# Feature: game situation
feature = [
play['down'],
play['distance'],
play['yard_line'],
play['seconds_remaining'],
1 if play['home_possession'] else 0
]
features.append(feature)
# Label: points scored on this drive
labels.append(play['drive_points'])
return np.array(features), np.array(labels)
def train(self, plays_df):
"""Train the model."""
X, y = self.prepare_training_data(plays_df)
# Use gradient boosting for non-linear relationships
self.model = GradientBoostingRegressor(
n_estimators=100,
max_depth=5,
learning_rate=0.1
)
self.model.fit(X, y)
# Validate
cv_scores = cross_val_score(self.model, X, y, cv=5)
logger.info(f"CV R²: {cv_scores.mean():.3f} (+/- {cv_scores.std()*2:.3f})")
return self.model
Model Performance: - Mean Absolute Error: 0.94 points - R² Score: 0.72 - Correlated well with published EPA values (r=0.96)
Fourth-Down Decision Engine
The most impactful feature was the fourth-down bot:
class FourthDownBot:
"""Real-time fourth-down decision support."""
def __init__(self, ep_model, wp_model):
self.ep_model = ep_model
self.wp_model = wp_model
# Load historical conversion rates
self.conversion_rates = self._load_conversion_rates()
self.fg_rates = self._load_fg_rates()
def analyze(self, situation: Dict) -> Dict:
"""Analyze fourth-down decision in real-time."""
start = time.time()
# Analyze all options
go_ev = self._analyze_go_for_it(situation)
fg_ev = self._analyze_field_goal(situation) if situation['yard_line'] >= 55 else None
punt_ev = self._analyze_punt(situation)
# Determine recommendation
options = {'go': go_ev, 'punt': punt_ev}
if fg_ev:
options['fg'] = fg_ev
recommendation = max(options, key=options.get)
latency = (time.time() - start) * 1000
return {
'recommendation': recommendation,
'expected_values': options,
'margin': options[recommendation] - sorted(options.values())[-2],
'latency_ms': latency
}
Validation Results:
The team validated the fourth-down bot against 3 years of historical decisions:
| Decision Type | Bot Agreed | Bot Disagreed | Bot Correct When Disagreed |
|---|---|---|---|
| Go for it | 78% | 22% | 67% |
| Field goal | 85% | 15% | 71% |
| Punt | 82% | 18% | 64% |
The bot's recommendations, when different from actual decisions, would have resulted in approximately 0.3 additional wins per season based on WPA analysis.
Phase 3: Dashboard Development (Months 10-14)
Coaching Dashboard
The coaching dashboard was designed through iterative prototyping with the coaching staff:
Version 1 Feedback:
"Too many numbers. I need to see 'go for it' or 'kick' in big letters."
Version 2 Feedback:
"Better, but I can't see it well on my tablet in bright sunlight."
Version 3 (Final): - High contrast colors for outdoor visibility - Large recommendation text - Swipe gestures for quick navigation - Offline mode for connectivity issues
// React component for fourth-down widget (simplified)
const FourthDownWidget = ({ situation, recommendation }) => {
const bgColor = recommendation.decision === 'go'
? 'bg-green-600'
: recommendation.decision === 'fg'
? 'bg-yellow-500'
: 'bg-blue-600';
return (
<div className={`${bgColor} p-6 rounded-xl text-white`}>
<div className="text-4xl font-bold mb-2">
{recommendation.decision.toUpperCase()}
</div>
<div className="text-xl">
{recommendation.decision === 'go' ? 'GO FOR IT' :
recommendation.decision === 'fg' ? 'FIELD GOAL' : 'PUNT'}
</div>
<div className="mt-4 text-sm opacity-80">
Win Prob: {(recommendation.expected_wp * 100).toFixed(1)}%
<br />
Margin: +{(recommendation.margin * 100).toFixed(1)}%
</div>
</div>
);
};
Recruiting Dashboard
The recruiting dashboard consolidated data from multiple sources:
Features: 1. Prospect Search: Filter by position, rating, location, status 2. Board View: Drag-and-drop priority management 3. Comparison Tool: Side-by-side prospect analysis 4. Activity Feed: Recent visits, offers, commitments
Integration Challenge:
The existing recruiting database used a different ID system. Solution:
class ProspectMatcher:
"""Match prospects across different data sources."""
def match(self, external_prospect):
"""Find matching internal prospect."""
# Try exact name match first
matches = self.db.query(
"SELECT * FROM prospects WHERE name = %s",
(external_prospect['name'],)
)
if len(matches) == 1:
return matches[0]
# Fuzzy match on name + high school
if not matches:
matches = self.fuzzy_search(external_prospect)
if matches:
# Score matches by similarity
best = max(matches, key=lambda m: self.similarity_score(m, external_prospect))
if self.similarity_score(best, external_prospect) > 0.85:
return best
return None # New prospect
Phase 4: Deployment and Adoption (Months 15-18)
Production Deployment
The system was deployed to a cloud Kubernetes cluster:
# Production deployment configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: cfb-analytics-api
spec:
replicas: 3
selector:
matchLabels:
app: cfb-analytics
template:
spec:
containers:
- name: api
image: state-analytics/api:v1.0.0
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8000
periodSeconds: 10
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cfb-analytics-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cfb-analytics-api
minReplicas: 3
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Game Day Auto-Scaling:
The system automatically scaled based on load:
| Period | Replicas | Avg Response Time |
|---|---|---|
| Non-game day | 3 | 45ms |
| Game day (pregame) | 5 | 52ms |
| Game day (during game) | 8-12 | 68ms |
| Peak (critical play) | 15 | 95ms |
Training Program
Adoption required significant training:
Week 1: Analytics Staff - Full system training (8 hours) - API documentation walkthrough - Custom query development
Week 2: Coaching Staff - Dashboard overview (2 hours) - Fourth-down bot training (1 hour) - Tablet usage on sideline
Week 3: Recruiting Staff - Recruiting dashboard training (2 hours) - Search and filtering - Board management
Week 4: Executive Staff - Report interpretation (1 hour) - Dashboard navigation
Adoption Metrics
After 3 months:
| Metric | Target | Actual |
|---|---|---|
| Weekly active users | 80% | 87% |
| Dashboard sessions/game | 50+ | 73 |
| Fourth-down bot queries/game | 10+ | 18 |
| Recruiting searches/week | 100+ | 245 |
Results and Impact
Quantitative Results
Efficiency Gains: - Analyst time on data prep: Reduced from 60% to 15% - Weekly report production: Reduced from 2-3 days to same-day - Opponent scouting report: Reduced from 8 hours to 1 hour
Decision Making: - Fourth-down decisions aligned with analytics: Increased from 45% to 72% - Estimated win probability gained: +0.4 wins/season from fourth-down decisions
System Performance: - Uptime during games: 99.95% (exceeded target) - Average API response: 62ms (well under 500ms target) - Data freshness: Real-time during games, <1 hour for historical
Qualitative Feedback
Head Coach:
"The fourth-down bot has changed how I think about critical decisions. I'm not guessing anymore."
Offensive Coordinator:
"Having opponent tendencies at my fingertips during game prep has been a game-changer. I found a coverage tendency last week that led to two big plays."
Recruiting Coordinator:
"I used to spend hours compiling prospect lists. Now I can answer any recruiting question in minutes."
Lessons Learned
-
Start with High-Impact Features: The fourth-down bot created immediate value and drove adoption
-
Design for End Users: Multiple design iterations based on coach feedback were essential
-
Plan for Game Day: Auto-scaling and reliability testing prevented game-day failures
-
Invest in Training: Adoption required significant hands-on training, not just documentation
-
Build Trust Gradually: Coaches needed to see the bot be right several times before trusting it in games
-
Monitor Everything: Detailed monitoring caught issues before they impacted users
Technical Appendix
System Metrics After 1 Year
| Metric | Value |
|---|---|
| Total plays in database | 850,000 |
| Total prospects tracked | 12,000 |
| API requests per day (average) | 15,000 |
| API requests per game day | 150,000 |
| Database size | 45 GB |
| Monthly cloud costs | $2,100 |
Key Performance Indicators
| KPI | Baseline | After 1 Year |
|---|---|---|
| Fourth-down decision accuracy | No baseline | 67% (vs. optimal) |
| Report turnaround | 2-3 days | Same day |
| Recruiting efficiency | 4 hrs/prospect | 1 hr/prospect |
| System uptime | N/A | 99.8% |
Conclusion
State University's analytics platform transformation demonstrates that a well-planned, user-focused implementation can deliver measurable competitive advantages. The keys to success were:
- Deep stakeholder engagement
- Iterative development with continuous feedback
- Robust technical architecture
- Comprehensive training and change management
- Continuous monitoring and improvement
The platform continues to evolve, with plans for video integration, enhanced predictive modeling, and expanded mobile capabilities in the coming seasons.