Case Study 1: Building an Analytics Platform for a Power Five Program

Executive Summary

This case study documents the implementation of a comprehensive analytics platform for a Power Five college football program. Over 18 months, the program transformed from spreadsheet-based analysis to a production-grade system serving coaches, recruiters, and executives. The platform now processes over 100,000 plays per season, delivers real-time game insights, and has been credited with improving fourth-down decision-making by 15%.


Background

The Program

The program (anonymized as "State University") is a Power Five conference member with: - Annual football budget: $45M - Analytics staff: 3 full-time members - Technology budget for analytics: $250K annually - Existing tools: Hudl video, basic spreadsheets, third-party recruiting database

The Challenge

Before the project, State University's analytics capabilities were fragmented:

  1. Data Silos: Play-by-play data in spreadsheets, video in Hudl, recruiting in separate system
  2. Manual Processes: Analysts spent 60% of time on data collection, not analysis
  3. Slow Insights: Weekly reports took 2-3 days to produce
  4. Limited Game-Day Support: No real-time decision support for coaches
  5. No Historical Integration: Difficult to analyze trends across seasons

Goals

The athletic department set these objectives:

Goal Metric Target
Reduce manual work Analyst time on data prep < 20%
Faster insights Weekly report turnaround Same day
Game-day support Real-time dashboard uptime 99.9%
Comprehensive data Years of historical data 10 seasons
User adoption Staff actively using system 80%+

Phase 1: Discovery and Planning (Months 1-3)

Stakeholder Interviews

The team conducted 15 interviews across all stakeholder groups:

Head Coach Interview Highlights:

"I need to know if we should go for it on fourth down within 10 seconds. Right now, I'm going with my gut because there's no time to calculate."

Offensive Coordinator:

"I want to know what our opponents do on third-and-medium in the red zone. Getting that information currently takes a full day."

Recruiting Director:

"We're tracking 500+ prospects across three classes. I can't quickly answer who our highest-rated uncommitted prospects are by position."

Requirements Synthesis

From interviews, the team identified 47 requirements, prioritized into tiers:

Must Have (Launch): - Automated play-by-play ingestion - EPA calculation for all plays - Fourth-down decision support - Weekly game reports - Basic recruiting dashboard

Should Have (Phase 2): - Real-time win probability - Opponent tendency analysis - Player evaluation metrics - Mobile dashboard access

Nice to Have (Phase 3): - Video integration - Predictive recruiting rankings - Custom model development tools

Architecture Decision

After evaluating options, the team selected:

┌─────────────────────────────────────────────────────────────────┐
│                     SELECTED ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Technology Stack:                                              │
│  ├── Database: PostgreSQL 15                                    │
│  ├── Cache: Redis 7                                             │
│  ├── API: Python FastAPI                                        │
│  ├── Frontend: React + D3.js                                    │
│  ├── Deployment: Docker + Kubernetes (cloud)                    │
│  └── Monitoring: Prometheus + Grafana                           │
│                                                                 │
│  Why This Stack:                                                │
│  • PostgreSQL: Reliable, good analytics performance             │
│  • FastAPI: Modern async Python, auto-documentation             │
│  • React: Industry standard, component ecosystem                │
│  • Kubernetes: Auto-scaling for game days                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Decision Rationale: - Python ecosystem aligned with existing analytics skills - Cloud deployment allowed auto-scaling without hardware investment - Open-source stack minimized licensing costs - React had strong component libraries for sports visualizations


Phase 2: Core Development (Months 4-9)

Data Pipeline Implementation

The first major deliverable was the automated data pipeline:

# Simplified version of the production pipeline
class StateUniversityPipeline:
    """Main data processing pipeline."""

    def __init__(self):
        self.ingester = CFBDataIngester(api_key=config.cfb_api_key)
        self.processor = PlayProcessor()
        self.store = PostgresStore(config.database_url)
        self.quality_monitor = QualityMonitor()

    async def run_daily_ingestion(self):
        """Run daily data ingestion and processing."""
        logger.info("Starting daily ingestion")

        # Step 1: Fetch new games
        games = await self.ingester.fetch_recent_games(days=7)
        logger.info(f"Found {len(games)} games to process")

        # Step 2: For each game, fetch and process plays
        for game in games:
            try:
                plays = await self.ingester.fetch_plays(game['id'])

                # Step 3: Validate
                valid_plays, errors = self.quality_monitor.validate_batch(plays)
                if errors:
                    logger.warning(f"Game {game['id']}: {len(errors)} validation errors")

                # Step 4: Calculate EPA
                processed = self.processor.calculate_epa_batch(valid_plays)

                # Step 5: Store
                self.store.upsert_plays(processed)

            except Exception as e:
                logger.error(f"Failed to process game {game['id']}: {e}")
                self.quality_monitor.log_error(game['id'], str(e))

        logger.info("Daily ingestion complete")

Challenges Encountered:

  1. API Rate Limits: The data provider limited requests to 100/minute. Solution: Implemented request queuing with exponential backoff.

  2. Data Quality Issues: Approximately 3% of plays had missing or invalid data. Solution: Built a quality monitoring system that logged issues and used sensible defaults.

  3. Historical Data Volume: Loading 10 years of data took 3 days initially. Solution: Parallelized processing and used database COPY for bulk inserts.

EPA Model Development

The team built a custom EPA model using 5 years of historical data:

# Model training process (simplified)
class EPAModelTrainer:
    """Train expected points model."""

    def prepare_training_data(self, plays_df):
        """Prepare features and labels for training."""
        features = []
        labels = []

        for _, play in plays_df.iterrows():
            # Feature: game situation
            feature = [
                play['down'],
                play['distance'],
                play['yard_line'],
                play['seconds_remaining'],
                1 if play['home_possession'] else 0
            ]
            features.append(feature)

            # Label: points scored on this drive
            labels.append(play['drive_points'])

        return np.array(features), np.array(labels)

    def train(self, plays_df):
        """Train the model."""
        X, y = self.prepare_training_data(plays_df)

        # Use gradient boosting for non-linear relationships
        self.model = GradientBoostingRegressor(
            n_estimators=100,
            max_depth=5,
            learning_rate=0.1
        )
        self.model.fit(X, y)

        # Validate
        cv_scores = cross_val_score(self.model, X, y, cv=5)
        logger.info(f"CV R²: {cv_scores.mean():.3f} (+/- {cv_scores.std()*2:.3f})")

        return self.model

Model Performance: - Mean Absolute Error: 0.94 points - R² Score: 0.72 - Correlated well with published EPA values (r=0.96)

Fourth-Down Decision Engine

The most impactful feature was the fourth-down bot:

class FourthDownBot:
    """Real-time fourth-down decision support."""

    def __init__(self, ep_model, wp_model):
        self.ep_model = ep_model
        self.wp_model = wp_model

        # Load historical conversion rates
        self.conversion_rates = self._load_conversion_rates()
        self.fg_rates = self._load_fg_rates()

    def analyze(self, situation: Dict) -> Dict:
        """Analyze fourth-down decision in real-time."""
        start = time.time()

        # Analyze all options
        go_ev = self._analyze_go_for_it(situation)
        fg_ev = self._analyze_field_goal(situation) if situation['yard_line'] >= 55 else None
        punt_ev = self._analyze_punt(situation)

        # Determine recommendation
        options = {'go': go_ev, 'punt': punt_ev}
        if fg_ev:
            options['fg'] = fg_ev

        recommendation = max(options, key=options.get)

        latency = (time.time() - start) * 1000

        return {
            'recommendation': recommendation,
            'expected_values': options,
            'margin': options[recommendation] - sorted(options.values())[-2],
            'latency_ms': latency
        }

Validation Results:

The team validated the fourth-down bot against 3 years of historical decisions:

Decision Type Bot Agreed Bot Disagreed Bot Correct When Disagreed
Go for it 78% 22% 67%
Field goal 85% 15% 71%
Punt 82% 18% 64%

The bot's recommendations, when different from actual decisions, would have resulted in approximately 0.3 additional wins per season based on WPA analysis.


Phase 3: Dashboard Development (Months 10-14)

Coaching Dashboard

The coaching dashboard was designed through iterative prototyping with the coaching staff:

Version 1 Feedback:

"Too many numbers. I need to see 'go for it' or 'kick' in big letters."

Version 2 Feedback:

"Better, but I can't see it well on my tablet in bright sunlight."

Version 3 (Final): - High contrast colors for outdoor visibility - Large recommendation text - Swipe gestures for quick navigation - Offline mode for connectivity issues

// React component for fourth-down widget (simplified)
const FourthDownWidget = ({ situation, recommendation }) => {
  const bgColor = recommendation.decision === 'go'
    ? 'bg-green-600'
    : recommendation.decision === 'fg'
    ? 'bg-yellow-500'
    : 'bg-blue-600';

  return (
    <div className={`${bgColor} p-6 rounded-xl text-white`}>
      <div className="text-4xl font-bold mb-2">
        {recommendation.decision.toUpperCase()}
      </div>
      <div className="text-xl">
        {recommendation.decision === 'go' ? 'GO FOR IT' :
         recommendation.decision === 'fg' ? 'FIELD GOAL' : 'PUNT'}
      </div>
      <div className="mt-4 text-sm opacity-80">
        Win Prob: {(recommendation.expected_wp * 100).toFixed(1)}%
        <br />
        Margin: +{(recommendation.margin * 100).toFixed(1)}%
      </div>
    </div>
  );
};

Recruiting Dashboard

The recruiting dashboard consolidated data from multiple sources:

Features: 1. Prospect Search: Filter by position, rating, location, status 2. Board View: Drag-and-drop priority management 3. Comparison Tool: Side-by-side prospect analysis 4. Activity Feed: Recent visits, offers, commitments

Integration Challenge:

The existing recruiting database used a different ID system. Solution:

class ProspectMatcher:
    """Match prospects across different data sources."""

    def match(self, external_prospect):
        """Find matching internal prospect."""
        # Try exact name match first
        matches = self.db.query(
            "SELECT * FROM prospects WHERE name = %s",
            (external_prospect['name'],)
        )

        if len(matches) == 1:
            return matches[0]

        # Fuzzy match on name + high school
        if not matches:
            matches = self.fuzzy_search(external_prospect)

        if matches:
            # Score matches by similarity
            best = max(matches, key=lambda m: self.similarity_score(m, external_prospect))
            if self.similarity_score(best, external_prospect) > 0.85:
                return best

        return None  # New prospect

Phase 4: Deployment and Adoption (Months 15-18)

Production Deployment

The system was deployed to a cloud Kubernetes cluster:

# Production deployment configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cfb-analytics-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cfb-analytics
  template:
    spec:
      containers:
      - name: api
        image: state-analytics/api:v1.0.0
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          periodSeconds: 10
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cfb-analytics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cfb-analytics-api
  minReplicas: 3
  maxReplicas: 15
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Game Day Auto-Scaling:

The system automatically scaled based on load:

Period Replicas Avg Response Time
Non-game day 3 45ms
Game day (pregame) 5 52ms
Game day (during game) 8-12 68ms
Peak (critical play) 15 95ms

Training Program

Adoption required significant training:

Week 1: Analytics Staff - Full system training (8 hours) - API documentation walkthrough - Custom query development

Week 2: Coaching Staff - Dashboard overview (2 hours) - Fourth-down bot training (1 hour) - Tablet usage on sideline

Week 3: Recruiting Staff - Recruiting dashboard training (2 hours) - Search and filtering - Board management

Week 4: Executive Staff - Report interpretation (1 hour) - Dashboard navigation

Adoption Metrics

After 3 months:

Metric Target Actual
Weekly active users 80% 87%
Dashboard sessions/game 50+ 73
Fourth-down bot queries/game 10+ 18
Recruiting searches/week 100+ 245

Results and Impact

Quantitative Results

Efficiency Gains: - Analyst time on data prep: Reduced from 60% to 15% - Weekly report production: Reduced from 2-3 days to same-day - Opponent scouting report: Reduced from 8 hours to 1 hour

Decision Making: - Fourth-down decisions aligned with analytics: Increased from 45% to 72% - Estimated win probability gained: +0.4 wins/season from fourth-down decisions

System Performance: - Uptime during games: 99.95% (exceeded target) - Average API response: 62ms (well under 500ms target) - Data freshness: Real-time during games, <1 hour for historical

Qualitative Feedback

Head Coach:

"The fourth-down bot has changed how I think about critical decisions. I'm not guessing anymore."

Offensive Coordinator:

"Having opponent tendencies at my fingertips during game prep has been a game-changer. I found a coverage tendency last week that led to two big plays."

Recruiting Coordinator:

"I used to spend hours compiling prospect lists. Now I can answer any recruiting question in minutes."

Lessons Learned

  1. Start with High-Impact Features: The fourth-down bot created immediate value and drove adoption

  2. Design for End Users: Multiple design iterations based on coach feedback were essential

  3. Plan for Game Day: Auto-scaling and reliability testing prevented game-day failures

  4. Invest in Training: Adoption required significant hands-on training, not just documentation

  5. Build Trust Gradually: Coaches needed to see the bot be right several times before trusting it in games

  6. Monitor Everything: Detailed monitoring caught issues before they impacted users


Technical Appendix

System Metrics After 1 Year

Metric Value
Total plays in database 850,000
Total prospects tracked 12,000
API requests per day (average) 15,000
API requests per game day 150,000
Database size 45 GB
Monthly cloud costs $2,100

Key Performance Indicators

KPI Baseline After 1 Year
Fourth-down decision accuracy No baseline 67% (vs. optimal)
Report turnaround 2-3 days Same day
Recruiting efficiency 4 hrs/prospect 1 hr/prospect
System uptime N/A 99.8%

Conclusion

State University's analytics platform transformation demonstrates that a well-planned, user-focused implementation can deliver measurable competitive advantages. The keys to success were:

  1. Deep stakeholder engagement
  2. Iterative development with continuous feedback
  3. Robust technical architecture
  4. Comprehensive training and change management
  5. Continuous monitoring and improvement

The platform continues to evolve, with plans for video integration, enhanced predictive modeling, and expanded mobile capabilities in the coming seasons.