Throughout this textbook, we have explored individual components of college football analytics: data collection, statistical analysis, visualization, machine learning, and real-time systems. This capstone chapter synthesizes all these elements into...
In This Chapter
- Learning Objectives
- Introduction
- 27.1 System Requirements and Stakeholder Analysis
- 27.2 System Architecture
- 27.3 Data Pipeline Implementation
- 27.4 Dashboard Implementation
- 27.5 Deployment and Operations
- 27.6 Testing and Quality Assurance
- 27.7 Documentation and Training
- Troubleshooting
- Support
- 27.8 Case Study: Full Platform Implementation
- Summary
- Key Takeaways
- Exercises
Chapter 27: Building a Complete Analytics System
Learning Objectives
By the end of this chapter, you will be able to: - Design and architect a production-ready college football analytics platform - Integrate multiple data sources into a unified analytics pipeline - Build automated workflows for data collection, processing, and analysis - Create comprehensive dashboards that serve multiple stakeholders - Implement quality assurance and testing for analytics systems - Deploy and maintain a scalable analytics infrastructure - Document and communicate technical systems to non-technical audiences
Introduction
Throughout this textbook, we have explored individual components of college football analytics: data collection, statistical analysis, visualization, machine learning, and real-time systems. This capstone chapter synthesizes all these elements into a cohesive, production-ready analytics platform.
Building a complete analytics system is fundamentally different from implementing individual analyses. It requires systems thinking—understanding how components interact, where failures can occur, and how to maintain reliability over time. A successful analytics platform must serve diverse stakeholders, from coaches needing quick game insights to executives making multi-year strategic decisions.
This chapter presents a comprehensive case study: designing and building an analytics platform for a Division I college football program. We will walk through every phase of development, from requirements gathering to production deployment, providing templates and code that can be adapted to your specific context.
The goal is not just to build a system that works, but to build a system that continues working—reliably, maintainably, and scalably—as the program's needs evolve and grow.
27.1 System Requirements and Stakeholder Analysis
27.1.1 Understanding Your Users
Before writing any code, successful analytics systems begin with understanding who will use them and what they need. Different stakeholders have vastly different requirements:
Coaching Staff - Primary need: Actionable insights for game preparation and in-game decisions - Time constraints: Decisions often needed in seconds during games, hours for game planning - Technical sophistication: Variable; prefer visual interfaces over raw data - Access patterns: Heavy use during season, especially game weeks
Player Personnel / Recruiting - Primary need: Prospect evaluation and comparison - Time constraints: Recruiting cycles span months, but individual evaluations needed quickly - Technical sophistication: Moderate; comfortable with databases and reports - Access patterns: Year-round, with peaks during evaluation periods
Athletic Administration - Primary need: Performance tracking, resource allocation justification - Time constraints: Quarterly and annual reporting cycles - Technical sophistication: Low; need executive summaries - Access patterns: Periodic, often driven by reporting requirements
Analytics Staff - Primary need: Flexible tools for ad-hoc analysis, model development - Time constraints: Variable based on project requirements - Technical sophistication: High; comfortable with code and complex interfaces - Access patterns: Continuous throughout the year
"""
Stakeholder Requirements Documentation
This module defines the requirements framework for the analytics platform.
"""
from dataclasses import dataclass, field
from typing import List, Dict, Optional
from enum import Enum
class UserRole(Enum):
"""User roles in the analytics system."""
HEAD_COACH = "head_coach"
OFFENSIVE_COORDINATOR = "offensive_coordinator"
DEFENSIVE_COORDINATOR = "defensive_coordinator"
POSITION_COACH = "position_coach"
RECRUITING_COORDINATOR = "recruiting_coordinator"
PLAYER_PERSONNEL = "player_personnel"
ATHLETIC_DIRECTOR = "athletic_director"
ANALYTICS_STAFF = "analytics_staff"
VIDEO_COORDINATOR = "video_coordinator"
class AccessLevel(Enum):
"""Data access levels."""
PUBLIC = "public" # Publicly available statistics
INTERNAL = "internal" # Team-internal analysis
CONFIDENTIAL = "confidential" # Recruiting, personnel decisions
RESTRICTED = "restricted" # Sensitive personnel information
@dataclass
class StakeholderRequirement:
"""Documents a single stakeholder requirement."""
id: str
role: UserRole
description: str
priority: str # "critical", "high", "medium", "low"
access_level: AccessLevel
response_time: str # e.g., "real-time", "< 1 minute", "< 1 hour"
frequency: str # How often they need this
acceptance_criteria: List[str] = field(default_factory=list)
# Define core requirements
COACHING_REQUIREMENTS = [
StakeholderRequirement(
id="COACH-001",
role=UserRole.HEAD_COACH,
description="Win probability dashboard during live games",
priority="critical",
access_level=AccessLevel.INTERNAL,
response_time="real-time",
frequency="Every game",
acceptance_criteria=[
"Updates within 3 seconds of play completion",
"Shows win probability for both teams",
"Displays fourth-down recommendations when applicable",
"Works on tablet devices on sideline"
]
),
StakeholderRequirement(
id="COACH-002",
role=UserRole.OFFENSIVE_COORDINATOR,
description="Opponent defensive tendency analysis",
priority="critical",
access_level=AccessLevel.INTERNAL,
response_time="< 1 hour",
frequency="Weekly during season",
acceptance_criteria=[
"Coverage distribution by down/distance",
"Blitz rates by game situation",
"Personnel grouping tendencies",
"Exportable to PowerPoint format"
]
),
StakeholderRequirement(
id="COACH-003",
role=UserRole.DEFENSIVE_COORDINATOR,
description="Opponent offensive play calling patterns",
priority="critical",
access_level=AccessLevel.INTERNAL,
response_time="< 1 hour",
frequency="Weekly during season",
acceptance_criteria=[
"Run/pass splits by situation",
"Formation tendencies",
"Red zone play calling patterns",
"Third down conversion analysis"
]
),
]
RECRUITING_REQUIREMENTS = [
StakeholderRequirement(
id="RECRUIT-001",
role=UserRole.RECRUITING_COORDINATOR,
description="Prospect evaluation database with scoring",
priority="critical",
access_level=AccessLevel.CONFIDENTIAL,
response_time="< 5 seconds",
frequency="Daily during evaluation periods",
acceptance_criteria=[
"Searchable by position, rating, location",
"Composite scores from multiple services",
"Custom evaluation fields",
"Comparison tool for multiple prospects"
]
),
StakeholderRequirement(
id="RECRUIT-002",
role=UserRole.PLAYER_PERSONNEL,
description="Transfer portal monitoring and alerts",
priority="high",
access_level=AccessLevel.CONFIDENTIAL,
response_time="< 1 hour of portal entry",
frequency="Continuous during portal windows",
acceptance_criteria=[
"Automated detection of new portal entries",
"Match scoring against program needs",
"Alerts to relevant coaches",
"Performance history integration"
]
),
]
ANALYTICS_REQUIREMENTS = [
StakeholderRequirement(
id="ANALYTICS-001",
role=UserRole.ANALYTICS_STAFF,
description="Flexible query interface for ad-hoc analysis",
priority="high",
access_level=AccessLevel.INTERNAL,
response_time="< 30 seconds",
frequency="Daily",
acceptance_criteria=[
"SQL-like query capability",
"Access to all historical data",
"Export to CSV/JSON formats",
"Visualization generation"
]
),
StakeholderRequirement(
id="ANALYTICS-002",
role=UserRole.ANALYTICS_STAFF,
description="Model development and deployment pipeline",
priority="high",
access_level=AccessLevel.INTERNAL,
response_time="< 1 hour for model updates",
frequency="Weekly",
acceptance_criteria=[
"Version control for models",
"A/B testing capability",
"Performance monitoring",
"Rollback capability"
]
),
]
def generate_requirements_document(
requirements: List[StakeholderRequirement]
) -> str:
"""Generate formatted requirements document."""
doc = "# Analytics Platform Requirements\n\n"
# Group by role
by_role: Dict[UserRole, List[StakeholderRequirement]] = {}
for req in requirements:
if req.role not in by_role:
by_role[req.role] = []
by_role[req.role].append(req)
for role, reqs in by_role.items():
doc += f"## {role.value.replace('_', ' ').title()}\n\n"
for req in reqs:
doc += f"### {req.id}: {req.description}\n"
doc += f"- **Priority:** {req.priority}\n"
doc += f"- **Access Level:** {req.access_level.value}\n"
doc += f"- **Response Time:** {req.response_time}\n"
doc += f"- **Frequency:** {req.frequency}\n"
doc += "- **Acceptance Criteria:**\n"
for criterion in req.acceptance_criteria:
doc += f" - {criterion}\n"
doc += "\n"
return doc
27.1.2 Functional Requirements
Based on stakeholder analysis, we can define the system's functional requirements:
Data Collection Requirements 1. Automated play-by-play data ingestion from multiple sources 2. Real-time tracking data integration (where available) 3. Recruiting service data aggregation 4. Video tagging and synchronization 5. Manual data entry for proprietary evaluations
Analysis Requirements 1. Expected Points Added (EPA) calculation for all plays 2. Win probability modeling with real-time updates 3. Player efficiency metrics across all positions 4. Opponent tendency analysis and scouting reports 5. Recruiting prospect scoring and comparison 6. Historical trend analysis and season projections
Visualization Requirements 1. Interactive dashboards for each user role 2. Automated report generation (weekly, seasonal) 3. Real-time game monitoring displays 4. Customizable chart export for presentations 5. Mobile-friendly interfaces for field use
Integration Requirements 1. Video platform integration (Hudl, XOS) 2. Recruiting database connections 3. Conference data sharing (where applicable) 4. Export capabilities for external tools
27.1.3 Non-Functional Requirements
Performance - Dashboard load time: < 3 seconds - Real-time updates: < 5 seconds latency - Query response: < 30 seconds for complex analyses - Concurrent users: Support 50+ simultaneous users during games
Reliability - Uptime: 99.9% during games, 99% overall - Data backup: Daily automated backups with point-in-time recovery - Disaster recovery: < 4 hour recovery time objective
Security - Role-based access control - Encryption at rest and in transit - Audit logging for sensitive data access - Compliance with university IT policies
Scalability - Support 10 years of historical data - Handle 1000+ plays per week during season - Accommodate additional data sources without architecture changes
27.2 System Architecture
27.2.1 High-Level Architecture
Our analytics platform follows a modern data platform architecture with distinct layers for ingestion, storage, processing, and presentation:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Coaching │ │ Recruiting │ │ Executive │ │ Analytics │ │
│ │ Dashboard │ │ Dashboard │ │ Reports │ │ Workbench │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ API LAYER │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ REST API Gateway │ │
│ │ /games /plays /players /recruiting /models /reports │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROCESSING LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Real-Time │ │ Batch │ │ ML │ │ Report │ │
│ │ Analytics │ │ Processing │ │ Models │ │ Generator │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STORAGE LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Redis │ │ PostgreSQL │ │ Data Lake │ │ Model │ │
│ │ (Cache) │ │ (Primary) │ │ (Archive) │ │ Registry │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
▲ ▲ ▲ ▲
│ │ │ │
┌─────────────────────────────────────────────────────────────────────────────┐
│ INGESTION LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Play-by- │ │ Tracking │ │ Recruiting │ │ Manual │ │
│ │ Play API │ │ Data │ │ Services │ │ Entry │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
27.2.2 Component Design
"""
System Architecture Components
This module defines the core architectural components of the analytics platform.
"""
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Any, Type
from datetime import datetime
import logging
logger = logging.getLogger(__name__)
# =============================================================================
# LAYER DEFINITIONS
# =============================================================================
class DataSource(ABC):
"""Abstract base class for data sources."""
@abstractmethod
def connect(self) -> bool:
"""Establish connection to data source."""
pass
@abstractmethod
def fetch(self, query: Dict) -> List[Dict]:
"""Fetch data based on query parameters."""
pass
@abstractmethod
def get_schema(self) -> Dict:
"""Return the data schema."""
pass
class DataProcessor(ABC):
"""Abstract base class for data processors."""
@abstractmethod
def process(self, data: List[Dict]) -> List[Dict]:
"""Process raw data into analytical format."""
pass
@abstractmethod
def validate(self, data: List[Dict]) -> List[str]:
"""Validate data, returning list of errors."""
pass
class DataStore(ABC):
"""Abstract base class for data storage."""
@abstractmethod
def save(self, collection: str, data: List[Dict]) -> int:
"""Save data to store, return count saved."""
pass
@abstractmethod
def query(self, collection: str, filters: Dict) -> List[Dict]:
"""Query data from store."""
pass
# =============================================================================
# CONFIGURATION
# =============================================================================
@dataclass
class SystemConfig:
"""Central configuration for the analytics platform."""
# Environment
environment: str = "development" # development, staging, production
# Database settings
database_url: str = "postgresql://localhost:5432/cfb_analytics"
redis_url: str = "redis://localhost:6379"
# API settings
api_host: str = "0.0.0.0"
api_port: int = 8000
api_workers: int = 4
# Data sources
play_by_play_api_url: str = "https://api.collegefootballdata.com"
play_by_play_api_key: Optional[str] = None
# Processing settings
batch_size: int = 1000
real_time_enabled: bool = True
# Feature flags
features: Dict[str, bool] = field(default_factory=lambda: {
"live_win_probability": True,
"fourth_down_bot": True,
"recruiting_alerts": True,
"automated_reports": True
})
# Logging
log_level: str = "INFO"
def is_feature_enabled(self, feature: str) -> bool:
"""Check if a feature is enabled."""
return self.features.get(feature, False)
# =============================================================================
# SERVICE REGISTRY
# =============================================================================
class ServiceRegistry:
"""
Central registry for system services.
Implements dependency injection pattern for loose coupling.
"""
_instance = None
_services: Dict[str, Any] = {}
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._services = {}
return cls._instance
@classmethod
def register(cls, name: str, service: Any):
"""Register a service."""
cls._services[name] = service
logger.info(f"Registered service: {name}")
@classmethod
def get(cls, name: str) -> Any:
"""Get a registered service."""
if name not in cls._services:
raise KeyError(f"Service not registered: {name}")
return cls._services[name]
@classmethod
def has(cls, name: str) -> bool:
"""Check if a service is registered."""
return name in cls._services
# =============================================================================
# EVENT BUS
# =============================================================================
class EventBus:
"""
Simple event bus for inter-component communication.
Enables loose coupling between system components.
"""
def __init__(self):
self._handlers: Dict[str, List[callable]] = {}
def subscribe(self, event_type: str, handler: callable):
"""Subscribe to an event type."""
if event_type not in self._handlers:
self._handlers[event_type] = []
self._handlers[event_type].append(handler)
def publish(self, event_type: str, data: Any):
"""Publish an event to all subscribers."""
handlers = self._handlers.get(event_type, [])
for handler in handlers:
try:
handler(data)
except Exception as e:
logger.error(f"Event handler error: {e}")
def unsubscribe(self, event_type: str, handler: callable):
"""Unsubscribe from an event type."""
if event_type in self._handlers:
self._handlers[event_type].remove(handler)
# =============================================================================
# DOMAIN ENTITIES
# =============================================================================
@dataclass
class Game:
"""Represents a college football game."""
id: str
season: int
week: int
home_team: str
away_team: str
home_score: Optional[int] = None
away_score: Optional[int] = None
venue: Optional[str] = None
game_date: Optional[datetime] = None
conference_game: bool = False
completed: bool = False
@dataclass
class Play:
"""Represents a single play."""
id: str
game_id: str
drive_id: str
play_number: int
# Situation
quarter: int
clock: str
down: int
distance: int
yard_line: int
offense: str
defense: str
# Result
play_type: str
yards_gained: int
touchdown: bool = False
turnover: bool = False
# Advanced metrics (calculated)
epa: Optional[float] = None
success: Optional[bool] = None
wpa: Optional[float] = None
@dataclass
class Player:
"""Represents a player."""
id: str
name: str
position: str
team: str
jersey_number: Optional[int] = None
height: Optional[int] = None # inches
weight: Optional[int] = None # pounds
class_year: Optional[str] = None
hometown: Optional[str] = None
@dataclass
class Prospect:
"""Represents a recruiting prospect."""
id: str
name: str
position: str
high_school: str
city: str
state: str
class_year: int
# Ratings
composite_rating: Optional[float] = None
stars: Optional[int] = None
national_rank: Optional[int] = None
position_rank: Optional[int] = None
state_rank: Optional[int] = None
# Status
committed: bool = False
committed_to: Optional[str] = None
signed: bool = False
# Internal evaluation
internal_grade: Optional[float] = None
evaluation_notes: Optional[str] = None
# =============================================================================
# REPOSITORY PATTERN
# =============================================================================
class GameRepository:
"""Repository for game data access."""
def __init__(self, data_store: DataStore):
self.store = data_store
def get_by_id(self, game_id: str) -> Optional[Game]:
"""Get a game by ID."""
results = self.store.query("games", {"id": game_id})
if results:
return Game(**results[0])
return None
def get_by_team_season(
self,
team: str,
season: int
) -> List[Game]:
"""Get all games for a team in a season."""
results = self.store.query("games", {
"$or": [
{"home_team": team, "season": season},
{"away_team": team, "season": season}
]
})
return [Game(**r) for r in results]
def save(self, game: Game) -> bool:
"""Save a game."""
from dataclasses import asdict
count = self.store.save("games", [asdict(game)])
return count > 0
class PlayRepository:
"""Repository for play data access."""
def __init__(self, data_store: DataStore):
self.store = data_store
def get_by_game(self, game_id: str) -> List[Play]:
"""Get all plays for a game."""
results = self.store.query("plays", {"game_id": game_id})
return [Play(**r) for r in results]
def get_by_team_season(
self,
team: str,
season: int,
offense_only: bool = False
) -> List[Play]:
"""Get all plays for a team in a season."""
filters = {"season": season}
if offense_only:
filters["offense"] = team
else:
filters["$or"] = [
{"offense": team},
{"defense": team}
]
results = self.store.query("plays", filters)
return [Play(**r) for r in results]
def save_batch(self, plays: List[Play]) -> int:
"""Save multiple plays."""
from dataclasses import asdict
data = [asdict(p) for p in plays]
return self.store.save("plays", data)
27.2.3 Database Schema
"""
Database Schema Definition
This module defines the PostgreSQL schema for the analytics platform.
"""
# SQL schema definition
DATABASE_SCHEMA = """
-- =============================================================================
-- CORE TABLES
-- =============================================================================
-- Teams table
CREATE TABLE IF NOT EXISTS teams (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
abbreviation VARCHAR(10),
conference VARCHAR(50),
division VARCHAR(20),
logo_url VARCHAR(255),
primary_color VARCHAR(7),
secondary_color VARCHAR(7),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_teams_conference ON teams(conference);
-- Seasons table
CREATE TABLE IF NOT EXISTS seasons (
year INTEGER PRIMARY KEY,
start_date DATE,
end_date DATE,
playoff_teams INTEGER DEFAULT 4,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Games table
CREATE TABLE IF NOT EXISTS games (
id VARCHAR(50) PRIMARY KEY,
season INTEGER REFERENCES seasons(year),
week INTEGER NOT NULL,
game_date DATE,
game_time TIME,
home_team_id VARCHAR(50) REFERENCES teams(id),
away_team_id VARCHAR(50) REFERENCES teams(id),
home_score INTEGER,
away_score INTEGER,
venue VARCHAR(200),
attendance INTEGER,
conference_game BOOLEAN DEFAULT FALSE,
neutral_site BOOLEAN DEFAULT FALSE,
completed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_games_season ON games(season);
CREATE INDEX idx_games_week ON games(season, week);
CREATE INDEX idx_games_home_team ON games(home_team_id);
CREATE INDEX idx_games_away_team ON games(away_team_id);
-- Drives table
CREATE TABLE IF NOT EXISTS drives (
id VARCHAR(50) PRIMARY KEY,
game_id VARCHAR(50) REFERENCES games(id),
drive_number INTEGER NOT NULL,
offense_team_id VARCHAR(50) REFERENCES teams(id),
start_quarter INTEGER,
start_time VARCHAR(10),
start_yard_line INTEGER,
end_quarter INTEGER,
end_time VARCHAR(10),
end_yard_line INTEGER,
plays INTEGER,
yards INTEGER,
result VARCHAR(50), -- touchdown, field_goal, punt, turnover, etc.
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_drives_game ON drives(game_id);
CREATE INDEX idx_drives_offense ON drives(offense_team_id);
-- Plays table
CREATE TABLE IF NOT EXISTS plays (
id VARCHAR(50) PRIMARY KEY,
game_id VARCHAR(50) REFERENCES games(id),
drive_id VARCHAR(50) REFERENCES drives(id),
play_number INTEGER NOT NULL,
-- Situation
quarter INTEGER NOT NULL,
clock VARCHAR(10),
down INTEGER,
distance INTEGER,
yard_line INTEGER,
offense_team_id VARCHAR(50) REFERENCES teams(id),
defense_team_id VARCHAR(50) REFERENCES teams(id),
offense_score INTEGER,
defense_score INTEGER,
-- Play details
play_type VARCHAR(50),
play_text TEXT,
yards_gained INTEGER,
first_down BOOLEAN DEFAULT FALSE,
touchdown BOOLEAN DEFAULT FALSE,
turnover BOOLEAN DEFAULT FALSE,
penalty BOOLEAN DEFAULT FALSE,
-- Advanced metrics
epa DECIMAL(10, 4),
success BOOLEAN,
wpa DECIMAL(10, 6),
pre_snap_win_prob DECIMAL(10, 6),
post_snap_win_prob DECIMAL(10, 6),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_plays_game ON plays(game_id);
CREATE INDEX idx_plays_drive ON plays(drive_id);
CREATE INDEX idx_plays_offense ON plays(offense_team_id);
CREATE INDEX idx_plays_type ON plays(play_type);
CREATE INDEX idx_plays_situation ON plays(down, distance);
-- =============================================================================
-- PLAYER TABLES
-- =============================================================================
-- Players table
CREATE TABLE IF NOT EXISTS players (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
position VARCHAR(10),
position_group VARCHAR(20),
team_id VARCHAR(50) REFERENCES teams(id),
jersey_number INTEGER,
height_inches INTEGER,
weight_lbs INTEGER,
class_year VARCHAR(10),
hometown VARCHAR(100),
home_state VARCHAR(2),
high_school VARCHAR(100),
active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_players_team ON players(team_id);
CREATE INDEX idx_players_position ON players(position);
-- Player game statistics
CREATE TABLE IF NOT EXISTS player_game_stats (
id SERIAL PRIMARY KEY,
player_id VARCHAR(50) REFERENCES players(id),
game_id VARCHAR(50) REFERENCES games(id),
-- Passing
pass_attempts INTEGER DEFAULT 0,
pass_completions INTEGER DEFAULT 0,
pass_yards INTEGER DEFAULT 0,
pass_touchdowns INTEGER DEFAULT 0,
interceptions INTEGER DEFAULT 0,
-- Rushing
rush_attempts INTEGER DEFAULT 0,
rush_yards INTEGER DEFAULT 0,
rush_touchdowns INTEGER DEFAULT 0,
-- Receiving
receptions INTEGER DEFAULT 0,
receiving_yards INTEGER DEFAULT 0,
receiving_touchdowns INTEGER DEFAULT 0,
targets INTEGER DEFAULT 0,
-- Defense
tackles INTEGER DEFAULT 0,
tackles_for_loss INTEGER DEFAULT 0,
sacks DECIMAL(3, 1) DEFAULT 0,
interceptions_def INTEGER DEFAULT 0,
pass_breakups INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(player_id, game_id)
);
CREATE INDEX idx_player_stats_player ON player_game_stats(player_id);
CREATE INDEX idx_player_stats_game ON player_game_stats(game_id);
-- =============================================================================
-- RECRUITING TABLES
-- =============================================================================
-- Prospects table
CREATE TABLE IF NOT EXISTS prospects (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
position VARCHAR(10),
high_school VARCHAR(100),
city VARCHAR(100),
state VARCHAR(2),
class_year INTEGER NOT NULL,
-- Physical attributes
height_inches INTEGER,
weight_lbs INTEGER,
-- Ratings (from services)
composite_rating DECIMAL(6, 4),
stars INTEGER,
national_rank INTEGER,
position_rank INTEGER,
state_rank INTEGER,
-- Individual service ratings
rating_247 DECIMAL(6, 4),
rating_rivals DECIMAL(6, 4),
rating_espn DECIMAL(6, 4),
rating_on3 DECIMAL(6, 4),
-- Status
committed BOOLEAN DEFAULT FALSE,
committed_to VARCHAR(50),
commitment_date DATE,
signed BOOLEAN DEFAULT FALSE,
signing_date DATE,
enrolled BOOLEAN DEFAULT FALSE,
-- Internal evaluation
internal_grade DECIMAL(5, 2),
priority_level VARCHAR(20), -- top_target, target, monitor
evaluation_notes TEXT,
last_contact_date DATE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_prospects_class ON prospects(class_year);
CREATE INDEX idx_prospects_position ON prospects(position);
CREATE INDEX idx_prospects_state ON prospects(state);
CREATE INDEX idx_prospects_stars ON prospects(stars);
CREATE INDEX idx_prospects_committed ON prospects(committed_to);
-- Prospect evaluations (internal)
CREATE TABLE IF NOT EXISTS prospect_evaluations (
id SERIAL PRIMARY KEY,
prospect_id VARCHAR(50) REFERENCES prospects(id),
evaluator VARCHAR(100),
evaluation_date DATE,
-- Grades (1-10 scale)
athleticism DECIMAL(3, 1),
technique DECIMAL(3, 1),
football_iq DECIMAL(3, 1),
competitiveness DECIMAL(3, 1),
character DECIMAL(3, 1),
overall_grade DECIMAL(3, 1),
notes TEXT,
film_notes TEXT,
comparison_player VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_evaluations_prospect ON prospect_evaluations(prospect_id);
-- =============================================================================
-- ANALYTICS TABLES
-- =============================================================================
-- Pre-computed team statistics by season
CREATE TABLE IF NOT EXISTS team_season_stats (
id SERIAL PRIMARY KEY,
team_id VARCHAR(50) REFERENCES teams(id),
season INTEGER REFERENCES seasons(year),
-- Record
wins INTEGER DEFAULT 0,
losses INTEGER DEFAULT 0,
conference_wins INTEGER DEFAULT 0,
conference_losses INTEGER DEFAULT 0,
-- Offense
total_plays INTEGER DEFAULT 0,
total_yards INTEGER DEFAULT 0,
pass_yards INTEGER DEFAULT 0,
rush_yards INTEGER DEFAULT 0,
points_scored INTEGER DEFAULT 0,
offensive_epa DECIMAL(10, 4),
success_rate DECIMAL(5, 4),
-- Defense
points_allowed INTEGER DEFAULT 0,
yards_allowed INTEGER DEFAULT 0,
defensive_epa DECIMAL(10, 4),
-- Special Teams
field_goal_pct DECIMAL(5, 4),
punt_avg DECIMAL(5, 2),
-- Rankings
sp_plus_rating DECIMAL(6, 2),
fpi_rating DECIMAL(6, 2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(team_id, season)
);
-- Model predictions storage
CREATE TABLE IF NOT EXISTS model_predictions (
id SERIAL PRIMARY KEY,
model_name VARCHAR(100) NOT NULL,
model_version VARCHAR(20),
prediction_type VARCHAR(50), -- game_outcome, win_probability, player_grade
entity_id VARCHAR(50), -- game_id, player_id, etc.
prediction_value DECIMAL(10, 6),
confidence DECIMAL(5, 4),
features_used JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_predictions_model ON model_predictions(model_name);
CREATE INDEX idx_predictions_entity ON model_predictions(entity_id);
CREATE INDEX idx_predictions_type ON model_predictions(prediction_type);
-- =============================================================================
-- AUDIT AND LOGGING
-- =============================================================================
-- Data quality log
CREATE TABLE IF NOT EXISTS data_quality_log (
id SERIAL PRIMARY KEY,
source VARCHAR(50),
entity_type VARCHAR(50),
entity_id VARCHAR(50),
issue_type VARCHAR(50),
issue_description TEXT,
severity VARCHAR(20), -- error, warning, info
resolved BOOLEAN DEFAULT FALSE,
resolution_notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
resolved_at TIMESTAMP
);
CREATE INDEX idx_quality_log_source ON data_quality_log(source);
CREATE INDEX idx_quality_log_severity ON data_quality_log(severity);
CREATE INDEX idx_quality_log_resolved ON data_quality_log(resolved);
-- User activity log
CREATE TABLE IF NOT EXISTS user_activity_log (
id SERIAL PRIMARY KEY,
user_id VARCHAR(50),
user_role VARCHAR(50),
action VARCHAR(50),
resource_type VARCHAR(50),
resource_id VARCHAR(50),
details JSONB,
ip_address VARCHAR(45),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_activity_user ON user_activity_log(user_id);
CREATE INDEX idx_activity_action ON user_activity_log(action);
CREATE INDEX idx_activity_created ON user_activity_log(created_at);
""";
27.3 Data Pipeline Implementation
27.3.1 Ingestion Layer
The ingestion layer is responsible for collecting data from multiple sources and preparing it for processing:
"""
Data Ingestion Pipeline
This module implements the data collection layer for the analytics platform.
"""
import asyncio
import aiohttp
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any
import logging
import json
logger = logging.getLogger(__name__)
# =============================================================================
# BASE CLASSES
# =============================================================================
@dataclass
class IngestionResult:
"""Result of a data ingestion operation."""
source: str
records_fetched: int
records_processed: int
records_failed: int
start_time: datetime
end_time: datetime
errors: List[str]
@property
def success_rate(self) -> float:
if self.records_fetched == 0:
return 1.0
return self.records_processed / self.records_fetched
@property
def duration_seconds(self) -> float:
return (self.end_time - self.start_time).total_seconds()
class DataIngester(ABC):
"""Abstract base class for data ingesters."""
def __init__(self, config: Dict):
self.config = config
self.name = self.__class__.__name__
@abstractmethod
async def ingest(self, **kwargs) -> IngestionResult:
"""Perform data ingestion."""
pass
@abstractmethod
def validate_record(self, record: Dict) -> List[str]:
"""Validate a single record, returning list of errors."""
pass
# =============================================================================
# PLAY-BY-PLAY INGESTER
# =============================================================================
class PlayByPlayIngester(DataIngester):
"""
Ingests play-by-play data from College Football Data API.
Handles rate limiting, pagination, and error recovery.
"""
BASE_URL = "https://api.collegefootballdata.com"
def __init__(self, config: Dict):
super().__init__(config)
self.api_key = config.get("api_key")
self.rate_limit_delay = config.get("rate_limit_delay", 0.5)
async def ingest(
self,
season: int,
week: Optional[int] = None,
team: Optional[str] = None
) -> IngestionResult:
"""
Ingest play-by-play data.
Args:
season: Season year
week: Optional specific week
team: Optional team filter
Returns:
IngestionResult with statistics
"""
start_time = datetime.now()
records_fetched = 0
records_processed = 0
records_failed = 0
errors = []
async with aiohttp.ClientSession() as session:
# Fetch games first
games = await self._fetch_games(session, season, week, team)
logger.info(f"Found {len(games)} games to process")
for game in games:
try:
# Fetch plays for each game
plays = await self._fetch_plays(session, game['id'])
records_fetched += len(plays)
# Validate and process each play
for play in plays:
validation_errors = self.validate_record(play)
if validation_errors:
records_failed += 1
errors.extend(validation_errors)
else:
# Transform and store
processed = self._transform_play(play, game)
# In production: save to database
records_processed += 1
# Respect rate limits
await asyncio.sleep(self.rate_limit_delay)
except Exception as e:
logger.error(f"Error processing game {game['id']}: {e}")
errors.append(f"Game {game['id']}: {str(e)}")
return IngestionResult(
source="play_by_play",
records_fetched=records_fetched,
records_processed=records_processed,
records_failed=records_failed,
start_time=start_time,
end_time=datetime.now(),
errors=errors[:100] # Limit error list size
)
async def _fetch_games(
self,
session: aiohttp.ClientSession,
season: int,
week: Optional[int],
team: Optional[str]
) -> List[Dict]:
"""Fetch games from API."""
url = f"{self.BASE_URL}/games"
params = {"year": season}
if week:
params["week"] = week
if team:
params["team"] = team
headers = {"Authorization": f"Bearer {self.api_key}"}
async with session.get(url, params=params, headers=headers) as resp:
if resp.status == 200:
return await resp.json()
else:
logger.error(f"Failed to fetch games: {resp.status}")
return []
async def _fetch_plays(
self,
session: aiohttp.ClientSession,
game_id: int
) -> List[Dict]:
"""Fetch plays for a specific game."""
url = f"{self.BASE_URL}/plays"
params = {"gameId": game_id}
headers = {"Authorization": f"Bearer {self.api_key}"}
async with session.get(url, params=params, headers=headers) as resp:
if resp.status == 200:
return await resp.json()
else:
logger.error(f"Failed to fetch plays for game {game_id}")
return []
def validate_record(self, record: Dict) -> List[str]:
"""Validate a play record."""
errors = []
required_fields = ['id', 'offense', 'defense', 'down', 'distance']
for field in required_fields:
if field not in record or record[field] is None:
errors.append(f"Missing required field: {field}")
# Range validations
if record.get('down') and not 1 <= record['down'] <= 4:
errors.append(f"Invalid down: {record['down']}")
if record.get('yardsGained') and abs(record['yardsGained']) > 99:
errors.append(f"Suspicious yards gained: {record['yardsGained']}")
return errors
def _transform_play(self, play: Dict, game: Dict) -> Dict:
"""Transform API play format to internal format."""
return {
'id': str(play.get('id')),
'game_id': str(game.get('id')),
'drive_id': str(play.get('drive_id', '')),
'play_number': play.get('play_number', 0),
'quarter': play.get('period', 1),
'clock': play.get('clock', {}).get('displayValue', ''),
'down': play.get('down'),
'distance': play.get('distance'),
'yard_line': play.get('yardsToEndzone', 50),
'offense': play.get('offense'),
'defense': play.get('defense'),
'play_type': play.get('playType'),
'yards_gained': play.get('yardsGained', 0),
'play_text': play.get('playText', ''),
'touchdown': 'touchdown' in play.get('playText', '').lower(),
}
# =============================================================================
# RECRUITING DATA INGESTER
# =============================================================================
class RecruitingIngester(DataIngester):
"""
Ingests recruiting data from multiple services.
Aggregates ratings from 247Sports, Rivals, ESPN, and On3.
"""
def __init__(self, config: Dict):
super().__init__(config)
self.api_key = config.get("api_key")
async def ingest(
self,
class_year: int,
position: Optional[str] = None
) -> IngestionResult:
"""
Ingest recruiting data for a class.
Args:
class_year: Recruiting class year
position: Optional position filter
Returns:
IngestionResult with statistics
"""
start_time = datetime.now()
records_fetched = 0
records_processed = 0
records_failed = 0
errors = []
async with aiohttp.ClientSession() as session:
# Fetch prospects
url = f"{PlayByPlayIngester.BASE_URL}/recruiting/players"
params = {"year": class_year}
if position:
params["position"] = position
headers = {"Authorization": f"Bearer {self.api_key}"}
async with session.get(url, params=params, headers=headers) as resp:
if resp.status == 200:
prospects = await resp.json()
records_fetched = len(prospects)
for prospect in prospects:
validation_errors = self.validate_record(prospect)
if validation_errors:
records_failed += 1
errors.extend(validation_errors)
else:
processed = self._transform_prospect(prospect)
records_processed += 1
else:
errors.append(f"API error: {resp.status}")
return IngestionResult(
source="recruiting",
records_fetched=records_fetched,
records_processed=records_processed,
records_failed=records_failed,
start_time=start_time,
end_time=datetime.now(),
errors=errors
)
def validate_record(self, record: Dict) -> List[str]:
"""Validate a prospect record."""
errors = []
if not record.get('name'):
errors.append("Missing prospect name")
if not record.get('position'):
errors.append("Missing position")
rating = record.get('rating')
if rating and not 0.0 <= rating <= 1.0:
errors.append(f"Invalid rating: {rating}")
return errors
def _transform_prospect(self, prospect: Dict) -> Dict:
"""Transform API prospect format to internal format."""
return {
'id': str(prospect.get('id')),
'name': prospect.get('name'),
'position': prospect.get('position'),
'high_school': prospect.get('school', {}).get('name'),
'city': prospect.get('city'),
'state': prospect.get('stateProvince'),
'class_year': prospect.get('year'),
'composite_rating': prospect.get('rating'),
'stars': prospect.get('stars'),
'national_rank': prospect.get('ranking'),
'committed_to': prospect.get('committedTo'),
}
# =============================================================================
# INGESTION ORCHESTRATOR
# =============================================================================
class IngestionOrchestrator:
"""
Orchestrates data ingestion across all sources.
Manages scheduling, dependencies, and monitoring.
"""
def __init__(self, config: Dict):
self.config = config
self.ingesters: Dict[str, DataIngester] = {}
self.results: List[IngestionResult] = []
def register_ingester(self, name: str, ingester: DataIngester):
"""Register a data ingester."""
self.ingesters[name] = ingester
logger.info(f"Registered ingester: {name}")
async def run_full_ingestion(
self,
season: int,
include_recruiting: bool = True
) -> Dict[str, IngestionResult]:
"""
Run full data ingestion for a season.
Args:
season: Season year
include_recruiting: Whether to include recruiting data
Returns:
Dictionary of results by source
"""
results = {}
# Play-by-play data (required)
if 'play_by_play' in self.ingesters:
logger.info(f"Starting play-by-play ingestion for {season}")
result = await self.ingesters['play_by_play'].ingest(season=season)
results['play_by_play'] = result
self.results.append(result)
logger.info(
f"Play-by-play complete: {result.records_processed} records, "
f"{result.success_rate:.1%} success rate"
)
# Recruiting data (optional)
if include_recruiting and 'recruiting' in self.ingesters:
for year in [season, season + 1]:
logger.info(f"Starting recruiting ingestion for class of {year}")
result = await self.ingesters['recruiting'].ingest(
class_year=year
)
results[f'recruiting_{year}'] = result
self.results.append(result)
return results
async def run_incremental_ingestion(
self,
season: int,
week: int
) -> Dict[str, IngestionResult]:
"""
Run incremental ingestion for a specific week.
Args:
season: Season year
week: Week number
Returns:
Dictionary of results by source
"""
results = {}
if 'play_by_play' in self.ingesters:
logger.info(f"Starting incremental ingestion for {season} week {week}")
result = await self.ingesters['play_by_play'].ingest(
season=season,
week=week
)
results['play_by_play'] = result
self.results.append(result)
return results
def get_summary(self) -> Dict:
"""Get summary of all ingestion runs."""
if not self.results:
return {"status": "no_runs"}
total_fetched = sum(r.records_fetched for r in self.results)
total_processed = sum(r.records_processed for r in self.results)
total_failed = sum(r.records_failed for r in self.results)
return {
"total_runs": len(self.results),
"total_fetched": total_fetched,
"total_processed": total_processed,
"total_failed": total_failed,
"overall_success_rate": (
total_processed / total_fetched if total_fetched > 0 else 1.0
),
"sources": list(set(r.source for r in self.results))
}
27.3.2 Processing Layer
The processing layer transforms raw data into analytical metrics:
"""
Data Processing Pipeline
This module implements the analytics computation layer.
"""
import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple
from scipy import stats
import logging
logger = logging.getLogger(__name__)
# =============================================================================
# EPA CALCULATOR
# =============================================================================
class EPACalculator:
"""
Calculates Expected Points Added (EPA) for plays.
EPA measures the value of a play relative to expected points
based on down, distance, and field position.
"""
def __init__(self):
# Expected points by field position (simplified)
# In production, this would be a more sophisticated model
self._build_ep_lookup()
def _build_ep_lookup(self):
"""Build expected points lookup table."""
# Simplified EP values by yard line (from own goal)
# Based on historical scoring outcomes
self.ep_by_position = {}
for yard_line in range(1, 100):
# Logistic curve approximation
ep = 7 * (1 / (1 + np.exp(-0.1 * (yard_line - 50)))) - 0.5
self.ep_by_position[yard_line] = round(ep, 2)
def calculate_ep(
self,
down: int,
distance: int,
yard_line: int
) -> float:
"""
Calculate expected points for a situation.
Args:
down: Current down (1-4)
distance: Yards to first down
yard_line: Yards from own goal (1-99)
Returns:
Expected points value
"""
base_ep = self.ep_by_position.get(yard_line, 0)
# Adjust for down and distance
down_adjustments = {1: 0.0, 2: -0.3, 3: -0.7, 4: -1.5}
down_adj = down_adjustments.get(down, 0)
# Distance penalty
distance_adj = -0.05 * max(0, distance - 5)
return base_ep + down_adj + distance_adj
def calculate_epa(
self,
before_state: Dict,
after_state: Dict
) -> float:
"""
Calculate EPA for a play.
Args:
before_state: Game state before play
after_state: Game state after play
Returns:
Expected Points Added
"""
ep_before = self.calculate_ep(
before_state['down'],
before_state['distance'],
before_state['yard_line']
)
# Handle scoring plays
if after_state.get('touchdown'):
ep_after = 7.0
elif after_state.get('field_goal'):
ep_after = 3.0
elif after_state.get('safety'):
ep_after = -2.0
elif after_state.get('turnover'):
# Opponent gets ball - negate EP from their perspective
opp_yard_line = 100 - after_state['yard_line']
ep_after = -self.calculate_ep(1, 10, opp_yard_line)
else:
ep_after = self.calculate_ep(
after_state['down'],
after_state['distance'],
after_state['yard_line']
)
return ep_after - ep_before
class PlayProcessor:
"""
Processes plays to compute advanced metrics.
Calculates EPA, success rate, and other derived statistics.
"""
def __init__(self):
self.epa_calc = EPACalculator()
def process_game_plays(self, plays: List[Dict]) -> pd.DataFrame:
"""
Process all plays in a game.
Args:
plays: List of play dictionaries
Returns:
DataFrame with processed plays and metrics
"""
processed = []
for i, play in enumerate(plays):
# Get next play for after-state (or end of drive)
if i + 1 < len(plays) and plays[i+1]['drive_id'] == play['drive_id']:
next_play = plays[i + 1]
after_state = {
'down': next_play['down'],
'distance': next_play['distance'],
'yard_line': next_play['yard_line'],
'touchdown': False,
'turnover': False
}
else:
# End of drive - determine outcome
after_state = self._determine_drive_end(play)
before_state = {
'down': play['down'],
'distance': play['distance'],
'yard_line': play['yard_line']
}
# Calculate EPA
epa = self.epa_calc.calculate_epa(before_state, after_state)
# Determine success
success = self._is_successful(play)
processed.append({
**play,
'epa': epa,
'success': success,
'before_ep': self.epa_calc.calculate_ep(
play['down'], play['distance'], play['yard_line']
)
})
return pd.DataFrame(processed)
def _determine_drive_end(self, last_play: Dict) -> Dict:
"""Determine the end state of a drive."""
play_text = last_play.get('play_text', '').lower()
if 'touchdown' in play_text:
return {'touchdown': True, 'turnover': False}
elif 'interception' in play_text or 'fumble' in play_text:
return {
'turnover': True,
'touchdown': False,
'yard_line': 100 - last_play['yard_line'],
'down': 1,
'distance': 10
}
else:
# Punt or turnover on downs
return {
'turnover': True,
'touchdown': False,
'yard_line': 100 - last_play['yard_line'] - 40, # Estimate punt
'down': 1,
'distance': 10
}
def _is_successful(self, play: Dict) -> bool:
"""
Determine if a play was successful.
Success definition:
- 1st down: Gain 40%+ of needed yards
- 2nd down: Gain 60%+ of needed yards
- 3rd/4th down: Get first down or score
"""
yards = play.get('yards_gained', 0)
down = play.get('down', 1)
distance = play.get('distance', 10)
if down == 1:
return yards >= 0.4 * distance
elif down == 2:
return yards >= 0.6 * distance
else: # 3rd or 4th
return yards >= distance
# =============================================================================
# TEAM ANALYTICS
# =============================================================================
class TeamAnalytics:
"""
Computes team-level analytics.
Aggregates play-level metrics to team statistics.
"""
def __init__(self):
self.play_processor = PlayProcessor()
def compute_team_stats(
self,
plays_df: pd.DataFrame,
team: str,
offense_only: bool = False
) -> Dict:
"""
Compute comprehensive team statistics.
Args:
plays_df: DataFrame of processed plays
team: Team name
offense_only: If True, only compute offensive stats
Returns:
Dictionary of team statistics
"""
# Filter to team's plays
if offense_only:
team_plays = plays_df[plays_df['offense'] == team]
else:
off_plays = plays_df[plays_df['offense'] == team]
def_plays = plays_df[plays_df['defense'] == team]
stats = {}
# Offensive stats
if len(off_plays if not offense_only else team_plays) > 0:
off_df = off_plays if not offense_only else team_plays
stats['offense'] = self._compute_unit_stats(off_df, 'offense')
# Defensive stats
if not offense_only and len(def_plays) > 0:
stats['defense'] = self._compute_unit_stats(def_plays, 'defense')
return stats
def _compute_unit_stats(
self,
plays_df: pd.DataFrame,
unit: str
) -> Dict:
"""Compute stats for offense or defense."""
is_offense = unit == 'offense'
stats = {
'plays': len(plays_df),
'total_epa': plays_df['epa'].sum() * (1 if is_offense else -1),
'epa_per_play': plays_df['epa'].mean() * (1 if is_offense else -1),
'success_rate': plays_df['success'].mean() if is_offense else 1 - plays_df['success'].mean(),
}
# Pass/rush splits
pass_plays = plays_df[plays_df['play_type'].str.contains('pass', case=False, na=False)]
rush_plays = plays_df[plays_df['play_type'].str.contains('rush|run', case=False, na=False)]
if len(pass_plays) > 0:
stats['pass_epa_per_play'] = pass_plays['epa'].mean() * (1 if is_offense else -1)
stats['pass_success_rate'] = pass_plays['success'].mean()
stats['pass_play_pct'] = len(pass_plays) / len(plays_df)
if len(rush_plays) > 0:
stats['rush_epa_per_play'] = rush_plays['epa'].mean() * (1 if is_offense else -1)
stats['rush_success_rate'] = rush_plays['success'].mean()
stats['rush_play_pct'] = len(rush_plays) / len(plays_df)
# Situational stats
stats['red_zone'] = self._situational_stats(
plays_df[plays_df['yard_line'] >= 80], is_offense
)
stats['third_down'] = self._situational_stats(
plays_df[plays_df['down'] == 3], is_offense
)
return stats
def _situational_stats(
self,
plays_df: pd.DataFrame,
is_offense: bool
) -> Dict:
"""Compute situational statistics."""
if len(plays_df) == 0:
return {'plays': 0}
return {
'plays': len(plays_df),
'epa_per_play': plays_df['epa'].mean() * (1 if is_offense else -1),
'success_rate': plays_df['success'].mean()
}
# =============================================================================
# OPPONENT TENDENCY ANALYSIS
# =============================================================================
class OpponentAnalyzer:
"""
Analyzes opponent tendencies for game preparation.
Generates scouting reports with play-calling patterns.
"""
def analyze_opponent(
self,
plays_df: pd.DataFrame,
opponent: str,
num_games: int = 5
) -> Dict:
"""
Generate opponent tendency report.
Args:
plays_df: DataFrame of opponent's plays
opponent: Opponent team name
num_games: Number of recent games to analyze
Returns:
Comprehensive tendency report
"""
# Filter to opponent's offensive plays
off_plays = plays_df[plays_df['offense'] == opponent].copy()
report = {
'team': opponent,
'games_analyzed': off_plays['game_id'].nunique(),
'total_plays': len(off_plays),
'tendencies': {}
}
# Overall tendencies
report['tendencies']['overall'] = self._analyze_tendencies(off_plays)
# By down
for down in [1, 2, 3]:
down_plays = off_plays[off_plays['down'] == down]
report['tendencies'][f'down_{down}'] = self._analyze_tendencies(down_plays)
# By field position
report['tendencies']['red_zone'] = self._analyze_tendencies(
off_plays[off_plays['yard_line'] >= 80]
)
report['tendencies']['own_territory'] = self._analyze_tendencies(
off_plays[off_plays['yard_line'] <= 50]
)
# By score differential
report['tendencies']['ahead'] = self._analyze_tendencies(
off_plays[off_plays.get('score_diff', 0) > 7]
)
report['tendencies']['behind'] = self._analyze_tendencies(
off_plays[off_plays.get('score_diff', 0) < -7]
)
return report
def _analyze_tendencies(self, plays_df: pd.DataFrame) -> Dict:
"""Analyze tendencies for a subset of plays."""
if len(plays_df) < 10:
return {'sample_size': len(plays_df), 'insufficient_data': True}
# Classify plays
pass_mask = plays_df['play_type'].str.contains('pass', case=False, na=False)
rush_mask = plays_df['play_type'].str.contains('rush|run', case=False, na=False)
return {
'sample_size': len(plays_df),
'pass_rate': pass_mask.mean(),
'rush_rate': rush_mask.mean(),
'pass_epa': plays_df.loc[pass_mask, 'epa'].mean() if pass_mask.any() else None,
'rush_epa': plays_df.loc[rush_mask, 'epa'].mean() if rush_mask.any() else None,
'avg_yards_to_go': plays_df['distance'].mean(),
'success_rate': plays_df['success'].mean() if 'success' in plays_df else None
}
def generate_scouting_report(
self,
analysis: Dict,
format: str = 'text'
) -> str:
"""
Generate formatted scouting report.
Args:
analysis: Result from analyze_opponent
format: Output format ('text' or 'html')
Returns:
Formatted report string
"""
lines = [
f"# Opponent Scouting Report: {analysis['team']}",
f"Games Analyzed: {analysis['games_analyzed']}",
f"Total Plays: {analysis['total_plays']}",
"",
"## Overall Tendencies",
]
overall = analysis['tendencies']['overall']
if not overall.get('insufficient_data'):
lines.extend([
f"- Pass Rate: {overall['pass_rate']:.1%}",
f"- Rush Rate: {overall['rush_rate']:.1%}",
f"- Pass EPA/Play: {overall['pass_epa']:.2f}" if overall['pass_epa'] else "",
f"- Rush EPA/Play: {overall['rush_epa']:.2f}" if overall['rush_epa'] else "",
])
lines.extend(["", "## By Down"])
for down in [1, 2, 3]:
tendencies = analysis['tendencies'].get(f'down_{down}', {})
if not tendencies.get('insufficient_data'):
lines.append(
f"- {down}st/nd/rd Down: "
f"{tendencies.get('pass_rate', 0):.0%} pass, "
f"{tendencies.get('rush_rate', 0):.0%} rush"
)
lines.extend(["", "## Situational"])
rz = analysis['tendencies'].get('red_zone', {})
if not rz.get('insufficient_data'):
lines.append(f"- Red Zone: {rz.get('pass_rate', 0):.0%} pass")
return "\n".join(lines)
27.4 Dashboard Implementation
27.4.1 Dashboard Architecture
"""
Dashboard Service Implementation
This module provides the API and data services for dashboards.
"""
from dataclasses import dataclass, asdict
from datetime import datetime
from typing import Dict, List, Optional, Any
import json
import logging
logger = logging.getLogger(__name__)
# =============================================================================
# DASHBOARD DATA SERVICE
# =============================================================================
class DashboardService:
"""
Provides data for all dashboard views.
Handles data aggregation, caching, and real-time updates.
"""
def __init__(self, config: Dict):
self.config = config
self.cache = {} # In production: Redis
def get_coaching_dashboard(
self,
team: str,
game_id: Optional[str] = None
) -> Dict:
"""
Get data for coaching dashboard.
Args:
team: Team name
game_id: Optional specific game
Returns:
Dashboard data structure
"""
cache_key = f"coaching:{team}:{game_id}"
if cache_key in self.cache:
return self.cache[cache_key]
data = {
'team': team,
'generated_at': datetime.now().isoformat(),
'sections': {}
}
# Win probability section
data['sections']['win_probability'] = {
'current': 0.65, # Placeholder
'history': [],
'leverage_index': 1.2
}
# Fourth down section
data['sections']['fourth_down'] = {
'recommendation': 'go_for_it',
'go_for_it_ev': 0.72,
'field_goal_ev': 0.68,
'punt_ev': 0.55
}
# Drive summary
data['sections']['drives'] = {
'current_drive': {
'plays': 5,
'yards': 32,
'time_elapsed': '2:45'
},
'game_summary': {
'total_drives': 8,
'scoring_drives': 3,
'turnovers': 1
}
}
# Opponent tendencies (quick reference)
data['sections']['opponent_tendencies'] = {
'defensive_front': 'Nickel (62%)',
'blitz_rate': '28%',
'coverage_split': {'Cover 3': 45, 'Cover 2': 32, 'Man': 23}
}
self.cache[cache_key] = data
return data
def get_recruiting_dashboard(
self,
team: str,
class_year: int
) -> Dict:
"""
Get data for recruiting dashboard.
Args:
team: Team name
class_year: Recruiting class year
Returns:
Dashboard data structure
"""
data = {
'team': team,
'class_year': class_year,
'generated_at': datetime.now().isoformat(),
'sections': {}
}
# Class overview
data['sections']['class_overview'] = {
'committed': 15,
'targets': 8,
'class_rank': 12,
'average_rating': 0.8945
}
# Position breakdown
data['sections']['position_needs'] = {
'QB': {'committed': 1, 'target': 1, 'status': 'filled'},
'RB': {'committed': 1, 'target': 2, 'status': 'need'},
'WR': {'committed': 3, 'target': 4, 'status': 'need'},
'OL': {'committed': 4, 'target': 5, 'status': 'need'},
'DL': {'committed': 3, 'target': 4, 'status': 'need'},
'LB': {'committed': 2, 'target': 3, 'status': 'need'},
'DB': {'committed': 4, 'target': 4, 'status': 'filled'},
}
# Recent activity
data['sections']['recent_activity'] = []
# Board comparison
data['sections']['class_comparison'] = {
'conference_rank': 3,
'national_rank': 12,
'points': 245.6
}
return data
def get_executive_dashboard(
self,
team: str,
season: int
) -> Dict:
"""
Get data for executive dashboard.
Args:
team: Team name
season: Season year
Returns:
Dashboard data structure
"""
data = {
'team': team,
'season': season,
'generated_at': datetime.now().isoformat(),
'sections': {}
}
# Season performance
data['sections']['performance'] = {
'record': '8-2',
'conference_record': '5-2',
'ranking': 12,
'projected_wins': 9.5
}
# Trends
data['sections']['trends'] = {
'win_pct_3yr': [0.667, 0.750, 0.800],
'recruiting_rank_3yr': [25, 18, 12],
'revenue_trend': 'increasing'
}
# Key metrics
data['sections']['key_metrics'] = {
'offensive_rank': 15,
'defensive_rank': 22,
'special_teams_rank': 8,
'strength_of_schedule': 12
}
return data
# =============================================================================
# API ENDPOINTS
# =============================================================================
# In production, this would use FastAPI or Flask
# Here we define the endpoint structure
API_ENDPOINTS = {
'/api/v1/dashboard/coaching': {
'method': 'GET',
'params': ['team', 'game_id'],
'handler': 'get_coaching_dashboard',
'auth': 'required',
'roles': ['coach', 'analytics']
},
'/api/v1/dashboard/recruiting': {
'method': 'GET',
'params': ['team', 'class_year'],
'handler': 'get_recruiting_dashboard',
'auth': 'required',
'roles': ['recruiting', 'analytics']
},
'/api/v1/dashboard/executive': {
'method': 'GET',
'params': ['team', 'season'],
'handler': 'get_executive_dashboard',
'auth': 'required',
'roles': ['executive', 'analytics']
},
'/api/v1/plays': {
'method': 'GET',
'params': ['game_id', 'team', 'season', 'week'],
'handler': 'get_plays',
'auth': 'required',
'roles': ['coach', 'analytics']
},
'/api/v1/players': {
'method': 'GET',
'params': ['team', 'position', 'season'],
'handler': 'get_players',
'auth': 'required',
'roles': ['coach', 'recruiting', 'analytics']
},
'/api/v1/prospects': {
'method': 'GET',
'params': ['class_year', 'position', 'state', 'committed'],
'handler': 'get_prospects',
'auth': 'required',
'roles': ['recruiting', 'analytics']
},
'/api/v1/models/win-probability': {
'method': 'POST',
'body': ['game_state'],
'handler': 'calculate_win_probability',
'auth': 'required',
'roles': ['coach', 'analytics']
},
'/api/v1/models/fourth-down': {
'method': 'POST',
'body': ['game_state'],
'handler': 'analyze_fourth_down',
'auth': 'required',
'roles': ['coach', 'analytics']
},
'/api/v1/reports/scouting': {
'method': 'GET',
'params': ['opponent', 'format'],
'handler': 'generate_scouting_report',
'auth': 'required',
'roles': ['coach', 'analytics']
},
}
27.4.2 Report Generation
"""
Automated Report Generation
This module generates scheduled reports for stakeholders.
"""
from dataclasses import dataclass
from datetime import datetime
from typing import Dict, List, Optional
import logging
logger = logging.getLogger(__name__)
@dataclass
class ReportConfig:
"""Configuration for a report type."""
name: str
template: str
schedule: str # cron expression
recipients: List[str]
format: str # 'pdf', 'html', 'pptx'
data_sources: List[str]
class ReportGenerator:
"""
Generates automated reports.
Supports multiple formats and scheduling.
"""
TEMPLATES = {
'weekly_summary': """
# Weekly Football Analytics Summary
## {team} - Week {week}, {season}
### Record: {record}
### Offensive Performance
- EPA per Play: {off_epa:.3f}
- Success Rate: {off_success:.1%}
- Explosive Play Rate: {explosive_rate:.1%}
### Defensive Performance
- EPA per Play Allowed: {def_epa:.3f}
- Success Rate Allowed: {def_success:.1%}
- Havoc Rate: {havoc_rate:.1%}
### Key Players
{key_players}
### Looking Ahead
Next opponent: {next_opponent}
{opponent_preview}
""",
'recruiting_update': """
# Recruiting Update
## Class of {class_year}
### Class Summary
- Commits: {commits}
- Class Rank: #{class_rank}
- Average Rating: {avg_rating:.4f}
### Recent Activity
{recent_activity}
### Priority Targets
{priority_targets}
### Position Needs
{position_needs}
""",
'game_recap': """
# Game Recap
## {home_team} vs {away_team}
### {date}
### Final Score: {home_score} - {away_score}
### Win Probability Chart
{wp_chart}
### Key Plays
{key_plays}
### Statistical Summary
{stat_summary}
"""
}
def __init__(self, config: Dict):
self.config = config
def generate_weekly_summary(
self,
team: str,
season: int,
week: int,
data: Dict
) -> str:
"""Generate weekly summary report."""
template = self.TEMPLATES['weekly_summary']
# Format key players section
key_players = self._format_key_players(data.get('key_players', []))
return template.format(
team=team,
season=season,
week=week,
record=data.get('record', 'N/A'),
off_epa=data.get('offensive_epa', 0),
off_success=data.get('offensive_success_rate', 0),
explosive_rate=data.get('explosive_rate', 0),
def_epa=data.get('defensive_epa', 0),
def_success=data.get('defensive_success_rate', 0),
havoc_rate=data.get('havoc_rate', 0),
key_players=key_players,
next_opponent=data.get('next_opponent', 'TBD'),
opponent_preview=data.get('opponent_preview', '')
)
def generate_recruiting_update(
self,
team: str,
class_year: int,
data: Dict
) -> str:
"""Generate recruiting update report."""
template = self.TEMPLATES['recruiting_update']
recent_activity = self._format_recent_activity(
data.get('recent_activity', [])
)
priority_targets = self._format_targets(
data.get('priority_targets', [])
)
position_needs = self._format_position_needs(
data.get('position_needs', {})
)
return template.format(
class_year=class_year,
commits=data.get('commits', 0),
class_rank=data.get('class_rank', 'N/A'),
avg_rating=data.get('avg_rating', 0),
recent_activity=recent_activity,
priority_targets=priority_targets,
position_needs=position_needs
)
def _format_key_players(self, players: List[Dict]) -> str:
"""Format key players section."""
if not players:
return "No key player data available."
lines = []
for player in players[:5]:
lines.append(
f"- **{player['name']}** ({player['position']}): "
f"{player.get('stat_line', 'N/A')}"
)
return "\n".join(lines)
def _format_recent_activity(self, activity: List[Dict]) -> str:
"""Format recent recruiting activity."""
if not activity:
return "No recent activity."
lines = []
for item in activity[:10]:
lines.append(
f"- {item['date']}: {item['player']} - {item['action']}"
)
return "\n".join(lines)
def _format_targets(self, targets: List[Dict]) -> str:
"""Format priority targets."""
if not targets:
return "No priority targets listed."
lines = []
for target in targets:
lines.append(
f"- **{target['name']}** ({target['position']}, "
f"{target['stars']}★): {target['status']}"
)
return "\n".join(lines)
def _format_position_needs(self, needs: Dict) -> str:
"""Format position needs."""
if not needs:
return "Position needs not specified."
lines = []
for position, info in needs.items():
status = "✓" if info.get('filled') else "○"
lines.append(
f"- {status} {position}: {info.get('committed', 0)}/"
f"{info.get('target', 0)}"
)
return "\n".join(lines)
27.5 Deployment and Operations
27.5.1 Infrastructure as Code
"""
Infrastructure Configuration
This module defines deployment configurations for the analytics platform.
"""
# Docker Compose configuration for local development
DOCKER_COMPOSE_DEV = """
version: '3.8'
services:
# PostgreSQL database
postgres:
image: postgres:15
environment:
POSTGRES_DB: cfb_analytics
POSTGRES_USER: analytics
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
- ./schema.sql:/docker-entrypoint-initdb.d/schema.sql
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U analytics -d cfb_analytics"]
interval: 10s
timeout: 5s
retries: 5
# Redis cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
# API server
api:
build:
context: .
dockerfile: Dockerfile.api
environment:
DATABASE_URL: postgresql://analytics:${POSTGRES_PASSWORD}@postgres:5432/cfb_analytics
REDIS_URL: redis://redis:6379
API_KEY: ${CFB_API_KEY}
ENVIRONMENT: development
ports:
- "8000:8000"
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- ./src:/app/src
command: uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
# Dashboard frontend
dashboard:
build:
context: ./dashboard
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
REACT_APP_API_URL: http://localhost:8000
depends_on:
- api
volumes:
- ./dashboard/src:/app/src
# Scheduler for automated jobs
scheduler:
build:
context: .
dockerfile: Dockerfile.scheduler
environment:
DATABASE_URL: postgresql://analytics:${POSTGRES_PASSWORD}@postgres:5432/cfb_analytics
REDIS_URL: redis://redis:6379
API_KEY: ${CFB_API_KEY}
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
volumes:
postgres_data:
redis_data:
"""
# Kubernetes deployment for production
KUBERNETES_DEPLOYMENT = """
apiVersion: apps/v1
kind: Deployment
metadata:
name: cfb-analytics-api
labels:
app: cfb-analytics
component: api
spec:
replicas: 3
selector:
matchLabels:
app: cfb-analytics
component: api
template:
metadata:
labels:
app: cfb-analytics
component: api
spec:
containers:
- name: api
image: cfb-analytics-api:latest
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: cfb-analytics-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: cfb-analytics-secrets
key: redis-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: cfb-analytics-api
spec:
selector:
app: cfb-analytics
component: api
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cfb-analytics-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cfb-analytics-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
"""
27.5.2 Monitoring and Alerting
"""
Monitoring and Alerting Configuration
This module defines monitoring setup for the analytics platform.
"""
from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum
import logging
logger = logging.getLogger(__name__)
class AlertSeverity(Enum):
"""Alert severity levels."""
INFO = "info"
WARNING = "warning"
ERROR = "error"
CRITICAL = "critical"
@dataclass
class MetricDefinition:
"""Definition of a metric to track."""
name: str
description: str
unit: str
alert_threshold: Optional[float] = None
alert_severity: AlertSeverity = AlertSeverity.WARNING
# Define key metrics
SYSTEM_METRICS = [
MetricDefinition(
name="api_latency_p99",
description="99th percentile API response time",
unit="milliseconds",
alert_threshold=500,
alert_severity=AlertSeverity.WARNING
),
MetricDefinition(
name="api_error_rate",
description="Percentage of API requests returning errors",
unit="percentage",
alert_threshold=1.0,
alert_severity=AlertSeverity.ERROR
),
MetricDefinition(
name="database_connections",
description="Number of active database connections",
unit="count",
alert_threshold=80,
alert_severity=AlertSeverity.WARNING
),
MetricDefinition(
name="cache_hit_rate",
description="Percentage of cache hits",
unit="percentage",
alert_threshold=50, # Alert if below 50%
alert_severity=AlertSeverity.WARNING
),
MetricDefinition(
name="data_freshness",
description="Minutes since last data update",
unit="minutes",
alert_threshold=60,
alert_severity=AlertSeverity.ERROR
),
MetricDefinition(
name="model_prediction_latency",
description="Time to generate model predictions",
unit="milliseconds",
alert_threshold=100,
alert_severity=AlertSeverity.WARNING
),
]
# Prometheus configuration
PROMETHEUS_CONFIG = """
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "alerts.yml"
scrape_configs:
- job_name: 'cfb-analytics-api'
static_configs:
- targets: ['api:8000']
metrics_path: /metrics
- job_name: 'cfb-analytics-scheduler'
static_configs:
- targets: ['scheduler:8001']
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']
"""
# Alert rules
ALERT_RULES = """
groups:
- name: cfb-analytics
rules:
- alert: HighAPILatency
expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: High API latency detected
description: 99th percentile latency is {{ $value }}s
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.01
for: 5m
labels:
severity: error
annotations:
summary: High error rate detected
description: Error rate is {{ $value | humanizePercentage }}
- alert: DatabaseConnectionsHigh
expr: pg_stat_activity_count > 80
for: 5m
labels:
severity: warning
annotations:
summary: Database connections approaching limit
description: {{ $value }} active connections
- alert: DataStale
expr: (time() - data_last_update_timestamp) / 60 > 60
for: 10m
labels:
severity: error
annotations:
summary: Data has not been updated recently
description: Last update was {{ $value }} minutes ago
- alert: GameDayAPIDown
expr: up{job="cfb-analytics-api"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: API is down during game day
description: The analytics API is not responding
"""
class HealthChecker:
"""
System health checker.
Performs health checks on all system components.
"""
def __init__(self):
self.checks = []
def add_check(self, name: str, check_func, critical: bool = False):
"""Add a health check."""
self.checks.append({
'name': name,
'func': check_func,
'critical': critical
})
def run_all_checks(self) -> Dict:
"""Run all health checks."""
results = {
'status': 'healthy',
'timestamp': datetime.now().isoformat(),
'checks': {}
}
for check in self.checks:
try:
result = check['func']()
results['checks'][check['name']] = {
'status': 'pass' if result else 'fail',
'critical': check['critical']
}
if not result and check['critical']:
results['status'] = 'unhealthy'
except Exception as e:
results['checks'][check['name']] = {
'status': 'error',
'error': str(e),
'critical': check['critical']
}
if check['critical']:
results['status'] = 'unhealthy'
return results
27.6 Testing and Quality Assurance
27.6.1 Testing Strategy
"""
Testing Framework
This module defines the testing strategy for the analytics platform.
"""
import pytest
from unittest.mock import Mock, patch
from typing import Dict, List
import pandas as pd
import numpy as np
# =============================================================================
# UNIT TESTS
# =============================================================================
class TestEPACalculator:
"""Unit tests for EPA calculation."""
def setup_method(self):
"""Set up test fixtures."""
self.calculator = EPACalculator()
def test_expected_points_own_goal_line(self):
"""EP should be negative near own goal line."""
ep = self.calculator.calculate_ep(down=1, distance=10, yard_line=1)
assert ep < 0
def test_expected_points_opponent_goal_line(self):
"""EP should be high near opponent goal line."""
ep = self.calculator.calculate_ep(down=1, distance=10, yard_line=99)
assert ep > 5
def test_expected_points_midfield(self):
"""EP should be moderate at midfield."""
ep = self.calculator.calculate_ep(down=1, distance=10, yard_line=50)
assert -1 < ep < 3
def test_epa_positive_for_good_play(self):
"""EPA should be positive for a play that improves situation."""
before = {'down': 1, 'distance': 10, 'yard_line': 50}
after = {'down': 1, 'distance': 10, 'yard_line': 65,
'touchdown': False, 'turnover': False}
epa = self.calculator.calculate_epa(before, after)
assert epa > 0
def test_epa_negative_for_turnover(self):
"""EPA should be negative for turnovers."""
before = {'down': 1, 'distance': 10, 'yard_line': 50}
after = {'down': 1, 'distance': 10, 'yard_line': 50,
'touchdown': False, 'turnover': True}
epa = self.calculator.calculate_epa(before, after)
assert epa < 0
def test_epa_touchdown_is_high(self):
"""EPA for touchdown should reflect full value gained."""
before = {'down': 1, 'distance': 10, 'yard_line': 95}
after = {'touchdown': True, 'turnover': False}
epa = self.calculator.calculate_epa(before, after)
assert epa > 0
class TestPlayProcessor:
"""Unit tests for play processing."""
def setup_method(self):
"""Set up test fixtures."""
self.processor = PlayProcessor()
def test_success_first_down_40_percent(self):
"""First down with 40%+ gain is success."""
play = {'down': 1, 'distance': 10, 'yards_gained': 4}
assert self.processor._is_successful(play) is True
def test_failure_first_down_less_than_40_percent(self):
"""First down with <40% gain is failure."""
play = {'down': 1, 'distance': 10, 'yards_gained': 3}
assert self.processor._is_successful(play) is False
def test_success_third_down_conversion(self):
"""Third down conversion is success."""
play = {'down': 3, 'distance': 5, 'yards_gained': 5}
assert self.processor._is_successful(play) is True
class TestDataValidator:
"""Unit tests for data validation."""
def setup_method(self):
"""Set up test fixtures."""
self.validator = DataValidator()
def test_valid_play_passes(self):
"""Valid play should pass validation."""
play = PlayEvent(
event_id="test_1",
game_id="game_1",
event_type=EventType.PLAY_END,
quarter=1,
down=1,
distance=10,
yard_line=25
)
is_valid, errors = self.validator.validate(play)
assert is_valid is True
assert len(errors) == 0
def test_invalid_down_fails(self):
"""Invalid down value should fail validation."""
play = PlayEvent(
event_id="test_1",
game_id="game_1",
event_type=EventType.PLAY_END,
quarter=1,
down=5, # Invalid
distance=10,
yard_line=25
)
is_valid, errors = self.validator.validate(play)
assert is_valid is False
# =============================================================================
# INTEGRATION TESTS
# =============================================================================
class TestDataPipeline:
"""Integration tests for the data pipeline."""
@pytest.fixture
def mock_api_response(self):
"""Mock API response data."""
return [
{
'id': 1,
'offense': 'Ohio State',
'defense': 'Michigan',
'down': 1,
'distance': 10,
'yardsToEndzone': 75,
'yardsGained': 5,
'playType': 'rush',
'playText': 'Rush for 5 yards'
},
{
'id': 2,
'offense': 'Ohio State',
'defense': 'Michigan',
'down': 2,
'distance': 5,
'yardsToEndzone': 70,
'yardsGained': 15,
'playType': 'pass',
'playText': 'Pass complete for 15 yards'
}
]
@pytest.mark.asyncio
async def test_ingestion_to_processing(self, mock_api_response):
"""Test full pipeline from ingestion to processing."""
with patch.object(PlayByPlayIngester, '_fetch_plays',
return_value=mock_api_response):
ingester = PlayByPlayIngester({'api_key': 'test'})
processor = PlayProcessor()
# Would run full pipeline here
# This is a placeholder for the actual test
# =============================================================================
# PERFORMANCE TESTS
# =============================================================================
class TestPerformance:
"""Performance tests for critical paths."""
def test_epa_calculation_speed(self):
"""EPA calculation should be fast."""
import time
calculator = EPACalculator()
start = time.time()
for _ in range(10000):
calculator.calculate_ep(down=2, distance=7, yard_line=45)
duration = time.time() - start
# Should process 10k calculations in under 1 second
assert duration < 1.0
def test_play_processing_speed(self):
"""Play processing should handle large volumes."""
processor = PlayProcessor()
# Generate synthetic plays
plays = []
for i in range(1000):
plays.append({
'id': str(i),
'game_id': 'test',
'drive_id': str(i // 10),
'play_number': i,
'quarter': (i // 150) + 1,
'clock': '10:00',
'down': (i % 4) + 1,
'distance': 10,
'yard_line': 50,
'offense': 'Team A',
'defense': 'Team B',
'play_type': 'pass' if i % 2 == 0 else 'rush',
'yards_gained': np.random.randint(-5, 20),
'play_text': 'Test play'
})
import time
start = time.time()
result = processor.process_game_plays(plays)
duration = time.time() - start
# Should process 1000 plays in under 5 seconds
assert duration < 5.0
assert len(result) == 1000
# =============================================================================
# DATA QUALITY TESTS
# =============================================================================
class TestDataQuality:
"""Tests for data quality validation."""
def test_epa_sum_near_zero(self):
"""EPA should roughly sum to zero across a game (zero-sum)."""
# In a game, EPA gained by offense ≈ EPA lost by defense
# This is a statistical property to validate
pass
def test_success_rate_bounds(self):
"""Success rate should be between 0 and 1."""
# Validate computed success rates
pass
def test_win_probability_sum_to_one(self):
"""Win probabilities should sum to 1."""
# Validate WP model outputs
pass
27.7 Documentation and Training
27.7.1 System Documentation
Comprehensive documentation is essential for system maintainability:
"""
Documentation Generator
Generates system documentation from code and configuration.
"""
SYSTEM_DOCUMENTATION = """
# College Football Analytics Platform
## System Overview
The College Football Analytics Platform provides comprehensive data analysis,
visualization, and decision support for college football programs.
### Architecture
The system follows a modern microservices architecture:
1. **Data Ingestion Layer**: Collects data from multiple sources
2. **Processing Layer**: Transforms and enriches data with analytics
3. **Storage Layer**: PostgreSQL for persistence, Redis for caching
4. **API Layer**: RESTful API for all data access
5. **Presentation Layer**: Role-specific dashboards
### Data Flow
External APIs → Ingestion → Validation → Processing → Storage → API → Dashboards ↓ ↓ ↓ Quality Log Error Queue ML Models
## Getting Started
### Prerequisites
- Python 3.10+
- Docker and Docker Compose
- PostgreSQL 15+
- Redis 7+
### Installation
1. Clone the repository:
```bash
git clone https://github.com/org/cfb-analytics.git
cd cfb-analytics
```
2. Create environment file:
```bash
cp .env.example .env
# Edit .env with your API keys and configuration
```
3. Start services:
```bash
docker-compose up -d
```
4. Initialize database:
```bash
python scripts/init_db.py
```
5. Run initial data load:
```bash
python scripts/load_historical.py --season 2024
```
### Configuration
Key configuration options in `.env`:
| Variable | Description | Default |
|----------|-------------|---------|
| DATABASE_URL | PostgreSQL connection string | required |
| REDIS_URL | Redis connection string | redis://localhost:6379 |
| CFB_API_KEY | API key for data source | required |
| ENVIRONMENT | Environment name | development |
## API Reference
### Authentication
All API endpoints require authentication via Bearer token:
```bash
curl -H "Authorization: Bearer <token>" https://api.example.com/v1/plays
Endpoints
GET /v1/plays
Retrieve play-by-play data.
Parameters:
- game_id (string): Specific game ID
- team (string): Filter by team
- season (integer): Filter by season
- week (integer): Filter by week
Response:
{
"plays": [
{
"id": "play_123",
"game_id": "game_456",
"down": 1,
"distance": 10,
"yard_line": 25,
"play_type": "pass",
"yards_gained": 12,
"epa": 0.45
}
],
"total": 150,
"page": 1
}
Troubleshooting
Common Issues
Issue: API returning 500 errors
Solution: Check database connectivity and logs:
docker-compose logs api
Issue: Data not updating
Solution: Verify scheduler is running:
docker-compose ps scheduler
Support
For issues, contact: analytics-support@university.edu """
def generate_api_documentation(endpoints: Dict) -> str: """Generate API documentation from endpoint definitions.""" doc = "# API Documentation\n\n"
for path, config in endpoints.items():
doc += f"## {config['method']} {path}\n\n"
doc += f"**Authentication**: {config.get('auth', 'none')}\n"
doc += f"**Allowed Roles**: {', '.join(config.get('roles', []))}\n\n"
if config.get('params'):
doc += "**Parameters:**\n"
for param in config['params']:
doc += f"- `{param}`\n"
doc += "\n"
doc += "---\n\n"
return doc
```
27.8 Case Study: Full Platform Implementation
27.8.1 Project Timeline
A realistic implementation timeline for a Division I program:
Phase 1: Foundation (Weeks 1-4) - Requirements gathering and stakeholder interviews - Architecture design and technology selection - Development environment setup - Database schema design and implementation
Phase 2: Core Development (Weeks 5-12) - Data ingestion pipeline implementation - EPA and core metrics calculation - Basic API development - Initial dashboard prototypes
Phase 3: Advanced Features (Weeks 13-20) - Win probability model development - Fourth-down decision engine - Opponent analysis automation - Report generation system
Phase 4: Integration and Testing (Weeks 21-26) - End-to-end testing - Performance optimization - Security audit - User acceptance testing
Phase 5: Deployment and Training (Weeks 27-30) - Production deployment - Staff training sessions - Documentation finalization - Feedback collection and iteration
27.8.2 Lessons Learned
From real-world implementations, common lessons include:
- Start Simple: Begin with core metrics (EPA, success rate) before adding complexity
- Prioritize Reliability: Coaches need to trust the data during games
- Design for Mobile: Sideline access requires tablet-friendly interfaces
- Automate Everything: Manual processes fail during busy game weeks
- Plan for Scale: Game day traffic is 10-100x normal load
- Document Extensively: Staff turnover requires comprehensive documentation
- Build Feedback Loops: Regular coach input improves relevance
- Security First: Competitive data must be protected
Summary
Building a complete analytics system requires integrating multiple disciplines:
- Systems Design: Architecture, scalability, reliability
- Data Engineering: Pipelines, quality, integration
- Analytics: Metrics, models, insights
- Product Development: Dashboards, reports, UX
- Operations: Deployment, monitoring, support
The key to success is understanding that the system exists to serve its users. Technical excellence means nothing if coaches can't access insights when they need them or if recruiters can't efficiently evaluate prospects.
A well-designed analytics platform becomes a competitive advantage, enabling better decisions across all aspects of a football program. The investment in building robust infrastructure pays dividends in the quality and speed of insights delivered.
Key Takeaways
- Start with users: Understand stakeholder needs before writing code
- Design for reliability: Game day uptime is critical
- Automate data pipelines: Manual processes don't scale
- Build incrementally: Deploy core features first, then enhance
- Monitor everything: Proactive alerting prevents failures
- Document thoroughly: Enable others to maintain and extend
- Test rigorously: Unit, integration, and performance tests
- Plan for growth: Architecture should accommodate new data sources
Exercises
-
Requirements Document: Create a complete requirements document for an analytics platform at your institution, interviewing at least three potential users.
-
Architecture Design: Design a system architecture diagram for a college football analytics platform, including all major components and data flows.
-
Data Pipeline: Implement a complete data ingestion pipeline that collects play-by-play data from an API and calculates EPA for all plays.
-
Dashboard Prototype: Build a prototype coaching dashboard showing win probability, fourth-down recommendations, and opponent tendencies.
-
Deployment Plan: Create a deployment plan including infrastructure requirements, monitoring setup, and disaster recovery procedures.