7 min read

Throughout this textbook, we have explored individual components of college football analytics: data collection, statistical analysis, visualization, machine learning, and real-time systems. This capstone chapter synthesizes all these elements into...

Chapter 27: Building a Complete Analytics System

Learning Objectives

By the end of this chapter, you will be able to: - Design and architect a production-ready college football analytics platform - Integrate multiple data sources into a unified analytics pipeline - Build automated workflows for data collection, processing, and analysis - Create comprehensive dashboards that serve multiple stakeholders - Implement quality assurance and testing for analytics systems - Deploy and maintain a scalable analytics infrastructure - Document and communicate technical systems to non-technical audiences

Introduction

Throughout this textbook, we have explored individual components of college football analytics: data collection, statistical analysis, visualization, machine learning, and real-time systems. This capstone chapter synthesizes all these elements into a cohesive, production-ready analytics platform.

Building a complete analytics system is fundamentally different from implementing individual analyses. It requires systems thinking—understanding how components interact, where failures can occur, and how to maintain reliability over time. A successful analytics platform must serve diverse stakeholders, from coaches needing quick game insights to executives making multi-year strategic decisions.

This chapter presents a comprehensive case study: designing and building an analytics platform for a Division I college football program. We will walk through every phase of development, from requirements gathering to production deployment, providing templates and code that can be adapted to your specific context.

The goal is not just to build a system that works, but to build a system that continues working—reliably, maintainably, and scalably—as the program's needs evolve and grow.


27.1 System Requirements and Stakeholder Analysis

27.1.1 Understanding Your Users

Before writing any code, successful analytics systems begin with understanding who will use them and what they need. Different stakeholders have vastly different requirements:

Coaching Staff - Primary need: Actionable insights for game preparation and in-game decisions - Time constraints: Decisions often needed in seconds during games, hours for game planning - Technical sophistication: Variable; prefer visual interfaces over raw data - Access patterns: Heavy use during season, especially game weeks

Player Personnel / Recruiting - Primary need: Prospect evaluation and comparison - Time constraints: Recruiting cycles span months, but individual evaluations needed quickly - Technical sophistication: Moderate; comfortable with databases and reports - Access patterns: Year-round, with peaks during evaluation periods

Athletic Administration - Primary need: Performance tracking, resource allocation justification - Time constraints: Quarterly and annual reporting cycles - Technical sophistication: Low; need executive summaries - Access patterns: Periodic, often driven by reporting requirements

Analytics Staff - Primary need: Flexible tools for ad-hoc analysis, model development - Time constraints: Variable based on project requirements - Technical sophistication: High; comfortable with code and complex interfaces - Access patterns: Continuous throughout the year

"""
Stakeholder Requirements Documentation

This module defines the requirements framework for the analytics platform.
"""

from dataclasses import dataclass, field
from typing import List, Dict, Optional
from enum import Enum


class UserRole(Enum):
    """User roles in the analytics system."""
    HEAD_COACH = "head_coach"
    OFFENSIVE_COORDINATOR = "offensive_coordinator"
    DEFENSIVE_COORDINATOR = "defensive_coordinator"
    POSITION_COACH = "position_coach"
    RECRUITING_COORDINATOR = "recruiting_coordinator"
    PLAYER_PERSONNEL = "player_personnel"
    ATHLETIC_DIRECTOR = "athletic_director"
    ANALYTICS_STAFF = "analytics_staff"
    VIDEO_COORDINATOR = "video_coordinator"


class AccessLevel(Enum):
    """Data access levels."""
    PUBLIC = "public"           # Publicly available statistics
    INTERNAL = "internal"       # Team-internal analysis
    CONFIDENTIAL = "confidential"  # Recruiting, personnel decisions
    RESTRICTED = "restricted"   # Sensitive personnel information


@dataclass
class StakeholderRequirement:
    """Documents a single stakeholder requirement."""
    id: str
    role: UserRole
    description: str
    priority: str  # "critical", "high", "medium", "low"
    access_level: AccessLevel
    response_time: str  # e.g., "real-time", "< 1 minute", "< 1 hour"
    frequency: str  # How often they need this
    acceptance_criteria: List[str] = field(default_factory=list)


# Define core requirements
COACHING_REQUIREMENTS = [
    StakeholderRequirement(
        id="COACH-001",
        role=UserRole.HEAD_COACH,
        description="Win probability dashboard during live games",
        priority="critical",
        access_level=AccessLevel.INTERNAL,
        response_time="real-time",
        frequency="Every game",
        acceptance_criteria=[
            "Updates within 3 seconds of play completion",
            "Shows win probability for both teams",
            "Displays fourth-down recommendations when applicable",
            "Works on tablet devices on sideline"
        ]
    ),
    StakeholderRequirement(
        id="COACH-002",
        role=UserRole.OFFENSIVE_COORDINATOR,
        description="Opponent defensive tendency analysis",
        priority="critical",
        access_level=AccessLevel.INTERNAL,
        response_time="< 1 hour",
        frequency="Weekly during season",
        acceptance_criteria=[
            "Coverage distribution by down/distance",
            "Blitz rates by game situation",
            "Personnel grouping tendencies",
            "Exportable to PowerPoint format"
        ]
    ),
    StakeholderRequirement(
        id="COACH-003",
        role=UserRole.DEFENSIVE_COORDINATOR,
        description="Opponent offensive play calling patterns",
        priority="critical",
        access_level=AccessLevel.INTERNAL,
        response_time="< 1 hour",
        frequency="Weekly during season",
        acceptance_criteria=[
            "Run/pass splits by situation",
            "Formation tendencies",
            "Red zone play calling patterns",
            "Third down conversion analysis"
        ]
    ),
]

RECRUITING_REQUIREMENTS = [
    StakeholderRequirement(
        id="RECRUIT-001",
        role=UserRole.RECRUITING_COORDINATOR,
        description="Prospect evaluation database with scoring",
        priority="critical",
        access_level=AccessLevel.CONFIDENTIAL,
        response_time="< 5 seconds",
        frequency="Daily during evaluation periods",
        acceptance_criteria=[
            "Searchable by position, rating, location",
            "Composite scores from multiple services",
            "Custom evaluation fields",
            "Comparison tool for multiple prospects"
        ]
    ),
    StakeholderRequirement(
        id="RECRUIT-002",
        role=UserRole.PLAYER_PERSONNEL,
        description="Transfer portal monitoring and alerts",
        priority="high",
        access_level=AccessLevel.CONFIDENTIAL,
        response_time="< 1 hour of portal entry",
        frequency="Continuous during portal windows",
        acceptance_criteria=[
            "Automated detection of new portal entries",
            "Match scoring against program needs",
            "Alerts to relevant coaches",
            "Performance history integration"
        ]
    ),
]

ANALYTICS_REQUIREMENTS = [
    StakeholderRequirement(
        id="ANALYTICS-001",
        role=UserRole.ANALYTICS_STAFF,
        description="Flexible query interface for ad-hoc analysis",
        priority="high",
        access_level=AccessLevel.INTERNAL,
        response_time="< 30 seconds",
        frequency="Daily",
        acceptance_criteria=[
            "SQL-like query capability",
            "Access to all historical data",
            "Export to CSV/JSON formats",
            "Visualization generation"
        ]
    ),
    StakeholderRequirement(
        id="ANALYTICS-002",
        role=UserRole.ANALYTICS_STAFF,
        description="Model development and deployment pipeline",
        priority="high",
        access_level=AccessLevel.INTERNAL,
        response_time="< 1 hour for model updates",
        frequency="Weekly",
        acceptance_criteria=[
            "Version control for models",
            "A/B testing capability",
            "Performance monitoring",
            "Rollback capability"
        ]
    ),
]


def generate_requirements_document(
    requirements: List[StakeholderRequirement]
) -> str:
    """Generate formatted requirements document."""
    doc = "# Analytics Platform Requirements\n\n"

    # Group by role
    by_role: Dict[UserRole, List[StakeholderRequirement]] = {}
    for req in requirements:
        if req.role not in by_role:
            by_role[req.role] = []
        by_role[req.role].append(req)

    for role, reqs in by_role.items():
        doc += f"## {role.value.replace('_', ' ').title()}\n\n"
        for req in reqs:
            doc += f"### {req.id}: {req.description}\n"
            doc += f"- **Priority:** {req.priority}\n"
            doc += f"- **Access Level:** {req.access_level.value}\n"
            doc += f"- **Response Time:** {req.response_time}\n"
            doc += f"- **Frequency:** {req.frequency}\n"
            doc += "- **Acceptance Criteria:**\n"
            for criterion in req.acceptance_criteria:
                doc += f"  - {criterion}\n"
            doc += "\n"

    return doc

27.1.2 Functional Requirements

Based on stakeholder analysis, we can define the system's functional requirements:

Data Collection Requirements 1. Automated play-by-play data ingestion from multiple sources 2. Real-time tracking data integration (where available) 3. Recruiting service data aggregation 4. Video tagging and synchronization 5. Manual data entry for proprietary evaluations

Analysis Requirements 1. Expected Points Added (EPA) calculation for all plays 2. Win probability modeling with real-time updates 3. Player efficiency metrics across all positions 4. Opponent tendency analysis and scouting reports 5. Recruiting prospect scoring and comparison 6. Historical trend analysis and season projections

Visualization Requirements 1. Interactive dashboards for each user role 2. Automated report generation (weekly, seasonal) 3. Real-time game monitoring displays 4. Customizable chart export for presentations 5. Mobile-friendly interfaces for field use

Integration Requirements 1. Video platform integration (Hudl, XOS) 2. Recruiting database connections 3. Conference data sharing (where applicable) 4. Export capabilities for external tools

27.1.3 Non-Functional Requirements

Performance - Dashboard load time: < 3 seconds - Real-time updates: < 5 seconds latency - Query response: < 30 seconds for complex analyses - Concurrent users: Support 50+ simultaneous users during games

Reliability - Uptime: 99.9% during games, 99% overall - Data backup: Daily automated backups with point-in-time recovery - Disaster recovery: < 4 hour recovery time objective

Security - Role-based access control - Encryption at rest and in transit - Audit logging for sensitive data access - Compliance with university IT policies

Scalability - Support 10 years of historical data - Handle 1000+ plays per week during season - Accommodate additional data sources without architecture changes


27.2 System Architecture

27.2.1 High-Level Architecture

Our analytics platform follows a modern data platform architecture with distinct layers for ingestion, storage, processing, and presentation:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           PRESENTATION LAYER                                │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │  Coaching   │  │  Recruiting │  │  Executive  │  │  Analytics  │       │
│  │  Dashboard  │  │  Dashboard  │  │  Reports    │  │  Workbench  │       │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘       │
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
          │                │                │                │
          ▼                ▼                ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              API LAYER                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         REST API Gateway                              │   │
│  │   /games  /plays  /players  /recruiting  /models  /reports           │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘
          │                │                │                │
          ▼                ▼                ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           PROCESSING LAYER                                   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │  Real-Time  │  │   Batch     │  │    ML       │  │   Report    │       │
│  │  Analytics  │  │  Processing │  │   Models    │  │  Generator  │       │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │
└─────────────────────────────────────────────────────────────────────────────┘
          │                │                │                │
          ▼                ▼                ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            STORAGE LAYER                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │   Redis     │  │ PostgreSQL  │  │ Data Lake   │  │   Model     │       │
│  │   (Cache)   │  │  (Primary)  │  │  (Archive)  │  │   Registry  │       │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │
└─────────────────────────────────────────────────────────────────────────────┘
          ▲                ▲                ▲                ▲
          │                │                │                │
┌─────────────────────────────────────────────────────────────────────────────┐
│                           INGESTION LAYER                                    │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │  Play-by-   │  │  Tracking   │  │ Recruiting  │  │   Manual    │       │
│  │   Play API  │  │    Data     │  │   Services  │  │   Entry     │       │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │
└─────────────────────────────────────────────────────────────────────────────┘

27.2.2 Component Design

"""
System Architecture Components

This module defines the core architectural components of the analytics platform.
"""

from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Any, Type
from datetime import datetime
import logging

logger = logging.getLogger(__name__)


# =============================================================================
# LAYER DEFINITIONS
# =============================================================================

class DataSource(ABC):
    """Abstract base class for data sources."""

    @abstractmethod
    def connect(self) -> bool:
        """Establish connection to data source."""
        pass

    @abstractmethod
    def fetch(self, query: Dict) -> List[Dict]:
        """Fetch data based on query parameters."""
        pass

    @abstractmethod
    def get_schema(self) -> Dict:
        """Return the data schema."""
        pass


class DataProcessor(ABC):
    """Abstract base class for data processors."""

    @abstractmethod
    def process(self, data: List[Dict]) -> List[Dict]:
        """Process raw data into analytical format."""
        pass

    @abstractmethod
    def validate(self, data: List[Dict]) -> List[str]:
        """Validate data, returning list of errors."""
        pass


class DataStore(ABC):
    """Abstract base class for data storage."""

    @abstractmethod
    def save(self, collection: str, data: List[Dict]) -> int:
        """Save data to store, return count saved."""
        pass

    @abstractmethod
    def query(self, collection: str, filters: Dict) -> List[Dict]:
        """Query data from store."""
        pass


# =============================================================================
# CONFIGURATION
# =============================================================================

@dataclass
class SystemConfig:
    """Central configuration for the analytics platform."""

    # Environment
    environment: str = "development"  # development, staging, production

    # Database settings
    database_url: str = "postgresql://localhost:5432/cfb_analytics"
    redis_url: str = "redis://localhost:6379"

    # API settings
    api_host: str = "0.0.0.0"
    api_port: int = 8000
    api_workers: int = 4

    # Data sources
    play_by_play_api_url: str = "https://api.collegefootballdata.com"
    play_by_play_api_key: Optional[str] = None

    # Processing settings
    batch_size: int = 1000
    real_time_enabled: bool = True

    # Feature flags
    features: Dict[str, bool] = field(default_factory=lambda: {
        "live_win_probability": True,
        "fourth_down_bot": True,
        "recruiting_alerts": True,
        "automated_reports": True
    })

    # Logging
    log_level: str = "INFO"

    def is_feature_enabled(self, feature: str) -> bool:
        """Check if a feature is enabled."""
        return self.features.get(feature, False)


# =============================================================================
# SERVICE REGISTRY
# =============================================================================

class ServiceRegistry:
    """
    Central registry for system services.

    Implements dependency injection pattern for loose coupling.
    """

    _instance = None
    _services: Dict[str, Any] = {}

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._services = {}
        return cls._instance

    @classmethod
    def register(cls, name: str, service: Any):
        """Register a service."""
        cls._services[name] = service
        logger.info(f"Registered service: {name}")

    @classmethod
    def get(cls, name: str) -> Any:
        """Get a registered service."""
        if name not in cls._services:
            raise KeyError(f"Service not registered: {name}")
        return cls._services[name]

    @classmethod
    def has(cls, name: str) -> bool:
        """Check if a service is registered."""
        return name in cls._services


# =============================================================================
# EVENT BUS
# =============================================================================

class EventBus:
    """
    Simple event bus for inter-component communication.

    Enables loose coupling between system components.
    """

    def __init__(self):
        self._handlers: Dict[str, List[callable]] = {}

    def subscribe(self, event_type: str, handler: callable):
        """Subscribe to an event type."""
        if event_type not in self._handlers:
            self._handlers[event_type] = []
        self._handlers[event_type].append(handler)

    def publish(self, event_type: str, data: Any):
        """Publish an event to all subscribers."""
        handlers = self._handlers.get(event_type, [])
        for handler in handlers:
            try:
                handler(data)
            except Exception as e:
                logger.error(f"Event handler error: {e}")

    def unsubscribe(self, event_type: str, handler: callable):
        """Unsubscribe from an event type."""
        if event_type in self._handlers:
            self._handlers[event_type].remove(handler)


# =============================================================================
# DOMAIN ENTITIES
# =============================================================================

@dataclass
class Game:
    """Represents a college football game."""
    id: str
    season: int
    week: int
    home_team: str
    away_team: str
    home_score: Optional[int] = None
    away_score: Optional[int] = None
    venue: Optional[str] = None
    game_date: Optional[datetime] = None
    conference_game: bool = False
    completed: bool = False


@dataclass
class Play:
    """Represents a single play."""
    id: str
    game_id: str
    drive_id: str
    play_number: int

    # Situation
    quarter: int
    clock: str
    down: int
    distance: int
    yard_line: int
    offense: str
    defense: str

    # Result
    play_type: str
    yards_gained: int
    touchdown: bool = False
    turnover: bool = False

    # Advanced metrics (calculated)
    epa: Optional[float] = None
    success: Optional[bool] = None
    wpa: Optional[float] = None


@dataclass
class Player:
    """Represents a player."""
    id: str
    name: str
    position: str
    team: str
    jersey_number: Optional[int] = None
    height: Optional[int] = None  # inches
    weight: Optional[int] = None  # pounds
    class_year: Optional[str] = None
    hometown: Optional[str] = None


@dataclass
class Prospect:
    """Represents a recruiting prospect."""
    id: str
    name: str
    position: str
    high_school: str
    city: str
    state: str
    class_year: int

    # Ratings
    composite_rating: Optional[float] = None
    stars: Optional[int] = None
    national_rank: Optional[int] = None
    position_rank: Optional[int] = None
    state_rank: Optional[int] = None

    # Status
    committed: bool = False
    committed_to: Optional[str] = None
    signed: bool = False

    # Internal evaluation
    internal_grade: Optional[float] = None
    evaluation_notes: Optional[str] = None


# =============================================================================
# REPOSITORY PATTERN
# =============================================================================

class GameRepository:
    """Repository for game data access."""

    def __init__(self, data_store: DataStore):
        self.store = data_store

    def get_by_id(self, game_id: str) -> Optional[Game]:
        """Get a game by ID."""
        results = self.store.query("games", {"id": game_id})
        if results:
            return Game(**results[0])
        return None

    def get_by_team_season(
        self,
        team: str,
        season: int
    ) -> List[Game]:
        """Get all games for a team in a season."""
        results = self.store.query("games", {
            "$or": [
                {"home_team": team, "season": season},
                {"away_team": team, "season": season}
            ]
        })
        return [Game(**r) for r in results]

    def save(self, game: Game) -> bool:
        """Save a game."""
        from dataclasses import asdict
        count = self.store.save("games", [asdict(game)])
        return count > 0


class PlayRepository:
    """Repository for play data access."""

    def __init__(self, data_store: DataStore):
        self.store = data_store

    def get_by_game(self, game_id: str) -> List[Play]:
        """Get all plays for a game."""
        results = self.store.query("plays", {"game_id": game_id})
        return [Play(**r) for r in results]

    def get_by_team_season(
        self,
        team: str,
        season: int,
        offense_only: bool = False
    ) -> List[Play]:
        """Get all plays for a team in a season."""
        filters = {"season": season}
        if offense_only:
            filters["offense"] = team
        else:
            filters["$or"] = [
                {"offense": team},
                {"defense": team}
            ]
        results = self.store.query("plays", filters)
        return [Play(**r) for r in results]

    def save_batch(self, plays: List[Play]) -> int:
        """Save multiple plays."""
        from dataclasses import asdict
        data = [asdict(p) for p in plays]
        return self.store.save("plays", data)

27.2.3 Database Schema

"""
Database Schema Definition

This module defines the PostgreSQL schema for the analytics platform.
"""

# SQL schema definition
DATABASE_SCHEMA = """
-- =============================================================================
-- CORE TABLES
-- =============================================================================

-- Teams table
CREATE TABLE IF NOT EXISTS teams (
    id VARCHAR(50) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    abbreviation VARCHAR(10),
    conference VARCHAR(50),
    division VARCHAR(20),
    logo_url VARCHAR(255),
    primary_color VARCHAR(7),
    secondary_color VARCHAR(7),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_teams_conference ON teams(conference);

-- Seasons table
CREATE TABLE IF NOT EXISTS seasons (
    year INTEGER PRIMARY KEY,
    start_date DATE,
    end_date DATE,
    playoff_teams INTEGER DEFAULT 4,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Games table
CREATE TABLE IF NOT EXISTS games (
    id VARCHAR(50) PRIMARY KEY,
    season INTEGER REFERENCES seasons(year),
    week INTEGER NOT NULL,
    game_date DATE,
    game_time TIME,
    home_team_id VARCHAR(50) REFERENCES teams(id),
    away_team_id VARCHAR(50) REFERENCES teams(id),
    home_score INTEGER,
    away_score INTEGER,
    venue VARCHAR(200),
    attendance INTEGER,
    conference_game BOOLEAN DEFAULT FALSE,
    neutral_site BOOLEAN DEFAULT FALSE,
    completed BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_games_season ON games(season);
CREATE INDEX idx_games_week ON games(season, week);
CREATE INDEX idx_games_home_team ON games(home_team_id);
CREATE INDEX idx_games_away_team ON games(away_team_id);

-- Drives table
CREATE TABLE IF NOT EXISTS drives (
    id VARCHAR(50) PRIMARY KEY,
    game_id VARCHAR(50) REFERENCES games(id),
    drive_number INTEGER NOT NULL,
    offense_team_id VARCHAR(50) REFERENCES teams(id),
    start_quarter INTEGER,
    start_time VARCHAR(10),
    start_yard_line INTEGER,
    end_quarter INTEGER,
    end_time VARCHAR(10),
    end_yard_line INTEGER,
    plays INTEGER,
    yards INTEGER,
    result VARCHAR(50),  -- touchdown, field_goal, punt, turnover, etc.
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_drives_game ON drives(game_id);
CREATE INDEX idx_drives_offense ON drives(offense_team_id);

-- Plays table
CREATE TABLE IF NOT EXISTS plays (
    id VARCHAR(50) PRIMARY KEY,
    game_id VARCHAR(50) REFERENCES games(id),
    drive_id VARCHAR(50) REFERENCES drives(id),
    play_number INTEGER NOT NULL,

    -- Situation
    quarter INTEGER NOT NULL,
    clock VARCHAR(10),
    down INTEGER,
    distance INTEGER,
    yard_line INTEGER,
    offense_team_id VARCHAR(50) REFERENCES teams(id),
    defense_team_id VARCHAR(50) REFERENCES teams(id),
    offense_score INTEGER,
    defense_score INTEGER,

    -- Play details
    play_type VARCHAR(50),
    play_text TEXT,
    yards_gained INTEGER,
    first_down BOOLEAN DEFAULT FALSE,
    touchdown BOOLEAN DEFAULT FALSE,
    turnover BOOLEAN DEFAULT FALSE,
    penalty BOOLEAN DEFAULT FALSE,

    -- Advanced metrics
    epa DECIMAL(10, 4),
    success BOOLEAN,
    wpa DECIMAL(10, 6),
    pre_snap_win_prob DECIMAL(10, 6),
    post_snap_win_prob DECIMAL(10, 6),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_plays_game ON plays(game_id);
CREATE INDEX idx_plays_drive ON plays(drive_id);
CREATE INDEX idx_plays_offense ON plays(offense_team_id);
CREATE INDEX idx_plays_type ON plays(play_type);
CREATE INDEX idx_plays_situation ON plays(down, distance);

-- =============================================================================
-- PLAYER TABLES
-- =============================================================================

-- Players table
CREATE TABLE IF NOT EXISTS players (
    id VARCHAR(50) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    position VARCHAR(10),
    position_group VARCHAR(20),
    team_id VARCHAR(50) REFERENCES teams(id),
    jersey_number INTEGER,
    height_inches INTEGER,
    weight_lbs INTEGER,
    class_year VARCHAR(10),
    hometown VARCHAR(100),
    home_state VARCHAR(2),
    high_school VARCHAR(100),
    active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_players_team ON players(team_id);
CREATE INDEX idx_players_position ON players(position);

-- Player game statistics
CREATE TABLE IF NOT EXISTS player_game_stats (
    id SERIAL PRIMARY KEY,
    player_id VARCHAR(50) REFERENCES players(id),
    game_id VARCHAR(50) REFERENCES games(id),

    -- Passing
    pass_attempts INTEGER DEFAULT 0,
    pass_completions INTEGER DEFAULT 0,
    pass_yards INTEGER DEFAULT 0,
    pass_touchdowns INTEGER DEFAULT 0,
    interceptions INTEGER DEFAULT 0,

    -- Rushing
    rush_attempts INTEGER DEFAULT 0,
    rush_yards INTEGER DEFAULT 0,
    rush_touchdowns INTEGER DEFAULT 0,

    -- Receiving
    receptions INTEGER DEFAULT 0,
    receiving_yards INTEGER DEFAULT 0,
    receiving_touchdowns INTEGER DEFAULT 0,
    targets INTEGER DEFAULT 0,

    -- Defense
    tackles INTEGER DEFAULT 0,
    tackles_for_loss INTEGER DEFAULT 0,
    sacks DECIMAL(3, 1) DEFAULT 0,
    interceptions_def INTEGER DEFAULT 0,
    pass_breakups INTEGER DEFAULT 0,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(player_id, game_id)
);

CREATE INDEX idx_player_stats_player ON player_game_stats(player_id);
CREATE INDEX idx_player_stats_game ON player_game_stats(game_id);

-- =============================================================================
-- RECRUITING TABLES
-- =============================================================================

-- Prospects table
CREATE TABLE IF NOT EXISTS prospects (
    id VARCHAR(50) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    position VARCHAR(10),
    high_school VARCHAR(100),
    city VARCHAR(100),
    state VARCHAR(2),
    class_year INTEGER NOT NULL,

    -- Physical attributes
    height_inches INTEGER,
    weight_lbs INTEGER,

    -- Ratings (from services)
    composite_rating DECIMAL(6, 4),
    stars INTEGER,
    national_rank INTEGER,
    position_rank INTEGER,
    state_rank INTEGER,

    -- Individual service ratings
    rating_247 DECIMAL(6, 4),
    rating_rivals DECIMAL(6, 4),
    rating_espn DECIMAL(6, 4),
    rating_on3 DECIMAL(6, 4),

    -- Status
    committed BOOLEAN DEFAULT FALSE,
    committed_to VARCHAR(50),
    commitment_date DATE,
    signed BOOLEAN DEFAULT FALSE,
    signing_date DATE,
    enrolled BOOLEAN DEFAULT FALSE,

    -- Internal evaluation
    internal_grade DECIMAL(5, 2),
    priority_level VARCHAR(20),  -- top_target, target, monitor
    evaluation_notes TEXT,
    last_contact_date DATE,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_prospects_class ON prospects(class_year);
CREATE INDEX idx_prospects_position ON prospects(position);
CREATE INDEX idx_prospects_state ON prospects(state);
CREATE INDEX idx_prospects_stars ON prospects(stars);
CREATE INDEX idx_prospects_committed ON prospects(committed_to);

-- Prospect evaluations (internal)
CREATE TABLE IF NOT EXISTS prospect_evaluations (
    id SERIAL PRIMARY KEY,
    prospect_id VARCHAR(50) REFERENCES prospects(id),
    evaluator VARCHAR(100),
    evaluation_date DATE,

    -- Grades (1-10 scale)
    athleticism DECIMAL(3, 1),
    technique DECIMAL(3, 1),
    football_iq DECIMAL(3, 1),
    competitiveness DECIMAL(3, 1),
    character DECIMAL(3, 1),
    overall_grade DECIMAL(3, 1),

    notes TEXT,
    film_notes TEXT,
    comparison_player VARCHAR(100),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_evaluations_prospect ON prospect_evaluations(prospect_id);

-- =============================================================================
-- ANALYTICS TABLES
-- =============================================================================

-- Pre-computed team statistics by season
CREATE TABLE IF NOT EXISTS team_season_stats (
    id SERIAL PRIMARY KEY,
    team_id VARCHAR(50) REFERENCES teams(id),
    season INTEGER REFERENCES seasons(year),

    -- Record
    wins INTEGER DEFAULT 0,
    losses INTEGER DEFAULT 0,
    conference_wins INTEGER DEFAULT 0,
    conference_losses INTEGER DEFAULT 0,

    -- Offense
    total_plays INTEGER DEFAULT 0,
    total_yards INTEGER DEFAULT 0,
    pass_yards INTEGER DEFAULT 0,
    rush_yards INTEGER DEFAULT 0,
    points_scored INTEGER DEFAULT 0,
    offensive_epa DECIMAL(10, 4),
    success_rate DECIMAL(5, 4),

    -- Defense
    points_allowed INTEGER DEFAULT 0,
    yards_allowed INTEGER DEFAULT 0,
    defensive_epa DECIMAL(10, 4),

    -- Special Teams
    field_goal_pct DECIMAL(5, 4),
    punt_avg DECIMAL(5, 2),

    -- Rankings
    sp_plus_rating DECIMAL(6, 2),
    fpi_rating DECIMAL(6, 2),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(team_id, season)
);

-- Model predictions storage
CREATE TABLE IF NOT EXISTS model_predictions (
    id SERIAL PRIMARY KEY,
    model_name VARCHAR(100) NOT NULL,
    model_version VARCHAR(20),
    prediction_type VARCHAR(50),  -- game_outcome, win_probability, player_grade
    entity_id VARCHAR(50),  -- game_id, player_id, etc.
    prediction_value DECIMAL(10, 6),
    confidence DECIMAL(5, 4),
    features_used JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_predictions_model ON model_predictions(model_name);
CREATE INDEX idx_predictions_entity ON model_predictions(entity_id);
CREATE INDEX idx_predictions_type ON model_predictions(prediction_type);

-- =============================================================================
-- AUDIT AND LOGGING
-- =============================================================================

-- Data quality log
CREATE TABLE IF NOT EXISTS data_quality_log (
    id SERIAL PRIMARY KEY,
    source VARCHAR(50),
    entity_type VARCHAR(50),
    entity_id VARCHAR(50),
    issue_type VARCHAR(50),
    issue_description TEXT,
    severity VARCHAR(20),  -- error, warning, info
    resolved BOOLEAN DEFAULT FALSE,
    resolution_notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    resolved_at TIMESTAMP
);

CREATE INDEX idx_quality_log_source ON data_quality_log(source);
CREATE INDEX idx_quality_log_severity ON data_quality_log(severity);
CREATE INDEX idx_quality_log_resolved ON data_quality_log(resolved);

-- User activity log
CREATE TABLE IF NOT EXISTS user_activity_log (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(50),
    user_role VARCHAR(50),
    action VARCHAR(50),
    resource_type VARCHAR(50),
    resource_id VARCHAR(50),
    details JSONB,
    ip_address VARCHAR(45),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_activity_user ON user_activity_log(user_id);
CREATE INDEX idx_activity_action ON user_activity_log(action);
CREATE INDEX idx_activity_created ON user_activity_log(created_at);
""";

27.3 Data Pipeline Implementation

27.3.1 Ingestion Layer

The ingestion layer is responsible for collecting data from multiple sources and preparing it for processing:

"""
Data Ingestion Pipeline

This module implements the data collection layer for the analytics platform.
"""

import asyncio
import aiohttp
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any
import logging
import json

logger = logging.getLogger(__name__)


# =============================================================================
# BASE CLASSES
# =============================================================================

@dataclass
class IngestionResult:
    """Result of a data ingestion operation."""
    source: str
    records_fetched: int
    records_processed: int
    records_failed: int
    start_time: datetime
    end_time: datetime
    errors: List[str]

    @property
    def success_rate(self) -> float:
        if self.records_fetched == 0:
            return 1.0
        return self.records_processed / self.records_fetched

    @property
    def duration_seconds(self) -> float:
        return (self.end_time - self.start_time).total_seconds()


class DataIngester(ABC):
    """Abstract base class for data ingesters."""

    def __init__(self, config: Dict):
        self.config = config
        self.name = self.__class__.__name__

    @abstractmethod
    async def ingest(self, **kwargs) -> IngestionResult:
        """Perform data ingestion."""
        pass

    @abstractmethod
    def validate_record(self, record: Dict) -> List[str]:
        """Validate a single record, returning list of errors."""
        pass


# =============================================================================
# PLAY-BY-PLAY INGESTER
# =============================================================================

class PlayByPlayIngester(DataIngester):
    """
    Ingests play-by-play data from College Football Data API.

    Handles rate limiting, pagination, and error recovery.
    """

    BASE_URL = "https://api.collegefootballdata.com"

    def __init__(self, config: Dict):
        super().__init__(config)
        self.api_key = config.get("api_key")
        self.rate_limit_delay = config.get("rate_limit_delay", 0.5)

    async def ingest(
        self,
        season: int,
        week: Optional[int] = None,
        team: Optional[str] = None
    ) -> IngestionResult:
        """
        Ingest play-by-play data.

        Args:
            season: Season year
            week: Optional specific week
            team: Optional team filter

        Returns:
            IngestionResult with statistics
        """
        start_time = datetime.now()
        records_fetched = 0
        records_processed = 0
        records_failed = 0
        errors = []

        async with aiohttp.ClientSession() as session:
            # Fetch games first
            games = await self._fetch_games(session, season, week, team)
            logger.info(f"Found {len(games)} games to process")

            for game in games:
                try:
                    # Fetch plays for each game
                    plays = await self._fetch_plays(session, game['id'])
                    records_fetched += len(plays)

                    # Validate and process each play
                    for play in plays:
                        validation_errors = self.validate_record(play)
                        if validation_errors:
                            records_failed += 1
                            errors.extend(validation_errors)
                        else:
                            # Transform and store
                            processed = self._transform_play(play, game)
                            # In production: save to database
                            records_processed += 1

                    # Respect rate limits
                    await asyncio.sleep(self.rate_limit_delay)

                except Exception as e:
                    logger.error(f"Error processing game {game['id']}: {e}")
                    errors.append(f"Game {game['id']}: {str(e)}")

        return IngestionResult(
            source="play_by_play",
            records_fetched=records_fetched,
            records_processed=records_processed,
            records_failed=records_failed,
            start_time=start_time,
            end_time=datetime.now(),
            errors=errors[:100]  # Limit error list size
        )

    async def _fetch_games(
        self,
        session: aiohttp.ClientSession,
        season: int,
        week: Optional[int],
        team: Optional[str]
    ) -> List[Dict]:
        """Fetch games from API."""
        url = f"{self.BASE_URL}/games"
        params = {"year": season}
        if week:
            params["week"] = week
        if team:
            params["team"] = team

        headers = {"Authorization": f"Bearer {self.api_key}"}

        async with session.get(url, params=params, headers=headers) as resp:
            if resp.status == 200:
                return await resp.json()
            else:
                logger.error(f"Failed to fetch games: {resp.status}")
                return []

    async def _fetch_plays(
        self,
        session: aiohttp.ClientSession,
        game_id: int
    ) -> List[Dict]:
        """Fetch plays for a specific game."""
        url = f"{self.BASE_URL}/plays"
        params = {"gameId": game_id}
        headers = {"Authorization": f"Bearer {self.api_key}"}

        async with session.get(url, params=params, headers=headers) as resp:
            if resp.status == 200:
                return await resp.json()
            else:
                logger.error(f"Failed to fetch plays for game {game_id}")
                return []

    def validate_record(self, record: Dict) -> List[str]:
        """Validate a play record."""
        errors = []

        required_fields = ['id', 'offense', 'defense', 'down', 'distance']
        for field in required_fields:
            if field not in record or record[field] is None:
                errors.append(f"Missing required field: {field}")

        # Range validations
        if record.get('down') and not 1 <= record['down'] <= 4:
            errors.append(f"Invalid down: {record['down']}")

        if record.get('yardsGained') and abs(record['yardsGained']) > 99:
            errors.append(f"Suspicious yards gained: {record['yardsGained']}")

        return errors

    def _transform_play(self, play: Dict, game: Dict) -> Dict:
        """Transform API play format to internal format."""
        return {
            'id': str(play.get('id')),
            'game_id': str(game.get('id')),
            'drive_id': str(play.get('drive_id', '')),
            'play_number': play.get('play_number', 0),
            'quarter': play.get('period', 1),
            'clock': play.get('clock', {}).get('displayValue', ''),
            'down': play.get('down'),
            'distance': play.get('distance'),
            'yard_line': play.get('yardsToEndzone', 50),
            'offense': play.get('offense'),
            'defense': play.get('defense'),
            'play_type': play.get('playType'),
            'yards_gained': play.get('yardsGained', 0),
            'play_text': play.get('playText', ''),
            'touchdown': 'touchdown' in play.get('playText', '').lower(),
        }


# =============================================================================
# RECRUITING DATA INGESTER
# =============================================================================

class RecruitingIngester(DataIngester):
    """
    Ingests recruiting data from multiple services.

    Aggregates ratings from 247Sports, Rivals, ESPN, and On3.
    """

    def __init__(self, config: Dict):
        super().__init__(config)
        self.api_key = config.get("api_key")

    async def ingest(
        self,
        class_year: int,
        position: Optional[str] = None
    ) -> IngestionResult:
        """
        Ingest recruiting data for a class.

        Args:
            class_year: Recruiting class year
            position: Optional position filter

        Returns:
            IngestionResult with statistics
        """
        start_time = datetime.now()
        records_fetched = 0
        records_processed = 0
        records_failed = 0
        errors = []

        async with aiohttp.ClientSession() as session:
            # Fetch prospects
            url = f"{PlayByPlayIngester.BASE_URL}/recruiting/players"
            params = {"year": class_year}
            if position:
                params["position"] = position

            headers = {"Authorization": f"Bearer {self.api_key}"}

            async with session.get(url, params=params, headers=headers) as resp:
                if resp.status == 200:
                    prospects = await resp.json()
                    records_fetched = len(prospects)

                    for prospect in prospects:
                        validation_errors = self.validate_record(prospect)
                        if validation_errors:
                            records_failed += 1
                            errors.extend(validation_errors)
                        else:
                            processed = self._transform_prospect(prospect)
                            records_processed += 1
                else:
                    errors.append(f"API error: {resp.status}")

        return IngestionResult(
            source="recruiting",
            records_fetched=records_fetched,
            records_processed=records_processed,
            records_failed=records_failed,
            start_time=start_time,
            end_time=datetime.now(),
            errors=errors
        )

    def validate_record(self, record: Dict) -> List[str]:
        """Validate a prospect record."""
        errors = []

        if not record.get('name'):
            errors.append("Missing prospect name")

        if not record.get('position'):
            errors.append("Missing position")

        rating = record.get('rating')
        if rating and not 0.0 <= rating <= 1.0:
            errors.append(f"Invalid rating: {rating}")

        return errors

    def _transform_prospect(self, prospect: Dict) -> Dict:
        """Transform API prospect format to internal format."""
        return {
            'id': str(prospect.get('id')),
            'name': prospect.get('name'),
            'position': prospect.get('position'),
            'high_school': prospect.get('school', {}).get('name'),
            'city': prospect.get('city'),
            'state': prospect.get('stateProvince'),
            'class_year': prospect.get('year'),
            'composite_rating': prospect.get('rating'),
            'stars': prospect.get('stars'),
            'national_rank': prospect.get('ranking'),
            'committed_to': prospect.get('committedTo'),
        }


# =============================================================================
# INGESTION ORCHESTRATOR
# =============================================================================

class IngestionOrchestrator:
    """
    Orchestrates data ingestion across all sources.

    Manages scheduling, dependencies, and monitoring.
    """

    def __init__(self, config: Dict):
        self.config = config
        self.ingesters: Dict[str, DataIngester] = {}
        self.results: List[IngestionResult] = []

    def register_ingester(self, name: str, ingester: DataIngester):
        """Register a data ingester."""
        self.ingesters[name] = ingester
        logger.info(f"Registered ingester: {name}")

    async def run_full_ingestion(
        self,
        season: int,
        include_recruiting: bool = True
    ) -> Dict[str, IngestionResult]:
        """
        Run full data ingestion for a season.

        Args:
            season: Season year
            include_recruiting: Whether to include recruiting data

        Returns:
            Dictionary of results by source
        """
        results = {}

        # Play-by-play data (required)
        if 'play_by_play' in self.ingesters:
            logger.info(f"Starting play-by-play ingestion for {season}")
            result = await self.ingesters['play_by_play'].ingest(season=season)
            results['play_by_play'] = result
            self.results.append(result)
            logger.info(
                f"Play-by-play complete: {result.records_processed} records, "
                f"{result.success_rate:.1%} success rate"
            )

        # Recruiting data (optional)
        if include_recruiting and 'recruiting' in self.ingesters:
            for year in [season, season + 1]:
                logger.info(f"Starting recruiting ingestion for class of {year}")
                result = await self.ingesters['recruiting'].ingest(
                    class_year=year
                )
                results[f'recruiting_{year}'] = result
                self.results.append(result)

        return results

    async def run_incremental_ingestion(
        self,
        season: int,
        week: int
    ) -> Dict[str, IngestionResult]:
        """
        Run incremental ingestion for a specific week.

        Args:
            season: Season year
            week: Week number

        Returns:
            Dictionary of results by source
        """
        results = {}

        if 'play_by_play' in self.ingesters:
            logger.info(f"Starting incremental ingestion for {season} week {week}")
            result = await self.ingesters['play_by_play'].ingest(
                season=season,
                week=week
            )
            results['play_by_play'] = result
            self.results.append(result)

        return results

    def get_summary(self) -> Dict:
        """Get summary of all ingestion runs."""
        if not self.results:
            return {"status": "no_runs"}

        total_fetched = sum(r.records_fetched for r in self.results)
        total_processed = sum(r.records_processed for r in self.results)
        total_failed = sum(r.records_failed for r in self.results)

        return {
            "total_runs": len(self.results),
            "total_fetched": total_fetched,
            "total_processed": total_processed,
            "total_failed": total_failed,
            "overall_success_rate": (
                total_processed / total_fetched if total_fetched > 0 else 1.0
            ),
            "sources": list(set(r.source for r in self.results))
        }

27.3.2 Processing Layer

The processing layer transforms raw data into analytical metrics:

"""
Data Processing Pipeline

This module implements the analytics computation layer.
"""

import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple
from scipy import stats
import logging

logger = logging.getLogger(__name__)


# =============================================================================
# EPA CALCULATOR
# =============================================================================

class EPACalculator:
    """
    Calculates Expected Points Added (EPA) for plays.

    EPA measures the value of a play relative to expected points
    based on down, distance, and field position.
    """

    def __init__(self):
        # Expected points by field position (simplified)
        # In production, this would be a more sophisticated model
        self._build_ep_lookup()

    def _build_ep_lookup(self):
        """Build expected points lookup table."""
        # Simplified EP values by yard line (from own goal)
        # Based on historical scoring outcomes
        self.ep_by_position = {}
        for yard_line in range(1, 100):
            # Logistic curve approximation
            ep = 7 * (1 / (1 + np.exp(-0.1 * (yard_line - 50)))) - 0.5
            self.ep_by_position[yard_line] = round(ep, 2)

    def calculate_ep(
        self,
        down: int,
        distance: int,
        yard_line: int
    ) -> float:
        """
        Calculate expected points for a situation.

        Args:
            down: Current down (1-4)
            distance: Yards to first down
            yard_line: Yards from own goal (1-99)

        Returns:
            Expected points value
        """
        base_ep = self.ep_by_position.get(yard_line, 0)

        # Adjust for down and distance
        down_adjustments = {1: 0.0, 2: -0.3, 3: -0.7, 4: -1.5}
        down_adj = down_adjustments.get(down, 0)

        # Distance penalty
        distance_adj = -0.05 * max(0, distance - 5)

        return base_ep + down_adj + distance_adj

    def calculate_epa(
        self,
        before_state: Dict,
        after_state: Dict
    ) -> float:
        """
        Calculate EPA for a play.

        Args:
            before_state: Game state before play
            after_state: Game state after play

        Returns:
            Expected Points Added
        """
        ep_before = self.calculate_ep(
            before_state['down'],
            before_state['distance'],
            before_state['yard_line']
        )

        # Handle scoring plays
        if after_state.get('touchdown'):
            ep_after = 7.0
        elif after_state.get('field_goal'):
            ep_after = 3.0
        elif after_state.get('safety'):
            ep_after = -2.0
        elif after_state.get('turnover'):
            # Opponent gets ball - negate EP from their perspective
            opp_yard_line = 100 - after_state['yard_line']
            ep_after = -self.calculate_ep(1, 10, opp_yard_line)
        else:
            ep_after = self.calculate_ep(
                after_state['down'],
                after_state['distance'],
                after_state['yard_line']
            )

        return ep_after - ep_before


class PlayProcessor:
    """
    Processes plays to compute advanced metrics.

    Calculates EPA, success rate, and other derived statistics.
    """

    def __init__(self):
        self.epa_calc = EPACalculator()

    def process_game_plays(self, plays: List[Dict]) -> pd.DataFrame:
        """
        Process all plays in a game.

        Args:
            plays: List of play dictionaries

        Returns:
            DataFrame with processed plays and metrics
        """
        processed = []

        for i, play in enumerate(plays):
            # Get next play for after-state (or end of drive)
            if i + 1 < len(plays) and plays[i+1]['drive_id'] == play['drive_id']:
                next_play = plays[i + 1]
                after_state = {
                    'down': next_play['down'],
                    'distance': next_play['distance'],
                    'yard_line': next_play['yard_line'],
                    'touchdown': False,
                    'turnover': False
                }
            else:
                # End of drive - determine outcome
                after_state = self._determine_drive_end(play)

            before_state = {
                'down': play['down'],
                'distance': play['distance'],
                'yard_line': play['yard_line']
            }

            # Calculate EPA
            epa = self.epa_calc.calculate_epa(before_state, after_state)

            # Determine success
            success = self._is_successful(play)

            processed.append({
                **play,
                'epa': epa,
                'success': success,
                'before_ep': self.epa_calc.calculate_ep(
                    play['down'], play['distance'], play['yard_line']
                )
            })

        return pd.DataFrame(processed)

    def _determine_drive_end(self, last_play: Dict) -> Dict:
        """Determine the end state of a drive."""
        play_text = last_play.get('play_text', '').lower()

        if 'touchdown' in play_text:
            return {'touchdown': True, 'turnover': False}
        elif 'interception' in play_text or 'fumble' in play_text:
            return {
                'turnover': True,
                'touchdown': False,
                'yard_line': 100 - last_play['yard_line'],
                'down': 1,
                'distance': 10
            }
        else:
            # Punt or turnover on downs
            return {
                'turnover': True,
                'touchdown': False,
                'yard_line': 100 - last_play['yard_line'] - 40,  # Estimate punt
                'down': 1,
                'distance': 10
            }

    def _is_successful(self, play: Dict) -> bool:
        """
        Determine if a play was successful.

        Success definition:
        - 1st down: Gain 40%+ of needed yards
        - 2nd down: Gain 60%+ of needed yards
        - 3rd/4th down: Get first down or score
        """
        yards = play.get('yards_gained', 0)
        down = play.get('down', 1)
        distance = play.get('distance', 10)

        if down == 1:
            return yards >= 0.4 * distance
        elif down == 2:
            return yards >= 0.6 * distance
        else:  # 3rd or 4th
            return yards >= distance


# =============================================================================
# TEAM ANALYTICS
# =============================================================================

class TeamAnalytics:
    """
    Computes team-level analytics.

    Aggregates play-level metrics to team statistics.
    """

    def __init__(self):
        self.play_processor = PlayProcessor()

    def compute_team_stats(
        self,
        plays_df: pd.DataFrame,
        team: str,
        offense_only: bool = False
    ) -> Dict:
        """
        Compute comprehensive team statistics.

        Args:
            plays_df: DataFrame of processed plays
            team: Team name
            offense_only: If True, only compute offensive stats

        Returns:
            Dictionary of team statistics
        """
        # Filter to team's plays
        if offense_only:
            team_plays = plays_df[plays_df['offense'] == team]
        else:
            off_plays = plays_df[plays_df['offense'] == team]
            def_plays = plays_df[plays_df['defense'] == team]

        stats = {}

        # Offensive stats
        if len(off_plays if not offense_only else team_plays) > 0:
            off_df = off_plays if not offense_only else team_plays
            stats['offense'] = self._compute_unit_stats(off_df, 'offense')

        # Defensive stats
        if not offense_only and len(def_plays) > 0:
            stats['defense'] = self._compute_unit_stats(def_plays, 'defense')

        return stats

    def _compute_unit_stats(
        self,
        plays_df: pd.DataFrame,
        unit: str
    ) -> Dict:
        """Compute stats for offense or defense."""
        is_offense = unit == 'offense'

        stats = {
            'plays': len(plays_df),
            'total_epa': plays_df['epa'].sum() * (1 if is_offense else -1),
            'epa_per_play': plays_df['epa'].mean() * (1 if is_offense else -1),
            'success_rate': plays_df['success'].mean() if is_offense else 1 - plays_df['success'].mean(),
        }

        # Pass/rush splits
        pass_plays = plays_df[plays_df['play_type'].str.contains('pass', case=False, na=False)]
        rush_plays = plays_df[plays_df['play_type'].str.contains('rush|run', case=False, na=False)]

        if len(pass_plays) > 0:
            stats['pass_epa_per_play'] = pass_plays['epa'].mean() * (1 if is_offense else -1)
            stats['pass_success_rate'] = pass_plays['success'].mean()
            stats['pass_play_pct'] = len(pass_plays) / len(plays_df)

        if len(rush_plays) > 0:
            stats['rush_epa_per_play'] = rush_plays['epa'].mean() * (1 if is_offense else -1)
            stats['rush_success_rate'] = rush_plays['success'].mean()
            stats['rush_play_pct'] = len(rush_plays) / len(plays_df)

        # Situational stats
        stats['red_zone'] = self._situational_stats(
            plays_df[plays_df['yard_line'] >= 80], is_offense
        )
        stats['third_down'] = self._situational_stats(
            plays_df[plays_df['down'] == 3], is_offense
        )

        return stats

    def _situational_stats(
        self,
        plays_df: pd.DataFrame,
        is_offense: bool
    ) -> Dict:
        """Compute situational statistics."""
        if len(plays_df) == 0:
            return {'plays': 0}

        return {
            'plays': len(plays_df),
            'epa_per_play': plays_df['epa'].mean() * (1 if is_offense else -1),
            'success_rate': plays_df['success'].mean()
        }


# =============================================================================
# OPPONENT TENDENCY ANALYSIS
# =============================================================================

class OpponentAnalyzer:
    """
    Analyzes opponent tendencies for game preparation.

    Generates scouting reports with play-calling patterns.
    """

    def analyze_opponent(
        self,
        plays_df: pd.DataFrame,
        opponent: str,
        num_games: int = 5
    ) -> Dict:
        """
        Generate opponent tendency report.

        Args:
            plays_df: DataFrame of opponent's plays
            opponent: Opponent team name
            num_games: Number of recent games to analyze

        Returns:
            Comprehensive tendency report
        """
        # Filter to opponent's offensive plays
        off_plays = plays_df[plays_df['offense'] == opponent].copy()

        report = {
            'team': opponent,
            'games_analyzed': off_plays['game_id'].nunique(),
            'total_plays': len(off_plays),
            'tendencies': {}
        }

        # Overall tendencies
        report['tendencies']['overall'] = self._analyze_tendencies(off_plays)

        # By down
        for down in [1, 2, 3]:
            down_plays = off_plays[off_plays['down'] == down]
            report['tendencies'][f'down_{down}'] = self._analyze_tendencies(down_plays)

        # By field position
        report['tendencies']['red_zone'] = self._analyze_tendencies(
            off_plays[off_plays['yard_line'] >= 80]
        )
        report['tendencies']['own_territory'] = self._analyze_tendencies(
            off_plays[off_plays['yard_line'] <= 50]
        )

        # By score differential
        report['tendencies']['ahead'] = self._analyze_tendencies(
            off_plays[off_plays.get('score_diff', 0) > 7]
        )
        report['tendencies']['behind'] = self._analyze_tendencies(
            off_plays[off_plays.get('score_diff', 0) < -7]
        )

        return report

    def _analyze_tendencies(self, plays_df: pd.DataFrame) -> Dict:
        """Analyze tendencies for a subset of plays."""
        if len(plays_df) < 10:
            return {'sample_size': len(plays_df), 'insufficient_data': True}

        # Classify plays
        pass_mask = plays_df['play_type'].str.contains('pass', case=False, na=False)
        rush_mask = plays_df['play_type'].str.contains('rush|run', case=False, na=False)

        return {
            'sample_size': len(plays_df),
            'pass_rate': pass_mask.mean(),
            'rush_rate': rush_mask.mean(),
            'pass_epa': plays_df.loc[pass_mask, 'epa'].mean() if pass_mask.any() else None,
            'rush_epa': plays_df.loc[rush_mask, 'epa'].mean() if rush_mask.any() else None,
            'avg_yards_to_go': plays_df['distance'].mean(),
            'success_rate': plays_df['success'].mean() if 'success' in plays_df else None
        }

    def generate_scouting_report(
        self,
        analysis: Dict,
        format: str = 'text'
    ) -> str:
        """
        Generate formatted scouting report.

        Args:
            analysis: Result from analyze_opponent
            format: Output format ('text' or 'html')

        Returns:
            Formatted report string
        """
        lines = [
            f"# Opponent Scouting Report: {analysis['team']}",
            f"Games Analyzed: {analysis['games_analyzed']}",
            f"Total Plays: {analysis['total_plays']}",
            "",
            "## Overall Tendencies",
        ]

        overall = analysis['tendencies']['overall']
        if not overall.get('insufficient_data'):
            lines.extend([
                f"- Pass Rate: {overall['pass_rate']:.1%}",
                f"- Rush Rate: {overall['rush_rate']:.1%}",
                f"- Pass EPA/Play: {overall['pass_epa']:.2f}" if overall['pass_epa'] else "",
                f"- Rush EPA/Play: {overall['rush_epa']:.2f}" if overall['rush_epa'] else "",
            ])

        lines.extend(["", "## By Down"])
        for down in [1, 2, 3]:
            tendencies = analysis['tendencies'].get(f'down_{down}', {})
            if not tendencies.get('insufficient_data'):
                lines.append(
                    f"- {down}st/nd/rd Down: "
                    f"{tendencies.get('pass_rate', 0):.0%} pass, "
                    f"{tendencies.get('rush_rate', 0):.0%} rush"
                )

        lines.extend(["", "## Situational"])
        rz = analysis['tendencies'].get('red_zone', {})
        if not rz.get('insufficient_data'):
            lines.append(f"- Red Zone: {rz.get('pass_rate', 0):.0%} pass")

        return "\n".join(lines)

27.4 Dashboard Implementation

27.4.1 Dashboard Architecture

"""
Dashboard Service Implementation

This module provides the API and data services for dashboards.
"""

from dataclasses import dataclass, asdict
from datetime import datetime
from typing import Dict, List, Optional, Any
import json
import logging

logger = logging.getLogger(__name__)


# =============================================================================
# DASHBOARD DATA SERVICE
# =============================================================================

class DashboardService:
    """
    Provides data for all dashboard views.

    Handles data aggregation, caching, and real-time updates.
    """

    def __init__(self, config: Dict):
        self.config = config
        self.cache = {}  # In production: Redis

    def get_coaching_dashboard(
        self,
        team: str,
        game_id: Optional[str] = None
    ) -> Dict:
        """
        Get data for coaching dashboard.

        Args:
            team: Team name
            game_id: Optional specific game

        Returns:
            Dashboard data structure
        """
        cache_key = f"coaching:{team}:{game_id}"
        if cache_key in self.cache:
            return self.cache[cache_key]

        data = {
            'team': team,
            'generated_at': datetime.now().isoformat(),
            'sections': {}
        }

        # Win probability section
        data['sections']['win_probability'] = {
            'current': 0.65,  # Placeholder
            'history': [],
            'leverage_index': 1.2
        }

        # Fourth down section
        data['sections']['fourth_down'] = {
            'recommendation': 'go_for_it',
            'go_for_it_ev': 0.72,
            'field_goal_ev': 0.68,
            'punt_ev': 0.55
        }

        # Drive summary
        data['sections']['drives'] = {
            'current_drive': {
                'plays': 5,
                'yards': 32,
                'time_elapsed': '2:45'
            },
            'game_summary': {
                'total_drives': 8,
                'scoring_drives': 3,
                'turnovers': 1
            }
        }

        # Opponent tendencies (quick reference)
        data['sections']['opponent_tendencies'] = {
            'defensive_front': 'Nickel (62%)',
            'blitz_rate': '28%',
            'coverage_split': {'Cover 3': 45, 'Cover 2': 32, 'Man': 23}
        }

        self.cache[cache_key] = data
        return data

    def get_recruiting_dashboard(
        self,
        team: str,
        class_year: int
    ) -> Dict:
        """
        Get data for recruiting dashboard.

        Args:
            team: Team name
            class_year: Recruiting class year

        Returns:
            Dashboard data structure
        """
        data = {
            'team': team,
            'class_year': class_year,
            'generated_at': datetime.now().isoformat(),
            'sections': {}
        }

        # Class overview
        data['sections']['class_overview'] = {
            'committed': 15,
            'targets': 8,
            'class_rank': 12,
            'average_rating': 0.8945
        }

        # Position breakdown
        data['sections']['position_needs'] = {
            'QB': {'committed': 1, 'target': 1, 'status': 'filled'},
            'RB': {'committed': 1, 'target': 2, 'status': 'need'},
            'WR': {'committed': 3, 'target': 4, 'status': 'need'},
            'OL': {'committed': 4, 'target': 5, 'status': 'need'},
            'DL': {'committed': 3, 'target': 4, 'status': 'need'},
            'LB': {'committed': 2, 'target': 3, 'status': 'need'},
            'DB': {'committed': 4, 'target': 4, 'status': 'filled'},
        }

        # Recent activity
        data['sections']['recent_activity'] = []

        # Board comparison
        data['sections']['class_comparison'] = {
            'conference_rank': 3,
            'national_rank': 12,
            'points': 245.6
        }

        return data

    def get_executive_dashboard(
        self,
        team: str,
        season: int
    ) -> Dict:
        """
        Get data for executive dashboard.

        Args:
            team: Team name
            season: Season year

        Returns:
            Dashboard data structure
        """
        data = {
            'team': team,
            'season': season,
            'generated_at': datetime.now().isoformat(),
            'sections': {}
        }

        # Season performance
        data['sections']['performance'] = {
            'record': '8-2',
            'conference_record': '5-2',
            'ranking': 12,
            'projected_wins': 9.5
        }

        # Trends
        data['sections']['trends'] = {
            'win_pct_3yr': [0.667, 0.750, 0.800],
            'recruiting_rank_3yr': [25, 18, 12],
            'revenue_trend': 'increasing'
        }

        # Key metrics
        data['sections']['key_metrics'] = {
            'offensive_rank': 15,
            'defensive_rank': 22,
            'special_teams_rank': 8,
            'strength_of_schedule': 12
        }

        return data


# =============================================================================
# API ENDPOINTS
# =============================================================================

# In production, this would use FastAPI or Flask
# Here we define the endpoint structure

API_ENDPOINTS = {
    '/api/v1/dashboard/coaching': {
        'method': 'GET',
        'params': ['team', 'game_id'],
        'handler': 'get_coaching_dashboard',
        'auth': 'required',
        'roles': ['coach', 'analytics']
    },
    '/api/v1/dashboard/recruiting': {
        'method': 'GET',
        'params': ['team', 'class_year'],
        'handler': 'get_recruiting_dashboard',
        'auth': 'required',
        'roles': ['recruiting', 'analytics']
    },
    '/api/v1/dashboard/executive': {
        'method': 'GET',
        'params': ['team', 'season'],
        'handler': 'get_executive_dashboard',
        'auth': 'required',
        'roles': ['executive', 'analytics']
    },
    '/api/v1/plays': {
        'method': 'GET',
        'params': ['game_id', 'team', 'season', 'week'],
        'handler': 'get_plays',
        'auth': 'required',
        'roles': ['coach', 'analytics']
    },
    '/api/v1/players': {
        'method': 'GET',
        'params': ['team', 'position', 'season'],
        'handler': 'get_players',
        'auth': 'required',
        'roles': ['coach', 'recruiting', 'analytics']
    },
    '/api/v1/prospects': {
        'method': 'GET',
        'params': ['class_year', 'position', 'state', 'committed'],
        'handler': 'get_prospects',
        'auth': 'required',
        'roles': ['recruiting', 'analytics']
    },
    '/api/v1/models/win-probability': {
        'method': 'POST',
        'body': ['game_state'],
        'handler': 'calculate_win_probability',
        'auth': 'required',
        'roles': ['coach', 'analytics']
    },
    '/api/v1/models/fourth-down': {
        'method': 'POST',
        'body': ['game_state'],
        'handler': 'analyze_fourth_down',
        'auth': 'required',
        'roles': ['coach', 'analytics']
    },
    '/api/v1/reports/scouting': {
        'method': 'GET',
        'params': ['opponent', 'format'],
        'handler': 'generate_scouting_report',
        'auth': 'required',
        'roles': ['coach', 'analytics']
    },
}

27.4.2 Report Generation

"""
Automated Report Generation

This module generates scheduled reports for stakeholders.
"""

from dataclasses import dataclass
from datetime import datetime
from typing import Dict, List, Optional
import logging

logger = logging.getLogger(__name__)


@dataclass
class ReportConfig:
    """Configuration for a report type."""
    name: str
    template: str
    schedule: str  # cron expression
    recipients: List[str]
    format: str  # 'pdf', 'html', 'pptx'
    data_sources: List[str]


class ReportGenerator:
    """
    Generates automated reports.

    Supports multiple formats and scheduling.
    """

    TEMPLATES = {
        'weekly_summary': """
# Weekly Football Analytics Summary
## {team} - Week {week}, {season}

### Record: {record}

### Offensive Performance
- EPA per Play: {off_epa:.3f}
- Success Rate: {off_success:.1%}
- Explosive Play Rate: {explosive_rate:.1%}

### Defensive Performance
- EPA per Play Allowed: {def_epa:.3f}
- Success Rate Allowed: {def_success:.1%}
- Havoc Rate: {havoc_rate:.1%}

### Key Players
{key_players}

### Looking Ahead
Next opponent: {next_opponent}
{opponent_preview}
        """,

        'recruiting_update': """
# Recruiting Update
## Class of {class_year}

### Class Summary
- Commits: {commits}
- Class Rank: #{class_rank}
- Average Rating: {avg_rating:.4f}

### Recent Activity
{recent_activity}

### Priority Targets
{priority_targets}

### Position Needs
{position_needs}
        """,

        'game_recap': """
# Game Recap
## {home_team} vs {away_team}
### {date}

### Final Score: {home_score} - {away_score}

### Win Probability Chart
{wp_chart}

### Key Plays
{key_plays}

### Statistical Summary
{stat_summary}
        """
    }

    def __init__(self, config: Dict):
        self.config = config

    def generate_weekly_summary(
        self,
        team: str,
        season: int,
        week: int,
        data: Dict
    ) -> str:
        """Generate weekly summary report."""
        template = self.TEMPLATES['weekly_summary']

        # Format key players section
        key_players = self._format_key_players(data.get('key_players', []))

        return template.format(
            team=team,
            season=season,
            week=week,
            record=data.get('record', 'N/A'),
            off_epa=data.get('offensive_epa', 0),
            off_success=data.get('offensive_success_rate', 0),
            explosive_rate=data.get('explosive_rate', 0),
            def_epa=data.get('defensive_epa', 0),
            def_success=data.get('defensive_success_rate', 0),
            havoc_rate=data.get('havoc_rate', 0),
            key_players=key_players,
            next_opponent=data.get('next_opponent', 'TBD'),
            opponent_preview=data.get('opponent_preview', '')
        )

    def generate_recruiting_update(
        self,
        team: str,
        class_year: int,
        data: Dict
    ) -> str:
        """Generate recruiting update report."""
        template = self.TEMPLATES['recruiting_update']

        recent_activity = self._format_recent_activity(
            data.get('recent_activity', [])
        )
        priority_targets = self._format_targets(
            data.get('priority_targets', [])
        )
        position_needs = self._format_position_needs(
            data.get('position_needs', {})
        )

        return template.format(
            class_year=class_year,
            commits=data.get('commits', 0),
            class_rank=data.get('class_rank', 'N/A'),
            avg_rating=data.get('avg_rating', 0),
            recent_activity=recent_activity,
            priority_targets=priority_targets,
            position_needs=position_needs
        )

    def _format_key_players(self, players: List[Dict]) -> str:
        """Format key players section."""
        if not players:
            return "No key player data available."

        lines = []
        for player in players[:5]:
            lines.append(
                f"- **{player['name']}** ({player['position']}): "
                f"{player.get('stat_line', 'N/A')}"
            )
        return "\n".join(lines)

    def _format_recent_activity(self, activity: List[Dict]) -> str:
        """Format recent recruiting activity."""
        if not activity:
            return "No recent activity."

        lines = []
        for item in activity[:10]:
            lines.append(
                f"- {item['date']}: {item['player']} - {item['action']}"
            )
        return "\n".join(lines)

    def _format_targets(self, targets: List[Dict]) -> str:
        """Format priority targets."""
        if not targets:
            return "No priority targets listed."

        lines = []
        for target in targets:
            lines.append(
                f"- **{target['name']}** ({target['position']}, "
                f"{target['stars']}★): {target['status']}"
            )
        return "\n".join(lines)

    def _format_position_needs(self, needs: Dict) -> str:
        """Format position needs."""
        if not needs:
            return "Position needs not specified."

        lines = []
        for position, info in needs.items():
            status = "✓" if info.get('filled') else "○"
            lines.append(
                f"- {status} {position}: {info.get('committed', 0)}/"
                f"{info.get('target', 0)}"
            )
        return "\n".join(lines)

27.5 Deployment and Operations

27.5.1 Infrastructure as Code

"""
Infrastructure Configuration

This module defines deployment configurations for the analytics platform.
"""

# Docker Compose configuration for local development
DOCKER_COMPOSE_DEV = """
version: '3.8'

services:
  # PostgreSQL database
  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: cfb_analytics
      POSTGRES_USER: analytics
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./schema.sql:/docker-entrypoint-initdb.d/schema.sql
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U analytics -d cfb_analytics"]
      interval: 10s
      timeout: 5s
      retries: 5

  # Redis cache
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  # API server
  api:
    build:
      context: .
      dockerfile: Dockerfile.api
    environment:
      DATABASE_URL: postgresql://analytics:${POSTGRES_PASSWORD}@postgres:5432/cfb_analytics
      REDIS_URL: redis://redis:6379
      API_KEY: ${CFB_API_KEY}
      ENVIRONMENT: development
    ports:
      - "8000:8000"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - ./src:/app/src
    command: uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

  # Dashboard frontend
  dashboard:
    build:
      context: ./dashboard
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      REACT_APP_API_URL: http://localhost:8000
    depends_on:
      - api
    volumes:
      - ./dashboard/src:/app/src

  # Scheduler for automated jobs
  scheduler:
    build:
      context: .
      dockerfile: Dockerfile.scheduler
    environment:
      DATABASE_URL: postgresql://analytics:${POSTGRES_PASSWORD}@postgres:5432/cfb_analytics
      REDIS_URL: redis://redis:6379
      API_KEY: ${CFB_API_KEY}
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

volumes:
  postgres_data:
  redis_data:
"""

# Kubernetes deployment for production
KUBERNETES_DEPLOYMENT = """
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cfb-analytics-api
  labels:
    app: cfb-analytics
    component: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cfb-analytics
      component: api
  template:
    metadata:
      labels:
        app: cfb-analytics
        component: api
    spec:
      containers:
      - name: api
        image: cfb-analytics-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: cfb-analytics-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: cfb-analytics-secrets
              key: redis-url
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: cfb-analytics-api
spec:
  selector:
    app: cfb-analytics
    component: api
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cfb-analytics-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cfb-analytics-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
"""

27.5.2 Monitoring and Alerting

"""
Monitoring and Alerting Configuration

This module defines monitoring setup for the analytics platform.
"""

from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum
import logging

logger = logging.getLogger(__name__)


class AlertSeverity(Enum):
    """Alert severity levels."""
    INFO = "info"
    WARNING = "warning"
    ERROR = "error"
    CRITICAL = "critical"


@dataclass
class MetricDefinition:
    """Definition of a metric to track."""
    name: str
    description: str
    unit: str
    alert_threshold: Optional[float] = None
    alert_severity: AlertSeverity = AlertSeverity.WARNING


# Define key metrics
SYSTEM_METRICS = [
    MetricDefinition(
        name="api_latency_p99",
        description="99th percentile API response time",
        unit="milliseconds",
        alert_threshold=500,
        alert_severity=AlertSeverity.WARNING
    ),
    MetricDefinition(
        name="api_error_rate",
        description="Percentage of API requests returning errors",
        unit="percentage",
        alert_threshold=1.0,
        alert_severity=AlertSeverity.ERROR
    ),
    MetricDefinition(
        name="database_connections",
        description="Number of active database connections",
        unit="count",
        alert_threshold=80,
        alert_severity=AlertSeverity.WARNING
    ),
    MetricDefinition(
        name="cache_hit_rate",
        description="Percentage of cache hits",
        unit="percentage",
        alert_threshold=50,  # Alert if below 50%
        alert_severity=AlertSeverity.WARNING
    ),
    MetricDefinition(
        name="data_freshness",
        description="Minutes since last data update",
        unit="minutes",
        alert_threshold=60,
        alert_severity=AlertSeverity.ERROR
    ),
    MetricDefinition(
        name="model_prediction_latency",
        description="Time to generate model predictions",
        unit="milliseconds",
        alert_threshold=100,
        alert_severity=AlertSeverity.WARNING
    ),
]

# Prometheus configuration
PROMETHEUS_CONFIG = """
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

rule_files:
  - "alerts.yml"

scrape_configs:
  - job_name: 'cfb-analytics-api'
    static_configs:
      - targets: ['api:8000']
    metrics_path: /metrics

  - job_name: 'cfb-analytics-scheduler'
    static_configs:
      - targets: ['scheduler:8001']

  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: 'redis'
    static_configs:
      - targets: ['redis-exporter:9121']
"""

# Alert rules
ALERT_RULES = """
groups:
  - name: cfb-analytics
    rules:
      - alert: HighAPILatency
        expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: High API latency detected
          description: 99th percentile latency is {{ $value }}s

      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: error
        annotations:
          summary: High error rate detected
          description: Error rate is {{ $value | humanizePercentage }}

      - alert: DatabaseConnectionsHigh
        expr: pg_stat_activity_count > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: Database connections approaching limit
          description: {{ $value }} active connections

      - alert: DataStale
        expr: (time() - data_last_update_timestamp) / 60 > 60
        for: 10m
        labels:
          severity: error
        annotations:
          summary: Data has not been updated recently
          description: Last update was {{ $value }} minutes ago

      - alert: GameDayAPIDown
        expr: up{job="cfb-analytics-api"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: API is down during game day
          description: The analytics API is not responding
"""


class HealthChecker:
    """
    System health checker.

    Performs health checks on all system components.
    """

    def __init__(self):
        self.checks = []

    def add_check(self, name: str, check_func, critical: bool = False):
        """Add a health check."""
        self.checks.append({
            'name': name,
            'func': check_func,
            'critical': critical
        })

    def run_all_checks(self) -> Dict:
        """Run all health checks."""
        results = {
            'status': 'healthy',
            'timestamp': datetime.now().isoformat(),
            'checks': {}
        }

        for check in self.checks:
            try:
                result = check['func']()
                results['checks'][check['name']] = {
                    'status': 'pass' if result else 'fail',
                    'critical': check['critical']
                }
                if not result and check['critical']:
                    results['status'] = 'unhealthy'
            except Exception as e:
                results['checks'][check['name']] = {
                    'status': 'error',
                    'error': str(e),
                    'critical': check['critical']
                }
                if check['critical']:
                    results['status'] = 'unhealthy'

        return results

27.6 Testing and Quality Assurance

27.6.1 Testing Strategy

"""
Testing Framework

This module defines the testing strategy for the analytics platform.
"""

import pytest
from unittest.mock import Mock, patch
from typing import Dict, List
import pandas as pd
import numpy as np


# =============================================================================
# UNIT TESTS
# =============================================================================

class TestEPACalculator:
    """Unit tests for EPA calculation."""

    def setup_method(self):
        """Set up test fixtures."""
        self.calculator = EPACalculator()

    def test_expected_points_own_goal_line(self):
        """EP should be negative near own goal line."""
        ep = self.calculator.calculate_ep(down=1, distance=10, yard_line=1)
        assert ep < 0

    def test_expected_points_opponent_goal_line(self):
        """EP should be high near opponent goal line."""
        ep = self.calculator.calculate_ep(down=1, distance=10, yard_line=99)
        assert ep > 5

    def test_expected_points_midfield(self):
        """EP should be moderate at midfield."""
        ep = self.calculator.calculate_ep(down=1, distance=10, yard_line=50)
        assert -1 < ep < 3

    def test_epa_positive_for_good_play(self):
        """EPA should be positive for a play that improves situation."""
        before = {'down': 1, 'distance': 10, 'yard_line': 50}
        after = {'down': 1, 'distance': 10, 'yard_line': 65,
                 'touchdown': False, 'turnover': False}
        epa = self.calculator.calculate_epa(before, after)
        assert epa > 0

    def test_epa_negative_for_turnover(self):
        """EPA should be negative for turnovers."""
        before = {'down': 1, 'distance': 10, 'yard_line': 50}
        after = {'down': 1, 'distance': 10, 'yard_line': 50,
                 'touchdown': False, 'turnover': True}
        epa = self.calculator.calculate_epa(before, after)
        assert epa < 0

    def test_epa_touchdown_is_high(self):
        """EPA for touchdown should reflect full value gained."""
        before = {'down': 1, 'distance': 10, 'yard_line': 95}
        after = {'touchdown': True, 'turnover': False}
        epa = self.calculator.calculate_epa(before, after)
        assert epa > 0


class TestPlayProcessor:
    """Unit tests for play processing."""

    def setup_method(self):
        """Set up test fixtures."""
        self.processor = PlayProcessor()

    def test_success_first_down_40_percent(self):
        """First down with 40%+ gain is success."""
        play = {'down': 1, 'distance': 10, 'yards_gained': 4}
        assert self.processor._is_successful(play) is True

    def test_failure_first_down_less_than_40_percent(self):
        """First down with <40% gain is failure."""
        play = {'down': 1, 'distance': 10, 'yards_gained': 3}
        assert self.processor._is_successful(play) is False

    def test_success_third_down_conversion(self):
        """Third down conversion is success."""
        play = {'down': 3, 'distance': 5, 'yards_gained': 5}
        assert self.processor._is_successful(play) is True


class TestDataValidator:
    """Unit tests for data validation."""

    def setup_method(self):
        """Set up test fixtures."""
        self.validator = DataValidator()

    def test_valid_play_passes(self):
        """Valid play should pass validation."""
        play = PlayEvent(
            event_id="test_1",
            game_id="game_1",
            event_type=EventType.PLAY_END,
            quarter=1,
            down=1,
            distance=10,
            yard_line=25
        )
        is_valid, errors = self.validator.validate(play)
        assert is_valid is True
        assert len(errors) == 0

    def test_invalid_down_fails(self):
        """Invalid down value should fail validation."""
        play = PlayEvent(
            event_id="test_1",
            game_id="game_1",
            event_type=EventType.PLAY_END,
            quarter=1,
            down=5,  # Invalid
            distance=10,
            yard_line=25
        )
        is_valid, errors = self.validator.validate(play)
        assert is_valid is False


# =============================================================================
# INTEGRATION TESTS
# =============================================================================

class TestDataPipeline:
    """Integration tests for the data pipeline."""

    @pytest.fixture
    def mock_api_response(self):
        """Mock API response data."""
        return [
            {
                'id': 1,
                'offense': 'Ohio State',
                'defense': 'Michigan',
                'down': 1,
                'distance': 10,
                'yardsToEndzone': 75,
                'yardsGained': 5,
                'playType': 'rush',
                'playText': 'Rush for 5 yards'
            },
            {
                'id': 2,
                'offense': 'Ohio State',
                'defense': 'Michigan',
                'down': 2,
                'distance': 5,
                'yardsToEndzone': 70,
                'yardsGained': 15,
                'playType': 'pass',
                'playText': 'Pass complete for 15 yards'
            }
        ]

    @pytest.mark.asyncio
    async def test_ingestion_to_processing(self, mock_api_response):
        """Test full pipeline from ingestion to processing."""
        with patch.object(PlayByPlayIngester, '_fetch_plays',
                         return_value=mock_api_response):
            ingester = PlayByPlayIngester({'api_key': 'test'})
            processor = PlayProcessor()

            # Would run full pipeline here
            # This is a placeholder for the actual test


# =============================================================================
# PERFORMANCE TESTS
# =============================================================================

class TestPerformance:
    """Performance tests for critical paths."""

    def test_epa_calculation_speed(self):
        """EPA calculation should be fast."""
        import time
        calculator = EPACalculator()

        start = time.time()
        for _ in range(10000):
            calculator.calculate_ep(down=2, distance=7, yard_line=45)
        duration = time.time() - start

        # Should process 10k calculations in under 1 second
        assert duration < 1.0

    def test_play_processing_speed(self):
        """Play processing should handle large volumes."""
        processor = PlayProcessor()

        # Generate synthetic plays
        plays = []
        for i in range(1000):
            plays.append({
                'id': str(i),
                'game_id': 'test',
                'drive_id': str(i // 10),
                'play_number': i,
                'quarter': (i // 150) + 1,
                'clock': '10:00',
                'down': (i % 4) + 1,
                'distance': 10,
                'yard_line': 50,
                'offense': 'Team A',
                'defense': 'Team B',
                'play_type': 'pass' if i % 2 == 0 else 'rush',
                'yards_gained': np.random.randint(-5, 20),
                'play_text': 'Test play'
            })

        import time
        start = time.time()
        result = processor.process_game_plays(plays)
        duration = time.time() - start

        # Should process 1000 plays in under 5 seconds
        assert duration < 5.0
        assert len(result) == 1000


# =============================================================================
# DATA QUALITY TESTS
# =============================================================================

class TestDataQuality:
    """Tests for data quality validation."""

    def test_epa_sum_near_zero(self):
        """EPA should roughly sum to zero across a game (zero-sum)."""
        # In a game, EPA gained by offense ≈ EPA lost by defense
        # This is a statistical property to validate
        pass

    def test_success_rate_bounds(self):
        """Success rate should be between 0 and 1."""
        # Validate computed success rates
        pass

    def test_win_probability_sum_to_one(self):
        """Win probabilities should sum to 1."""
        # Validate WP model outputs
        pass

27.7 Documentation and Training

27.7.1 System Documentation

Comprehensive documentation is essential for system maintainability:

"""
Documentation Generator

Generates system documentation from code and configuration.
"""

SYSTEM_DOCUMENTATION = """
# College Football Analytics Platform

## System Overview

The College Football Analytics Platform provides comprehensive data analysis,
visualization, and decision support for college football programs.

### Architecture

The system follows a modern microservices architecture:

1. **Data Ingestion Layer**: Collects data from multiple sources
2. **Processing Layer**: Transforms and enriches data with analytics
3. **Storage Layer**: PostgreSQL for persistence, Redis for caching
4. **API Layer**: RESTful API for all data access
5. **Presentation Layer**: Role-specific dashboards

### Data Flow

External APIs → Ingestion → Validation → Processing → Storage → API → Dashboards ↓ ↓ ↓ Quality Log Error Queue ML Models


## Getting Started

### Prerequisites

- Python 3.10+
- Docker and Docker Compose
- PostgreSQL 15+
- Redis 7+

### Installation

1. Clone the repository:
   ```bash
   git clone https://github.com/org/cfb-analytics.git
   cd cfb-analytics
   ```

2. Create environment file:
   ```bash
   cp .env.example .env
   # Edit .env with your API keys and configuration
   ```

3. Start services:
   ```bash
   docker-compose up -d
   ```

4. Initialize database:
   ```bash
   python scripts/init_db.py
   ```

5. Run initial data load:
   ```bash
   python scripts/load_historical.py --season 2024
   ```

### Configuration

Key configuration options in `.env`:

| Variable | Description | Default |
|----------|-------------|---------|
| DATABASE_URL | PostgreSQL connection string | required |
| REDIS_URL | Redis connection string | redis://localhost:6379 |
| CFB_API_KEY | API key for data source | required |
| ENVIRONMENT | Environment name | development |

## API Reference

### Authentication

All API endpoints require authentication via Bearer token:

```bash
curl -H "Authorization: Bearer <token>" https://api.example.com/v1/plays

Endpoints

GET /v1/plays

Retrieve play-by-play data.

Parameters: - game_id (string): Specific game ID - team (string): Filter by team - season (integer): Filter by season - week (integer): Filter by week

Response:

{
  "plays": [
    {
      "id": "play_123",
      "game_id": "game_456",
      "down": 1,
      "distance": 10,
      "yard_line": 25,
      "play_type": "pass",
      "yards_gained": 12,
      "epa": 0.45
    }
  ],
  "total": 150,
  "page": 1
}

Troubleshooting

Common Issues

Issue: API returning 500 errors

Solution: Check database connectivity and logs:

docker-compose logs api

Issue: Data not updating

Solution: Verify scheduler is running:

docker-compose ps scheduler

Support

For issues, contact: analytics-support@university.edu """

def generate_api_documentation(endpoints: Dict) -> str: """Generate API documentation from endpoint definitions.""" doc = "# API Documentation\n\n"

for path, config in endpoints.items():
    doc += f"## {config['method']} {path}\n\n"
    doc += f"**Authentication**: {config.get('auth', 'none')}\n"
    doc += f"**Allowed Roles**: {', '.join(config.get('roles', []))}\n\n"

    if config.get('params'):
        doc += "**Parameters:**\n"
        for param in config['params']:
            doc += f"- `{param}`\n"
        doc += "\n"

    doc += "---\n\n"

return doc

```


27.8 Case Study: Full Platform Implementation

27.8.1 Project Timeline

A realistic implementation timeline for a Division I program:

Phase 1: Foundation (Weeks 1-4) - Requirements gathering and stakeholder interviews - Architecture design and technology selection - Development environment setup - Database schema design and implementation

Phase 2: Core Development (Weeks 5-12) - Data ingestion pipeline implementation - EPA and core metrics calculation - Basic API development - Initial dashboard prototypes

Phase 3: Advanced Features (Weeks 13-20) - Win probability model development - Fourth-down decision engine - Opponent analysis automation - Report generation system

Phase 4: Integration and Testing (Weeks 21-26) - End-to-end testing - Performance optimization - Security audit - User acceptance testing

Phase 5: Deployment and Training (Weeks 27-30) - Production deployment - Staff training sessions - Documentation finalization - Feedback collection and iteration

27.8.2 Lessons Learned

From real-world implementations, common lessons include:

  1. Start Simple: Begin with core metrics (EPA, success rate) before adding complexity
  2. Prioritize Reliability: Coaches need to trust the data during games
  3. Design for Mobile: Sideline access requires tablet-friendly interfaces
  4. Automate Everything: Manual processes fail during busy game weeks
  5. Plan for Scale: Game day traffic is 10-100x normal load
  6. Document Extensively: Staff turnover requires comprehensive documentation
  7. Build Feedback Loops: Regular coach input improves relevance
  8. Security First: Competitive data must be protected

Summary

Building a complete analytics system requires integrating multiple disciplines:

  1. Systems Design: Architecture, scalability, reliability
  2. Data Engineering: Pipelines, quality, integration
  3. Analytics: Metrics, models, insights
  4. Product Development: Dashboards, reports, UX
  5. Operations: Deployment, monitoring, support

The key to success is understanding that the system exists to serve its users. Technical excellence means nothing if coaches can't access insights when they need them or if recruiters can't efficiently evaluate prospects.

A well-designed analytics platform becomes a competitive advantage, enabling better decisions across all aspects of a football program. The investment in building robust infrastructure pays dividends in the quality and speed of insights delivered.


Key Takeaways

  1. Start with users: Understand stakeholder needs before writing code
  2. Design for reliability: Game day uptime is critical
  3. Automate data pipelines: Manual processes don't scale
  4. Build incrementally: Deploy core features first, then enhance
  5. Monitor everything: Proactive alerting prevents failures
  6. Document thoroughly: Enable others to maintain and extend
  7. Test rigorously: Unit, integration, and performance tests
  8. Plan for growth: Architecture should accommodate new data sources

Exercises

  1. Requirements Document: Create a complete requirements document for an analytics platform at your institution, interviewing at least three potential users.

  2. Architecture Design: Design a system architecture diagram for a college football analytics platform, including all major components and data flows.

  3. Data Pipeline: Implement a complete data ingestion pipeline that collects play-by-play data from an API and calculates EPA for all plays.

  4. Dashboard Prototype: Build a prototype coaching dashboard showing win probability, fourth-down recommendations, and opponent tendencies.

  5. Deployment Plan: Create a deployment plan including infrastructure requirements, monitoring setup, and disaster recovery procedures.