Chapter 27 Exercises: Building a Complete Analytics System
Overview
These exercises guide you through building components of a production-ready college football analytics platform. You'll apply concepts from throughout the textbook to create integrated, real-world systems.
Level 1: Foundation Exercises
Exercise 1.1: Stakeholder Requirements Gathering
Objective: Practice requirements gathering through stakeholder interviews.
Task: Create a complete requirements document for a football analytics platform.
- Identify at least five stakeholder roles (coaching staff, recruiting, etc.)
-
For each stakeholder, document: - Primary use cases (3-5 per stakeholder) - Data needs - Response time requirements - Access frequency - Technical sophistication level
-
Create a prioritized requirements list with: - Requirement ID - Description - Priority (critical, high, medium, low) - Stakeholder(s) - Acceptance criteria
Deliverable: A 5-10 page requirements document in markdown format.
Evaluation Criteria: - Comprehensive stakeholder coverage - Clear, specific requirements - Realistic priorities - Measurable acceptance criteria
Exercise 1.2: Database Schema Design
Objective: Design a normalized database schema for football analytics.
Task: Create a complete SQL schema including:
-
Core Tables: - Teams (with conference, division) - Games (with scheduling details) - Plays (with all situational data) - Players (with biographical info)
-
Analytics Tables: - Pre-computed EPA values - Win probability snapshots - Team season statistics - Player game statistics
-
Recruiting Tables: - Prospects - Evaluations - Contacts/interactions
-
Supporting Tables: - User accounts and roles - Audit logs - Data quality logs
Requirements: - Proper foreign key relationships - Appropriate indexes for common queries - Constraints for data integrity - Comments explaining design decisions
# Starter code for schema generation
def generate_schema():
"""Generate the complete database schema."""
tables = []
# Teams table
teams = """
CREATE TABLE teams (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
-- Add remaining columns
);
"""
tables.append(teams)
# Continue with remaining tables...
return "\n\n".join(tables)
Deliverable: Complete SQL schema file with at least 15 tables.
Exercise 1.3: Configuration Management
Objective: Create a robust configuration system for multiple environments.
Task: Implement a configuration management system that:
- Supports multiple environments (development, staging, production)
- Uses environment variables for sensitive data
- Provides defaults for non-sensitive settings
- Validates configuration on startup
- Documents all configuration options
# Starter code
from dataclasses import dataclass, field
from typing import Optional, Dict
import os
@dataclass
class DatabaseConfig:
"""Database configuration."""
host: str = "localhost"
port: int = 5432
name: str = "cfb_analytics"
user: str = "analytics"
password: str = field(default="", repr=False)
@classmethod
def from_env(cls) -> 'DatabaseConfig':
"""Load from environment variables."""
# Implement this method
pass
@dataclass
class SystemConfig:
"""Complete system configuration."""
environment: str
database: DatabaseConfig
# Add more configuration sections
def validate(self) -> list:
"""Validate configuration, return list of errors."""
errors = []
# Implement validation
return errors
Deliverable: Complete configuration module with validation and documentation.
Level 2: Data Pipeline Exercises
Exercise 2.1: Multi-Source Data Ingestion
Objective: Build a pipeline that ingests data from multiple sources.
Task: Create an ingestion system that:
- Collects play-by-play data from College Football Data API
- Fetches recruiting data
- Handles rate limiting and retries
- Validates all incoming data
- Logs quality issues
- Provides ingestion statistics
# Starter code
import asyncio
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Dict
from datetime import datetime
@dataclass
class IngestionStats:
"""Statistics from an ingestion run."""
source: str
records_fetched: int
records_valid: int
records_failed: int
start_time: datetime
end_time: datetime
errors: List[str]
class DataIngester(ABC):
"""Base class for data ingesters."""
@abstractmethod
async def ingest(self, **kwargs) -> IngestionStats:
"""Perform ingestion."""
pass
@abstractmethod
def validate_record(self, record: Dict) -> List[str]:
"""Validate a record, return list of errors."""
pass
class PlayByPlayIngester(DataIngester):
"""Ingest play-by-play data."""
async def ingest(self, season: int, week: int = None) -> IngestionStats:
# Implement full ingestion logic
pass
def validate_record(self, record: Dict) -> List[str]:
# Implement validation
pass
# Create orchestrator to run all ingesters
class IngestionOrchestrator:
"""Coordinate multiple ingesters."""
async def run_full_ingestion(self, season: int) -> Dict[str, IngestionStats]:
# Implement orchestration
pass
Deliverable: Complete ingestion pipeline with tests showing successful data collection.
Exercise 2.2: EPA Calculation Pipeline
Objective: Implement a complete EPA calculation system.
Task: Build an EPA calculator that:
- Calculates expected points for any game situation
- Computes EPA for individual plays
- Aggregates EPA by team, player, and play type
- Handles edge cases (touchdowns, turnovers, end of half)
- Validates results against known benchmarks
# Implement a complete EPA system
class EPAModel:
"""Expected Points model."""
def __init__(self):
# Load or create expected points lookup
self.ep_table = self._build_ep_table()
def _build_ep_table(self) -> Dict:
"""Build expected points lookup table."""
# Use historical data to build EP values
# by down, distance, and field position
pass
def calculate_ep(
self,
down: int,
distance: int,
yard_line: int,
seconds_remaining: int = None
) -> float:
"""Calculate expected points for a situation."""
pass
def calculate_epa(
self,
before_state: Dict,
after_state: Dict
) -> float:
"""Calculate EPA for a play."""
pass
class EPAAggregator:
"""Aggregate EPA statistics."""
def aggregate_by_team(
self,
plays: List[Dict],
team: str
) -> Dict:
"""Calculate team-level EPA statistics."""
pass
def aggregate_by_player(
self,
plays: List[Dict],
player_id: str
) -> Dict:
"""Calculate player-level EPA statistics."""
pass
Deliverable: EPA system with validation showing results within expected ranges.
Exercise 2.3: Data Quality Monitoring
Objective: Create a comprehensive data quality monitoring system.
Task: Build a quality monitoring system that:
- Defines quality rules for each data type
- Runs automated quality checks
- Calculates quality scores
- Generates quality reports
- Alerts on quality degradation
# Implement quality monitoring
from enum import Enum
from dataclasses import dataclass
from typing import List, Callable
class QualityLevel(Enum):
PASS = "pass"
WARNING = "warning"
FAIL = "fail"
@dataclass
class QualityRule:
"""Definition of a quality rule."""
name: str
description: str
check_func: Callable
severity: str # "critical", "warning", "info"
@dataclass
class QualityResult:
"""Result of a quality check."""
rule_name: str
level: QualityLevel
message: str
details: Dict
class DataQualityMonitor:
"""Monitor data quality."""
def __init__(self):
self.rules: List[QualityRule] = []
self._register_default_rules()
def _register_default_rules(self):
"""Register standard quality rules."""
# Implement rules for:
# - Required fields present
# - Values in valid ranges
# - Referential integrity
# - Temporal consistency
# - Statistical anomalies
pass
def check_play(self, play: Dict) -> List[QualityResult]:
"""Check quality of a single play."""
pass
def check_game(self, game_plays: List[Dict]) -> Dict:
"""Check quality of all plays in a game."""
pass
def generate_quality_report(self, results: List[QualityResult]) -> str:
"""Generate a quality report."""
pass
Deliverable: Quality monitoring system with report generation and alerting.
Level 3: Analytics and Modeling Exercises
Exercise 3.1: Win Probability Model Integration
Objective: Integrate a win probability model into the analytics platform.
Task: Create a complete win probability service that:
- Loads a trained model
- Provides real-time predictions via API
- Calculates WPA for plays
- Generates win probability charts
- Caches predictions for performance
# Implement win probability service
import numpy as np
from sklearn.linear_model import LogisticRegression
from typing import Dict, List, Tuple
class WinProbabilityService:
"""Win probability prediction service."""
def __init__(self, model_path: str = None):
self.model = self._load_or_create_model(model_path)
self.cache = {}
def _load_or_create_model(self, path: str):
"""Load model from file or create default."""
pass
def predict(self, game_state: Dict) -> float:
"""Predict win probability for home team."""
# Extract features
features = self._extract_features(game_state)
# Check cache
cache_key = self._make_cache_key(features)
if cache_key in self.cache:
return self.cache[cache_key]
# Make prediction
prob = self.model.predict_proba([features])[0][1]
self.cache[cache_key] = prob
return prob
def calculate_wpa(
self,
before_state: Dict,
after_state: Dict
) -> float:
"""Calculate Win Probability Added."""
pass
def _extract_features(self, state: Dict) -> np.ndarray:
"""Extract model features from game state."""
pass
def generate_wp_chart_data(
self,
game_plays: List[Dict]
) -> List[Dict]:
"""Generate data for win probability chart."""
pass
Deliverable: Complete win probability service with API endpoint and visualization.
Exercise 3.2: Fourth-Down Decision Engine
Objective: Build a complete fourth-down decision recommendation system.
Task: Create a decision engine that:
- Analyzes all fourth-down options (go, kick, punt)
- Calculates expected win probability for each option
- Provides clear recommendations
- Explains the reasoning
- Integrates with the dashboard
# Implement fourth-down decision engine
from dataclasses import dataclass
from typing import Dict, List, Optional
@dataclass
class DecisionOption:
"""One option in a fourth-down decision."""
name: str # "go_for_it", "field_goal", "punt"
success_probability: float
wp_if_success: float
wp_if_failure: float
expected_wp: float
@dataclass
class FourthDownDecision:
"""Complete fourth-down analysis."""
situation: Dict
options: List[DecisionOption]
recommendation: str
confidence: float
explanation: str
class FourthDownEngine:
"""Fourth-down decision analysis."""
def __init__(self, wp_service: WinProbabilityService):
self.wp_service = wp_service
self.conversion_rates = self._load_conversion_rates()
self.fg_rates = self._load_fg_rates()
def analyze(self, game_state: Dict) -> FourthDownDecision:
"""Analyze fourth-down decision."""
# Analyze each option
go_option = self._analyze_go_for_it(game_state)
fg_option = self._analyze_field_goal(game_state)
punt_option = self._analyze_punt(game_state)
# Determine recommendation
options = [go_option, punt_option]
if fg_option:
options.append(fg_option)
best = max(options, key=lambda x: x.expected_wp)
return FourthDownDecision(
situation=game_state,
options=options,
recommendation=best.name,
confidence=self._calculate_confidence(options),
explanation=self._generate_explanation(best, options)
)
def _analyze_go_for_it(self, state: Dict) -> DecisionOption:
"""Analyze going for it."""
pass
def _analyze_field_goal(self, state: Dict) -> Optional[DecisionOption]:
"""Analyze field goal attempt."""
pass
def _analyze_punt(self, state: Dict) -> DecisionOption:
"""Analyze punting."""
pass
def _generate_explanation(
self,
recommendation: DecisionOption,
all_options: List[DecisionOption]
) -> str:
"""Generate human-readable explanation."""
pass
Deliverable: Fourth-down engine with dashboard widget showing recommendations.
Exercise 3.3: Opponent Scouting Report Generator
Objective: Automate opponent scouting report generation.
Task: Build a system that:
- Analyzes opponent play-calling tendencies
- Identifies key personnel groupings
- Highlights situational patterns
- Generates formatted reports
- Exports to multiple formats (PDF, PowerPoint)
# Implement scouting report generator
import pandas as pd
from typing import Dict, List
class ScoutingReportGenerator:
"""Generate opponent scouting reports."""
def __init__(self, play_repository):
self.play_repo = play_repository
def generate_report(
self,
opponent: str,
num_games: int = 5,
format: str = "markdown"
) -> str:
"""Generate complete scouting report."""
# Fetch opponent plays
plays = self._fetch_opponent_plays(opponent, num_games)
# Analyze tendencies
analysis = {
'overview': self._analyze_overview(plays),
'by_down': self._analyze_by_down(plays),
'red_zone': self._analyze_red_zone(plays),
'third_down': self._analyze_third_down(plays),
'personnel': self._analyze_personnel(plays),
'key_players': self._identify_key_players(plays)
}
# Generate formatted report
if format == "markdown":
return self._format_markdown(opponent, analysis)
elif format == "html":
return self._format_html(opponent, analysis)
else:
raise ValueError(f"Unknown format: {format}")
def _analyze_by_down(self, plays: pd.DataFrame) -> Dict:
"""Analyze tendencies by down."""
results = {}
for down in [1, 2, 3, 4]:
down_plays = plays[plays['down'] == down]
results[down] = {
'sample_size': len(down_plays),
'pass_rate': self._calculate_pass_rate(down_plays),
'avg_distance': down_plays['distance'].mean(),
'success_rate': down_plays['success'].mean()
}
return results
def _format_markdown(self, opponent: str, analysis: Dict) -> str:
"""Format report as markdown."""
pass
Deliverable: Scouting report generator with sample output for a real opponent.
Level 4: Full System Integration
Exercise 4.1: Complete Dashboard Implementation
Objective: Build a full-featured coaching dashboard.
Task: Create a dashboard that includes:
- Real-time win probability display
- Fourth-down recommendation widget
- Drive summary visualization
- Opponent tendency reference
- Player performance tracking
Requirements: - Responsive design for tablet use - Updates without page refresh - Role-based access control - Export capabilities
# Dashboard API service
class DashboardAPI:
"""API for dashboard data."""
def __init__(self, config: Dict):
self.wp_service = WinProbabilityService()
self.fourth_down_engine = FourthDownEngine(self.wp_service)
self.cache = {}
async def get_game_dashboard(self, game_id: str) -> Dict:
"""Get complete dashboard data for a game."""
return {
'game_info': await self._get_game_info(game_id),
'win_probability': await self._get_win_probability(game_id),
'current_drive': await self._get_current_drive(game_id),
'fourth_down': await self._get_fourth_down_analysis(game_id),
'opponent_tendencies': await self._get_opponent_tendencies(game_id),
'player_stats': await self._get_player_stats(game_id)
}
async def subscribe_to_updates(self, game_id: str, callback):
"""Subscribe to real-time updates."""
pass
Deliverable: Working dashboard with all specified components.
Exercise 4.2: Automated Report Pipeline
Objective: Create an automated report generation and distribution system.
Task: Build a system that:
- Generates weekly summary reports
- Creates recruiting updates
- Produces game recap reports
- Distributes to appropriate stakeholders
- Archives all generated reports
# Implement automated reporting
from datetime import datetime, timedelta
import schedule
class ReportScheduler:
"""Schedule and execute automated reports."""
def __init__(self, config: Dict):
self.config = config
self.report_generators = {}
self.distribution_service = DistributionService()
def schedule_weekly_reports(self):
"""Set up weekly report schedule."""
# Sunday evening: Game recap
schedule.every().sunday.at("20:00").do(
self._generate_game_recaps
)
# Monday morning: Weekly summary
schedule.every().monday.at("06:00").do(
self._generate_weekly_summary
)
# Tuesday: Opponent scouting report
schedule.every().tuesday.at("08:00").do(
self._generate_scouting_report
)
def _generate_weekly_summary(self):
"""Generate and distribute weekly summary."""
report = self.report_generators['weekly'].generate(
team=self.config['team'],
season=self.config['season'],
week=self._get_current_week()
)
self.distribution_service.distribute(
report=report,
recipients=self.config['summary_recipients'],
format='pdf'
)
def run(self):
"""Run the scheduler."""
while True:
schedule.run_pending()
time.sleep(60)
Deliverable: Working report scheduler with sample generated reports.
Exercise 4.3: Production Deployment
Objective: Deploy the analytics platform to a production environment.
Task: Create deployment configuration including:
- Docker containers for all services
- Docker Compose for local development
- Kubernetes manifests for production
- CI/CD pipeline configuration
- Monitoring and alerting setup
# docker-compose.yml (starter)
version: '3.8'
services:
postgres:
image: postgres:15
# Complete configuration
redis:
image: redis:7-alpine
# Complete configuration
api:
build: ./api
# Complete configuration
dashboard:
build: ./dashboard
# Complete configuration
scheduler:
build: ./scheduler
# Complete configuration
Deliverable: Complete deployment configuration with documentation.
Project: Full Platform Implementation
Comprehensive Project
Objective: Build a complete, production-ready analytics platform.
Scope: Implement all components covered in this chapter:
-
Data Layer - Multi-source ingestion pipeline - Data validation and quality monitoring - PostgreSQL storage with proper schema - Redis caching layer
-
Analytics Layer - EPA calculation engine - Win probability model - Fourth-down decision system - Opponent analysis tools
-
Presentation Layer - RESTful API - Coaching dashboard - Recruiting dashboard - Executive reports
-
Operations - Docker deployment - Monitoring and alerting - Automated testing - Documentation
Timeline: 8-12 weeks
Deliverables: 1. Complete source code repository 2. Deployment configuration 3. User documentation 4. Technical documentation 5. Demo video showing all features
Evaluation Criteria: - Code quality and organization - Feature completeness - System reliability - Performance under load - Documentation quality - User experience
Submission Guidelines
For all exercises:
- Code: Submit clean, well-documented Python code
- Tests: Include unit tests with >80% coverage
- Documentation: Provide README with setup instructions
- Demo: Include screenshots or video of working features
- Reflection: Write a brief reflection on challenges and learnings
Resources
- College Football Data API documentation
- PostgreSQL documentation
- Redis documentation
- FastAPI documentation
- React documentation (for dashboard)
- Docker documentation