Chapter 27 Key Takeaways: Building a Complete Analytics System

Quick Reference Summary

The Five Layers of a Football Analytics Platform

┌─────────────────────────────────────────────────────────────────┐
│                        PLATFORM ARCHITECTURE                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Layer 1: DATA INGESTION                                        │
│  ├── External APIs (play-by-play, recruiting)                   │
│  ├── Internal sources (video tags, evaluations)                 │
│  ├── Validation and quality monitoring                          │
│  └── Rate limiting and error handling                           │
│                                                                 │
│  Layer 2: STORAGE                                               │
│  ├── PostgreSQL for primary data                                │
│  ├── Redis for caching hot data                                 │
│  ├── Data lake for historical archive                           │
│  └── Model registry for ML models                               │
│                                                                 │
│  Layer 3: PROCESSING                                            │
│  ├── EPA calculation engine                                     │
│  ├── Win probability models                                     │
│  ├── Opponent analysis tools                                    │
│  └── Report generation                                          │
│                                                                 │
│  Layer 4: API                                                   │
│  ├── RESTful endpoints                                          │
│  ├── Authentication/authorization                               │
│  ├── Rate limiting                                              │
│  └── Documentation                                              │
│                                                                 │
│  Layer 5: PRESENTATION                                          │
│  ├── Coaching dashboard                                         │
│  ├── Recruiting dashboard                                       │
│  ├── Executive reports                                          │
│  └── Analytics workbench                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Stakeholder Requirements Summary

Stakeholder Primary Need Response Time Key Features
Head Coach Game-day decisions Real-time Fourth-down bot, WP display
Coordinators Game preparation < 1 hour Opponent tendencies
Recruiting Prospect management < 5 seconds Search, compare, track
Executives Performance tracking Same day Summary reports
Analytics Staff Flexible analysis < 30 seconds Query tools, exports

Technology Stack Decision Guide

DECISION FRAMEWORK: WHAT STACK TO CHOOSE

Budget Level: HIGH (>$200K/year)
├── Database: PostgreSQL + Snowflake/BigQuery
├── Cache: Redis Cluster
├── API: FastAPI or Node.js
├── Frontend: React + D3.js
├── Deployment: Kubernetes on cloud
└── BI: Tableau or Power BI

Budget Level: MEDIUM ($50K-200K/year)
├── Database: PostgreSQL
├── Cache: Redis
├── API: FastAPI
├── Frontend: Plotly Dash or React
├── Deployment: Docker on cloud VM
└── BI: Open source (Metabase, Superset)

Budget Level: LOW (<$50K/year)
├── Database: PostgreSQL or SQLite
├── Cache: Application-level
├── API: Flask/FastAPI
├── Frontend: Plotly Dash
├── Deployment: Single server
└── BI: Custom dashboards

Requirements Template

# Requirement: [ID]-[Number]

**Title**: [Short descriptive title]

**Stakeholder**: [Role]

**Priority**: [Critical | High | Medium | Low]

**Description**:
[What the user needs to accomplish]

**Acceptance Criteria**:
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3

**Response Time**: [e.g., < 500ms, same day]

**Access Level**: [Public | Internal | Confidential | Restricted]

Data Pipeline Checklist

  • [ ] Data source connections configured
  • [ ] API rate limiting implemented
  • [ ] Validation rules defined
  • [ ] Error handling and logging
  • [ ] Quality monitoring active
  • [ ] Incremental updates supported
  • [ ] Historical backfill capability
  • [ ] Data lineage documented

Database Schema Essentials

Core Tables: - teams - Team master data - games - Game schedule and results - drives - Drive summaries - plays - Play-by-play detail - players - Player master data - player_game_stats - Per-game statistics

Analytics Tables: - play_epa - Pre-computed EPA values - team_season_stats - Aggregated team stats - model_predictions - Model outputs

Recruiting Tables: - prospects - Prospect master data - evaluations - Internal evaluations - contacts - Interaction tracking

EPA Quick Reference

Expected Points by Field Position (approximate): | Yard Line | EP (1st & 10) | |-----------|---------------| | Own 10 | -0.5 | | Own 25 | 0.5 | | Own 50 | 2.0 | | Opp 25 | 4.0 | | Opp 10 | 5.5 |

EPA Calculation:

EPA = EP_after_play - EP_before_play

Success Rate Thresholds: - 1st down: 40% of needed yards - 2nd down: 60% of needed yards - 3rd/4th down: 100% (convert or score)

Fourth-Down Decision Framework

FOURTH DOWN DECISION PROCESS

Step 1: Calculate CONVERSION probability
        Based on: Distance, field position, offense quality

Step 2: Calculate STATE after each outcome
        - If convert: 1st down at new position
        - If fail: Opponent ball at current position
        - If FG make: Kickoff
        - If FG miss: Opponent ball at kick spot
        - If punt: Opponent ball at expected landing

Step 3: Calculate WIN PROBABILITY for each state
        Using: WP model with game situation

Step 4: Calculate EXPECTED VALUE
        EV_go = P(convert) × WP(convert) + P(fail) × WP(fail)
        EV_fg = P(make) × WP(make) + P(miss) × WP(miss)
        EV_punt = WP(opponent after punt)

Step 5: RECOMMEND option with highest EV

Dashboard Design Principles

For Coaching Staff: - Large text, high contrast (outdoor visibility) - Single-tap access to key info - Offline capability - No login required during games

For Recruiting: - Fast search and filtering - Side-by-side comparisons - Mobile-friendly - Real-time updates

For Executives: - Summary views first - Drill-down available - Print-friendly - Historical trends

Deployment Checklist

PRE-DEPLOYMENT:
- [ ] All tests passing
- [ ] Load testing completed
- [ ] Security review done
- [ ] Documentation updated
- [ ] Rollback plan ready

DEPLOYMENT:
- [ ] Database migrations applied
- [ ] Environment variables configured
- [ ] Health checks responding
- [ ] Monitoring alerts configured
- [ ] SSL certificates valid

POST-DEPLOYMENT:
- [ ] Smoke tests passed
- [ ] User acceptance verified
- [ ] Performance baseline captured
- [ ] Runbook updated

Monitoring Essentials

Critical Metrics: | Metric | Warning | Critical | |--------|---------|----------| | API Latency (p99) | > 500ms | > 2s | | Error Rate | > 1% | > 5% | | CPU Usage | > 70% | > 90% | | Memory Usage | > 80% | > 95% | | Data Freshness | > 30 min | > 60 min | | Queue Depth | > 1000 | > 5000 |

Common Pitfalls to Avoid

Pitfall Problem Solution
Building before understanding Wrong features Interview stakeholders first
Over-engineering Wasted effort Start simple, iterate
No validation Bad data propagates Validate at ingestion
Single point of failure System goes down Redundancy at critical points
Poor documentation Knowledge loss Document as you build
No monitoring Blind to issues Monitor from day one
Ignoring security Data breach Implement RBAC, encrypt data

Quick Formulas

Win Probability Added:

WPA = WP_after - WP_before

Data Quality Score:

Quality = Valid_records / Total_records

System Availability:

Availability = Uptime / (Uptime + Downtime)

Latency Percentiles:

p99 = Value where 99% of requests are faster

Budget Planning Template

Category Small Program Medium Program Large Program
Personnel $65K | $200K $500K+
Infrastructure $5K | $25K $100K
Software $5K | $30K $75K
Training $3K | $10K $25K
Total ~$80K** | **~$265K ~$700K+

Key Success Factors

  1. User Focus: Build what users need, not what's technically interesting
  2. Reliability First: A system that's down is worse than no system
  3. Start Small: Prove value before scaling
  4. Automate Early: Manual processes don't survive game weeks
  5. Document Everything: Enable others to maintain and extend
  6. Iterate Continuously: Gather feedback and improve

Essential Code Patterns

Data Ingestion Pattern

async def ingest_with_retry(source, max_retries=3):
    for attempt in range(max_retries):
        try:
            data = await source.fetch()
            validated = validate(data)
            await store(validated)
            return len(validated)
        except RetryableError:
            await asyncio.sleep(2 ** attempt)
    raise IngestionFailed()

Caching Pattern

def get_with_cache(key, compute_fn, ttl=300):
    cached = cache.get(key)
    if cached:
        return cached
    value = compute_fn()
    cache.set(key, value, ttl)
    return value

Repository Pattern

class PlayRepository:
    def __init__(self, db):
        self.db = db

    def get_by_game(self, game_id):
        return self.db.query(Play).filter(game_id=game_id).all()

    def save(self, play):
        self.db.add(play)
        self.db.commit()

Quick Reference Commands

Database:

-- Check data freshness
SELECT MAX(updated_at) FROM plays;

-- Game EPA summary
SELECT game_id, SUM(epa) as total_epa
FROM plays GROUP BY game_id;

Docker:

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f api

# Restart specific service
docker-compose restart api

Kubernetes:

# Check pod status
kubectl get pods

# View logs
kubectl logs -f deployment/api

# Scale deployment
kubectl scale deployment/api --replicas=5