Chapter 27 Key Takeaways: Building a Complete Analytics System

DataField.Dev

Chapter 27 Key Takeaways: Building a Complete Analytics System

Quick Reference Summary

The Five Layers of a Football Analytics Platform

┌─────────────────────────────────────────────────────────────────┐
│                        PLATFORM ARCHITECTURE                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Layer 1: DATA INGESTION                                        │
│  ├── External APIs (play-by-play, recruiting)                   │
│  ├── Internal sources (video tags, evaluations)                 │
│  ├── Validation and quality monitoring                          │
│  └── Rate limiting and error handling                           │
│                                                                 │
│  Layer 2: STORAGE                                               │
│  ├── PostgreSQL for primary data                                │
│  ├── Redis for caching hot data                                 │
│  ├── Data lake for historical archive                           │
│  └── Model registry for ML models                               │
│                                                                 │
│  Layer 3: PROCESSING                                            │
│  ├── EPA calculation engine                                     │
│  ├── Win probability models                                     │
│  ├── Opponent analysis tools                                    │
│  └── Report generation                                          │
│                                                                 │
│  Layer 4: API                                                   │
│  ├── RESTful endpoints                                          │
│  ├── Authentication/authorization                               │
│  ├── Rate limiting                                              │
│  └── Documentation                                              │
│                                                                 │
│  Layer 5: PRESENTATION                                          │
│  ├── Coaching dashboard                                         │
│  ├── Recruiting dashboard                                       │
│  ├── Executive reports                                          │
│  └── Analytics workbench                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Stakeholder Requirements Summary

Stakeholder	Primary Need	Response Time	Key Features
Head Coach	Game-day decisions	Real-time	Fourth-down bot, WP display
Coordinators	Game preparation	< 1 hour	Opponent tendencies
Recruiting	Prospect management	< 5 seconds	Search, compare, track
Executives	Performance tracking	Same day	Summary reports
Analytics Staff	Flexible analysis	< 30 seconds	Query tools, exports

Technology Stack Decision Guide

DECISION FRAMEWORK: WHAT STACK TO CHOOSE

Budget Level: HIGH (>$200K/year)
├── Database: PostgreSQL + Snowflake/BigQuery
├── Cache: Redis Cluster
├── API: FastAPI or Node.js
├── Frontend: React + D3.js
├── Deployment: Kubernetes on cloud
└── BI: Tableau or Power BI

Budget Level: MEDIUM ($50K-200K/year)
├── Database: PostgreSQL
├── Cache: Redis
├── API: FastAPI
├── Frontend: Plotly Dash or React
├── Deployment: Docker on cloud VM
└── BI: Open source (Metabase, Superset)

Budget Level: LOW (<$50K/year)
├── Database: PostgreSQL or SQLite
├── Cache: Application-level
├── API: Flask/FastAPI
├── Frontend: Plotly Dash
├── Deployment: Single server
└── BI: Custom dashboards

Requirements Template

# Requirement: [ID]-[Number]

**Title**: [Short descriptive title]

**Stakeholder**: [Role]

**Priority**: [Critical | High | Medium | Low]

**Description**:
[What the user needs to accomplish]

**Acceptance Criteria**:
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3

**Response Time**: [e.g., < 500ms, same day]

**Access Level**: [Public | Internal | Confidential | Restricted]

Data Pipeline Checklist

[ ] Data source connections configured
[ ] API rate limiting implemented
[ ] Validation rules defined
[ ] Error handling and logging
[ ] Quality monitoring active
[ ] Incremental updates supported
[ ] Historical backfill capability
[ ] Data lineage documented

Database Schema Essentials

Core Tables: - teams - Team master data - games - Game schedule and results - drives - Drive summaries - plays - Play-by-play detail - players - Player master data - player_game_stats - Per-game statistics

Analytics Tables: - play_epa - Pre-computed EPA values - team_season_stats - Aggregated team stats - model_predictions - Model outputs

Recruiting Tables: - prospects - Prospect master data - evaluations - Internal evaluations - contacts - Interaction tracking

EPA Quick Reference

Expected Points by Field Position (approximate): | Yard Line | EP (1st & 10) | |-----------|---------------| | Own 10 | -0.5 | | Own 25 | 0.5 | | Own 50 | 2.0 | | Opp 25 | 4.0 | | Opp 10 | 5.5 |

EPA Calculation:

EPA = EP_after_play - EP_before_play

Success Rate Thresholds: - 1st down: 40% of needed yards - 2nd down: 60% of needed yards - 3rd/4th down: 100% (convert or score)

Fourth-Down Decision Framework

FOURTH DOWN DECISION PROCESS

Step 1: Calculate CONVERSION probability
        Based on: Distance, field position, offense quality

Step 2: Calculate STATE after each outcome
        - If convert: 1st down at new position
        - If fail: Opponent ball at current position
        - If FG make: Kickoff
        - If FG miss: Opponent ball at kick spot
        - If punt: Opponent ball at expected landing

Step 3: Calculate WIN PROBABILITY for each state
        Using: WP model with game situation

Step 4: Calculate EXPECTED VALUE
        EV_go = P(convert) × WP(convert) + P(fail) × WP(fail)
        EV_fg = P(make) × WP(make) + P(miss) × WP(miss)
        EV_punt = WP(opponent after punt)

Step 5: RECOMMEND option with highest EV

Dashboard Design Principles

For Coaching Staff: - Large text, high contrast (outdoor visibility) - Single-tap access to key info - Offline capability - No login required during games

For Recruiting: - Fast search and filtering - Side-by-side comparisons - Mobile-friendly - Real-time updates

For Executives: - Summary views first - Drill-down available - Print-friendly - Historical trends

Deployment Checklist

PRE-DEPLOYMENT:
- [ ] All tests passing
- [ ] Load testing completed
- [ ] Security review done
- [ ] Documentation updated
- [ ] Rollback plan ready

DEPLOYMENT:
- [ ] Database migrations applied
- [ ] Environment variables configured
- [ ] Health checks responding
- [ ] Monitoring alerts configured
- [ ] SSL certificates valid

POST-DEPLOYMENT:
- [ ] Smoke tests passed
- [ ] User acceptance verified
- [ ] Performance baseline captured
- [ ] Runbook updated

Monitoring Essentials

Critical Metrics: | Metric | Warning | Critical | |--------|---------|----------| | API Latency (p99) | > 500ms | > 2s | | Error Rate | > 1% | > 5% | | CPU Usage | > 70% | > 90% | | Memory Usage | > 80% | > 95% | | Data Freshness | > 30 min | > 60 min | | Queue Depth | > 1000 | > 5000 |

Common Pitfalls to Avoid

Pitfall	Problem	Solution
Building before understanding	Wrong features	Interview stakeholders first
Over-engineering	Wasted effort	Start simple, iterate
No validation	Bad data propagates	Validate at ingestion
Single point of failure	System goes down	Redundancy at critical points
Poor documentation	Knowledge loss	Document as you build
No monitoring	Blind to issues	Monitor from day one
Ignoring security	Data breach	Implement RBAC, encrypt data

Quick Formulas

Win Probability Added:

WPA = WP_after - WP_before

Data Quality Score:

Quality = Valid_records / Total_records

System Availability:

Availability = Uptime / (Uptime + Downtime)

Latency Percentiles:

p99 = Value where 99% of requests are faster

Budget Planning Template

Category	Small Program	Medium Program
Personnel	$65K \| $200K	$500K+
Infrastructure	$5K \| $25K	$100K
Software	$5K \| $30K	$75K
Training	$3K \| $10K	$25K
Total	~$80K \| ~$265K	~$700K+

Key Success Factors

User Focus: Build what users need, not what's technically interesting
Reliability First: A system that's down is worse than no system
Start Small: Prove value before scaling
Automate Early: Manual processes don't survive game weeks
Document Everything: Enable others to maintain and extend
Iterate Continuously: Gather feedback and improve

Essential Code Patterns

Data Ingestion Pattern

async def ingest_with_retry(source, max_retries=3):
    for attempt in range(max_retries):
        try:
            data = await source.fetch()
            validated = validate(data)
            await store(validated)
            return len(validated)
        except RetryableError:
            await asyncio.sleep(2 ** attempt)
    raise IngestionFailed()

Caching Pattern

def get_with_cache(key, compute_fn, ttl=300):
    cached = cache.get(key)
    if cached:
        return cached
    value = compute_fn()
    cache.set(key, value, ttl)
    return value

Repository Pattern

class PlayRepository:
    def __init__(self, db):
        self.db = db

    def get_by_game(self, game_id):
        return self.db.query(Play).filter(game_id=game_id).all()

    def save(self, play):
        self.db.add(play)
        self.db.commit()

Quick Reference Commands

Database:

-- Check data freshness
SELECT MAX(updated_at) FROM plays;

-- Game EPA summary
SELECT game_id, SUM(epa) as total_epa
FROM plays GROUP BY game_id;

Docker:

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f api

# Restart specific service
docker-compose restart api

Kubernetes:

# Check pod status
kubectl get pods

# View logs
kubectl logs -f deployment/api

# Scale deployment
kubectl scale deployment/api --replicas=5