Chapter 27 Key Takeaways: Building a Complete Analytics System
Quick Reference Summary
The Five Layers of a Football Analytics Platform
┌─────────────────────────────────────────────────────────────────┐
│ PLATFORM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: DATA INGESTION │
│ ├── External APIs (play-by-play, recruiting) │
│ ├── Internal sources (video tags, evaluations) │
│ ├── Validation and quality monitoring │
│ └── Rate limiting and error handling │
│ │
│ Layer 2: STORAGE │
│ ├── PostgreSQL for primary data │
│ ├── Redis for caching hot data │
│ ├── Data lake for historical archive │
│ └── Model registry for ML models │
│ │
│ Layer 3: PROCESSING │
│ ├── EPA calculation engine │
│ ├── Win probability models │
│ ├── Opponent analysis tools │
│ └── Report generation │
│ │
│ Layer 4: API │
│ ├── RESTful endpoints │
│ ├── Authentication/authorization │
│ ├── Rate limiting │
│ └── Documentation │
│ │
│ Layer 5: PRESENTATION │
│ ├── Coaching dashboard │
│ ├── Recruiting dashboard │
│ ├── Executive reports │
│ └── Analytics workbench │
│ │
└─────────────────────────────────────────────────────────────────┘
Stakeholder Requirements Summary
| Stakeholder | Primary Need | Response Time | Key Features |
|---|---|---|---|
| Head Coach | Game-day decisions | Real-time | Fourth-down bot, WP display |
| Coordinators | Game preparation | < 1 hour | Opponent tendencies |
| Recruiting | Prospect management | < 5 seconds | Search, compare, track |
| Executives | Performance tracking | Same day | Summary reports |
| Analytics Staff | Flexible analysis | < 30 seconds | Query tools, exports |
Technology Stack Decision Guide
DECISION FRAMEWORK: WHAT STACK TO CHOOSE
Budget Level: HIGH (>$200K/year)
├── Database: PostgreSQL + Snowflake/BigQuery
├── Cache: Redis Cluster
├── API: FastAPI or Node.js
├── Frontend: React + D3.js
├── Deployment: Kubernetes on cloud
└── BI: Tableau or Power BI
Budget Level: MEDIUM ($50K-200K/year)
├── Database: PostgreSQL
├── Cache: Redis
├── API: FastAPI
├── Frontend: Plotly Dash or React
├── Deployment: Docker on cloud VM
└── BI: Open source (Metabase, Superset)
Budget Level: LOW (<$50K/year)
├── Database: PostgreSQL or SQLite
├── Cache: Application-level
├── API: Flask/FastAPI
├── Frontend: Plotly Dash
├── Deployment: Single server
└── BI: Custom dashboards
Requirements Template
# Requirement: [ID]-[Number]
**Title**: [Short descriptive title]
**Stakeholder**: [Role]
**Priority**: [Critical | High | Medium | Low]
**Description**:
[What the user needs to accomplish]
**Acceptance Criteria**:
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3
**Response Time**: [e.g., < 500ms, same day]
**Access Level**: [Public | Internal | Confidential | Restricted]
Data Pipeline Checklist
- [ ] Data source connections configured
- [ ] API rate limiting implemented
- [ ] Validation rules defined
- [ ] Error handling and logging
- [ ] Quality monitoring active
- [ ] Incremental updates supported
- [ ] Historical backfill capability
- [ ] Data lineage documented
Database Schema Essentials
Core Tables:
- teams - Team master data
- games - Game schedule and results
- drives - Drive summaries
- plays - Play-by-play detail
- players - Player master data
- player_game_stats - Per-game statistics
Analytics Tables:
- play_epa - Pre-computed EPA values
- team_season_stats - Aggregated team stats
- model_predictions - Model outputs
Recruiting Tables:
- prospects - Prospect master data
- evaluations - Internal evaluations
- contacts - Interaction tracking
EPA Quick Reference
Expected Points by Field Position (approximate): | Yard Line | EP (1st & 10) | |-----------|---------------| | Own 10 | -0.5 | | Own 25 | 0.5 | | Own 50 | 2.0 | | Opp 25 | 4.0 | | Opp 10 | 5.5 |
EPA Calculation:
EPA = EP_after_play - EP_before_play
Success Rate Thresholds: - 1st down: 40% of needed yards - 2nd down: 60% of needed yards - 3rd/4th down: 100% (convert or score)
Fourth-Down Decision Framework
FOURTH DOWN DECISION PROCESS
Step 1: Calculate CONVERSION probability
Based on: Distance, field position, offense quality
Step 2: Calculate STATE after each outcome
- If convert: 1st down at new position
- If fail: Opponent ball at current position
- If FG make: Kickoff
- If FG miss: Opponent ball at kick spot
- If punt: Opponent ball at expected landing
Step 3: Calculate WIN PROBABILITY for each state
Using: WP model with game situation
Step 4: Calculate EXPECTED VALUE
EV_go = P(convert) × WP(convert) + P(fail) × WP(fail)
EV_fg = P(make) × WP(make) + P(miss) × WP(miss)
EV_punt = WP(opponent after punt)
Step 5: RECOMMEND option with highest EV
Dashboard Design Principles
For Coaching Staff: - Large text, high contrast (outdoor visibility) - Single-tap access to key info - Offline capability - No login required during games
For Recruiting: - Fast search and filtering - Side-by-side comparisons - Mobile-friendly - Real-time updates
For Executives: - Summary views first - Drill-down available - Print-friendly - Historical trends
Deployment Checklist
PRE-DEPLOYMENT:
- [ ] All tests passing
- [ ] Load testing completed
- [ ] Security review done
- [ ] Documentation updated
- [ ] Rollback plan ready
DEPLOYMENT:
- [ ] Database migrations applied
- [ ] Environment variables configured
- [ ] Health checks responding
- [ ] Monitoring alerts configured
- [ ] SSL certificates valid
POST-DEPLOYMENT:
- [ ] Smoke tests passed
- [ ] User acceptance verified
- [ ] Performance baseline captured
- [ ] Runbook updated
Monitoring Essentials
Critical Metrics: | Metric | Warning | Critical | |--------|---------|----------| | API Latency (p99) | > 500ms | > 2s | | Error Rate | > 1% | > 5% | | CPU Usage | > 70% | > 90% | | Memory Usage | > 80% | > 95% | | Data Freshness | > 30 min | > 60 min | | Queue Depth | > 1000 | > 5000 |
Common Pitfalls to Avoid
| Pitfall | Problem | Solution |
|---|---|---|
| Building before understanding | Wrong features | Interview stakeholders first |
| Over-engineering | Wasted effort | Start simple, iterate |
| No validation | Bad data propagates | Validate at ingestion |
| Single point of failure | System goes down | Redundancy at critical points |
| Poor documentation | Knowledge loss | Document as you build |
| No monitoring | Blind to issues | Monitor from day one |
| Ignoring security | Data breach | Implement RBAC, encrypt data |
Quick Formulas
Win Probability Added:
WPA = WP_after - WP_before
Data Quality Score:
Quality = Valid_records / Total_records
System Availability:
Availability = Uptime / (Uptime + Downtime)
Latency Percentiles:
p99 = Value where 99% of requests are faster
Budget Planning Template
| Category | Small Program | Medium Program | Large Program |
|---|---|---|---|
| Personnel | $65K | $200K | $500K+ | |
| Infrastructure | $5K | $25K | $100K | |
| Software | $5K | $30K | $75K | |
| Training | $3K | $10K | $25K | |
| Total | ~$80K** | **~$265K | ~$700K+ |
Key Success Factors
- User Focus: Build what users need, not what's technically interesting
- Reliability First: A system that's down is worse than no system
- Start Small: Prove value before scaling
- Automate Early: Manual processes don't survive game weeks
- Document Everything: Enable others to maintain and extend
- Iterate Continuously: Gather feedback and improve
Essential Code Patterns
Data Ingestion Pattern
async def ingest_with_retry(source, max_retries=3):
for attempt in range(max_retries):
try:
data = await source.fetch()
validated = validate(data)
await store(validated)
return len(validated)
except RetryableError:
await asyncio.sleep(2 ** attempt)
raise IngestionFailed()
Caching Pattern
def get_with_cache(key, compute_fn, ttl=300):
cached = cache.get(key)
if cached:
return cached
value = compute_fn()
cache.set(key, value, ttl)
return value
Repository Pattern
class PlayRepository:
def __init__(self, db):
self.db = db
def get_by_game(self, game_id):
return self.db.query(Play).filter(game_id=game_id).all()
def save(self, play):
self.db.add(play)
self.db.commit()
Quick Reference Commands
Database:
-- Check data freshness
SELECT MAX(updated_at) FROM plays;
-- Game EPA summary
SELECT game_id, SUM(epa) as total_epa
FROM plays GROUP BY game_id;
Docker:
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f api
# Restart specific service
docker-compose restart api
Kubernetes:
# Check pod status
kubectl get pods
# View logs
kubectl logs -f deployment/api
# Scale deployment
kubectl scale deployment/api --replicas=5