Case Study 2: Group of Five Program Building Analytics Capability on a Budget
Executive Summary
This case study examines how a Group of Five college football program built a competitive analytics capability with limited resources. Working with a $50,000 annual budget and a single dedicated analyst, the program created an effective system by prioritizing ruthlessly, leveraging open-source tools, and focusing on the highest-impact analyses. After two seasons, the program's analytics maturity matched programs spending 4x as much.
Background
The Program
"Mountain State University" (anonymized) is a Group of Five conference member with: - Annual football budget: $12M - Previous analytics staff: 0 (GA handling basic stats) - Technology budget for analytics: $50,000 (first year), $35,000 (ongoing) - Existing tools: Hudl video, conference stat feeds, Excel
The Constraint
Unlike Power Five programs with dedicated analytics departments, Mountain State had to build capability with: - 1 full-time analyst (new hire) - Limited budget for software licenses - Shared IT infrastructure - No cloud computing allocation
The Opportunity
The athletic director recognized that analytics could be a competitive equalizer:
"We can't outspend our opponents on facilities or coaching salaries, but we can be smarter. Analytics is one area where a small investment can yield big returns."
Phase 1: Strategic Assessment (Month 1)
Resource Audit
The new analyst began by cataloging available resources:
Existing Data Sources (Free): - Conference statistical feeds - Public play-by-play data (College Football Data API) - Hudl video tags (already licensed) - Recruiting service data (basic tier)
Available Infrastructure: - University-provided laptop - Shared server with 50GB storage - University Microsoft 365 (for collaboration) - University VPN for remote access
Budget Allocation: | Category | Year 1 | Ongoing | |----------|--------|---------| | Personnel (1 analyst) | $65,000 | $68,000 | | Software/Tools | $15,000 | $10,000 | | Cloud Services | $8,000 | $8,000 | | Training/Conferences | $5,000 | $5,000 | | Hardware | $7,000 | $2,000 | | Total | $100,000** | **$93,000 |
Priority Framework
Given constraints, the analyst developed a prioritization framework:
IMPACT vs. EFFORT MATRIX
Low Effort High Effort
High Impact │ ★ DO FIRST │ ◐ PLAN FOR │
│ - EPA dashboards │ - Real-time WP │
│ - Basic tendency │ - Custom models │
│ - Recruiting │ - Video sync │
│ tracking │ │
───────────────┼───────────────────┼─────────────────┤
Low Impact │ ○ AUTOMATE │ ✗ SKIP │
│ - Data ingestion │ - Custom viz │
│ - Report templates│ - Advanced ML │
│ │ - Mobile app │
Year 1 Goals: 1. Automated data collection (foundation) 2. EPA-based game analysis (high impact) 3. Fourth-down decision support (competitive edge) 4. Basic recruiting organization (efficiency)
Phase 2: Building the Foundation (Months 2-4)
Open-Source Stack Selection
The analyst chose a fully open-source stack to minimize costs:
| Component | Solution | Cost |
|---|---|---|
| Database | PostgreSQL | Free |
| Programming | Python | Free |
| Visualization | Plotly/Dash | Free |
| Version Control | GitHub | Free |
| Scheduling | Apache Airflow | Free |
| Cloud Computing | AWS Free Tier + $500/mo | $6,000/year |
Total Software Cost: $6,000/year (vs. $50,000+ for commercial alternatives)
Data Pipeline (Simple but Effective)
"""
Mountain State Analytics Pipeline
Simple, effective data collection using minimal resources.
"""
import os
import requests
import pandas as pd
from datetime import datetime, timedelta
import sqlite3 # Started with SQLite before PostgreSQL
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class SimplePipeline:
"""Minimal viable data pipeline."""
def __init__(self):
self.api_key = os.environ.get('CFB_API_KEY')
self.db_path = 'cfb_analytics.db'
def fetch_games(self, season: int) -> pd.DataFrame:
"""Fetch games from API."""
url = 'https://api.collegefootballdata.com/games'
params = {'year': season, 'team': 'Mountain State'}
headers = {'Authorization': f'Bearer {self.api_key}'}
response = requests.get(url, params=params, headers=headers)
response.raise_for_status()
return pd.DataFrame(response.json())
def fetch_plays(self, game_id: int) -> pd.DataFrame:
"""Fetch plays for a game."""
url = 'https://api.collegefootballdata.com/plays'
params = {'gameId': game_id}
headers = {'Authorization': f'Bearer {self.api_key}'}
response = requests.get(url, params=params, headers=headers)
response.raise_for_status()
return pd.DataFrame(response.json())
def calculate_simple_epa(self, plays: pd.DataFrame) -> pd.DataFrame:
"""Calculate EPA using simple lookup table."""
# Simplified EP values (could be refined)
def get_ep(row):
yard_line = row['yardsToEndzone'] if pd.notna(row['yardsToEndzone']) else 50
down = row['down'] if pd.notna(row['down']) else 1
# Simple logistic approximation
base_ep = 7 / (1 + 2.718 ** (-0.1 * (100 - yard_line - 50)))
down_adj = {1: 0, 2: -0.3, 3: -0.7, 4: -1.2}.get(down, 0)
return base_ep + down_adj
plays['ep_before'] = plays.apply(get_ep, axis=1)
# Calculate EPA
plays['epa'] = plays['ep_before'].shift(-1) - plays['ep_before']
# Handle drive endings
plays.loc[plays['play_type'].str.contains('Touchdown', na=False), 'epa'] = \
7 - plays['ep_before']
return plays
def save_to_db(self, df: pd.DataFrame, table: str):
"""Save dataframe to SQLite."""
conn = sqlite3.connect(self.db_path)
df.to_sql(table, conn, if_exists='append', index=False)
conn.close()
logger.info(f"Saved {len(df)} rows to {table}")
def daily_update():
"""Run daily update job."""
pipeline = SimplePipeline()
# Get recent games
games = pipeline.fetch_games(2024)
recent_games = games[games['completed'] == True].tail(5)
for _, game in recent_games.iterrows():
logger.info(f"Processing game {game['id']}")
plays = pipeline.fetch_plays(game['id'])
plays_with_epa = pipeline.calculate_simple_epa(plays)
pipeline.save_to_db(plays_with_epa, 'plays')
if __name__ == '__main__':
daily_update()
Key Design Decisions: 1. SQLite First: Started with SQLite for simplicity, migrated to PostgreSQL in month 6 2. Simple EPA: Used a basic EPA model initially, refined later 3. Manual Triggers: Ran updates manually before automating with Airflow
Dashboard (Plotly Dash)
Rather than expensive dashboard software, the analyst built dashboards with Plotly Dash:
"""
Simple EPA Dashboard using Plotly Dash
Free alternative to commercial BI tools.
"""
import dash
from dash import html, dcc
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
app = dash.Dash(__name__)
def create_epa_chart(team_data: pd.DataFrame):
"""Create EPA per play chart."""
fig = go.Figure()
# Offense EPA
fig.add_trace(go.Bar(
name='Offense EPA/Play',
x=team_data['opponent'],
y=team_data['off_epa_per_play'],
marker_color='green'
))
# Defense EPA
fig.add_trace(go.Bar(
name='Defense EPA/Play Allowed',
x=team_data['opponent'],
y=team_data['def_epa_per_play'],
marker_color='red'
))
fig.update_layout(
title='EPA Performance by Game',
barmode='group',
xaxis_title='Opponent',
yaxis_title='EPA per Play'
)
return fig
app.layout = html.Div([
html.H1("Mountain State Football Analytics"),
dcc.Tabs([
dcc.Tab(label='Season Overview', children=[
dcc.Graph(id='epa-chart'),
dcc.Graph(id='success-rate-chart')
]),
dcc.Tab(label='Fourth Down', children=[
html.Div(id='fourth-down-widget')
]),
dcc.Tab(label='Opponent Scout', children=[
dcc.Dropdown(id='opponent-select'),
html.Div(id='opponent-tendencies')
])
])
])
if __name__ == '__main__':
app.run_server(debug=True, host='0.0.0.0', port=8050)
Cost Comparison: | Solution | Annual Cost | |----------|-------------| | Tableau (5 users) | $8,400 | | Power BI Pro (5 users) | $6,000 | | Plotly Dash (self-hosted) | $0 |
Phase 3: High-Impact Features (Months 5-8)
Fourth-Down Decision Card
Instead of a complex real-time system, the analyst created a simple decision reference card:
"""
Fourth Down Decision Card Generator
Creates a printed reference card for sideline use.
"""
def generate_decision_card():
"""Generate fourth-down decision matrix."""
card = {}
for yard_line in range(20, 100, 5):
for distance in range(1, 11):
# Simplified decision logic
if distance <= 2:
decision = 'GO' if yard_line >= 30 else 'CONTEXT'
elif distance <= 4:
decision = 'GO' if yard_line >= 50 else 'KICK/PUNT'
else:
fg_range = yard_line >= 60
if fg_range:
decision = 'FG'
else:
decision = 'PUNT' if yard_line < 40 else 'CONTEXT'
card[(yard_line, distance)] = decision
return card
def print_card():
"""Print formatted decision card."""
card = generate_decision_card()
print("=" * 60)
print("FOURTH DOWN DECISION CARD - Mountain State")
print("=" * 60)
print()
print("Yard Line | 1yd | 2yd | 3yd | 4yd | 5yd+ |")
print("-" * 50)
for yl in range(95, 15, -5):
row = f" {yl:2d} |"
for dist in [1, 2, 3, 4, 5]:
decision = card.get((yl, dist), '?')
symbol = {
'GO': ' GO ',
'FG': ' FG ',
'PUNT': 'PUNT',
'KICK/PUNT': 'K/P ',
'CONTEXT': ' ?? '
}.get(decision, ' ?? ')
row += f" {symbol} |"
print(row)
print()
print("Legend: GO=Go for it, FG=Field Goal, PUNT=Punt")
print(" K/P=Kick or Punt (check score), ??=Situational")
Usage: The card was laminated and attached to the play-calling sheet, providing instant guidance without needing technology during games.
Opponent Tendency Reports
Weekly reports became a key deliverable:
"""
Automated Opponent Tendency Report
Generates PDF report for game preparation.
"""
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from reportlab.lib import colors
import pandas as pd
class OpponentReport:
"""Generate opponent scouting report."""
def __init__(self, opponent: str, plays_df: pd.DataFrame):
self.opponent = opponent
self.plays = plays_df[plays_df['offense'] == opponent]
def analyze_tendencies(self):
"""Analyze play-calling tendencies."""
total = len(self.plays)
pass_plays = self.plays[self.plays['play_type'].str.contains('Pass', na=False)]
rush_plays = self.plays[self.plays['play_type'].str.contains('Rush', na=False)]
return {
'total_plays': total,
'pass_rate': len(pass_plays) / total if total > 0 else 0,
'rush_rate': len(rush_plays) / total if total > 0 else 0,
'by_down': self._by_down_analysis(),
'by_field_position': self._by_field_position()
}
def _by_down_analysis(self):
"""Analyze by down."""
results = {}
for down in [1, 2, 3]:
down_plays = self.plays[self.plays['down'] == down]
if len(down_plays) > 0:
pass_pct = len(down_plays[
down_plays['play_type'].str.contains('Pass', na=False)
]) / len(down_plays)
results[down] = {'pass_rate': pass_pct, 'sample': len(down_plays)}
return results
def _by_field_position(self):
"""Analyze by field position zone."""
zones = {
'own_territory': self.plays[self.plays['yardsToEndzone'] > 50],
'midfield': self.plays[
(self.plays['yardsToEndzone'] <= 50) &
(self.plays['yardsToEndzone'] > 20)
],
'red_zone': self.plays[self.plays['yardsToEndzone'] <= 20]
}
results = {}
for zone_name, zone_plays in zones.items():
if len(zone_plays) > 0:
pass_pct = len(zone_plays[
zone_plays['play_type'].str.contains('Pass', na=False)
]) / len(zone_plays)
results[zone_name] = {'pass_rate': pass_pct, 'sample': len(zone_plays)}
return results
def generate_pdf(self, output_path: str):
"""Generate PDF report."""
analysis = self.analyze_tendencies()
c = canvas.Canvas(output_path, pagesize=letter)
width, height = letter
# Title
c.setFont("Helvetica-Bold", 24)
c.drawString(50, height - 50, f"Opponent Scouting: {self.opponent}")
# Overview
c.setFont("Helvetica", 12)
y = height - 100
c.drawString(50, y, f"Total Plays Analyzed: {analysis['total_plays']}")
y -= 20
c.drawString(50, y, f"Overall Pass Rate: {analysis['pass_rate']:.1%}")
y -= 20
c.drawString(50, y, f"Overall Rush Rate: {analysis['rush_rate']:.1%}")
# By Down
y -= 40
c.setFont("Helvetica-Bold", 14)
c.drawString(50, y, "By Down:")
c.setFont("Helvetica", 12)
for down, data in analysis['by_down'].items():
y -= 20
c.drawString(70, y, f"Down {down}: {data['pass_rate']:.1%} pass ({data['sample']} plays)")
# By Field Position
y -= 40
c.setFont("Helvetica-Bold", 14)
c.drawString(50, y, "By Field Position:")
c.setFont("Helvetica", 12)
for zone, data in analysis['by_field_position'].items():
y -= 20
zone_name = zone.replace('_', ' ').title()
c.drawString(70, y, f"{zone_name}: {data['pass_rate']:.1%} pass ({data['sample']} plays)")
c.save()
Turnaround Time: Reports were generated every Sunday night for Monday staff meetings, reducing prep time from 6 hours to 30 minutes.
Phase 4: Scaling Up (Months 9-18)
Infrastructure Upgrade
After proving value, the analyst received budget approval to upgrade infrastructure:
Before (Months 1-8): - SQLite database on laptop - Manual script execution - Dash running locally
After (Months 9+): - PostgreSQL on AWS RDS - Airflow for automated scheduling - Dash deployed on AWS EC2
# Airflow DAG for automated updates
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'analytics',
'depends_on_past': False,
'email_on_failure': True,
'email': ['analyst@mountainstate.edu'],
'retries': 3,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
'daily_analytics_update',
default_args=default_args,
description='Daily data ingestion and processing',
schedule_interval='0 6 * * *', # 6 AM daily
start_date=datetime(2024, 1, 1),
catchup=False
)
def run_ingestion():
from pipeline import SimplePipeline
pipeline = SimplePipeline()
pipeline.daily_update()
def run_epa_calculation():
from analytics import EPACalculator
calc = EPACalculator()
calc.process_recent_games()
def generate_reports():
from reports import ReportGenerator
gen = ReportGenerator()
gen.generate_weekly_reports()
ingest_task = PythonOperator(
task_id='ingest_data',
python_callable=run_ingestion,
dag=dag
)
epa_task = PythonOperator(
task_id='calculate_epa',
python_callable=run_epa_calculation,
dag=dag
)
report_task = PythonOperator(
task_id='generate_reports',
python_callable=generate_reports,
dag=dag
)
ingest_task >> epa_task >> report_task
Adding Recruiting Support
With the foundation solid, the analyst added recruiting capabilities:
"""
Simple Recruiting Tracker
Consolidated view of recruiting targets.
"""
import pandas as pd
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
class RecruitingTracker:
"""Track and organize recruiting prospects."""
def __init__(self, sheet_id: str):
# Use Google Sheets as a simple, collaborative database
self.sheet_id = sheet_id
self.service = self._init_sheets_service()
def _init_sheets_service(self):
"""Initialize Google Sheets API."""
creds = Credentials.from_authorized_user_file('credentials.json')
return build('sheets', 'v4', credentials=creds)
def get_prospects(self) -> pd.DataFrame:
"""Get all prospects from tracking sheet."""
result = self.service.spreadsheets().values().get(
spreadsheetId=self.sheet_id,
range='Prospects!A:Z'
).execute()
values = result.get('values', [])
if not values:
return pd.DataFrame()
return pd.DataFrame(values[1:], columns=values[0])
def filter_by_position(self, position: str) -> pd.DataFrame:
"""Get prospects by position."""
prospects = self.get_prospects()
return prospects[prospects['Position'] == position]
def get_priority_targets(self) -> pd.DataFrame:
"""Get high-priority uncommitted prospects."""
prospects = self.get_prospects()
return prospects[
(prospects['Priority'] == 'High') &
(prospects['Status'] != 'Committed')
]
def generate_board_summary(self) -> str:
"""Generate recruiting board summary."""
prospects = self.get_prospects()
summary = "RECRUITING BOARD SUMMARY\n"
summary += "=" * 40 + "\n\n"
# By status
for status in ['Committed', 'Target', 'Monitoring']:
status_df = prospects[prospects['Status'] == status]
summary += f"{status}: {len(status_df)}\n"
# By position need
summary += "\nBy Position:\n"
for pos in ['QB', 'RB', 'WR', 'OL', 'DL', 'LB', 'DB']:
pos_df = prospects[prospects['Position'] == pos]
committed = len(pos_df[pos_df['Status'] == 'Committed'])
targets = len(pos_df[pos_df['Status'] == 'Target'])
summary += f" {pos}: {committed} committed, {targets} targets\n"
return summary
Why Google Sheets? - Free with university account - Coaches could add notes directly - Built-in collaboration features - API access for automation
Results
Year 1 Outcomes
| Metric | Before | After |
|---|---|---|
| Data collection time | 15 hrs/week | 2 hrs/week |
| Opponent report time | 6 hours | 30 minutes |
| Fourth-down decisions documented | None | All |
| Recruiting prospects tracked | ~100 | 450 |
Year 2 Outcomes
| Metric | Year 1 | Year 2 |
|---|---|---|
| Dashboard users | 3 | 12 |
| Automated reports | 2 | 8 |
| Data freshness | Daily | Hourly (game days) |
| Budget efficiency | $100/insight | $35/insight |
Competitive Impact
While direct win attribution is difficult, coaching staff reported:
- Better Preparation: "We're not surprised by anything anymore" - DC
- Faster Adjustments: Halftime adjustments backed by data
- Recruiting Efficiency: More organized pursuit of fewer, better-fit prospects
- Conference Recognition: Asked to present approach at conference meetings
Key Lessons for Budget-Constrained Programs
1. Prioritize Ruthlessly
Focus on 2-3 high-impact features rather than building a comprehensive system.
2. Leverage Open Source
| Commercial Tool | Open Source Alternative | Savings |
|---|---|---|
| Tableau | Plotly Dash | $8,400/year |
| Alteryx | Python + Pandas | $25,000/year |
| Snowflake | PostgreSQL | $12,000/year |
3. Start Manual, Then Automate
Prove value with manual processes before investing in automation. This validates the workflow before engineering effort.
4. Use University Resources
- Microsoft 365 for collaboration
- Google Workspace (via university)
- Student workers for data entry
- IT department for hosting
5. Build Relationships
The analyst spent significant time with coaches to understand needs: - Weekly check-ins with coordinators - Monthly reviews with head coach - Attended practice to understand workflow
6. Document Everything
With a one-person team, documentation was essential for: - Knowledge continuity - Training future staff - Explaining methods to coaches
Conclusion
Mountain State's analytics journey demonstrates that competitive analytics capabilities don't require large budgets. Through strategic prioritization, open-source tools, and focus on high-impact features, a single analyst created meaningful competitive advantage.
The key insight: Start where you are, use what you have, do what you can.
For programs considering analytics investments, the Mountain State approach offers a template for proving value before scaling up.