Case Study 2: Group of Five Program Building Analytics Capability on a Budget

Executive Summary

This case study examines how a Group of Five college football program built a competitive analytics capability with limited resources. Working with a $50,000 annual budget and a single dedicated analyst, the program created an effective system by prioritizing ruthlessly, leveraging open-source tools, and focusing on the highest-impact analyses. After two seasons, the program's analytics maturity matched programs spending 4x as much.


Background

The Program

"Mountain State University" (anonymized) is a Group of Five conference member with: - Annual football budget: $12M - Previous analytics staff: 0 (GA handling basic stats) - Technology budget for analytics: $50,000 (first year), $35,000 (ongoing) - Existing tools: Hudl video, conference stat feeds, Excel

The Constraint

Unlike Power Five programs with dedicated analytics departments, Mountain State had to build capability with: - 1 full-time analyst (new hire) - Limited budget for software licenses - Shared IT infrastructure - No cloud computing allocation

The Opportunity

The athletic director recognized that analytics could be a competitive equalizer:

"We can't outspend our opponents on facilities or coaching salaries, but we can be smarter. Analytics is one area where a small investment can yield big returns."


Phase 1: Strategic Assessment (Month 1)

Resource Audit

The new analyst began by cataloging available resources:

Existing Data Sources (Free): - Conference statistical feeds - Public play-by-play data (College Football Data API) - Hudl video tags (already licensed) - Recruiting service data (basic tier)

Available Infrastructure: - University-provided laptop - Shared server with 50GB storage - University Microsoft 365 (for collaboration) - University VPN for remote access

Budget Allocation: | Category | Year 1 | Ongoing | |----------|--------|---------| | Personnel (1 analyst) | $65,000 | $68,000 | | Software/Tools | $15,000 | $10,000 | | Cloud Services | $8,000 | $8,000 | | Training/Conferences | $5,000 | $5,000 | | Hardware | $7,000 | $2,000 | | Total | $100,000** | **$93,000 |

Priority Framework

Given constraints, the analyst developed a prioritization framework:

IMPACT vs. EFFORT MATRIX

                    Low Effort          High Effort
High Impact    │ ★ DO FIRST        │ ◐ PLAN FOR      │
               │ - EPA dashboards  │ - Real-time WP  │
               │ - Basic tendency  │ - Custom models │
               │ - Recruiting      │ - Video sync    │
               │   tracking        │                 │
───────────────┼───────────────────┼─────────────────┤
Low Impact     │ ○ AUTOMATE        │ ✗ SKIP          │
               │ - Data ingestion  │ - Custom viz    │
               │ - Report templates│ - Advanced ML   │
               │                   │ - Mobile app    │

Year 1 Goals: 1. Automated data collection (foundation) 2. EPA-based game analysis (high impact) 3. Fourth-down decision support (competitive edge) 4. Basic recruiting organization (efficiency)


Phase 2: Building the Foundation (Months 2-4)

Open-Source Stack Selection

The analyst chose a fully open-source stack to minimize costs:

Component Solution Cost
Database PostgreSQL Free
Programming Python Free
Visualization Plotly/Dash Free
Version Control GitHub Free
Scheduling Apache Airflow Free
Cloud Computing AWS Free Tier + $500/mo | $6,000/year

Total Software Cost: $6,000/year (vs. $50,000+ for commercial alternatives)

Data Pipeline (Simple but Effective)

"""
Mountain State Analytics Pipeline

Simple, effective data collection using minimal resources.
"""

import os
import requests
import pandas as pd
from datetime import datetime, timedelta
import sqlite3  # Started with SQLite before PostgreSQL
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class SimplePipeline:
    """Minimal viable data pipeline."""

    def __init__(self):
        self.api_key = os.environ.get('CFB_API_KEY')
        self.db_path = 'cfb_analytics.db'

    def fetch_games(self, season: int) -> pd.DataFrame:
        """Fetch games from API."""
        url = 'https://api.collegefootballdata.com/games'
        params = {'year': season, 'team': 'Mountain State'}
        headers = {'Authorization': f'Bearer {self.api_key}'}

        response = requests.get(url, params=params, headers=headers)
        response.raise_for_status()

        return pd.DataFrame(response.json())

    def fetch_plays(self, game_id: int) -> pd.DataFrame:
        """Fetch plays for a game."""
        url = 'https://api.collegefootballdata.com/plays'
        params = {'gameId': game_id}
        headers = {'Authorization': f'Bearer {self.api_key}'}

        response = requests.get(url, params=params, headers=headers)
        response.raise_for_status()

        return pd.DataFrame(response.json())

    def calculate_simple_epa(self, plays: pd.DataFrame) -> pd.DataFrame:
        """Calculate EPA using simple lookup table."""
        # Simplified EP values (could be refined)
        def get_ep(row):
            yard_line = row['yardsToEndzone'] if pd.notna(row['yardsToEndzone']) else 50
            down = row['down'] if pd.notna(row['down']) else 1

            # Simple logistic approximation
            base_ep = 7 / (1 + 2.718 ** (-0.1 * (100 - yard_line - 50)))
            down_adj = {1: 0, 2: -0.3, 3: -0.7, 4: -1.2}.get(down, 0)

            return base_ep + down_adj

        plays['ep_before'] = plays.apply(get_ep, axis=1)

        # Calculate EPA
        plays['epa'] = plays['ep_before'].shift(-1) - plays['ep_before']

        # Handle drive endings
        plays.loc[plays['play_type'].str.contains('Touchdown', na=False), 'epa'] = \
            7 - plays['ep_before']

        return plays

    def save_to_db(self, df: pd.DataFrame, table: str):
        """Save dataframe to SQLite."""
        conn = sqlite3.connect(self.db_path)
        df.to_sql(table, conn, if_exists='append', index=False)
        conn.close()
        logger.info(f"Saved {len(df)} rows to {table}")


def daily_update():
    """Run daily update job."""
    pipeline = SimplePipeline()

    # Get recent games
    games = pipeline.fetch_games(2024)
    recent_games = games[games['completed'] == True].tail(5)

    for _, game in recent_games.iterrows():
        logger.info(f"Processing game {game['id']}")
        plays = pipeline.fetch_plays(game['id'])
        plays_with_epa = pipeline.calculate_simple_epa(plays)
        pipeline.save_to_db(plays_with_epa, 'plays')


if __name__ == '__main__':
    daily_update()

Key Design Decisions: 1. SQLite First: Started with SQLite for simplicity, migrated to PostgreSQL in month 6 2. Simple EPA: Used a basic EPA model initially, refined later 3. Manual Triggers: Ran updates manually before automating with Airflow

Dashboard (Plotly Dash)

Rather than expensive dashboard software, the analyst built dashboards with Plotly Dash:

"""
Simple EPA Dashboard using Plotly Dash

Free alternative to commercial BI tools.
"""

import dash
from dash import html, dcc
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

app = dash.Dash(__name__)

def create_epa_chart(team_data: pd.DataFrame):
    """Create EPA per play chart."""
    fig = go.Figure()

    # Offense EPA
    fig.add_trace(go.Bar(
        name='Offense EPA/Play',
        x=team_data['opponent'],
        y=team_data['off_epa_per_play'],
        marker_color='green'
    ))

    # Defense EPA
    fig.add_trace(go.Bar(
        name='Defense EPA/Play Allowed',
        x=team_data['opponent'],
        y=team_data['def_epa_per_play'],
        marker_color='red'
    ))

    fig.update_layout(
        title='EPA Performance by Game',
        barmode='group',
        xaxis_title='Opponent',
        yaxis_title='EPA per Play'
    )

    return fig


app.layout = html.Div([
    html.H1("Mountain State Football Analytics"),

    dcc.Tabs([
        dcc.Tab(label='Season Overview', children=[
            dcc.Graph(id='epa-chart'),
            dcc.Graph(id='success-rate-chart')
        ]),
        dcc.Tab(label='Fourth Down', children=[
            html.Div(id='fourth-down-widget')
        ]),
        dcc.Tab(label='Opponent Scout', children=[
            dcc.Dropdown(id='opponent-select'),
            html.Div(id='opponent-tendencies')
        ])
    ])
])

if __name__ == '__main__':
    app.run_server(debug=True, host='0.0.0.0', port=8050)

Cost Comparison: | Solution | Annual Cost | |----------|-------------| | Tableau (5 users) | $8,400 | | Power BI Pro (5 users) | $6,000 | | Plotly Dash (self-hosted) | $0 |


Phase 3: High-Impact Features (Months 5-8)

Fourth-Down Decision Card

Instead of a complex real-time system, the analyst created a simple decision reference card:

"""
Fourth Down Decision Card Generator

Creates a printed reference card for sideline use.
"""

def generate_decision_card():
    """Generate fourth-down decision matrix."""
    card = {}

    for yard_line in range(20, 100, 5):
        for distance in range(1, 11):
            # Simplified decision logic
            if distance <= 2:
                decision = 'GO' if yard_line >= 30 else 'CONTEXT'
            elif distance <= 4:
                decision = 'GO' if yard_line >= 50 else 'KICK/PUNT'
            else:
                fg_range = yard_line >= 60
                if fg_range:
                    decision = 'FG'
                else:
                    decision = 'PUNT' if yard_line < 40 else 'CONTEXT'

            card[(yard_line, distance)] = decision

    return card


def print_card():
    """Print formatted decision card."""
    card = generate_decision_card()

    print("=" * 60)
    print("FOURTH DOWN DECISION CARD - Mountain State")
    print("=" * 60)
    print()
    print("Yard Line | 1yd | 2yd | 3yd | 4yd | 5yd+ |")
    print("-" * 50)

    for yl in range(95, 15, -5):
        row = f"   {yl:2d}     |"
        for dist in [1, 2, 3, 4, 5]:
            decision = card.get((yl, dist), '?')
            symbol = {
                'GO': ' GO ',
                'FG': ' FG ',
                'PUNT': 'PUNT',
                'KICK/PUNT': 'K/P ',
                'CONTEXT': ' ?? '
            }.get(decision, ' ?? ')
            row += f" {symbol} |"
        print(row)

    print()
    print("Legend: GO=Go for it, FG=Field Goal, PUNT=Punt")
    print("        K/P=Kick or Punt (check score), ??=Situational")

Usage: The card was laminated and attached to the play-calling sheet, providing instant guidance without needing technology during games.

Opponent Tendency Reports

Weekly reports became a key deliverable:

"""
Automated Opponent Tendency Report

Generates PDF report for game preparation.
"""

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from reportlab.lib import colors
import pandas as pd


class OpponentReport:
    """Generate opponent scouting report."""

    def __init__(self, opponent: str, plays_df: pd.DataFrame):
        self.opponent = opponent
        self.plays = plays_df[plays_df['offense'] == opponent]

    def analyze_tendencies(self):
        """Analyze play-calling tendencies."""
        total = len(self.plays)
        pass_plays = self.plays[self.plays['play_type'].str.contains('Pass', na=False)]
        rush_plays = self.plays[self.plays['play_type'].str.contains('Rush', na=False)]

        return {
            'total_plays': total,
            'pass_rate': len(pass_plays) / total if total > 0 else 0,
            'rush_rate': len(rush_plays) / total if total > 0 else 0,
            'by_down': self._by_down_analysis(),
            'by_field_position': self._by_field_position()
        }

    def _by_down_analysis(self):
        """Analyze by down."""
        results = {}
        for down in [1, 2, 3]:
            down_plays = self.plays[self.plays['down'] == down]
            if len(down_plays) > 0:
                pass_pct = len(down_plays[
                    down_plays['play_type'].str.contains('Pass', na=False)
                ]) / len(down_plays)
                results[down] = {'pass_rate': pass_pct, 'sample': len(down_plays)}
        return results

    def _by_field_position(self):
        """Analyze by field position zone."""
        zones = {
            'own_territory': self.plays[self.plays['yardsToEndzone'] > 50],
            'midfield': self.plays[
                (self.plays['yardsToEndzone'] <= 50) &
                (self.plays['yardsToEndzone'] > 20)
            ],
            'red_zone': self.plays[self.plays['yardsToEndzone'] <= 20]
        }

        results = {}
        for zone_name, zone_plays in zones.items():
            if len(zone_plays) > 0:
                pass_pct = len(zone_plays[
                    zone_plays['play_type'].str.contains('Pass', na=False)
                ]) / len(zone_plays)
                results[zone_name] = {'pass_rate': pass_pct, 'sample': len(zone_plays)}

        return results

    def generate_pdf(self, output_path: str):
        """Generate PDF report."""
        analysis = self.analyze_tendencies()

        c = canvas.Canvas(output_path, pagesize=letter)
        width, height = letter

        # Title
        c.setFont("Helvetica-Bold", 24)
        c.drawString(50, height - 50, f"Opponent Scouting: {self.opponent}")

        # Overview
        c.setFont("Helvetica", 12)
        y = height - 100
        c.drawString(50, y, f"Total Plays Analyzed: {analysis['total_plays']}")
        y -= 20
        c.drawString(50, y, f"Overall Pass Rate: {analysis['pass_rate']:.1%}")
        y -= 20
        c.drawString(50, y, f"Overall Rush Rate: {analysis['rush_rate']:.1%}")

        # By Down
        y -= 40
        c.setFont("Helvetica-Bold", 14)
        c.drawString(50, y, "By Down:")
        c.setFont("Helvetica", 12)
        for down, data in analysis['by_down'].items():
            y -= 20
            c.drawString(70, y, f"Down {down}: {data['pass_rate']:.1%} pass ({data['sample']} plays)")

        # By Field Position
        y -= 40
        c.setFont("Helvetica-Bold", 14)
        c.drawString(50, y, "By Field Position:")
        c.setFont("Helvetica", 12)
        for zone, data in analysis['by_field_position'].items():
            y -= 20
            zone_name = zone.replace('_', ' ').title()
            c.drawString(70, y, f"{zone_name}: {data['pass_rate']:.1%} pass ({data['sample']} plays)")

        c.save()

Turnaround Time: Reports were generated every Sunday night for Monday staff meetings, reducing prep time from 6 hours to 30 minutes.


Phase 4: Scaling Up (Months 9-18)

Infrastructure Upgrade

After proving value, the analyst received budget approval to upgrade infrastructure:

Before (Months 1-8): - SQLite database on laptop - Manual script execution - Dash running locally

After (Months 9+): - PostgreSQL on AWS RDS - Airflow for automated scheduling - Dash deployed on AWS EC2

# Airflow DAG for automated updates
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'analytics',
    'depends_on_past': False,
    'email_on_failure': True,
    'email': ['analyst@mountainstate.edu'],
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    'daily_analytics_update',
    default_args=default_args,
    description='Daily data ingestion and processing',
    schedule_interval='0 6 * * *',  # 6 AM daily
    start_date=datetime(2024, 1, 1),
    catchup=False
)

def run_ingestion():
    from pipeline import SimplePipeline
    pipeline = SimplePipeline()
    pipeline.daily_update()

def run_epa_calculation():
    from analytics import EPACalculator
    calc = EPACalculator()
    calc.process_recent_games()

def generate_reports():
    from reports import ReportGenerator
    gen = ReportGenerator()
    gen.generate_weekly_reports()

ingest_task = PythonOperator(
    task_id='ingest_data',
    python_callable=run_ingestion,
    dag=dag
)

epa_task = PythonOperator(
    task_id='calculate_epa',
    python_callable=run_epa_calculation,
    dag=dag
)

report_task = PythonOperator(
    task_id='generate_reports',
    python_callable=generate_reports,
    dag=dag
)

ingest_task >> epa_task >> report_task

Adding Recruiting Support

With the foundation solid, the analyst added recruiting capabilities:

"""
Simple Recruiting Tracker

Consolidated view of recruiting targets.
"""

import pandas as pd
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build


class RecruitingTracker:
    """Track and organize recruiting prospects."""

    def __init__(self, sheet_id: str):
        # Use Google Sheets as a simple, collaborative database
        self.sheet_id = sheet_id
        self.service = self._init_sheets_service()

    def _init_sheets_service(self):
        """Initialize Google Sheets API."""
        creds = Credentials.from_authorized_user_file('credentials.json')
        return build('sheets', 'v4', credentials=creds)

    def get_prospects(self) -> pd.DataFrame:
        """Get all prospects from tracking sheet."""
        result = self.service.spreadsheets().values().get(
            spreadsheetId=self.sheet_id,
            range='Prospects!A:Z'
        ).execute()

        values = result.get('values', [])
        if not values:
            return pd.DataFrame()

        return pd.DataFrame(values[1:], columns=values[0])

    def filter_by_position(self, position: str) -> pd.DataFrame:
        """Get prospects by position."""
        prospects = self.get_prospects()
        return prospects[prospects['Position'] == position]

    def get_priority_targets(self) -> pd.DataFrame:
        """Get high-priority uncommitted prospects."""
        prospects = self.get_prospects()
        return prospects[
            (prospects['Priority'] == 'High') &
            (prospects['Status'] != 'Committed')
        ]

    def generate_board_summary(self) -> str:
        """Generate recruiting board summary."""
        prospects = self.get_prospects()

        summary = "RECRUITING BOARD SUMMARY\n"
        summary += "=" * 40 + "\n\n"

        # By status
        for status in ['Committed', 'Target', 'Monitoring']:
            status_df = prospects[prospects['Status'] == status]
            summary += f"{status}: {len(status_df)}\n"

        # By position need
        summary += "\nBy Position:\n"
        for pos in ['QB', 'RB', 'WR', 'OL', 'DL', 'LB', 'DB']:
            pos_df = prospects[prospects['Position'] == pos]
            committed = len(pos_df[pos_df['Status'] == 'Committed'])
            targets = len(pos_df[pos_df['Status'] == 'Target'])
            summary += f"  {pos}: {committed} committed, {targets} targets\n"

        return summary

Why Google Sheets? - Free with university account - Coaches could add notes directly - Built-in collaboration features - API access for automation


Results

Year 1 Outcomes

Metric Before After
Data collection time 15 hrs/week 2 hrs/week
Opponent report time 6 hours 30 minutes
Fourth-down decisions documented None All
Recruiting prospects tracked ~100 450

Year 2 Outcomes

Metric Year 1 Year 2
Dashboard users 3 12
Automated reports 2 8
Data freshness Daily Hourly (game days)
Budget efficiency $100/insight | $35/insight

Competitive Impact

While direct win attribution is difficult, coaching staff reported:

  1. Better Preparation: "We're not surprised by anything anymore" - DC
  2. Faster Adjustments: Halftime adjustments backed by data
  3. Recruiting Efficiency: More organized pursuit of fewer, better-fit prospects
  4. Conference Recognition: Asked to present approach at conference meetings

Key Lessons for Budget-Constrained Programs

1. Prioritize Ruthlessly

Focus on 2-3 high-impact features rather than building a comprehensive system.

2. Leverage Open Source

Commercial Tool Open Source Alternative Savings
Tableau Plotly Dash $8,400/year
Alteryx Python + Pandas $25,000/year
Snowflake PostgreSQL $12,000/year

3. Start Manual, Then Automate

Prove value with manual processes before investing in automation. This validates the workflow before engineering effort.

4. Use University Resources

  • Microsoft 365 for collaboration
  • Google Workspace (via university)
  • Student workers for data entry
  • IT department for hosting

5. Build Relationships

The analyst spent significant time with coaches to understand needs: - Weekly check-ins with coordinators - Monthly reviews with head coach - Attended practice to understand workflow

6. Document Everything

With a one-person team, documentation was essential for: - Knowledge continuity - Training future staff - Explaining methods to coaches


Conclusion

Mountain State's analytics journey demonstrates that competitive analytics capabilities don't require large budgets. Through strategic prioritization, open-source tools, and focus on high-impact features, a single analyst created meaningful competitive advantage.

The key insight: Start where you are, use what you have, do what you can.

For programs considering analytics investments, the Mountain State approach offers a template for proving value before scaling up.