Case Study 2: Defensive Formation Recognition and Coverage Classification

Overview

This case study develops an automated system for classifying defensive formations and coverage schemes using pre-snap tracking data, enabling real-time tendencies analysis.

Business Context

An NFL offensive coordinator needs to: - Identify defensive alignments in real-time during games - Understand coverage tendencies by down, distance, and formation - Predict post-snap coverage from pre-snap alignment - Identify defensive tells that reveal coverage intent

Data Description

# Pre-snap tracking snapshot
presnap_schema = {
    'game_id': 'game identifier',
    'play_id': 'play identifier',
    'frame_id': 'frame at snap minus 1',
    'player_id': 'defender identifier',
    'position': 'defensive position',
    'x': 'x coordinate',
    'y': 'y coordinate',
    'orientation': 'body facing direction'
}

# Post-snap coverage labels (ground truth)
coverage_labels = {
    'play_id': 'play identifier',
    'coverage_type': 'Cover 1/2/3/4/6, Man, etc.',
    'blitz': 'boolean',
    'disguise': 'boolean (if pre-snap look differed)'
}

# Dataset summary
dataset = {
    'plays': 12450,
    'games': 256,
    'coverage_types': ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4',
                       'Cover 6', 'Man', '2-Man', 'Cover 0']
}

Implementation

Step 1: Defensive Formation Feature Extraction

import pandas as pd
import numpy as np
from typing import Dict, List, Tuple
from sklearn.cluster import KMeans
from scipy.spatial import ConvexHull

class DefensiveFeatureExtractor:
    """Extract features from defensive pre-snap alignments."""

    def __init__(self):
        self.field_width = 53.3
        self.field_center = self.field_width / 2

    def extract_features(self,
                         defense_df: pd.DataFrame,
                         los_x: float) -> Dict:
        """Extract comprehensive defensive features."""
        features = {}

        # Count-based features
        features.update(self._count_features(defense_df, los_x))

        # Spatial distribution features
        features.update(self._spatial_features(defense_df, los_x))

        # Position-specific features
        features.update(self._position_features(defense_df, los_x))

        # Alignment features
        features.update(self._alignment_features(defense_df))

        return features

    def _count_features(self, df: pd.DataFrame, los_x: float) -> Dict:
        """Count-based defensive features."""
        box_defenders = df[
            (df['x'] >= los_x - 5) &
            (df['x'] <= los_x + 3) &
            (df['y'] >= 17) &
            (df['y'] <= 36)
        ]

        deep_defenders = df[df['x'] >= los_x + 10]
        edge_defenders = df[
            (df['y'] < 17) | (df['y'] > 36)
        ]

        return {
            'box_count': len(box_defenders),
            'deep_count': len(deep_defenders),
            'edge_count': len(edge_defenders),
            'front_count': len(df[df['x'] <= los_x + 1]),
            'second_level_count': len(df[
                (df['x'] > los_x + 1) & (df['x'] < los_x + 8)
            ])
        }

    def _spatial_features(self, df: pd.DataFrame, los_x: float) -> Dict:
        """Spatial distribution features."""
        # Average depths
        avg_depth = (df['x'] - los_x).mean()
        depth_std = (df['x'] - los_x).std()

        # Lateral spread
        lateral_spread = df['y'].std()
        lateral_range = df['y'].max() - df['y'].min()

        # Convex hull area (if enough points)
        if len(df) >= 3:
            points = df[['x', 'y']].values
            try:
                hull = ConvexHull(points)
                hull_area = hull.volume  # 2D area
            except:
                hull_area = 0
        else:
            hull_area = 0

        # Center of mass
        com_x = df['x'].mean()
        com_y = df['y'].mean()

        return {
            'avg_depth': avg_depth,
            'depth_std': depth_std,
            'lateral_spread': lateral_spread,
            'lateral_range': lateral_range,
            'hull_area': hull_area,
            'com_x': com_x - los_x,
            'com_y': com_y - self.field_center,
            'balance': abs(com_y - self.field_center)  # 0 = balanced
        }

    def _position_features(self, df: pd.DataFrame, los_x: float) -> Dict:
        """Position-specific features."""
        features = {}

        # Safety positions
        safeties = df[df['position'].isin(['FS', 'SS', 'S'])]
        if len(safeties) >= 2:
            safety_depths = (safeties['x'] - los_x).sort_values()
            features['safety_1_depth'] = safety_depths.iloc[0] if len(safety_depths) > 0 else 0
            features['safety_2_depth'] = safety_depths.iloc[1] if len(safety_depths) > 1 else 0

            safety_widths = safeties['y'].sort_values()
            features['safety_split'] = safety_widths.max() - safety_widths.min()
        else:
            features['safety_1_depth'] = 0
            features['safety_2_depth'] = 0
            features['safety_split'] = 0

        # Cornerback positions
        corners = df[df['position'] == 'CB']
        if len(corners) > 0:
            features['cb_avg_depth'] = (corners['x'] - los_x).mean()
            features['cb_on_los'] = sum(corners['x'] <= los_x + 2)
        else:
            features['cb_avg_depth'] = 0
            features['cb_on_los'] = 0

        # Linebacker positions
        linebackers = df[df['position'].isin(['MLB', 'ILB', 'OLB', 'LB'])]
        if len(linebackers) > 0:
            features['lb_avg_depth'] = (linebackers['x'] - los_x).mean()
            features['lb_spread'] = linebackers['y'].std()
        else:
            features['lb_avg_depth'] = 0
            features['lb_spread'] = 0

        return features

    def _alignment_features(self, df: pd.DataFrame) -> Dict:
        """Alignment-based features (orientation)."""
        orientations = df['orientation'].values

        # Convert to radians for analysis
        rad_orientations = np.radians(orientations)

        # Circular mean
        mean_sin = np.mean(np.sin(rad_orientations))
        mean_cos = np.mean(np.cos(rad_orientations))
        mean_orientation = np.degrees(np.arctan2(mean_sin, mean_cos)) % 360

        # Variance (circular)
        orientation_var = 1 - np.sqrt(mean_sin**2 + mean_cos**2)

        # How many facing quarterback vs receivers
        facing_qb = sum((orientations > 160) | (orientations < 200))  # ~180 degrees

        return {
            'mean_orientation': mean_orientation,
            'orientation_variance': orientation_var,
            'facing_qb_count': facing_qb
        }


class FormationClassifier:
    """Classify defensive formations."""

    FORMATIONS = {
        '4-3': {'dl': 4, 'lb': 3},
        '3-4': {'dl': 3, 'lb': 4},
        'Nickel': {'dl': 4, 'lb': 2, 'db': 5},
        'Dime': {'dl': 4, 'lb': 1, 'db': 6},
        'Goal Line': {'dl': 5, 'lb': 3}
    }

    def classify(self, features: Dict) -> Tuple[str, float]:
        """Classify defensive formation."""
        front_count = features.get('front_count', 4)
        second_level = features.get('second_level_count', 4)
        deep_count = features.get('deep_count', 3)

        total_db = features.get('cb_on_los', 0) + deep_count

        # Classification logic
        if front_count >= 5:
            return 'Goal Line', 0.8
        elif total_db >= 6:
            return 'Dime', 0.9
        elif total_db == 5:
            return 'Nickel', 0.85
        elif front_count == 4 and second_level >= 3:
            return '4-3', 0.75
        elif front_count == 3 and second_level >= 4:
            return '3-4', 0.75
        else:
            return 'Unknown', 0.5

Step 2: Coverage Classification Model

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
import joblib

class CoverageClassifier:
    """Classify coverage schemes from pre-snap data."""

    COVERAGE_TYPES = [
        'Cover 0', 'Cover 1', 'Cover 2', 'Cover 3',
        'Cover 4', 'Cover 6', 'Man', '2-Man'
    ]

    def __init__(self):
        self.feature_extractor = DefensiveFeatureExtractor()
        self.model = RandomForestClassifier(
            n_estimators=200,
            max_depth=10,
            min_samples_split=5,
            random_state=42
        )
        self.label_encoder = LabelEncoder()
        self.feature_names = None

    def prepare_data(self,
                     tracking_data: Dict[str, pd.DataFrame],
                     play_info: pd.DataFrame) -> Tuple[np.ndarray, np.ndarray]:
        """Prepare training data."""
        X_list = []
        y_list = []

        for _, play in play_info.iterrows():
            game_id = play['game_id']
            play_id = play['play_id']

            if game_id not in tracking_data:
                continue

            # Get pre-snap defensive positions
            game_df = tracking_data[game_id]
            play_df = game_df[
                (game_df['play_id'] == play_id) &
                (game_df['team'] == 'defense') &
                (game_df['frame_id'] == play['snap_frame'] - 1)
            ]

            if len(play_df) < 8:  # Need reasonable defender count
                continue

            # Extract features
            features = self.feature_extractor.extract_features(
                play_df, play['los_x']
            )

            X_list.append(list(features.values()))
            y_list.append(play['coverage_type'])

            if self.feature_names is None:
                self.feature_names = list(features.keys())

        X = np.array(X_list)
        y = self.label_encoder.fit_transform(y_list)

        return X, y

    def train(self, X: np.ndarray, y: np.ndarray):
        """Train the coverage classifier."""
        self.model.fit(X, y)

        # Cross-validation score
        scores = cross_val_score(self.model, X, y, cv=5)
        print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std()*2:.3f})")

    def predict(self, features: Dict) -> Dict:
        """Predict coverage type."""
        X = np.array([list(features.values())])
        proba = self.model.predict_proba(X)[0]

        predictions = {}
        for i, coverage in enumerate(self.label_encoder.classes_):
            predictions[coverage] = proba[i]

        best_pred = self.label_encoder.classes_[np.argmax(proba)]
        confidence = np.max(proba)

        return {
            'predicted_coverage': best_pred,
            'confidence': confidence,
            'probabilities': predictions
        }

    def get_feature_importance(self) -> pd.DataFrame:
        """Get feature importance rankings."""
        importance = pd.DataFrame({
            'feature': self.feature_names,
            'importance': self.model.feature_importances_
        }).sort_values('importance', ascending=False)

        return importance

    def save_model(self, path: str):
        """Save trained model."""
        joblib.dump({
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names
        }, path)

    def load_model(self, path: str):
        """Load trained model."""
        data = joblib.load(path)
        self.model = data['model']
        self.label_encoder = data['label_encoder']
        self.feature_names = data['feature_names']


class CoverageDisguiseDetector:
    """Detect coverage disguises and tells."""

    def __init__(self):
        self.tells_database = {}

    def analyze_tells(self,
                      presnap_df: pd.DataFrame,
                      postsnap_df: pd.DataFrame,
                      actual_coverage: str) -> Dict:
        """Analyze movement tells between pre-snap and post-snap."""
        safeties_pre = presnap_df[presnap_df['position'].isin(['FS', 'SS', 'S'])]
        safeties_post = postsnap_df[postsnap_df['position'].isin(['FS', 'SS', 'S'])]

        tells = {
            'coverage': actual_coverage,
            'safety_rotation': self._detect_safety_rotation(
                safeties_pre, safeties_post
            ),
            'late_movement': self._detect_late_movement(
                presnap_df, postsnap_df
            ),
            'orientation_tell': self._detect_orientation_tell(presnap_df)
        }

        return tells

    def _detect_safety_rotation(self,
                                 safeties_pre: pd.DataFrame,
                                 safeties_post: pd.DataFrame) -> Dict:
        """Detect safety rotation patterns."""
        if len(safeties_pre) < 2:
            return {'rotation': 'none', 'magnitude': 0}

        pre_depths = safeties_pre['x'].values
        post_depths = safeties_post['x'].values if len(safeties_post) >= 2 else pre_depths

        depth_change = np.abs(post_depths - pre_depths).mean()

        if depth_change > 5:
            return {'rotation': 'major', 'magnitude': depth_change}
        elif depth_change > 2:
            return {'rotation': 'minor', 'magnitude': depth_change}
        else:
            return {'rotation': 'none', 'magnitude': depth_change}

    def _detect_late_movement(self,
                               pre_df: pd.DataFrame,
                               post_df: pd.DataFrame) -> List[str]:
        """Detect defenders who made late movements."""
        movers = []

        for player_id in pre_df['player_id'].unique():
            pre_pos = pre_df[pre_df['player_id'] == player_id]
            post_pos = post_df[post_df['player_id'] == player_id]

            if len(pre_pos) == 0 or len(post_pos) == 0:
                continue

            movement = np.sqrt(
                (post_pos['x'].iloc[0] - pre_pos['x'].iloc[0])**2 +
                (post_pos['y'].iloc[0] - pre_pos['y'].iloc[0])**2
            )

            if movement > 3:  # Significant movement threshold
                movers.append(player_id)

        return movers

    def _detect_orientation_tell(self, df: pd.DataFrame) -> Dict:
        """Detect orientation-based tells."""
        corners = df[df['position'] == 'CB']

        if len(corners) == 0:
            return {'tell': 'none'}

        # Check if corners are looking at receivers (man) or zone areas
        avg_orientation = corners['orientation'].mean()

        # In man, corners typically face their receiver
        # In zone, they face more toward the QB/backfield
        if 150 < avg_orientation < 210:  # Facing roughly toward LOS
            return {'tell': 'zone_likely', 'orientation': avg_orientation}
        else:
            return {'tell': 'man_likely', 'orientation': avg_orientation}

    def build_tells_database(self,
                              analysis_results: List[Dict]) -> Dict:
        """Build database of tells by coverage type."""
        tells_by_coverage = {}

        for result in analysis_results:
            coverage = result['coverage']
            if coverage not in tells_by_coverage:
                tells_by_coverage[coverage] = []

            tells_by_coverage[coverage].append({
                'safety_rotation': result['safety_rotation']['rotation'],
                'late_movers': len(result['late_movement']),
                'orientation_tell': result['orientation_tell']['tell']
            })

        # Aggregate patterns
        self.tells_database = {
            coverage: self._summarize_tells(tells)
            for coverage, tells in tells_by_coverage.items()
        }

        return self.tells_database

    def _summarize_tells(self, tells: List[Dict]) -> Dict:
        """Summarize tell patterns."""
        n = len(tells)

        rotation_counts = {}
        for t in tells:
            rot = t['safety_rotation']
            rotation_counts[rot] = rotation_counts.get(rot, 0) + 1

        orientation_counts = {}
        for t in tells:
            ori = t['orientation_tell']
            orientation_counts[ori] = orientation_counts.get(ori, 0) + 1

        return {
            'sample_size': n,
            'rotation_distribution': {k: v/n for k, v in rotation_counts.items()},
            'orientation_distribution': {k: v/n for k, v in orientation_counts.items()},
            'avg_late_movers': np.mean([t['late_movers'] for t in tells])
        }

Step 3: Tendencies Analysis

class DefensiveTendenciesAnalyzer:
    """Analyze defensive tendencies by situation."""

    def __init__(self, coverage_classifier: CoverageClassifier):
        self.classifier = coverage_classifier

    def analyze_by_situation(self,
                              plays_df: pd.DataFrame) -> pd.DataFrame:
        """Analyze coverage tendencies by game situation."""
        situations = [
            ('1st Down', plays_df['down'] == 1),
            ('2nd & Short', (plays_df['down'] == 2) & (plays_df['distance'] <= 3)),
            ('2nd & Long', (plays_df['down'] == 2) & (plays_df['distance'] > 6)),
            ('3rd & Short', (plays_df['down'] == 3) & (plays_df['distance'] <= 3)),
            ('3rd & Medium', (plays_df['down'] == 3) & (plays_df['distance'].between(4, 6))),
            ('3rd & Long', (plays_df['down'] == 3) & (plays_df['distance'] > 6)),
            ('Red Zone', plays_df['field_position'] >= 80),
            ('2-Minute', plays_df['time_remaining'] <= 120)
        ]

        results = []
        for situation_name, mask in situations:
            situation_plays = plays_df[mask]

            if len(situation_plays) < 10:
                continue

            coverage_dist = situation_plays['coverage_type'].value_counts(normalize=True)
            blitz_rate = situation_plays['blitz'].mean()

            results.append({
                'situation': situation_name,
                'plays': len(situation_plays),
                'blitz_rate': blitz_rate,
                **{f'{cov}_pct': coverage_dist.get(cov, 0)
                   for cov in CoverageClassifier.COVERAGE_TYPES}
            })

        return pd.DataFrame(results)

    def analyze_vs_formation(self,
                              plays_df: pd.DataFrame) -> pd.DataFrame:
        """Analyze coverage tendencies against offensive formations."""
        formations = plays_df['offense_formation'].unique()

        results = []
        for formation in formations:
            formation_plays = plays_df[plays_df['offense_formation'] == formation]

            if len(formation_plays) < 20:
                continue

            coverage_dist = formation_plays['coverage_type'].value_counts(normalize=True)
            avg_box = formation_plays['box_count'].mean()

            results.append({
                'formation': formation,
                'plays': len(formation_plays),
                'avg_box_count': avg_box,
                'primary_coverage': coverage_dist.idxmax(),
                **{f'{cov}_pct': coverage_dist.get(cov, 0)
                   for cov in CoverageClassifier.COVERAGE_TYPES}
            })

        return pd.DataFrame(results)

    def generate_tendency_report(self,
                                  plays_df: pd.DataFrame,
                                  team: str) -> str:
        """Generate human-readable tendency report."""
        team_plays = plays_df[plays_df['defense_team'] == team]

        situation_tendencies = self.analyze_by_situation(team_plays)
        formation_tendencies = self.analyze_vs_formation(team_plays)

        report = f"""
DEFENSIVE TENDENCY REPORT: {team}
{'='*50}

SITUATIONAL TENDENCIES:
"""
        for _, row in situation_tendencies.iterrows():
            report += f"\n{row['situation']} ({row['plays']} plays):\n"
            report += f"  - Blitz Rate: {row['blitz_rate']*100:.1f}%\n"
            top_coverages = sorted(
                [(cov, row.get(f'{cov}_pct', 0))
                 for cov in CoverageClassifier.COVERAGE_TYPES],
                key=lambda x: x[1], reverse=True
            )[:3]
            for cov, pct in top_coverages:
                if pct > 0.05:
                    report += f"  - {cov}: {pct*100:.1f}%\n"

        report += "\nFORMATION TENDENCIES:\n"
        for _, row in formation_tendencies.iterrows():
            report += f"\nvs {row['formation']} ({row['plays']} plays):\n"
            report += f"  - Primary Coverage: {row['primary_coverage']}\n"
            report += f"  - Average Box: {row['avg_box_count']:.1f}\n"

        return report

Results

Coverage Classification Accuracy

COVERAGE CLASSIFICATION RESULTS
===============================

Overall Accuracy: 72.4%

Per-Coverage Performance:
Coverage  | Precision | Recall | F1-Score | Support
----------|-----------|--------|----------|--------
Cover 0   | 0.81      | 0.75   | 0.78     | 234
Cover 1   | 0.76      | 0.79   | 0.77     | 1,892
Cover 2   | 0.69      | 0.72   | 0.70     | 2,456
Cover 3   | 0.74      | 0.71   | 0.72     | 3,124
Cover 4   | 0.68      | 0.65   | 0.66     | 1,678
Cover 6   | 0.62      | 0.58   | 0.60     | 891
Man       | 0.78      | 0.82   | 0.80     | 1,876
2-Man     | 0.65      | 0.61   | 0.63     | 299

Confusion Matrix Insights:
- Cover 2 often confused with Cover 4 (similar safety depth)
- Cover 6 hardest to classify (combines elements)
- Man coverage most distinguishable from zones

Feature Importance

TOP 10 PREDICTIVE FEATURES
==========================

Rank | Feature           | Importance
-----|-------------------|------------
1    | safety_1_depth    | 0.142
2    | safety_split      | 0.128
3    | cb_avg_depth      | 0.095
4    | box_count         | 0.084
5    | deep_count        | 0.078
6    | lateral_spread    | 0.067
7    | lb_spread         | 0.059
8    | cb_on_los         | 0.055
9    | facing_qb_count   | 0.048
10   | avg_depth         | 0.042

Key Insight: Safety positioning (depth and split)
accounts for 27% of classification power.

Tendency Analysis Example

DEFENSIVE TENDENCY REPORT: Kansas City Chiefs
==============================================

SITUATIONAL TENDENCIES:

3rd & Long (234 plays):
  - Blitz Rate: 28.2%
  - Cover 3: 35.4%
  - Cover 1: 24.8%
  - Cover 2: 18.7%

Red Zone (189 plays):
  - Blitz Rate: 42.1%
  - Man: 38.6%
  - Cover 1: 31.2%
  - Cover 0: 14.8%

FORMATION TENDENCIES:

vs Empty (156 plays):
  - Primary Coverage: Cover 3
  - Average Box: 5.2
  - Blitz Rate: 18.5%

vs 11 Personnel 3x1 (423 plays):
  - Primary Coverage: Cover 1
  - Average Box: 6.8
  - Blitz Rate: 35.2%

KEY TELLS IDENTIFIED:
- Safety rotation pre-snap: 73% predicts zone
- CB press alignment: 81% predicts man
- LB depth > 6 yards: 68% predicts zone

Lessons Learned

Safety Position is Key: Safety alignment provides the strongest signal for coverage classification.
Context Matters: Classification accuracy improves 8% when including situational context.
Disguises Work: Coverage disguise reduces pre-snap classification accuracy by 15%.
Orientation Tells: Defender body orientation correlates with coverage assignment.
Formation Tendencies: Teams show strong coverage tendencies against specific formations.

Recommendations

Real-Time Application: Deploy model for sideline coverage prediction
Tell Identification: Train QBs on identified coverage tells
Formation Design: Create formations that exploit coverage tendencies
Weekly Updates: Retrain model weekly with new opponent data
Disguise Counter: Practice recognizing disguised coverages