2 min read

Football is fundamentally a game of connections—between players, between schemes, and between plays. Network analysis provides powerful tools for visualizing and quantifying these relationships, revealing insights that traditional statistics miss...

Chapter 23: Network Analysis in Football

Introduction

Football is fundamentally a game of connections—between players, between schemes, and between plays. Network analysis provides powerful tools for visualizing and quantifying these relationships, revealing insights that traditional statistics miss. This chapter explores how graph theory and network science can illuminate the hidden structure of football, from passing networks to coaching trees and recruiting pipelines.

Learning Objectives

By the end of this chapter, you will be able to:

  1. Construct and analyze passing networks for offensive evaluation
  2. Calculate network centrality metrics for player importance
  3. Build and visualize coaching trees and their influence
  4. Analyze recruiting networks and pipeline relationships
  5. Apply community detection to identify play-calling patterns
  6. Use network metrics for strategic insights

23.1 Fundamentals of Network Analysis

23.1.1 Graph Theory Basics

Networks consist of nodes (vertices) connected by edges (links). In football contexts:

import networkx as nx
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict

@dataclass
class FootballNode:
    """Represents a node in a football network."""
    node_id: str
    node_type: str  # 'player', 'team', 'coach', 'school'
    attributes: Dict


@dataclass
class FootballEdge:
    """Represents an edge in a football network."""
    source: str
    target: str
    weight: float
    edge_type: str
    attributes: Dict


class FootballNetwork:
    """Base class for football network analysis."""

    def __init__(self, directed: bool = True):
        self.graph = nx.DiGraph() if directed else nx.Graph()

    def add_node(self, node: FootballNode):
        """Add a node to the network."""
        self.graph.add_node(
            node.node_id,
            node_type=node.node_type,
            **node.attributes
        )

    def add_edge(self, edge: FootballEdge):
        """Add an edge to the network."""
        self.graph.add_edge(
            edge.source,
            edge.target,
            weight=edge.weight,
            edge_type=edge.edge_type,
            **edge.attributes
        )

    def get_basic_stats(self) -> Dict[str, any]:
        """Get basic network statistics."""
        return {
            'nodes': self.graph.number_of_nodes(),
            'edges': self.graph.number_of_edges(),
            'density': nx.density(self.graph),
            'is_connected': nx.is_weakly_connected(self.graph) if self.graph.is_directed()
                           else nx.is_connected(self.graph)
        }

23.1.2 Types of Football Networks

FOOTBALL NETWORK TYPES
======================

1. PASSING NETWORKS
   Nodes: Players (QB, receivers, RBs)
   Edges: Passes thrown/caught
   Weight: Completion count, yards, or EPA

2. BLOCKING NETWORKS
   Nodes: Offensive linemen, blockers
   Edges: Blocking assignments
   Weight: Block success rate

3. COACHING NETWORKS
   Nodes: Coaches
   Edges: Mentor-protégé relationships
   Weight: Years together, role similarity

4. RECRUITING NETWORKS
   Nodes: High schools, colleges, players
   Edges: Recruiting relationships
   Weight: Number of recruits, star ratings

5. PLAY SIMILARITY NETWORKS
   Nodes: Plays/formations
   Edges: Strategic similarity
   Weight: Conceptual overlap

6. CONFERENCE/RIVALRY NETWORKS
   Nodes: Teams
   Edges: Games played, rivalries
   Weight: Historical matchups

23.2 Passing Networks

23.2.1 Building Passing Networks

class PassingNetwork(FootballNetwork):
    """Network analysis of passing connections."""

    def __init__(self):
        super().__init__(directed=True)
        self.qb_nodes = set()
        self.receiver_nodes = set()

    def build_from_plays(self, plays: pd.DataFrame):
        """Build passing network from play-by-play data."""
        # Filter to pass plays
        passes = plays[plays['play_type'] == 'pass'].copy()

        # Add QB nodes
        for qb in passes['passer'].unique():
            if pd.notna(qb):
                self.add_node(FootballNode(
                    node_id=qb,
                    node_type='qb',
                    attributes={'position': 'QB'}
                ))
                self.qb_nodes.add(qb)

        # Add receiver nodes
        for receiver in passes['receiver'].unique():
            if pd.notna(receiver):
                self.add_node(FootballNode(
                    node_id=receiver,
                    node_type='receiver',
                    attributes={}
                ))
                self.receiver_nodes.add(receiver)

        # Add edges (passes)
        pass_counts = passes.groupby(['passer', 'receiver']).agg({
            'play_id': 'count',
            'yards_gained': 'sum',
            'epa': 'sum',
            'complete': 'sum'
        }).reset_index()

        for _, row in pass_counts.iterrows():
            if pd.notna(row['passer']) and pd.notna(row['receiver']):
                self.add_edge(FootballEdge(
                    source=row['passer'],
                    target=row['receiver'],
                    weight=row['play_id'],
                    edge_type='pass',
                    attributes={
                        'attempts': row['play_id'],
                        'completions': row['complete'],
                        'yards': row['yards_gained'],
                        'epa': row['epa'],
                        'comp_pct': row['complete'] / row['play_id']
                    }
                ))

    def calculate_target_share(self) -> pd.DataFrame:
        """Calculate target share for each receiver."""
        results = []

        for qb in self.qb_nodes:
            # Get all edges from this QB
            qb_edges = [
                (u, v, d) for u, v, d in self.graph.out_edges(qb, data=True)
            ]
            total_targets = sum(d['weight'] for _, _, d in qb_edges)

            for _, receiver, data in qb_edges:
                results.append({
                    'qb': qb,
                    'receiver': receiver,
                    'targets': data['weight'],
                    'target_share': data['weight'] / total_targets if total_targets > 0 else 0,
                    'completions': data['attributes']['completions'],
                    'yards': data['attributes']['yards'],
                    'epa': data['attributes']['epa']
                })

        return pd.DataFrame(results).sort_values('target_share', ascending=False)

    def get_receiver_centrality(self) -> pd.DataFrame:
        """Calculate centrality metrics for receivers."""
        # In-degree centrality (targets received)
        in_degree = nx.in_degree_centrality(self.graph)

        # Weighted in-degree
        weighted_in = {
            node: sum(d['weight'] for _, _, d in self.graph.in_edges(node, data=True))
            for node in self.receiver_nodes
        }
        max_weighted = max(weighted_in.values()) if weighted_in else 1
        weighted_in_norm = {k: v/max_weighted for k, v in weighted_in.items()}

        # PageRank
        pagerank = nx.pagerank(self.graph, weight='weight')

        results = []
        for receiver in self.receiver_nodes:
            results.append({
                'receiver': receiver,
                'in_degree_centrality': in_degree.get(receiver, 0),
                'weighted_centrality': weighted_in_norm.get(receiver, 0),
                'pagerank': pagerank.get(receiver, 0)
            })

        return pd.DataFrame(results).sort_values('pagerank', ascending=False)


class TeamPassingNetworkAnalyzer:
    """Analyze passing networks across teams."""

    def __init__(self):
        self.team_networks: Dict[str, PassingNetwork] = {}

    def build_team_networks(self, plays: pd.DataFrame):
        """Build passing networks for each team."""
        for team in plays['offense_team'].unique():
            team_plays = plays[plays['offense_team'] == team]
            network = PassingNetwork()
            network.build_from_plays(team_plays)
            self.team_networks[team] = network

    def compare_team_structures(self) -> pd.DataFrame:
        """Compare structural properties across teams."""
        results = []

        for team, network in self.team_networks.items():
            stats = network.get_basic_stats()

            # Additional metrics
            if network.graph.number_of_edges() > 0:
                # Concentration (how spread out are targets?)
                target_shares = network.calculate_target_share()
                if len(target_shares) > 0:
                    hhi = (target_shares['target_share'] ** 2).sum()
                else:
                    hhi = 1.0
            else:
                hhi = 1.0

            results.append({
                'team': team,
                'receivers': len(network.receiver_nodes),
                'connections': stats['edges'],
                'density': stats['density'],
                'target_hhi': hhi,  # Higher = more concentrated
                'target_entropy': self._calculate_entropy(network)
            })

        return pd.DataFrame(results).sort_values('target_entropy', ascending=False)

    def _calculate_entropy(self, network: PassingNetwork) -> float:
        """Calculate target distribution entropy."""
        shares = network.calculate_target_share()
        if len(shares) == 0:
            return 0

        probs = shares['target_share'].values
        probs = probs[probs > 0]

        if len(probs) == 0:
            return 0

        return -np.sum(probs * np.log2(probs))

23.2.2 Network Visualization

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

class PassingNetworkVisualizer:
    """Visualize passing networks."""

    def __init__(self, network: PassingNetwork):
        self.network = network

    def draw_network(
        self,
        figsize: Tuple[int, int] = (14, 10),
        min_edge_weight: int = 3,
        node_size_factor: float = 100,
        title: str = "Passing Network"
    ) -> plt.Figure:
        """Draw the passing network."""
        fig, ax = plt.subplots(figsize=figsize)

        # Filter edges by minimum weight
        edges_to_draw = [
            (u, v) for u, v, d in self.network.graph.edges(data=True)
            if d['weight'] >= min_edge_weight
        ]

        # Create subgraph
        subgraph = self.network.graph.edge_subgraph(edges_to_draw)

        # Position nodes
        pos = self._get_positions(subgraph)

        # Node sizes based on involvement
        node_sizes = []
        for node in subgraph.nodes():
            in_weight = sum(
                d['weight'] for _, _, d in subgraph.in_edges(node, data=True)
            )
            out_weight = sum(
                d['weight'] for _, _, d in subgraph.out_edges(node, data=True)
            )
            node_sizes.append((in_weight + out_weight) * node_size_factor)

        # Node colors
        node_colors = [
            '#e74c3c' if node in self.network.qb_nodes else '#3498db'
            for node in subgraph.nodes()
        ]

        # Edge widths
        edge_widths = [
            subgraph[u][v]['weight'] / 5
            for u, v in edges_to_draw
        ]

        # Draw
        nx.draw_networkx_nodes(
            subgraph, pos, ax=ax,
            node_size=node_sizes,
            node_color=node_colors,
            alpha=0.8
        )

        nx.draw_networkx_edges(
            subgraph, pos, ax=ax,
            edgelist=edges_to_draw,
            width=edge_widths,
            alpha=0.5,
            edge_color='gray',
            arrows=True,
            arrowsize=15,
            connectionstyle='arc3,rad=0.1'
        )

        nx.draw_networkx_labels(
            subgraph, pos, ax=ax,
            font_size=9,
            font_weight='bold'
        )

        # Legend
        qb_patch = mpatches.Patch(color='#e74c3c', label='QB')
        rec_patch = mpatches.Patch(color='#3498db', label='Receiver')
        ax.legend(handles=[qb_patch, rec_patch], loc='upper left')

        ax.set_title(title, fontsize=14, fontweight='bold')
        ax.axis('off')

        return fig

    def _get_positions(self, graph: nx.Graph) -> Dict:
        """Calculate node positions for visualization."""
        # Use spring layout with QB at center
        pos = nx.spring_layout(graph, k=2, iterations=50)

        # Adjust QB position to center
        for qb in self.network.qb_nodes:
            if qb in pos:
                pos[qb] = np.array([0.5, 0.8])

        return pos

    def draw_target_distribution(
        self,
        top_n: int = 10
    ) -> plt.Figure:
        """Draw target distribution chart."""
        shares = self.network.calculate_target_share()

        if len(shares) == 0:
            fig, ax = plt.subplots(figsize=(10, 6))
            ax.text(0.5, 0.5, 'No passing data available',
                    ha='center', va='center')
            return fig

        top_receivers = shares.nlargest(top_n, 'target_share')

        fig, ax = plt.subplots(figsize=(12, 6))

        bars = ax.barh(
            top_receivers['receiver'],
            top_receivers['target_share'] * 100,
            color='#3498db',
            alpha=0.8
        )

        ax.set_xlabel('Target Share (%)', fontsize=12)
        ax.set_ylabel('Receiver', fontsize=12)
        ax.set_title('Target Distribution', fontsize=14, fontweight='bold')

        # Add value labels
        for bar, share in zip(bars, top_receivers['target_share']):
            ax.text(
                bar.get_width() + 0.5,
                bar.get_y() + bar.get_height()/2,
                f'{share*100:.1f}%',
                va='center'
            )

        ax.invert_yaxis()
        plt.tight_layout()

        return fig

23.3 Centrality Metrics in Football

23.3.1 Understanding Centrality

class CentralityAnalyzer:
    """Analyze node centrality in football networks."""

    CENTRALITY_INTERPRETATIONS = {
        'degree': 'Number of direct connections',
        'betweenness': 'Importance as a connector between others',
        'closeness': 'Average distance to all other nodes',
        'eigenvector': 'Connected to other important nodes',
        'pagerank': 'Importance based on who connects to you'
    }

    def __init__(self, graph: nx.Graph):
        self.graph = graph

    def calculate_all_centralities(self) -> pd.DataFrame:
        """Calculate all centrality metrics for nodes."""
        # Degree centrality
        if self.graph.is_directed():
            in_degree = nx.in_degree_centrality(self.graph)
            out_degree = nx.out_degree_centrality(self.graph)
        else:
            in_degree = nx.degree_centrality(self.graph)
            out_degree = in_degree

        # Betweenness
        betweenness = nx.betweenness_centrality(self.graph, weight='weight')

        # Closeness
        closeness = nx.closeness_centrality(self.graph)

        # Eigenvector (may fail for directed graphs)
        try:
            eigenvector = nx.eigenvector_centrality(
                self.graph, max_iter=1000, weight='weight'
            )
        except:
            eigenvector = {node: 0 for node in self.graph.nodes()}

        # PageRank
        pagerank = nx.pagerank(self.graph, weight='weight')

        # Combine results
        results = []
        for node in self.graph.nodes():
            results.append({
                'node': node,
                'in_degree': in_degree.get(node, 0),
                'out_degree': out_degree.get(node, 0),
                'betweenness': betweenness.get(node, 0),
                'closeness': closeness.get(node, 0),
                'eigenvector': eigenvector.get(node, 0),
                'pagerank': pagerank.get(node, 0)
            })

        return pd.DataFrame(results)

    def identify_key_players(
        self,
        centrality_df: pd.DataFrame,
        metric: str = 'pagerank',
        top_n: int = 10
    ) -> pd.DataFrame:
        """Identify most central players by metric."""
        return centrality_df.nlargest(top_n, metric)[['node', metric]]

    def analyze_network_structure(self) -> Dict[str, float]:
        """Analyze overall network structure."""
        return {
            'nodes': self.graph.number_of_nodes(),
            'edges': self.graph.number_of_edges(),
            'density': nx.density(self.graph),
            'avg_clustering': nx.average_clustering(self.graph),
            'transitivity': nx.transitivity(self.graph),
            'avg_path_length': self._safe_avg_path_length()
        }

    def _safe_avg_path_length(self) -> float:
        """Calculate average path length safely."""
        try:
            if self.graph.is_directed():
                if nx.is_weakly_connected(self.graph):
                    # Use largest strongly connected component
                    largest_scc = max(
                        nx.strongly_connected_components(self.graph),
                        key=len
                    )
                    subgraph = self.graph.subgraph(largest_scc)
                    return nx.average_shortest_path_length(subgraph)
            else:
                if nx.is_connected(self.graph):
                    return nx.average_shortest_path_length(self.graph)
        except:
            pass
        return float('inf')

23.3.2 Applying Centrality to Player Roles

class PlayerRoleAnalyzer:
    """Analyze player roles using network centrality."""

    def __init__(self, passing_network: PassingNetwork):
        self.network = passing_network
        self.centrality_analyzer = CentralityAnalyzer(passing_network.graph)

    def classify_receiver_roles(self) -> pd.DataFrame:
        """Classify receivers based on network position."""
        centralities = self.centrality_analyzer.calculate_all_centralities()

        # Filter to receivers
        receivers = centralities[
            centralities['node'].isin(self.network.receiver_nodes)
        ].copy()

        if len(receivers) == 0:
            return pd.DataFrame()

        # Normalize metrics
        for col in ['in_degree', 'betweenness', 'pagerank']:
            if col in receivers.columns:
                max_val = receivers[col].max()
                if max_val > 0:
                    receivers[f'{col}_norm'] = receivers[col] / max_val
                else:
                    receivers[f'{col}_norm'] = 0

        # Classify roles
        receivers['role'] = receivers.apply(self._classify_role, axis=1)

        return receivers

    def _classify_role(self, row: pd.Series) -> str:
        """Classify receiver role based on metrics."""
        # High volume = primary target
        if row.get('in_degree_norm', 0) > 0.7:
            return 'Primary Target'
        # High betweenness = connector (used in various situations)
        elif row.get('betweenness_norm', 0) > 0.5:
            return 'Versatile Option'
        # High pagerank but lower volume = efficient
        elif row.get('pagerank_norm', 0) > 0.5:
            return 'High-Value Target'
        else:
            return 'Role Player'

    def identify_qb_tendencies(self) -> Dict[str, any]:
        """Analyze QB tendencies from network structure."""
        results = {}

        for qb in self.network.qb_nodes:
            # Get outgoing edges
            out_edges = list(self.network.graph.out_edges(qb, data=True))

            if len(out_edges) == 0:
                continue

            # Calculate metrics
            total_targets = sum(d['weight'] for _, _, d in out_edges)
            unique_targets = len(out_edges)

            # Target concentration
            shares = [d['weight'] / total_targets for _, _, d in out_edges]
            hhi = sum(s ** 2 for s in shares)

            # Favorite target
            favorite = max(out_edges, key=lambda x: x[2]['weight'])

            results[qb] = {
                'total_targets': total_targets,
                'unique_receivers': unique_targets,
                'concentration_hhi': hhi,
                'favorite_target': favorite[1],
                'favorite_share': favorite[2]['weight'] / total_targets,
                'tendency': 'Spread' if hhi < 0.2 else 'Concentrated'
            }

        return results

23.4 Coaching Trees and Influence Networks

23.4.1 Building Coaching Networks

class CoachingNetwork(FootballNetwork):
    """Network of coaching relationships."""

    def __init__(self):
        super().__init__(directed=True)

    def build_from_history(self, coaching_history: pd.DataFrame):
        """
        Build network from coaching history data.

        Expected columns:
        - coach_name
        - year
        - team
        - position
        - head_coach (who was HC when they were there)
        """
        # Add coach nodes
        for coach in coaching_history['coach_name'].unique():
            coach_data = coaching_history[
                coaching_history['coach_name'] == coach
            ].iloc[0]

            self.add_node(FootballNode(
                node_id=coach,
                node_type='coach',
                attributes={
                    'first_year': coaching_history[
                        coaching_history['coach_name'] == coach
                    ]['year'].min()
                }
            ))

        # Build mentor-mentee relationships
        for _, row in coaching_history.iterrows():
            mentor = row['head_coach']
            mentee = row['coach_name']

            if pd.notna(mentor) and mentor != mentee:
                # Check if edge exists
                if self.graph.has_edge(mentor, mentee):
                    # Increment weight (years together)
                    self.graph[mentor][mentee]['weight'] += 1
                else:
                    self.add_edge(FootballEdge(
                        source=mentor,
                        target=mentee,
                        weight=1,
                        edge_type='mentor',
                        attributes={
                            'team': row['team'],
                            'first_year': row['year']
                        }
                    ))

    def get_coaching_tree(self, head_coach: str) -> Dict[str, List[str]]:
        """Get coaching tree for a head coach."""
        tree = {'root': head_coach, 'direct': [], 'second_gen': [], 'third_gen': []}

        # Direct mentees
        direct = list(self.graph.successors(head_coach))
        tree['direct'] = direct

        # Second generation
        for mentee in direct:
            second = list(self.graph.successors(mentee))
            tree['second_gen'].extend(second)

        # Third generation
        for coach in tree['second_gen']:
            third = list(self.graph.successors(coach))
            tree['third_gen'].extend(third)

        return tree

    def calculate_coaching_influence(self) -> pd.DataFrame:
        """Calculate influence metrics for coaches."""
        results = []

        for coach in self.graph.nodes():
            # Direct mentees who became HCs
            mentees = list(self.graph.successors(coach))
            mentee_count = len(mentees)

            # Total descendants (full tree)
            descendants = nx.descendants(self.graph, coach)
            descendant_count = len(descendants)

            # Years of mentoring
            total_years = sum(
                d['weight'] for _, _, d in self.graph.out_edges(coach, data=True)
            )

            results.append({
                'coach': coach,
                'direct_mentees': mentee_count,
                'total_descendants': descendant_count,
                'mentoring_years': total_years,
                'avg_years_per_mentee': total_years / mentee_count if mentee_count > 0 else 0
            })

        return pd.DataFrame(results).sort_values('total_descendants', ascending=False)


class CoachingTreeVisualizer:
    """Visualize coaching trees."""

    def __init__(self, network: CoachingNetwork):
        self.network = network

    def draw_tree(
        self,
        root_coach: str,
        max_depth: int = 3,
        figsize: Tuple[int, int] = (16, 12)
    ) -> plt.Figure:
        """Draw coaching tree from root coach."""
        # Get subgraph for tree
        tree_nodes = {root_coach}
        current_level = {root_coach}

        for _ in range(max_depth):
            next_level = set()
            for coach in current_level:
                successors = set(self.network.graph.successors(coach))
                next_level.update(successors)
            tree_nodes.update(next_level)
            current_level = next_level

        subgraph = self.network.graph.subgraph(tree_nodes)

        # Create figure
        fig, ax = plt.subplots(figsize=figsize)

        # Use hierarchical layout
        pos = self._hierarchical_layout(subgraph, root_coach)

        # Node sizes based on descendants
        node_sizes = []
        for node in subgraph.nodes():
            descendants = len(nx.descendants(self.network.graph, node))
            node_sizes.append(100 + descendants * 50)

        # Draw
        nx.draw_networkx_nodes(
            subgraph, pos, ax=ax,
            node_size=node_sizes,
            node_color='#2ecc71',
            alpha=0.8
        )

        nx.draw_networkx_edges(
            subgraph, pos, ax=ax,
            edge_color='gray',
            alpha=0.5,
            arrows=True,
            arrowsize=15
        )

        nx.draw_networkx_labels(
            subgraph, pos, ax=ax,
            font_size=8
        )

        ax.set_title(f"Coaching Tree: {root_coach}", fontsize=14, fontweight='bold')
        ax.axis('off')

        return fig

    def _hierarchical_layout(
        self,
        graph: nx.Graph,
        root: str
    ) -> Dict[str, np.ndarray]:
        """Create hierarchical layout for tree visualization."""
        # BFS to get levels
        levels = {root: 0}
        queue = [root]

        while queue:
            current = queue.pop(0)
            current_level = levels[current]

            for successor in graph.successors(current):
                if successor not in levels:
                    levels[successor] = current_level + 1
                    queue.append(successor)

        # Group nodes by level
        level_nodes = defaultdict(list)
        for node, level in levels.items():
            level_nodes[level].append(node)

        # Calculate positions
        pos = {}
        max_level = max(levels.values()) if levels else 0

        for level, nodes in level_nodes.items():
            y = 1 - level / (max_level + 1)
            for i, node in enumerate(nodes):
                x = (i + 1) / (len(nodes) + 1)
                pos[node] = np.array([x, y])

        return pos

23.5 Recruiting Networks

23.5.1 Pipeline Analysis

class RecruitingNetwork(FootballNetwork):
    """Network of recruiting relationships."""

    def __init__(self):
        super().__init__(directed=True)

    def build_from_recruiting_data(self, recruits: pd.DataFrame):
        """
        Build network from recruiting data.

        Expected columns:
        - player_name
        - high_school
        - city, state
        - college
        - star_rating
        - position
        """
        # Add high school nodes
        for _, row in recruits.drop_duplicates('high_school').iterrows():
            self.add_node(FootballNode(
                node_id=row['high_school'],
                node_type='high_school',
                attributes={
                    'city': row.get('city', ''),
                    'state': row.get('state', '')
                }
            ))

        # Add college nodes
        for college in recruits['college'].unique():
            self.add_node(FootballNode(
                node_id=college,
                node_type='college',
                attributes={}
            ))

        # Add recruiting edges
        pipeline = recruits.groupby(['high_school', 'college']).agg({
            'player_name': 'count',
            'star_rating': 'mean'
        }).reset_index()

        for _, row in pipeline.iterrows():
            self.add_edge(FootballEdge(
                source=row['high_school'],
                target=row['college'],
                weight=row['player_name'],
                edge_type='recruited',
                attributes={
                    'avg_stars': row['star_rating']
                }
            ))

    def identify_pipeline_schools(
        self,
        college: str,
        min_recruits: int = 3
    ) -> pd.DataFrame:
        """Identify pipeline high schools for a college."""
        # Get incoming edges to college
        in_edges = [
            (u, v, d) for u, v, d in self.graph.in_edges(college, data=True)
            if d['weight'] >= min_recruits
        ]

        results = []
        for hs, _, data in in_edges:
            results.append({
                'high_school': hs,
                'recruits': data['weight'],
                'avg_stars': data['attributes'].get('avg_stars', 0),
                'state': self.graph.nodes[hs].get('state', '')
            })

        return pd.DataFrame(results).sort_values('recruits', ascending=False)

    def analyze_geographic_reach(self, college: str) -> Dict[str, any]:
        """Analyze geographic recruiting reach."""
        in_edges = list(self.graph.in_edges(college, data=True))

        if len(in_edges) == 0:
            return {}

        # Get state distribution
        states = defaultdict(int)
        for hs, _, data in in_edges:
            state = self.graph.nodes[hs].get('state', 'Unknown')
            states[state] += data['weight']

        total = sum(states.values())

        return {
            'total_states': len(states),
            'total_recruits': total,
            'state_distribution': dict(states),
            'top_state': max(states, key=states.get),
            'concentration': max(states.values()) / total if total > 0 else 0
        }


class RecruitingNetworkAnalyzer:
    """Analyze recruiting network patterns."""

    def __init__(self, network: RecruitingNetwork):
        self.network = network

    def find_competing_schools(
        self,
        college: str,
        min_overlap: int = 5
    ) -> pd.DataFrame:
        """Find schools that recruit from same high schools."""
        # Get high schools that send to this college
        my_hs = set(
            u for u, v, _ in self.network.graph.in_edges(college, data=True)
        )

        # Find other colleges that recruit from same high schools
        competitors = defaultdict(int)

        for hs in my_hs:
            for _, target, data in self.network.graph.out_edges(hs, data=True):
                if target != college:
                    competitors[target] += data['weight']

        results = [
            {'competitor': c, 'shared_recruits': count}
            for c, count in competitors.items()
            if count >= min_overlap
        ]

        return pd.DataFrame(results).sort_values('shared_recruits', ascending=False)

    def calculate_recruiting_centrality(self) -> pd.DataFrame:
        """Calculate centrality metrics for colleges."""
        # Only consider college nodes
        college_nodes = [
            n for n, d in self.network.graph.nodes(data=True)
            if d.get('node_type') == 'college'
        ]

        results = []
        for college in college_nodes:
            # Number of feeder schools
            feeders = self.network.graph.in_degree(college)

            # Total recruits
            total_recruits = sum(
                d['weight'] for _, _, d in
                self.network.graph.in_edges(college, data=True)
            )

            # Average star rating
            stars = [
                d['attributes'].get('avg_stars', 0) * d['weight']
                for _, _, d in self.network.graph.in_edges(college, data=True)
            ]
            avg_stars = sum(stars) / total_recruits if total_recruits > 0 else 0

            # Diversity (number of states)
            states = set(
                self.network.graph.nodes[u].get('state', '')
                for u, _, _ in self.network.graph.in_edges(college, data=True)
            )

            results.append({
                'college': college,
                'feeder_schools': feeders,
                'total_recruits': total_recruits,
                'avg_star_rating': avg_stars,
                'state_diversity': len(states)
            })

        return pd.DataFrame(results).sort_values('total_recruits', ascending=False)

23.6 Community Detection

23.6.1 Finding Communities in Football Networks

from community import community_louvain  # python-louvain package

class CommunityAnalyzer:
    """Detect and analyze communities in football networks."""

    def __init__(self, graph: nx.Graph):
        self.graph = graph
        self.communities = None

    def detect_communities_louvain(self) -> Dict[str, int]:
        """Detect communities using Louvain algorithm."""
        # Convert to undirected for community detection
        if self.graph.is_directed():
            undirected = self.graph.to_undirected()
        else:
            undirected = self.graph

        self.communities = community_louvain.best_partition(undirected)
        return self.communities

    def get_community_summary(self) -> pd.DataFrame:
        """Summarize detected communities."""
        if self.communities is None:
            self.detect_communities_louvain()

        # Group nodes by community
        community_nodes = defaultdict(list)
        for node, comm in self.communities.items():
            community_nodes[comm].append(node)

        results = []
        for comm_id, nodes in community_nodes.items():
            subgraph = self.graph.subgraph(nodes)
            results.append({
                'community': comm_id,
                'size': len(nodes),
                'internal_edges': subgraph.number_of_edges(),
                'density': nx.density(subgraph),
                'sample_members': nodes[:5]
            })

        return pd.DataFrame(results).sort_values('size', ascending=False)

    def analyze_inter_community_connections(self) -> pd.DataFrame:
        """Analyze connections between communities."""
        if self.communities is None:
            self.detect_communities_louvain()

        # Count edges between communities
        inter_edges = defaultdict(int)

        for u, v in self.graph.edges():
            comm_u = self.communities[u]
            comm_v = self.communities[v]

            if comm_u != comm_v:
                key = tuple(sorted([comm_u, comm_v]))
                inter_edges[key] += 1

        results = [
            {'community_1': k[0], 'community_2': k[1], 'edges': count}
            for k, count in inter_edges.items()
        ]

        return pd.DataFrame(results).sort_values('edges', ascending=False)


class PlayCallingCommunityAnalyzer:
    """Analyze play-calling patterns using community detection."""

    def __init__(self, plays: pd.DataFrame):
        self.plays = plays
        self.play_network = None

    def build_play_sequence_network(self, window: int = 2):
        """Build network of play sequences."""
        self.play_network = nx.Graph()

        # Group by drive
        for _, drive in self.plays.groupby(['game_id', 'drive_id']):
            drive = drive.sort_values('play_id')
            play_types = drive['play_type'].tolist()

            # Create edges between plays within window
            for i, play1 in enumerate(play_types):
                for j in range(i+1, min(i+window+1, len(play_types))):
                    play2 = play_types[j]

                    if self.play_network.has_edge(play1, play2):
                        self.play_network[play1][play2]['weight'] += 1
                    else:
                        self.play_network.add_edge(play1, play2, weight=1)

    def find_play_clusters(self) -> Dict[str, List[str]]:
        """Find clusters of related plays."""
        analyzer = CommunityAnalyzer(self.play_network)
        communities = analyzer.detect_communities_louvain()

        # Group plays by community
        clusters = defaultdict(list)
        for play, comm in communities.items():
            clusters[f'Cluster_{comm}'].append(play)

        return dict(clusters)

Summary

Network analysis provides powerful tools for understanding the complex relationships in football:

  1. Passing Networks reveal target distributions, receiver roles, and QB tendencies that traditional stats miss.

  2. Centrality Metrics quantify player importance beyond raw statistics, identifying who connects the offense.

  3. Coaching Trees trace the flow of ideas and strategies across generations of coaches.

  4. Recruiting Networks uncover pipeline relationships and geographic strategies.

  5. Community Detection finds hidden patterns in play-calling and team relationships.

The key insight is that football is a system of relationships, and network analysis helps us see and measure those relationships directly.

Key Takeaways

  • Networks represent entities (nodes) and their relationships (edges)
  • Passing networks reveal offensive structure beyond traditional stats
  • Centrality metrics identify influential players and coaches
  • Community detection finds natural groupings in complex systems
  • Visualization is essential for communicating network insights