Football is fundamentally a game of connections—between players, between schemes, and between plays. Network analysis provides powerful tools for visualizing and quantifying these relationships, revealing insights that traditional statistics miss...
In This Chapter
Chapter 23: Network Analysis in Football
Introduction
Football is fundamentally a game of connections—between players, between schemes, and between plays. Network analysis provides powerful tools for visualizing and quantifying these relationships, revealing insights that traditional statistics miss. This chapter explores how graph theory and network science can illuminate the hidden structure of football, from passing networks to coaching trees and recruiting pipelines.
Learning Objectives
By the end of this chapter, you will be able to:
- Construct and analyze passing networks for offensive evaluation
- Calculate network centrality metrics for player importance
- Build and visualize coaching trees and their influence
- Analyze recruiting networks and pipeline relationships
- Apply community detection to identify play-calling patterns
- Use network metrics for strategic insights
23.1 Fundamentals of Network Analysis
23.1.1 Graph Theory Basics
Networks consist of nodes (vertices) connected by edges (links). In football contexts:
import networkx as nx
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict
@dataclass
class FootballNode:
"""Represents a node in a football network."""
node_id: str
node_type: str # 'player', 'team', 'coach', 'school'
attributes: Dict
@dataclass
class FootballEdge:
"""Represents an edge in a football network."""
source: str
target: str
weight: float
edge_type: str
attributes: Dict
class FootballNetwork:
"""Base class for football network analysis."""
def __init__(self, directed: bool = True):
self.graph = nx.DiGraph() if directed else nx.Graph()
def add_node(self, node: FootballNode):
"""Add a node to the network."""
self.graph.add_node(
node.node_id,
node_type=node.node_type,
**node.attributes
)
def add_edge(self, edge: FootballEdge):
"""Add an edge to the network."""
self.graph.add_edge(
edge.source,
edge.target,
weight=edge.weight,
edge_type=edge.edge_type,
**edge.attributes
)
def get_basic_stats(self) -> Dict[str, any]:
"""Get basic network statistics."""
return {
'nodes': self.graph.number_of_nodes(),
'edges': self.graph.number_of_edges(),
'density': nx.density(self.graph),
'is_connected': nx.is_weakly_connected(self.graph) if self.graph.is_directed()
else nx.is_connected(self.graph)
}
23.1.2 Types of Football Networks
FOOTBALL NETWORK TYPES
======================
1. PASSING NETWORKS
Nodes: Players (QB, receivers, RBs)
Edges: Passes thrown/caught
Weight: Completion count, yards, or EPA
2. BLOCKING NETWORKS
Nodes: Offensive linemen, blockers
Edges: Blocking assignments
Weight: Block success rate
3. COACHING NETWORKS
Nodes: Coaches
Edges: Mentor-protégé relationships
Weight: Years together, role similarity
4. RECRUITING NETWORKS
Nodes: High schools, colleges, players
Edges: Recruiting relationships
Weight: Number of recruits, star ratings
5. PLAY SIMILARITY NETWORKS
Nodes: Plays/formations
Edges: Strategic similarity
Weight: Conceptual overlap
6. CONFERENCE/RIVALRY NETWORKS
Nodes: Teams
Edges: Games played, rivalries
Weight: Historical matchups
23.2 Passing Networks
23.2.1 Building Passing Networks
class PassingNetwork(FootballNetwork):
"""Network analysis of passing connections."""
def __init__(self):
super().__init__(directed=True)
self.qb_nodes = set()
self.receiver_nodes = set()
def build_from_plays(self, plays: pd.DataFrame):
"""Build passing network from play-by-play data."""
# Filter to pass plays
passes = plays[plays['play_type'] == 'pass'].copy()
# Add QB nodes
for qb in passes['passer'].unique():
if pd.notna(qb):
self.add_node(FootballNode(
node_id=qb,
node_type='qb',
attributes={'position': 'QB'}
))
self.qb_nodes.add(qb)
# Add receiver nodes
for receiver in passes['receiver'].unique():
if pd.notna(receiver):
self.add_node(FootballNode(
node_id=receiver,
node_type='receiver',
attributes={}
))
self.receiver_nodes.add(receiver)
# Add edges (passes)
pass_counts = passes.groupby(['passer', 'receiver']).agg({
'play_id': 'count',
'yards_gained': 'sum',
'epa': 'sum',
'complete': 'sum'
}).reset_index()
for _, row in pass_counts.iterrows():
if pd.notna(row['passer']) and pd.notna(row['receiver']):
self.add_edge(FootballEdge(
source=row['passer'],
target=row['receiver'],
weight=row['play_id'],
edge_type='pass',
attributes={
'attempts': row['play_id'],
'completions': row['complete'],
'yards': row['yards_gained'],
'epa': row['epa'],
'comp_pct': row['complete'] / row['play_id']
}
))
def calculate_target_share(self) -> pd.DataFrame:
"""Calculate target share for each receiver."""
results = []
for qb in self.qb_nodes:
# Get all edges from this QB
qb_edges = [
(u, v, d) for u, v, d in self.graph.out_edges(qb, data=True)
]
total_targets = sum(d['weight'] for _, _, d in qb_edges)
for _, receiver, data in qb_edges:
results.append({
'qb': qb,
'receiver': receiver,
'targets': data['weight'],
'target_share': data['weight'] / total_targets if total_targets > 0 else 0,
'completions': data['attributes']['completions'],
'yards': data['attributes']['yards'],
'epa': data['attributes']['epa']
})
return pd.DataFrame(results).sort_values('target_share', ascending=False)
def get_receiver_centrality(self) -> pd.DataFrame:
"""Calculate centrality metrics for receivers."""
# In-degree centrality (targets received)
in_degree = nx.in_degree_centrality(self.graph)
# Weighted in-degree
weighted_in = {
node: sum(d['weight'] for _, _, d in self.graph.in_edges(node, data=True))
for node in self.receiver_nodes
}
max_weighted = max(weighted_in.values()) if weighted_in else 1
weighted_in_norm = {k: v/max_weighted for k, v in weighted_in.items()}
# PageRank
pagerank = nx.pagerank(self.graph, weight='weight')
results = []
for receiver in self.receiver_nodes:
results.append({
'receiver': receiver,
'in_degree_centrality': in_degree.get(receiver, 0),
'weighted_centrality': weighted_in_norm.get(receiver, 0),
'pagerank': pagerank.get(receiver, 0)
})
return pd.DataFrame(results).sort_values('pagerank', ascending=False)
class TeamPassingNetworkAnalyzer:
"""Analyze passing networks across teams."""
def __init__(self):
self.team_networks: Dict[str, PassingNetwork] = {}
def build_team_networks(self, plays: pd.DataFrame):
"""Build passing networks for each team."""
for team in plays['offense_team'].unique():
team_plays = plays[plays['offense_team'] == team]
network = PassingNetwork()
network.build_from_plays(team_plays)
self.team_networks[team] = network
def compare_team_structures(self) -> pd.DataFrame:
"""Compare structural properties across teams."""
results = []
for team, network in self.team_networks.items():
stats = network.get_basic_stats()
# Additional metrics
if network.graph.number_of_edges() > 0:
# Concentration (how spread out are targets?)
target_shares = network.calculate_target_share()
if len(target_shares) > 0:
hhi = (target_shares['target_share'] ** 2).sum()
else:
hhi = 1.0
else:
hhi = 1.0
results.append({
'team': team,
'receivers': len(network.receiver_nodes),
'connections': stats['edges'],
'density': stats['density'],
'target_hhi': hhi, # Higher = more concentrated
'target_entropy': self._calculate_entropy(network)
})
return pd.DataFrame(results).sort_values('target_entropy', ascending=False)
def _calculate_entropy(self, network: PassingNetwork) -> float:
"""Calculate target distribution entropy."""
shares = network.calculate_target_share()
if len(shares) == 0:
return 0
probs = shares['target_share'].values
probs = probs[probs > 0]
if len(probs) == 0:
return 0
return -np.sum(probs * np.log2(probs))
23.2.2 Network Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
class PassingNetworkVisualizer:
"""Visualize passing networks."""
def __init__(self, network: PassingNetwork):
self.network = network
def draw_network(
self,
figsize: Tuple[int, int] = (14, 10),
min_edge_weight: int = 3,
node_size_factor: float = 100,
title: str = "Passing Network"
) -> plt.Figure:
"""Draw the passing network."""
fig, ax = plt.subplots(figsize=figsize)
# Filter edges by minimum weight
edges_to_draw = [
(u, v) for u, v, d in self.network.graph.edges(data=True)
if d['weight'] >= min_edge_weight
]
# Create subgraph
subgraph = self.network.graph.edge_subgraph(edges_to_draw)
# Position nodes
pos = self._get_positions(subgraph)
# Node sizes based on involvement
node_sizes = []
for node in subgraph.nodes():
in_weight = sum(
d['weight'] for _, _, d in subgraph.in_edges(node, data=True)
)
out_weight = sum(
d['weight'] for _, _, d in subgraph.out_edges(node, data=True)
)
node_sizes.append((in_weight + out_weight) * node_size_factor)
# Node colors
node_colors = [
'#e74c3c' if node in self.network.qb_nodes else '#3498db'
for node in subgraph.nodes()
]
# Edge widths
edge_widths = [
subgraph[u][v]['weight'] / 5
for u, v in edges_to_draw
]
# Draw
nx.draw_networkx_nodes(
subgraph, pos, ax=ax,
node_size=node_sizes,
node_color=node_colors,
alpha=0.8
)
nx.draw_networkx_edges(
subgraph, pos, ax=ax,
edgelist=edges_to_draw,
width=edge_widths,
alpha=0.5,
edge_color='gray',
arrows=True,
arrowsize=15,
connectionstyle='arc3,rad=0.1'
)
nx.draw_networkx_labels(
subgraph, pos, ax=ax,
font_size=9,
font_weight='bold'
)
# Legend
qb_patch = mpatches.Patch(color='#e74c3c', label='QB')
rec_patch = mpatches.Patch(color='#3498db', label='Receiver')
ax.legend(handles=[qb_patch, rec_patch], loc='upper left')
ax.set_title(title, fontsize=14, fontweight='bold')
ax.axis('off')
return fig
def _get_positions(self, graph: nx.Graph) -> Dict:
"""Calculate node positions for visualization."""
# Use spring layout with QB at center
pos = nx.spring_layout(graph, k=2, iterations=50)
# Adjust QB position to center
for qb in self.network.qb_nodes:
if qb in pos:
pos[qb] = np.array([0.5, 0.8])
return pos
def draw_target_distribution(
self,
top_n: int = 10
) -> plt.Figure:
"""Draw target distribution chart."""
shares = self.network.calculate_target_share()
if len(shares) == 0:
fig, ax = plt.subplots(figsize=(10, 6))
ax.text(0.5, 0.5, 'No passing data available',
ha='center', va='center')
return fig
top_receivers = shares.nlargest(top_n, 'target_share')
fig, ax = plt.subplots(figsize=(12, 6))
bars = ax.barh(
top_receivers['receiver'],
top_receivers['target_share'] * 100,
color='#3498db',
alpha=0.8
)
ax.set_xlabel('Target Share (%)', fontsize=12)
ax.set_ylabel('Receiver', fontsize=12)
ax.set_title('Target Distribution', fontsize=14, fontweight='bold')
# Add value labels
for bar, share in zip(bars, top_receivers['target_share']):
ax.text(
bar.get_width() + 0.5,
bar.get_y() + bar.get_height()/2,
f'{share*100:.1f}%',
va='center'
)
ax.invert_yaxis()
plt.tight_layout()
return fig
23.3 Centrality Metrics in Football
23.3.1 Understanding Centrality
class CentralityAnalyzer:
"""Analyze node centrality in football networks."""
CENTRALITY_INTERPRETATIONS = {
'degree': 'Number of direct connections',
'betweenness': 'Importance as a connector between others',
'closeness': 'Average distance to all other nodes',
'eigenvector': 'Connected to other important nodes',
'pagerank': 'Importance based on who connects to you'
}
def __init__(self, graph: nx.Graph):
self.graph = graph
def calculate_all_centralities(self) -> pd.DataFrame:
"""Calculate all centrality metrics for nodes."""
# Degree centrality
if self.graph.is_directed():
in_degree = nx.in_degree_centrality(self.graph)
out_degree = nx.out_degree_centrality(self.graph)
else:
in_degree = nx.degree_centrality(self.graph)
out_degree = in_degree
# Betweenness
betweenness = nx.betweenness_centrality(self.graph, weight='weight')
# Closeness
closeness = nx.closeness_centrality(self.graph)
# Eigenvector (may fail for directed graphs)
try:
eigenvector = nx.eigenvector_centrality(
self.graph, max_iter=1000, weight='weight'
)
except:
eigenvector = {node: 0 for node in self.graph.nodes()}
# PageRank
pagerank = nx.pagerank(self.graph, weight='weight')
# Combine results
results = []
for node in self.graph.nodes():
results.append({
'node': node,
'in_degree': in_degree.get(node, 0),
'out_degree': out_degree.get(node, 0),
'betweenness': betweenness.get(node, 0),
'closeness': closeness.get(node, 0),
'eigenvector': eigenvector.get(node, 0),
'pagerank': pagerank.get(node, 0)
})
return pd.DataFrame(results)
def identify_key_players(
self,
centrality_df: pd.DataFrame,
metric: str = 'pagerank',
top_n: int = 10
) -> pd.DataFrame:
"""Identify most central players by metric."""
return centrality_df.nlargest(top_n, metric)[['node', metric]]
def analyze_network_structure(self) -> Dict[str, float]:
"""Analyze overall network structure."""
return {
'nodes': self.graph.number_of_nodes(),
'edges': self.graph.number_of_edges(),
'density': nx.density(self.graph),
'avg_clustering': nx.average_clustering(self.graph),
'transitivity': nx.transitivity(self.graph),
'avg_path_length': self._safe_avg_path_length()
}
def _safe_avg_path_length(self) -> float:
"""Calculate average path length safely."""
try:
if self.graph.is_directed():
if nx.is_weakly_connected(self.graph):
# Use largest strongly connected component
largest_scc = max(
nx.strongly_connected_components(self.graph),
key=len
)
subgraph = self.graph.subgraph(largest_scc)
return nx.average_shortest_path_length(subgraph)
else:
if nx.is_connected(self.graph):
return nx.average_shortest_path_length(self.graph)
except:
pass
return float('inf')
23.3.2 Applying Centrality to Player Roles
class PlayerRoleAnalyzer:
"""Analyze player roles using network centrality."""
def __init__(self, passing_network: PassingNetwork):
self.network = passing_network
self.centrality_analyzer = CentralityAnalyzer(passing_network.graph)
def classify_receiver_roles(self) -> pd.DataFrame:
"""Classify receivers based on network position."""
centralities = self.centrality_analyzer.calculate_all_centralities()
# Filter to receivers
receivers = centralities[
centralities['node'].isin(self.network.receiver_nodes)
].copy()
if len(receivers) == 0:
return pd.DataFrame()
# Normalize metrics
for col in ['in_degree', 'betweenness', 'pagerank']:
if col in receivers.columns:
max_val = receivers[col].max()
if max_val > 0:
receivers[f'{col}_norm'] = receivers[col] / max_val
else:
receivers[f'{col}_norm'] = 0
# Classify roles
receivers['role'] = receivers.apply(self._classify_role, axis=1)
return receivers
def _classify_role(self, row: pd.Series) -> str:
"""Classify receiver role based on metrics."""
# High volume = primary target
if row.get('in_degree_norm', 0) > 0.7:
return 'Primary Target'
# High betweenness = connector (used in various situations)
elif row.get('betweenness_norm', 0) > 0.5:
return 'Versatile Option'
# High pagerank but lower volume = efficient
elif row.get('pagerank_norm', 0) > 0.5:
return 'High-Value Target'
else:
return 'Role Player'
def identify_qb_tendencies(self) -> Dict[str, any]:
"""Analyze QB tendencies from network structure."""
results = {}
for qb in self.network.qb_nodes:
# Get outgoing edges
out_edges = list(self.network.graph.out_edges(qb, data=True))
if len(out_edges) == 0:
continue
# Calculate metrics
total_targets = sum(d['weight'] for _, _, d in out_edges)
unique_targets = len(out_edges)
# Target concentration
shares = [d['weight'] / total_targets for _, _, d in out_edges]
hhi = sum(s ** 2 for s in shares)
# Favorite target
favorite = max(out_edges, key=lambda x: x[2]['weight'])
results[qb] = {
'total_targets': total_targets,
'unique_receivers': unique_targets,
'concentration_hhi': hhi,
'favorite_target': favorite[1],
'favorite_share': favorite[2]['weight'] / total_targets,
'tendency': 'Spread' if hhi < 0.2 else 'Concentrated'
}
return results
23.4 Coaching Trees and Influence Networks
23.4.1 Building Coaching Networks
class CoachingNetwork(FootballNetwork):
"""Network of coaching relationships."""
def __init__(self):
super().__init__(directed=True)
def build_from_history(self, coaching_history: pd.DataFrame):
"""
Build network from coaching history data.
Expected columns:
- coach_name
- year
- team
- position
- head_coach (who was HC when they were there)
"""
# Add coach nodes
for coach in coaching_history['coach_name'].unique():
coach_data = coaching_history[
coaching_history['coach_name'] == coach
].iloc[0]
self.add_node(FootballNode(
node_id=coach,
node_type='coach',
attributes={
'first_year': coaching_history[
coaching_history['coach_name'] == coach
]['year'].min()
}
))
# Build mentor-mentee relationships
for _, row in coaching_history.iterrows():
mentor = row['head_coach']
mentee = row['coach_name']
if pd.notna(mentor) and mentor != mentee:
# Check if edge exists
if self.graph.has_edge(mentor, mentee):
# Increment weight (years together)
self.graph[mentor][mentee]['weight'] += 1
else:
self.add_edge(FootballEdge(
source=mentor,
target=mentee,
weight=1,
edge_type='mentor',
attributes={
'team': row['team'],
'first_year': row['year']
}
))
def get_coaching_tree(self, head_coach: str) -> Dict[str, List[str]]:
"""Get coaching tree for a head coach."""
tree = {'root': head_coach, 'direct': [], 'second_gen': [], 'third_gen': []}
# Direct mentees
direct = list(self.graph.successors(head_coach))
tree['direct'] = direct
# Second generation
for mentee in direct:
second = list(self.graph.successors(mentee))
tree['second_gen'].extend(second)
# Third generation
for coach in tree['second_gen']:
third = list(self.graph.successors(coach))
tree['third_gen'].extend(third)
return tree
def calculate_coaching_influence(self) -> pd.DataFrame:
"""Calculate influence metrics for coaches."""
results = []
for coach in self.graph.nodes():
# Direct mentees who became HCs
mentees = list(self.graph.successors(coach))
mentee_count = len(mentees)
# Total descendants (full tree)
descendants = nx.descendants(self.graph, coach)
descendant_count = len(descendants)
# Years of mentoring
total_years = sum(
d['weight'] for _, _, d in self.graph.out_edges(coach, data=True)
)
results.append({
'coach': coach,
'direct_mentees': mentee_count,
'total_descendants': descendant_count,
'mentoring_years': total_years,
'avg_years_per_mentee': total_years / mentee_count if mentee_count > 0 else 0
})
return pd.DataFrame(results).sort_values('total_descendants', ascending=False)
class CoachingTreeVisualizer:
"""Visualize coaching trees."""
def __init__(self, network: CoachingNetwork):
self.network = network
def draw_tree(
self,
root_coach: str,
max_depth: int = 3,
figsize: Tuple[int, int] = (16, 12)
) -> plt.Figure:
"""Draw coaching tree from root coach."""
# Get subgraph for tree
tree_nodes = {root_coach}
current_level = {root_coach}
for _ in range(max_depth):
next_level = set()
for coach in current_level:
successors = set(self.network.graph.successors(coach))
next_level.update(successors)
tree_nodes.update(next_level)
current_level = next_level
subgraph = self.network.graph.subgraph(tree_nodes)
# Create figure
fig, ax = plt.subplots(figsize=figsize)
# Use hierarchical layout
pos = self._hierarchical_layout(subgraph, root_coach)
# Node sizes based on descendants
node_sizes = []
for node in subgraph.nodes():
descendants = len(nx.descendants(self.network.graph, node))
node_sizes.append(100 + descendants * 50)
# Draw
nx.draw_networkx_nodes(
subgraph, pos, ax=ax,
node_size=node_sizes,
node_color='#2ecc71',
alpha=0.8
)
nx.draw_networkx_edges(
subgraph, pos, ax=ax,
edge_color='gray',
alpha=0.5,
arrows=True,
arrowsize=15
)
nx.draw_networkx_labels(
subgraph, pos, ax=ax,
font_size=8
)
ax.set_title(f"Coaching Tree: {root_coach}", fontsize=14, fontweight='bold')
ax.axis('off')
return fig
def _hierarchical_layout(
self,
graph: nx.Graph,
root: str
) -> Dict[str, np.ndarray]:
"""Create hierarchical layout for tree visualization."""
# BFS to get levels
levels = {root: 0}
queue = [root]
while queue:
current = queue.pop(0)
current_level = levels[current]
for successor in graph.successors(current):
if successor not in levels:
levels[successor] = current_level + 1
queue.append(successor)
# Group nodes by level
level_nodes = defaultdict(list)
for node, level in levels.items():
level_nodes[level].append(node)
# Calculate positions
pos = {}
max_level = max(levels.values()) if levels else 0
for level, nodes in level_nodes.items():
y = 1 - level / (max_level + 1)
for i, node in enumerate(nodes):
x = (i + 1) / (len(nodes) + 1)
pos[node] = np.array([x, y])
return pos
23.5 Recruiting Networks
23.5.1 Pipeline Analysis
class RecruitingNetwork(FootballNetwork):
"""Network of recruiting relationships."""
def __init__(self):
super().__init__(directed=True)
def build_from_recruiting_data(self, recruits: pd.DataFrame):
"""
Build network from recruiting data.
Expected columns:
- player_name
- high_school
- city, state
- college
- star_rating
- position
"""
# Add high school nodes
for _, row in recruits.drop_duplicates('high_school').iterrows():
self.add_node(FootballNode(
node_id=row['high_school'],
node_type='high_school',
attributes={
'city': row.get('city', ''),
'state': row.get('state', '')
}
))
# Add college nodes
for college in recruits['college'].unique():
self.add_node(FootballNode(
node_id=college,
node_type='college',
attributes={}
))
# Add recruiting edges
pipeline = recruits.groupby(['high_school', 'college']).agg({
'player_name': 'count',
'star_rating': 'mean'
}).reset_index()
for _, row in pipeline.iterrows():
self.add_edge(FootballEdge(
source=row['high_school'],
target=row['college'],
weight=row['player_name'],
edge_type='recruited',
attributes={
'avg_stars': row['star_rating']
}
))
def identify_pipeline_schools(
self,
college: str,
min_recruits: int = 3
) -> pd.DataFrame:
"""Identify pipeline high schools for a college."""
# Get incoming edges to college
in_edges = [
(u, v, d) for u, v, d in self.graph.in_edges(college, data=True)
if d['weight'] >= min_recruits
]
results = []
for hs, _, data in in_edges:
results.append({
'high_school': hs,
'recruits': data['weight'],
'avg_stars': data['attributes'].get('avg_stars', 0),
'state': self.graph.nodes[hs].get('state', '')
})
return pd.DataFrame(results).sort_values('recruits', ascending=False)
def analyze_geographic_reach(self, college: str) -> Dict[str, any]:
"""Analyze geographic recruiting reach."""
in_edges = list(self.graph.in_edges(college, data=True))
if len(in_edges) == 0:
return {}
# Get state distribution
states = defaultdict(int)
for hs, _, data in in_edges:
state = self.graph.nodes[hs].get('state', 'Unknown')
states[state] += data['weight']
total = sum(states.values())
return {
'total_states': len(states),
'total_recruits': total,
'state_distribution': dict(states),
'top_state': max(states, key=states.get),
'concentration': max(states.values()) / total if total > 0 else 0
}
class RecruitingNetworkAnalyzer:
"""Analyze recruiting network patterns."""
def __init__(self, network: RecruitingNetwork):
self.network = network
def find_competing_schools(
self,
college: str,
min_overlap: int = 5
) -> pd.DataFrame:
"""Find schools that recruit from same high schools."""
# Get high schools that send to this college
my_hs = set(
u for u, v, _ in self.network.graph.in_edges(college, data=True)
)
# Find other colleges that recruit from same high schools
competitors = defaultdict(int)
for hs in my_hs:
for _, target, data in self.network.graph.out_edges(hs, data=True):
if target != college:
competitors[target] += data['weight']
results = [
{'competitor': c, 'shared_recruits': count}
for c, count in competitors.items()
if count >= min_overlap
]
return pd.DataFrame(results).sort_values('shared_recruits', ascending=False)
def calculate_recruiting_centrality(self) -> pd.DataFrame:
"""Calculate centrality metrics for colleges."""
# Only consider college nodes
college_nodes = [
n for n, d in self.network.graph.nodes(data=True)
if d.get('node_type') == 'college'
]
results = []
for college in college_nodes:
# Number of feeder schools
feeders = self.network.graph.in_degree(college)
# Total recruits
total_recruits = sum(
d['weight'] for _, _, d in
self.network.graph.in_edges(college, data=True)
)
# Average star rating
stars = [
d['attributes'].get('avg_stars', 0) * d['weight']
for _, _, d in self.network.graph.in_edges(college, data=True)
]
avg_stars = sum(stars) / total_recruits if total_recruits > 0 else 0
# Diversity (number of states)
states = set(
self.network.graph.nodes[u].get('state', '')
for u, _, _ in self.network.graph.in_edges(college, data=True)
)
results.append({
'college': college,
'feeder_schools': feeders,
'total_recruits': total_recruits,
'avg_star_rating': avg_stars,
'state_diversity': len(states)
})
return pd.DataFrame(results).sort_values('total_recruits', ascending=False)
23.6 Community Detection
23.6.1 Finding Communities in Football Networks
from community import community_louvain # python-louvain package
class CommunityAnalyzer:
"""Detect and analyze communities in football networks."""
def __init__(self, graph: nx.Graph):
self.graph = graph
self.communities = None
def detect_communities_louvain(self) -> Dict[str, int]:
"""Detect communities using Louvain algorithm."""
# Convert to undirected for community detection
if self.graph.is_directed():
undirected = self.graph.to_undirected()
else:
undirected = self.graph
self.communities = community_louvain.best_partition(undirected)
return self.communities
def get_community_summary(self) -> pd.DataFrame:
"""Summarize detected communities."""
if self.communities is None:
self.detect_communities_louvain()
# Group nodes by community
community_nodes = defaultdict(list)
for node, comm in self.communities.items():
community_nodes[comm].append(node)
results = []
for comm_id, nodes in community_nodes.items():
subgraph = self.graph.subgraph(nodes)
results.append({
'community': comm_id,
'size': len(nodes),
'internal_edges': subgraph.number_of_edges(),
'density': nx.density(subgraph),
'sample_members': nodes[:5]
})
return pd.DataFrame(results).sort_values('size', ascending=False)
def analyze_inter_community_connections(self) -> pd.DataFrame:
"""Analyze connections between communities."""
if self.communities is None:
self.detect_communities_louvain()
# Count edges between communities
inter_edges = defaultdict(int)
for u, v in self.graph.edges():
comm_u = self.communities[u]
comm_v = self.communities[v]
if comm_u != comm_v:
key = tuple(sorted([comm_u, comm_v]))
inter_edges[key] += 1
results = [
{'community_1': k[0], 'community_2': k[1], 'edges': count}
for k, count in inter_edges.items()
]
return pd.DataFrame(results).sort_values('edges', ascending=False)
class PlayCallingCommunityAnalyzer:
"""Analyze play-calling patterns using community detection."""
def __init__(self, plays: pd.DataFrame):
self.plays = plays
self.play_network = None
def build_play_sequence_network(self, window: int = 2):
"""Build network of play sequences."""
self.play_network = nx.Graph()
# Group by drive
for _, drive in self.plays.groupby(['game_id', 'drive_id']):
drive = drive.sort_values('play_id')
play_types = drive['play_type'].tolist()
# Create edges between plays within window
for i, play1 in enumerate(play_types):
for j in range(i+1, min(i+window+1, len(play_types))):
play2 = play_types[j]
if self.play_network.has_edge(play1, play2):
self.play_network[play1][play2]['weight'] += 1
else:
self.play_network.add_edge(play1, play2, weight=1)
def find_play_clusters(self) -> Dict[str, List[str]]:
"""Find clusters of related plays."""
analyzer = CommunityAnalyzer(self.play_network)
communities = analyzer.detect_communities_louvain()
# Group plays by community
clusters = defaultdict(list)
for play, comm in communities.items():
clusters[f'Cluster_{comm}'].append(play)
return dict(clusters)
Summary
Network analysis provides powerful tools for understanding the complex relationships in football:
-
Passing Networks reveal target distributions, receiver roles, and QB tendencies that traditional stats miss.
-
Centrality Metrics quantify player importance beyond raw statistics, identifying who connects the offense.
-
Coaching Trees trace the flow of ideas and strategies across generations of coaches.
-
Recruiting Networks uncover pipeline relationships and geographic strategies.
-
Community Detection finds hidden patterns in play-calling and team relationships.
The key insight is that football is a system of relationships, and network analysis helps us see and measure those relationships directly.
Key Takeaways
- Networks represent entities (nodes) and their relationships (edges)
- Passing networks reveal offensive structure beyond traditional stats
- Centrality metrics identify influential players and coaches
- Community detection finds natural groupings in complex systems
- Visualization is essential for communicating network insights