Chapter 10 Exercises: Passing Networks and Analysis
Overview
These exercises progressively build your skills in constructing, analyzing, and interpreting passing networks. Starting with theoretical foundations, you will advance through practical implementation to sophisticated tactical applications.
Data Sources: StatsBomb Open Data (World Cup 2018, La Liga, Women's World Cup)
Libraries Required: pandas, numpy, networkx, matplotlib, statsbombpy, mplsoccer
Part 1: Foundations and Graph Theory (Exercises 1-6)
Exercise 1: Graph Theory Concepts
Objective: Reinforce understanding of fundamental graph theory.
Questions:
a) Define the following terms in the context of passing networks: - Node - Edge - Directed graph - Weighted graph - Adjacency matrix
b) For a team of 11 players, what is the maximum number of unique directed edges (passing combinations) possible?
c) If a passing network has 45 unique passing combinations (edges) out of the maximum possible, what is the network density?
d) Explain why passing networks are typically directed rather than undirected. When might an undirected representation be appropriate?
Exercise 2: Adjacency Matrix Construction
Objective: Build adjacency matrices manually to understand the data structure.
Task: Given the following simplified passing data for a 5-player mini-team:
| Passer | Receiver | Passes |
|---|---|---|
| A | B | 12 |
| A | C | 8 |
| B | A | 10 |
| B | C | 15 |
| B | D | 7 |
| C | B | 6 |
| C | E | 11 |
| D | C | 4 |
| D | E | 9 |
| E | A | 3 |
| E | D | 5 |
a) Construct the 5×5 weighted adjacency matrix $A$ where $A_{ij}$ represents passes from player $i$ to player $j$.
b) Calculate the row sums (out-degree) and column sums (in-degree) for each player.
c) Identify which player has: - Highest out-degree (most passes made) - Highest in-degree (most passes received) - Most balanced passing (closest in-degree to out-degree)
d) Calculate the reciprocity of the network: what proportion of passing connections are two-way?
Exercise 3: Network Density Analysis
Objective: Understand and calculate network density across different match contexts.
Task: Using StatsBomb World Cup 2018 data:
a) Load events from three matches with different scorelines: - A close match (e.g., 1-0 or 1-1) - A dominant win (e.g., 3-0 or higher) - A high-scoring draw (e.g., 2-2 or 3-3)
b) For each team in each match, calculate: - Total passes - Unique passing combinations - Network density
c) Analyze the relationship between match outcome and network density. Do winning teams have higher or lower density?
d) Does the margin of victory correlate with density differences between teams?
Exercise 4: Degree Distribution
Objective: Analyze the distribution of passing connections across players.
Task:
a) Load events from 5 different World Cup 2018 matches.
b) For each team (10 total), calculate the weighted degree (total passes in + out) for each player.
c) Create histograms showing the degree distribution for: - A possession-dominant team (e.g., Spain, Germany) - A counter-attacking team (e.g., Russia, South Korea)
d) Calculate the Gini coefficient of degree distribution for each team. Higher Gini indicates more unequal distribution.
e) Interpret: Do some playing styles produce more equal passing distributions?
Exercise 5: Basic Network Metrics
Objective: Calculate fundamental network metrics from scratch.
Task: Implement Python functions to calculate the following metrics without using NetworkX built-in functions:
def calculate_density(adjacency_matrix):
"""
Calculate network density from adjacency matrix.
Returns: float between 0 and 1
"""
# Your implementation
def calculate_avg_degree(adjacency_matrix):
"""
Calculate average (weighted) degree.
Returns: float
"""
# Your implementation
def calculate_reciprocity(adjacency_matrix):
"""
Calculate reciprocity: proportion of edges with reverse edges.
Returns: float between 0 and 1
"""
# Your implementation
Test your functions against NetworkX implementations to verify correctness.
Exercise 6: Directed vs Undirected Analysis
Objective: Compare directed and undirected network representations.
Task:
a) Load a match and build both directed and undirected versions of the passing network for one team.
b) For the undirected version, combine pass weights in both directions (A→B and B→A become a single edge with combined weight).
c) Calculate and compare: - Number of edges - Network density - Degree distribution
d) When would an analyst prefer each representation? Provide specific use cases.
Part 2: Centrality Measures (Exercises 7-12)
Exercise 7: Degree Centrality
Objective: Calculate and interpret degree centrality measures.
Task:
a) Load a World Cup 2018 match and calculate for each player: - Raw in-degree (passes received) - Raw out-degree (passes made) - Weighted in-degree (sum of pass weights received) - Weighted out-degree (sum of pass weights made)
b) Normalize centrality scores to the range [0, 1].
c) Create a scatter plot with out-degree on x-axis and in-degree on y-axis. Label notable players.
d) Identify players who are: - Primarily distributors (high out, lower in) - Primarily targets (high in, lower out) - Balanced hubs (high both) - Peripheral (low both)
Exercise 8: Betweenness Centrality
Objective: Calculate and interpret betweenness centrality.
Task:
a) Implement a simplified betweenness calculation for a passing network:
def calculate_betweenness(G):
"""
Calculate betweenness centrality for all nodes.
Betweenness(v) = sum of (paths through v) / (total paths)
for all pairs of other nodes.
"""
# Your implementation
b) Compare your implementation with NetworkX's betweenness_centrality.
c) For a World Cup match, identify the top 3 players by betweenness for each team.
d) Analyze: What positions typically have highest betweenness? Why?
e) Find a match where the player with highest betweenness is NOT the player with highest degree. Explain this discrepancy.
Exercise 9: PageRank Analysis
Objective: Apply PageRank to passing networks.
Task:
a) Calculate PageRank for a World Cup team's passing network using NetworkX.
b) Experiment with different damping factors (0.5, 0.7, 0.85, 0.95). How does this affect rankings?
c) Compare PageRank rankings with degree centrality rankings. Which players move up or down?
d) Explain intuitively why PageRank might rank some players differently than raw degree centrality.
Exercise 10: Closeness Centrality
Objective: Calculate and interpret closeness centrality.
Task:
a) For a selected team, calculate closeness centrality: - Using unweighted edges - Using weighted edges (convert weights to distances: distance = 1/weight)
b) Identify the player with highest closeness centrality. What position do they play?
c) Calculate the average shortest path from each player to all others. Which player can reach all teammates most efficiently?
d) Compare closeness centrality between a team that plays short passes and one that plays long balls. What patterns emerge?
Exercise 11: Centrality Comparison
Objective: Compare different centrality measures for the same network.
Task:
a) For one team in a World Cup match, calculate: - Degree centrality - Betweenness centrality - Closeness centrality - PageRank
b) Create a correlation matrix showing how different centrality measures relate to each other.
c) Create a parallel coordinates plot showing each player's score across all four measures.
d) Identify players who rank highly on some measures but not others. Explain what this reveals about their role.
Exercise 12: Centrality Over Time
Objective: Analyze how centrality changes during a match.
Task:
a) Divide a match into 6 periods (0-15, 15-30, 30-45, 45-60, 60-75, 75-90 minutes).
b) Calculate degree centrality for each player in each period.
c) Create a line plot showing centrality evolution for the top 5 players.
d) Identify: - Players whose centrality increases over time - Players whose centrality decreases - Matches where a substitution caused major centrality shifts
e) Correlate centrality changes with match events (goals, red cards, substitutions).
Part 3: Network-Level Metrics (Exercises 13-18)
Exercise 13: Clustering Coefficient
Objective: Calculate and interpret clustering in passing networks.
Task:
a) Implement local clustering coefficient calculation:
def local_clustering(G, node):
"""
Calculate local clustering coefficient for a node.
Clustering(v) = (triangles through v) / (possible triangles through v)
"""
# Your implementation
b) Calculate global (average) clustering coefficient for teams in 5 World Cup matches.
c) Compare clustering between: - Possession-oriented teams - Direct-play teams
d) High clustering indicates triangular passing. Which formations and styles produce highest clustering?
Exercise 14: Centralization Index
Objective: Measure how concentrated passing is around key players.
Task:
a) Implement network centralization calculation:
def network_centralization(centralities):
"""
Calculate how centralized the network is.
Centralization = sum(max_centrality - each_centrality) / max_possible_sum
"""
# Your implementation
b) Calculate centralization for all teams in 10 World Cup matches.
c) Rank teams by centralization. Which teams are most/least centralized?
d) Analyze: Do highly centralized teams tend to win or lose? Create a scatter plot of centralization vs. goals scored.
Exercise 15: Network Entropy
Objective: Measure passing unpredictability through entropy.
Task:
a) Calculate the Shannon entropy of passing distributions:
def passing_entropy(G):
"""
Calculate entropy of the passing distribution.
H = -sum(p_ij * log(p_ij)) for all edges
"""
# Your implementation
b) Compare entropy between first half and second half of matches. Does unpredictability change?
c) Calculate entropy for winning vs. losing teams. Are winners more or less predictable?
d) Find matches where one team has significantly higher entropy than the other. What might cause this?
Exercise 16: Passing Flow Analysis
Objective: Analyze directional flow through the pitch.
Task:
a) Divide the pitch into 3 vertical zones (defensive, middle, attacking) and calculate: - Passes within each zone - Passes forward (zone n to zone n+1) - Passes backward (zone n to zone n-1)
b) Create a Sankey diagram showing passing flow between zones.
c) Calculate the "progression ratio" = forward passes / backward passes for each team.
d) Compare progression ratios between: - Winners vs. losers - Home vs. away teams - Teams playing with lead vs. teams chasing
Exercise 17: Network Comparison
Objective: Develop methods to compare two passing networks.
Task:
a) Design a distance metric between two passing networks. Consider: - Difference in density - Difference in centralization - Jensen-Shannon divergence of degree distributions
b) Calculate pairwise distances between 8 team networks from the World Cup.
c) Use hierarchical clustering to group similar teams. Visualize as a dendrogram.
d) Do clusters correspond to playing style, region, or tournament performance?
Exercise 18: Motif Analysis
Objective: Identify common passing patterns (motifs).
Task:
a) Define and count the following 3-node motifs: - Chain: A→B→C - Triangle: A→B→C→A - Star-out: A→B, A→C, A→D - Star-in: B→A, C→A, D→A
b) For each team, calculate the proportion of each motif type.
c) Compare motif profiles between teams with different playing styles.
d) Do certain motifs correlate with scoring probability?
Part 4: Visualization (Exercises 19-24)
Exercise 19: Basic Network Plot
Objective: Create clear, informative passing network visualizations.
Task:
a) Create a passing network visualization with: - Nodes at average player positions - Node size proportional to total passes - Edge width proportional to pass frequency - Minimum pass threshold (hide edges with fewer than 3 passes)
b) Add player name labels that don't overlap.
c) Use a colormap to color nodes by centrality.
d) Create versions for both teams in a match, side by side.
Exercise 20: Passing Matrix Heatmap
Objective: Create heatmap visualizations of passing patterns.
Task:
a) Create a heatmap of the passing matrix with: - Players ordered by position (GK, defenders, midfielders, forwards) - Clear color scale - Annotations showing pass counts
b) Normalize the matrix by row (each passer's distribution) and create a second heatmap.
c) Create a difference matrix showing which combinations are above/below expected frequency.
d) Create an animated heatmap showing how passing patterns change over 15-minute periods.
Exercise 21: Chord Diagram
Objective: Create circular chord diagrams for passing visualization.
Task:
a) Implement a chord diagram where: - Players are arranged in a circle - Chords connect players who pass to each other - Chord thickness represents pass frequency
b) Color chords by: - Source player's position - Whether pass is forward or backward - Passer or receiver
c) Add interactivity: hovering shows pass count.
Exercise 22: Temporal Visualization
Objective: Visualize network evolution over time.
Task:
a) Create a series of network plots for 6 time periods in a match.
b) Use consistent node positions across all periods to show evolution.
c) Highlight edges that: - Appear (new connection formed) - Disappear (connection stops) - Strengthen or weaken
d) Create an animated GIF showing network evolution.
Exercise 23: Combined Dashboard
Objective: Create a comprehensive network analysis dashboard.
Task: Create a multi-panel figure including:
a) Passing network for each team (2 panels)
b) Centrality comparison bar chart
c) Passing matrix heatmap
d) Key metrics table (density, centralization, entropy)
e) Top 5 passing combinations list
Design for both print (static) and presentation (interactive) use cases.
Exercise 24: Style Comparison Visualization
Objective: Visualize differences between team playing styles.
Task:
a) Select 4 teams with different playing styles from the World Cup.
b) Calculate their network metrics (density, centralization, clustering, entropy).
c) Create a radar chart comparing all 4 teams across these dimensions.
d) Add a small network plot for each team as an inset.
e) Write a visual style guide interpretation for each team.
Part 5: Advanced Analysis (Exercises 25-30)
Exercise 25: Community Detection
Objective: Identify player clusters within teams.
Task:
a) Apply the Louvain algorithm to detect communities in a team's passing network.
b) Color nodes by community membership in a network visualization.
c) Analyze: Do detected communities correspond to: - Position groups (defense, midfield, attack)? - Pitch sides (left, central, right)? - Something else?
d) Compare community structure across multiple teams. Do some formations produce clearer communities?
Exercise 26: Player Role Classification
Objective: Use network position to classify player roles.
Task:
a) For each player, calculate: - Degree centrality - Betweenness centrality - In-degree / out-degree ratio - Clustering coefficient
b) Use K-means clustering on these features to identify role groups.
c) Label clusters based on their characteristics (e.g., "Hub", "Distributor", "Target", "Connector").
d) Compare network-derived roles with official positions. Where do they agree/disagree?
Exercise 27: xT-Weighted Networks
Objective: Integrate xT values into passing network analysis.
Task:
a) Weight edges by the xT gained from each passing combination (average xT added per pass).
b) Calculate "xT-betweenness": which players are on paths that generate the most xT?
c) Compare xT-weighted centrality rankings with volume-weighted rankings.
d) Identify which passing combinations create the most threat.
Exercise 28: Substitution Impact
Objective: Measure how substitutions change network structure.
Task:
a) Identify matches with significant substitutions (before 70th minute).
b) Build separate networks for pre- and post-substitution periods.
c) Calculate: - Change in overall density - Change in centralization - Which existing players' centrality changed most
d) Classify substitutions as "successful" or "unsuccessful" based on network improvement.
Exercise 29: Opposition Effects
Objective: Analyze how opponents affect network structure.
Task:
a) Select a team that played multiple World Cup matches.
b) Build their passing network for each match.
c) Calculate metrics across matches: - Does their network structure change based on opponent? - Which opponents forced the biggest changes?
d) Correlate network changes with opponent ranking/style.
Exercise 30: Comprehensive Team Report
Objective: Create a complete passing network analysis report for a team's tournament.
Task:
For a World Cup team across all their matches:
a) Calculate and present: - Average network metrics - Match-to-match variation - Top passing combinations overall - Most central players overall
b) Create visualizations: - Aggregate network (all matches combined) - Network evolution across tournament - Player importance rankings
c) Write a 500-word tactical analysis based on the network findings.
d) Recommend improvements based on network weaknesses.
Submission Guidelines
Required Submissions
-
Jupyter Notebook containing: - All code implementations - Visualizations with captions - Markdown explanations of findings
-
Written Report (2-3 pages): - Summary of key findings - Interpretation of metrics in tactical context - Recommendations for analysts using these methods
Evaluation Criteria
- Technical Correctness (40%): Accurate implementations and calculations
- Visualization Quality (25%): Clear, informative, publication-ready graphics
- Interpretation (25%): Meaningful soccer insights from network analysis
- Code Quality (10%): Clean, documented, reusable code
Extension Challenges
For advanced students:
- Dynamic Networks: Model passing networks as time-evolving graphs
- Machine Learning: Predict match outcomes from network features
- Custom Metrics: Design a novel centrality measure for soccer
- Tool Development: Build an interactive network analysis dashboard
Exercise solutions are provided in the appendix for selected problems.