Exercises: Network and Graph Visualization

DataField.Dev

Exercises: Network and Graph Visualization

These exercises assume pip install networkx matplotlib plotly pyvis. Imports: import networkx as nx, import matplotlib.pyplot as plt, import plotly.graph_objects as go, from pyvis.network import Network.

Part A: Conceptual (6 problems)

A.1 ★☆☆ | Recall

Name the four main flavors of graph and give an example of each.

Guidance

**Undirected**: edges have no direction (friendships, co-authorships). **Directed**: edges have a source and target (Twitter follows, citations). **Weighted**: edges have numeric strengths (road networks with distances, call networks with frequencies). **Bipartite**: two node sets with edges only between them (users and items in recommendations, authors and papers).

A.2 ★☆☆ | Recall

What is a layout algorithm and why is it necessary for network visualization?

Guidance

A layout algorithm computes 2D positions for the nodes of a graph. It is necessary because graphs do not have intrinsic spatial coordinates — nodes are abstract entities connected by edges, and the visualization must construct a layout before it can draw anything. Different algorithms (spring, kamada_kawai, circular, shell, spectral, bipartite) produce different visual results, and the choice depends on the graph's structure and the question being asked.

A.3 ★★☆ | Understand

Describe the hairball problem and three strategies for responding to it.

Guidance

The hairball problem: above a few hundred nodes, network diagrams become dense tangles that convey nothing. Strategies: (1) **filter** — show only high-degree, high-centrality, or high-weight edges and nodes; (2) **aggregate** — merge nodes into super-nodes by community or type; (3) **switch representation** — use an adjacency matrix, arc diagram, or chord diagram instead of a force-directed layout; (4) **go interactive** — let the reader zoom, pan, and filter in the browser.

A.4 ★★☆ | Understand

Explain the difference between degree centrality and betweenness centrality. Give a scenario where each is more informative.

Guidance

**Degree centrality** counts direct connections — a high-degree node has many neighbors. Informative for identifying "popular" or highly-connected nodes (Instagram influencers, road intersections with many streets). **Betweenness centrality** counts how often a node lies on shortest paths between other nodes. Informative for identifying bottlenecks or brokers — nodes whose removal would disconnect parts of the graph (gatekeepers in a social network, critical switches in a power grid). A node can have high degree but low betweenness (lots of friends, all of whom know each other) or low degree but high betweenness (a single bridge between two communities).

A.5 ★★★ | Analyze

The chapter says "networks are not always the right answer." Give a specific scenario where you would choose an adjacency matrix over a network diagram.

Guidance

Scenario: you have 500 proteins and their pairwise interaction strengths. A force-directed network diagram would produce a hairball. An adjacency matrix with proteins on both axes and cell colors for interaction strength would show the full structure without tangling. Ordering the rows and columns by cluster assignment would reveal block-diagonal structure where groups of proteins interact with each other more than with outsiders. The matrix scales to thousands of nodes; the network diagram does not.

A.6 ★★★ | Evaluate

A colleague sends you a network diagram with 300 nodes and 2000 edges. It looks like a solid mass of lines. What do you suggest?

Guidance

Several options: (1) Filter to the top 100 nodes by degree or centrality and redraw. (2) Detect communities and color nodes by community; large communities will be visible as colored blocks even in a tangle. (3) Compute an aggregated "community graph" where each super-node is a community. (4) Switch to an adjacency matrix sorted by community membership. (5) Build an interactive Plotly or pyvis version so the reader can hover, zoom, and filter. Ask the colleague what question they are trying to answer — the right fix depends on the question, not just on the visual density.

Part B: Applied (10 problems)

B.1 ★☆☆ | Apply

Create a simple undirected graph with 5 nodes and 6 edges using NetworkX, and print the number of nodes and edges.

Guidance

import networkx as nx
G = nx.Graph()
G.add_edges_from([("A", "B"), ("B", "C"), ("A", "C"), ("C", "D"), ("D", "E"), ("A", "E")])
print(G.number_of_nodes())  # 5
print(G.number_of_edges())  # 6

B.2 ★☆☆ | Apply

Load the built-in Zachary's karate club graph and draw it with default settings.

Guidance

import networkx as nx
import matplotlib.pyplot as plt

G = nx.karate_club_graph()
nx.draw(G, with_labels=True, node_color="lightblue")
plt.show()

B.3 ★★☆ | Apply

Draw the karate club graph with node sizes proportional to degree centrality and node colors from a detected community partition.

Guidance

from networkx.algorithms.community import greedy_modularity_communities

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)
communities = list(greedy_modularity_communities(G))

color_map = {}
for i, comm in enumerate(communities):
    for node in comm:
        color_map[node] = i

node_sizes = [G.degree(n) * 50 for n in G.nodes()]
node_colors = [color_map[n] for n in G.nodes()]

nx.draw(G, pos, node_size=node_sizes, node_color=node_colors, cmap="Set2",
        with_labels=True, edge_color="gray", alpha=0.7)

B.4 ★★☆ | Apply

Build a directed graph of 5 nodes with several directed edges and visualize it with arrows.

Guidance

D = nx.DiGraph()
D.add_edges_from([("A", "B"), ("B", "C"), ("C", "A"), ("A", "D"), ("D", "E")])
pos = nx.spring_layout(D, seed=42)
nx.draw(D, pos, with_labels=True, node_color="lightcoral",
        arrows=True, arrowsize=20, edge_color="gray")

NetworkX's `draw` shows arrows by default for DiGraphs.

B.5 ★★☆ | Apply

Compute a spring layout, circular layout, and shell layout for the karate club graph and compare them visually.

Guidance

G = nx.karate_club_graph()
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

for ax, layout_fn, name in zip(
    axes,
    [nx.spring_layout, nx.circular_layout, nx.shell_layout],
    ["Spring", "Circular", "Shell"],
):
    pos = layout_fn(G) if layout_fn is not nx.spring_layout else layout_fn(G, seed=42)
    nx.draw(G, pos, node_size=100, ax=ax, edge_color="gray", alpha=0.5)
    ax.set_title(name)

plt.show()

Spring reveals community structure; circular shows uniformity; shell produces a compact but edge-crossing layout.

B.6 ★★☆ | Apply

Compute betweenness centrality for the karate club graph and draw it with node size proportional to betweenness.

Guidance

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)
bet = nx.betweenness_centrality(G)
sizes = [bet[n] * 3000 + 100 for n in G.nodes()]

nx.draw(G, pos, node_size=sizes, with_labels=True, edge_color="gray")
plt.title("Karate Club — size ∝ betweenness centrality")

The result highlights the two "central" nodes in the karate club that are on many shortest paths between the two factions.

B.7 ★★★ | Apply

Build an interactive Plotly network of the karate club by converting NetworkX positions to Plotly scatter traces.

Guidance

import plotly.graph_objects as go

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)

edge_x, edge_y = [], []
for u, v in G.edges():
    edge_x.extend([pos[u][0], pos[v][0], None])
    edge_y.extend([pos[u][1], pos[v][1], None])

edge_trace = go.Scatter(x=edge_x, y=edge_y, mode="lines", line=dict(width=1, color="gray"))
node_trace = go.Scatter(
    x=[pos[n][0] for n in G.nodes()],
    y=[pos[n][1] for n in G.nodes()],
    mode="markers+text",
    text=[str(n) for n in G.nodes()],
    marker=dict(size=[G.degree(n) * 3 + 5 for n in G.nodes()], color="lightblue"),
)

fig = go.Figure([edge_trace, node_trace])
fig.update_layout(showlegend=False, xaxis_visible=False, yaxis_visible=False,
                  title="Karate Club — Interactive")
fig.show()

B.8 ★★☆ | Apply

Use pyvis to create a quick interactive network from the karate club graph.

Guidance

from pyvis.network import Network

G = nx.karate_club_graph()
net = Network(notebook=True, cdn_resources="in_line")
net.from_nx(G)
net.show("karate.html")

The resulting HTML file has a draggable, zoomable network with physics-based layout.

B.9 ★★★ | Apply

Load the Les Misérables co-appearance graph and draw it with community colors. How many communities does the detection algorithm find?

Guidance

from networkx.algorithms.community import greedy_modularity_communities

G = nx.les_miserables_graph()
communities = list(greedy_modularity_communities(G))
print(f"Found {len(communities)} communities")

pos = nx.spring_layout(G, seed=42)
color_map = {}
for i, comm in enumerate(communities):
    for node in comm:
        color_map[node] = i

nx.draw(G, pos, node_color=[color_map[n] for n in G.nodes()], cmap="tab10",
        node_size=100, edge_color="gray", alpha=0.5, with_labels=False)

The algorithm typically finds 6–8 communities in the Les Misérables graph, corresponding to different plot threads and character groupings.

B.10 ★★★ | Create

Build a Sankey diagram in Plotly showing a 3-stage flow: Sources (A, B, C) → Intermediaries (D, E) → Destinations (F, G).

Guidance

fig = go.Figure(go.Sankey(
    node=dict(label=["A", "B", "C", "D", "E", "F", "G"]),
    link=dict(
        source=[0, 1, 2, 3, 4, 3, 4],
        target=[3, 3, 4, 5, 5, 6, 6],
        value=[5, 3, 6, 8, 4, 2, 2],
    ),
))
fig.update_layout(title="3-Stage Flow")
fig.show()

Part C: Synthesis (4 problems)

C.1 ★★★ | Analyze

Take the climate correlation network from Section 24.11. Compare the network view with the Chapter 19 clustermap view. Which representation better shows the structure for 7 variables? What about for 50?

Guidance

For 7 variables, both work. The clustermap shows all correlations (including weak ones); the network filters to strong correlations and is less cluttered. Depending on the question, either is fine. For 50 variables, the clustermap scales well (50×50 is still readable); the network hairballs unless filtered aggressively. The clustermap is the more scalable representation for dense correlation data. The network becomes preferable when the graph is sparse — few strong correlations among many variables — because the sparse structure is easier to see as a network than as a mostly-empty matrix.

C.2 ★★★ | Evaluate

A colleague wants to visualize a Twitter follower network with 100,000 users. What do you tell them?

Guidance

Don't draw the full graph. 100,000 users is far beyond the hairball threshold. Options: (1) Filter to users with high follower counts or who are connected to a specific seed user; (2) detect communities and visualize the aggregated community graph (maybe 10–20 super-nodes); (3) compute summary statistics (degree distribution, clustering coefficient, top-k central users) and display them as charts instead of a network diagram; (4) use sigma.js (JavaScript) for a large-scale interactive rendering if a browser-based visualization is essential. A static Python network visualization of 100,000 nodes is not a realistic goal.

C.3 ★★★ | Create

Build an adjacency matrix visualization of the karate club graph, ordered by community membership. How does it compare to the network diagram?

Guidance

import numpy as np
import seaborn as sns

G = nx.karate_club_graph()
communities = list(greedy_modularity_communities(G))
order = [node for comm in communities for node in sorted(comm)]
adj = nx.to_numpy_array(G, nodelist=order)

fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(adj, xticklabels=order, yticklabels=order, cmap="Blues",
            square=True, cbar=False, ax=ax)
ax.set_title("Karate Club — Adjacency Matrix (ordered by community)")

The reordered matrix shows block-diagonal structure — dense blocks of 1s on the diagonal represent intra-community edges; sparse off-diagonal blocks are inter-community edges. This is the same community structure you see in the colored network diagram, but in a more scalable form.

C.4 ★★★ | Evaluate

The chapter says NetworkX is primarily an analysis library with bolted-on visualization. What would a "visualization-first" network library look like, and why hasn't one emerged in Python?

Guidance

A visualization-first library would prioritize interactive layouts, rich styling, built-in support for community colors and centrality-based sizing, smooth transitions for temporal networks, and polished export to publication-quality images. pyvis approaches this but wraps vis.js and is less flexible; Plotly has no built-in network type. D3.js is the de facto visualization-first library, but it is JavaScript. The reason Python lacks one: networks are a specialized domain, and the Python scientific community has prioritized analysis (hence NetworkX, igraph) over visualization. For visualization-first work, most serious practitioners use Gephi (desktop) or D3 (web) rather than Python.

These exercises exercise NetworkX, layout algorithms, centrality, community detection, and the interactive libraries for networks. With Chapter 24 complete, Part V (Interactive Visualization) is finished. Chapter 25 begins Part VI (Specialized Domains) with time series visualization.