Key Takeaways: Network and Graph Visualization
-
Graphs are the right data model for relational data. When what matters is the connections between entities rather than the entities themselves, graphs (nodes + edges) are the natural representation. Social networks, citations, food webs, supply chains, and correlation networks all fit this model.
-
NetworkX is the Python standard for graph data and algorithms. It provides the
GraphandDiGraphclasses, implementations of classic algorithms (shortest paths, centrality, community detection, random graph generators), and basic plotting. It is primarily an analysis library — use Plotly or pyvis for polished visualization. -
Layout algorithms construct the visual positions. Graphs do not have intrinsic coordinates. Force-directed (
spring_layout,kamada_kawai) is the default and works for most small graphs. Usecircular,shell,spectral, orbipartitelayouts when the graph's structure matches their assumptions. Set a seed for reproducible layouts. -
Node size and color are the most common visual encodings. Size typically encodes a centrality measure (degree, betweenness, eigenvector). Color typically encodes community membership. Be explicit in your caption about which metric you are using — "size by degree centrality" is better than "size by importance."
-
The hairball problem is the central challenge. Above a few hundred nodes, force-directed network diagrams become dense tangles that convey nothing. The responses are filtering, aggregation, switching to matrices/arc/chord diagrams, or going interactive with zoom and hover.
-
Plotly has no built-in network type. You build networks by constructing edge traces (Scatter with line segments) and node traces (Scatter with markers) from NetworkX positions manually. The code is verbose but produces fully interactive results. pyvis provides a quicker alternative via vis.js.
-
Adjacency matrices scale where networks do not. For graphs beyond a few hundred nodes, adjacency matrices (rendered as seaborn heatmaps) are often better than network diagrams. They reveal block structure when sorted by community, and they scale to thousands of nodes without tangling.
-
Community detection reveals latent structure. Algorithms like Louvain and modularity maximization partition the graph into communities. Colors on the visualization let the reader see the partition at a glance. Different algorithms produce different partitions — report the algorithm, the seed, and any parameters for reproducibility.
-
Sankey diagrams are a special case for flow data. When the network has a clear stage structure (sources → intermediaries → destinations) with meaningful flow magnitudes, Sankey diagrams communicate the flows more directly than general network diagrams. Plotly has
go.Sankeyfor this. -
Networks are not always the right answer. The chapter's threshold concept: the natural instinct to draw a network for every relational dataset is often wrong. Above a few dozen nodes, matrices or aggregated views communicate better. At the scale of the Facebook social graph (hundreds of millions of nodes), networks must be replaced entirely with statistical aggregations. Knowing when to draw a network and when not to is part of becoming fluent with relational data.
Chapter 24 concludes Part V (Interactive Visualization). You now have five complementary interactive libraries in your toolkit: Plotly Express, Plotly Graph Objects, Altair, geospatial (via geopandas/Folium/Plotly), and network (via NetworkX/pyvis). Chapter 25 begins Part VI (Specialized Domains) with time series visualization — the specific techniques and tools for data indexed by time.