Quiz: Network and Graph Visualization

Q: Which Python library is the de facto standard for graph data structures and algorithms? A) Plotly B) NetworkX C) matplotlib D) pandas

B. NetworkX is the standard Python library for graph data structures and analysis. Plotly and matplotlib provide visualization; NetworkX provides the data structures and algorithms.

Q: Which layout algorithm uses a force-directed model where nodes repel each other and edges act as springs? A) Circular B) Spring (Fruchterman-Reingold) C) Shell D) Random

B. `nx.spring_layout` implements Fruchterman-Reingold, the most common force-directed algorithm. Nodes repel each other (to spread out), and edges pull connected nodes together (to maintain structure).

Q: What is the "hairball problem" in network visualization? A) A performance issue in NetworkX for large graphs B) A rendering bug in matplotlib C) Large networks producing dense visual tangles that convey no information D) An incorrect centrality calculation

C. The hairball is the situation where a network of more than a few hundred nodes produces a dense tangle when drawn with a force-directed layout. The chapter's threshold concept is that networks are not always the right answer — sometimes a matrix, arc diagram, or aggregated view is better.

Q: Which centrality measure counts how often a node lies on shortest paths between other nodes? A) Degree centrality B) Closeness centrality C) Betweenness centrality D) Eigenvector centrality

C. Betweenness centrality measures how often a node is a "bottleneck" on shortest paths. High-betweenness nodes are brokers or gatekeepers. Degree centrality counts direct connections; closeness measures average distance to all others; eigenvector is recursive influence.

Q: What does Plotly Express provide for network visualization? A) `px.network` B) `px.graph` C) `px.scatter` with manual node/edge conversion — no built-in network type D) `px.choropleth` with graph mode

C. Plotly Express has no built-in network function. You build networks in Plotly by manually constructing Scatter traces for edges and nodes from NetworkX positions. This is verbose but gives full interactive control.

Q: Which library wraps the vis.js JavaScript library for quick interactive networks in Python? A) pyvis B) graphviz C) networkx-viz D) d3py

A. pyvis wraps vis.js and lets you create interactive networks with physics-based layouts in a few lines. It is the fastest path to an interactive network in Python.

Q: Which visualization is most appropriate for a graph with 2000 nodes? A) Force-directed network diagram B) Adjacency matrix (heatmap) C) Chord diagram D) Sankey diagram

B. Adjacency matrices scale to thousands of nodes without tangling. A 2000-node force-directed diagram would be a hairball. Chord and Sankey diagrams are for specific flow/aggregate structures, not general large graphs.

Q: In a bipartite graph: A) Edges are directed B) Nodes come in two disjoint sets with edges only between them C) Every node has exactly two neighbors D) Every edge has a weight

B. A bipartite graph has two distinct node sets, and edges only connect nodes from different sets. Examples: users ↔ items, authors ↔ papers, diseases ↔ genes.

Q: Which chart type is best for visualizing flows between categories (e.g., energy sources → uses → waste)? A) Network diagram B) Adjacency matrix C) Sankey diagram D) Chord diagram

C. Sankey diagrams are designed for flow data with clear stage structure. The width of each ribbon encodes the magnitude of flow. Plotly has `go.Sankey` for this directly.

Q: The chapter's threshold concept is: A) All networks should be drawn as force-directed diagrams B) Networks are not always the right answer C) Centrality is the most important metric D) Graphs must be undirected to visualize

B. The chapter argues that drawing a network is not always the best way to present relational data. Matrices, arc diagrams, chord diagrams, and aggregated views may communicate better depending on the graph size and the question being asked.

DataField.Dev

Quiz: Network and Graph Visualization

Answer all 20 questions. Answers and explanations are hidden below each question.

Part I: Multiple Choice (10 questions)

Q1. Which Python library is the de facto standard for graph data structures and algorithms?

A) Plotly B) NetworkX C) matplotlib D) pandas

Answer

**B.** NetworkX is the standard Python library for graph data structures and analysis. Plotly and matplotlib provide visualization; NetworkX provides the data structures and algorithms.

Q2. Which layout algorithm uses a force-directed model where nodes repel each other and edges act as springs?

A) Circular B) Spring (Fruchterman-Reingold) C) Shell D) Random

Answer

**B.** `nx.spring_layout` implements Fruchterman-Reingold, the most common force-directed algorithm. Nodes repel each other (to spread out), and edges pull connected nodes together (to maintain structure).

Q3. What is the "hairball problem" in network visualization?

A) A performance issue in NetworkX for large graphs B) A rendering bug in matplotlib C) Large networks producing dense visual tangles that convey no information D) An incorrect centrality calculation

Answer

**C.** The hairball is the situation where a network of more than a few hundred nodes produces a dense tangle when drawn with a force-directed layout. The chapter's threshold concept is that networks are not always the right answer — sometimes a matrix, arc diagram, or aggregated view is better.

Q4. Which centrality measure counts how often a node lies on shortest paths between other nodes?

A) Degree centrality B) Closeness centrality C) Betweenness centrality D) Eigenvector centrality

Answer

**C.** Betweenness centrality measures how often a node is a "bottleneck" on shortest paths. High-betweenness nodes are brokers or gatekeepers. Degree centrality counts direct connections; closeness measures average distance to all others; eigenvector is recursive influence.

Q5. What does Plotly Express provide for network visualization?

A) px.network B) px.graph C) px.scatter with manual node/edge conversion — no built-in network type D) px.choropleth with graph mode

Answer

**C.** Plotly Express has no built-in network function. You build networks in Plotly by manually constructing Scatter traces for edges and nodes from NetworkX positions. This is verbose but gives full interactive control.

Q6. Which library wraps the vis.js JavaScript library for quick interactive networks in Python?

A) pyvis B) graphviz C) networkx-viz D) d3py

Answer

**A.** pyvis wraps vis.js and lets you create interactive networks with physics-based layouts in a few lines. It is the fastest path to an interactive network in Python.

Q7. Which visualization is most appropriate for a graph with 2000 nodes?

A) Force-directed network diagram B) Adjacency matrix (heatmap) C) Chord diagram D) Sankey diagram

Answer

**B.** Adjacency matrices scale to thousands of nodes without tangling. A 2000-node force-directed diagram would be a hairball. Chord and Sankey diagrams are for specific flow/aggregate structures, not general large graphs.

Q8. In a bipartite graph:

A) Edges are directed B) Nodes come in two disjoint sets with edges only between them C) Every node has exactly two neighbors D) Every edge has a weight

Answer

**B.** A bipartite graph has two distinct node sets, and edges only connect nodes from different sets. Examples: users ↔ items, authors ↔ papers, diseases ↔ genes.

Q9. Which chart type is best for visualizing flows between categories (e.g., energy sources → uses → waste)?

A) Network diagram B) Adjacency matrix C) Sankey diagram D) Chord diagram

Answer

**C.** Sankey diagrams are designed for flow data with clear stage structure. The width of each ribbon encodes the magnitude of flow. Plotly has `go.Sankey` for this directly.

Q10. The chapter's threshold concept is:

A) All networks should be drawn as force-directed diagrams B) Networks are not always the right answer C) Centrality is the most important metric D) Graphs must be undirected to visualize

Answer

**B.** The chapter argues that drawing a network is not always the best way to present relational data. Matrices, arc diagrams, chord diagrams, and aggregated views may communicate better depending on the graph size and the question being asked.

Part II: Short Answer (10 questions)

Q11. Write NetworkX code to create a graph from a pandas DataFrame with source, target, and weight columns.

Answer

import networkx as nx
G = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"])

Add `create_using=nx.DiGraph` for a directed graph.

Q12. Describe the difference between degree centrality and eigenvector centrality.

Answer

Degree centrality counts direct neighbors; a node has high degree centrality if it is connected to many other nodes. Eigenvector centrality is recursive: a node is important if it is connected to other important nodes. A node with few connections but connections to highly-connected nodes can have high eigenvector centrality and low degree centrality, or vice versa. PageRank is a variant of eigenvector centrality.

Q13. When should you use spring_layout, and when should you try kamada_kawai_layout instead?

Answer

Spring_layout is the default and works well for most small-to-medium graphs. It is fast and produces organic-looking results. Kamada_kawai_layout is a different force-directed algorithm that often produces cleaner layouts for graphs with clear structure but is slower. Try kamada_kawai when spring produces a cluttered layout on a graph you expect to have clean community structure.

Q14. Explain why community detection results can vary between runs of the same algorithm.

Answer

Many community detection algorithms (Louvain, label propagation) are stochastic — they involve random initial conditions, random tie-breaking, or random sampling. Different runs can produce different partitions, especially for graphs where the community structure is ambiguous. Set a seed and report the algorithm parameters (modularity resolution, number of iterations) for reproducibility.

Q15. What are three things you can do to make a dense network visualization more legible?

Answer

(1) **Filter** — show only high-centrality nodes or high-weight edges. (2) **Aggregate** — collapse communities into super-nodes. (3) **Reduce visual noise** — use transparent edges (`alpha=0.3`), remove labels from low-importance nodes, use a smaller edge width. (4) **Switch representation** — use an adjacency matrix or arc diagram instead. (5) **Go interactive** — let the user zoom and filter rather than showing everything statically.

Q16. When is a matrix a better representation than a force-directed network diagram?

Answer

When the graph has many nodes (more than a few hundred), when edge density is high, when the question is about structure (communities, clusters) rather than specific connections, and when you want a reproducible representation that does not depend on layout algorithm choices. Matrices scale to thousands of nodes and reveal block-diagonal structure when sorted appropriately.

Q17. Describe the workflow for building a Plotly network visualization.

Answer

(1) Use NetworkX for the graph data structure. (2) Compute positions with a layout algorithm (`pos = nx.spring_layout(G)`). (3) Build an edge trace: a `go.Scatter` with all edge endpoints concatenated, separated by `None` to produce discontinuous segments. (4) Build a node trace: a `go.Scatter` with positions from `pos`, with size/color encoding node attributes. (5) Combine into a `go.Figure` and customize the layout (hide axes, set title, add hover template).

Q18. What is the difference between nx.Graph and nx.DiGraph?

Answer

`nx.Graph` is an undirected graph — an edge from A to B is the same as an edge from B to A. `nx.DiGraph` is a directed graph — the edge (A, B) is distinct from (B, A). DiGraphs have `predecessors()` and `successors()` methods that distinguish incoming from outgoing edges; plain Graphs only have `neighbors()`.

Q19. Explain when a Sankey diagram is appropriate.

Answer

Sankey diagrams are for flow data with clear stage structure: energy flows, budget allocations, customer journeys, material flows through a supply chain. The data must have sources, intermediate stages, and destinations, with numeric flow magnitudes. Sankey diagrams show the relative magnitudes of flows through ribbon widths and the branching structure through the layout. They are poor for general undirected networks or for abstract relational data.

Q20. The chapter mentions Gephi and D3.js as alternatives to Python network libraries. Under what circumstances would you recommend each?

Answer

**Gephi**: for exploratory network analysis and one-off visualization where interactive filtering and polished layouts matter. Free desktop application. Best when you want to explore an unfamiliar graph before deciding how to visualize it in a paper or report. **D3.js**: for production-quality web visualization with unmatched polish and customization. Expensive in development time but produces the best visual output in the field. Best when the visualization is high-profile (news article, flagship dashboard) and you have JavaScript expertise. For everyday Python work, NetworkX + Plotly or pyvis is usually sufficient without reaching for either.

Scoring Rubric

Score	Level	Meaning
18–20	Mastery	You understand graph data, layouts, centrality, and when to use networks vs. alternatives.
14–17	Proficient	You know the main APIs; review community detection and centrality sections.
10–13	Developing	You grasp the basics; re-read Sections 24.4-24.9 and work all Part B exercises.
< 10	Review	Re-read the full chapter and complete all Part A and Part B exercises. Part VI begins next.

With this quiz, Part V (Interactive Visualization) is complete. Chapter 25 begins Part VI with time series visualization.