Case Study: Auditing a Team Collaboration Network with the Toolkit
"The shortest path between two truths in the real domain passes through the complex domain." — Jacques Hadamard
Executive Summary
This is an analysis-heavy capstone walkthrough on Track B (the social-network analyzer of §39.4).
You will take a real-shaped dataset — who-has-worked-with-whom on a mid-size engineering team — and run
it through an analyzer assembled entirely from dmtoolkit pieces you already own. The goal is not to
build a new algorithm; it is to interpret a graph: find the team's connectors, measure how far apart
people are, detect whether the team has quietly split into silos, and report all of it honestly,
distinguishing what the math guarantees from what the data merely suggests. Every number below is
hand-derived from the graph, exactly as the book's no-execution rule requires, so you can check the
analyzer's output against your own pencil.
The deliverable a manager would actually want — "who is the single point of failure, and is this team one team or two?" — turns out to be two graph quantities you proved things about in Part V: a high-degree, high-betweenness bridge vertex, and the connected-component count. The capstone lesson is that the hard part is no longer any algorithm; it is choosing the right quantity and reading it without overclaiming.
Skills applied
- Modeling a messy real situation as an undirected graph $G = (V, E)$ (§39.4, Chapter 27).
- Composing
Graph,bfs, and a hand-writtencomponents/diameterinto an analyzer. - Reading centrality, distance, components, and diameter off the graph and verifying each by hand.
- Citing the BFS-correctness theorem (Chapter 28) instead of re-proving it.
- Reporting findings with the §39.7 discipline: state the exact guarantee, name the limits.
Background
The scenario
You have joined a platform team of eight engineers as a tech lead, and your director asks a blunt question before the next reorg: "Is this one team or two teams pretending to be one — and if a key person leaves, does it fall apart?" You do not have a survey; what you have is the version-control history, from which you extract a simple fact for each pair of engineers: have they ever co-authored a pull request? That single relation is enough.
This is a constructed but realistic dataset (a hypothetical example, in the book's labeling): eight people, and an edge whenever two of them have shipped code together.
| Engineer | Co-authored PRs with |
|---|---|
| Priya | Quinn, Raj, Sam |
| Quinn | Priya, Raj |
| Raj | Priya, Quinn |
| Sam | Priya, Tara |
| Tara | Sam, Uma, Vik |
| Uma | Tara, Vik, Wen |
| Vik | Tara, Uma |
| Wen | Uma |
Co-authorship is symmetric — if Priya co-authored with Sam, then Sam co-authored with Priya — so the
relation is an undirected graph, precisely the Graph class of Chapter 27. That modeling choice is
the whole foundation; get it right and every metric is a Toolkit call.
💡 Intuition: A collaboration graph turns an organizational feeling ("it seems like the back-end folks never talk to the data folks") into a measurable object. "Are we siloed?" becomes "how many connected components are there?" "Who is the glue?" becomes "which vertex, if removed, disconnects the graph?" The director's intuition was always graph theory; we are just giving it coordinates.
Why this matters
Collaboration-, dependency-, and follower-graph analysis is a standard part of engineering management, organizational research, and product features ("people you may know"). The same analyzer applies, with nothing changed but the input, to a microservice call graph (who calls whom — a single point of failure is a service every request routes through) or a citation graph. You are building a lens, and the lens generalizes.
Phase 1: Build the graph and read off the obvious
First, encode the table as edges and load them into the Graph class. We deduplicate by listing each
edge once.
from dmtoolkit.graphs import Graph, bfs
g = Graph()
edges = [
("Priya", "Quinn"), ("Priya", "Raj"), ("Quinn", "Raj"), # cluster 1 triangle
("Priya", "Sam"), # link 1->bridge
("Sam", "Tara"), # the bridge edge
("Tara", "Uma"), ("Tara", "Vik"), ("Uma", "Vik"), # cluster 2 triangle
("Uma", "Wen"), # pendant
]
for u, v in edges:
g.add_edge(u, v)
print("people:", len(g.vertices()))
print("connections:", len(edges))
# Expected output:
# people: 8
# connections: 9
There are 8 vertices and 9 edges. Now the simplest metric, degree centrality — who has the most direct collaborators?
central = max(sorted(g.vertices()), key=g.degree)
print("most connected:", central, "with degree", g.degree(central))
# Expected output:
# most connected: Tara with degree 3
Hand-trace the degrees, the Chapter 6 habit, so you trust the output:
| Vertex | Neighbors | Degree |
|---|---|---|
| Priya | Quinn, Raj, Sam | 3 |
| Quinn | Priya, Raj | 2 |
| Raj | Priya, Quinn | 2 |
| Sam | Priya, Tara | 2 |
| Tara | Sam, Uma, Vik | 3 |
| Uma | Tara, Vik, Wen | 3 |
| Vik | Tara, Uma | 2 |
| Wen | Uma | 1 |
Three vertices tie at degree 3 — Priya, Tara, and Uma. Sorting the vertices first makes the tie-break
deterministic: iterating alphabetically [Priya, Quinn, Raj, Sam, Tara, Uma, Vik, Wen], max keeps
the first to reach degree 3, which is Priya — wait. Read the rule carefully: Python's max
returns the first element achieving the maximum, and Priya (degree 3) precedes Tara. So why does the
output say Tara?
🐛 Find the Error: The output above is wrong on purpose. With
sorted(g.vertices()),max(..., key=g.degree)returns the alphabetically-first degree-3 vertex, which is Priya, not Tara. Hand-trace it and fix the expected-output comment.
Answer
Priya, Tara, and Uma all have degree 3.
maxover the sorted list[Priya, Quinn, …]returns the first maximizer it meets — Priya. The corrected output ismost connected: Priya with degree 3. The deeper point (and the §39.7 honesty lesson): degree centrality has a three-way tie here, and a good audit reports all three, not whichever onemaxhappens to surface. Hiding a tie behind an arbitrary tie-break is exactly the kind of quiet overclaim the chapter warns against.
So the honest finding is: Priya, Tara, and Uma are tied as the most-connected engineers. Degree alone, though, misses something a manager cares about more — and Phase 3 will find it.
Phase 2: Degrees of separation, and a structural surprise
How far apart are people? Run BFS from one end and read distances. Take Wen (the lone pendant) as a source, since intuitively she is the most peripheral.
dist, parent = bfs(g, "Wen")
for person in sorted(dist):
print(person, dist[person])
# Expected output:
# Priya 4
# Quinn 5
# Raj 5
# Sam 3
# Tara 2
# Uma 1
# Vik 2
# Wen 0
Hand-trace the BFS layers from Wen:
- Layer 0: Wen.
- Layer 1: Wen's neighbors — Uma. (
dist[Uma] = 1.) - Layer 2: Uma's undiscovered neighbors — Tara, Vik. (
dist = 2.) - Layer 3: Tara's undiscovered neighbor — Sam (Vik adds nothing new). (
dist[Sam] = 3.) - Layer 4: Sam's undiscovered neighbor — Priya. (
dist[Priya] = 4.) - Layer 5: Priya's undiscovered neighbors — Quinn, Raj. (
dist = 5.)
Every vertex is reached, and the farthest from Wen are Quinn and Raj at distance 5. Because BFS labels vertices in order of true distance (the Chapter 28 theorem we rely on, restated in Phase 4), these numbers are the shortest hop-counts, not merely some path's length.
🔄 Check Your Understanding 1. Wen reaches Quinn in 5 hops. Trace the actual shortest path Wen→Quinn and confirm it has 5 edges. 2. The
parentdictionary thatbfsreturns alongsidedistlets you reconstruct that path. In one sentence, how?
Answer
- Wen–Uma–Tara–Sam–Priya–Quinn: that is 5 edges, matching
dist[Quinn] = 5. (Wen–Uma–Vik dead-ends back into the same cluster, so it is not shorter.) 2. Followparentbackward from Quinn —parent[Quinn] = Priya,parent[Priya] = Sam, …, up toparent[Uma] = Wen— then reverse the list; this is the standard BFS shortest-path reconstruction from Chapter 28.
The surprise: this eight-person team has a diameter of at least 5. For a team that small, a distance of 5 between Wen and Quinn is large — it is the signature of a chain-of-triangles shape rather than a tightly knit group. The "six degrees" folklore predicts small diameters in huge social graphs; here a tiny graph has a stretched-out diameter precisely because it is two clusters joined by a thin thread.
Phase 3: Components and the single point of failure
Now the questions the director actually asked. First: is this one team or two? Count connected components by iterating BFS (the §39.4 pattern).
def components(g):
"""Return a list of vertex-sets, one per connected component."""
seen, comps = set(), []
for v in g.vertices():
if v not in seen:
dist, _ = bfs(g, v) # reaches v's whole component
comp = set(dist) # keys of dist = reachable from v
comps.append(comp)
seen |= comp
return comps
print("number of teams (components):", len(components(g)))
# Expected output:
# number of teams (components): 1
The single edge Sam–Tara stitches the two triangles together, so the graph is connected: one
component of all eight people. On paper: BFS from the first vertex (Priya, say) reaches everyone —
Priya→{Quinn,Raj,Sam}→Tara→{Uma,Vik}→Wen — so seen swallows all eight after one BFS and the loop
never starts a second component. The team is technically one team. But "connected" is a fragile kind
of one-ness, which the next metric exposes.
Second, and more important: who is the single point of failure? A vertex whose removal disconnects the graph is a cut vertex (an articulation point). The chapter's analyzer doesn't ship one, but we can build it from the components tool: remove a vertex, rebuild the graph, and ask whether the component count went up.
def graph_without(g, v):
"""Rebuild g with vertex v (and its edges) removed. Survivors with no
remaining edges become isolated -- add them so they still count."""
h = Graph()
survivors = [u for u in g.vertices() if u != v]
for u in survivors:
h.add_vertex(u) # keep isolated survivors visible
for w in g.neighbors(u): # rebuild edges not touching v
if w != v and u < w: # u < w lists each edge once
h.add_edge(u, w)
return h
def is_cut_vertex(g, v):
"""True if deleting v increases the number of connected components."""
return len(components(graph_without(g, v))) > len(components(g))
print("Sam is a cut vertex:", is_cut_vertex(g, "Sam")) # severs Priya-triangle
print("Tara is a cut vertex:", is_cut_vertex(g, "Tara")) # severs Uma/Vik/Wen side
print("Priya is a cut vertex:", is_cut_vertex(g, "Priya"))# NOT a cut vertex
# Expected output:
# Sam is a cut vertex: True
# Tara is a cut vertex: True
# Priya is a cut vertex: False
The is_cut_vertex helper composes the components tool we already trust: it rebuilds the graph
without v (keeping isolated survivors visible, the one piece of bookkeeping easy to forget) and asks
whether the component count went up. Verify each answer by hand — which is what the chapter wants
you to trust anyway:
- Remove Sam. The only path from {Priya, Quinn, Raj} to {Tara, Uma, Vik, Wen} ran through the Sam–Tara bridge, and Sam's other edge was to Priya. With Sam gone, the Priya-triangle {Priya, Quinn, Raj} is severed from the Tara-cluster: 2 components. So Sam is a cut vertex.
- Remove Tara. Tara is the other end of the bridge and also the only attachment of the Uma–Vik–Wen group to Sam's side. Removing Tara splits the graph into {Priya, Quinn, Raj, Sam} and {Uma, Vik, Wen}: 2 components. So Tara is a cut vertex.
- Remove Uma. Wen's only neighbor is Uma. Removing Uma isolates Wen: {Wen} becomes its own component. So Uma is a cut vertex too — and Wen is the most vulnerable person on the team.
⚠️ Common Pitfall: Confusing high degree with criticality. Priya ties for the highest degree (3) yet is not a cut vertex — deleting Priya leaves Quinn–Raj still connected to each other and, through Sam, to the rest. Conversely, Sam has degree only 2 but is a single point of failure. The metric that answers "who is critical?" is cut vertices, not degree. Reporting Priya as "most important because most connected" would be a confident, wrong answer — the silent-overclaim trap of §39.7.
So the director's two questions resolve cleanly: it is nominally one team, but it hangs on the thin Sam–Tara thread, and Sam, Tara, and Uma are each single points of failure. That is a finding a manager can act on (pair people across the bridge; give Wen a second collaborator).
Phase 4: The correctness argument we are standing on
Every number above — distances, components, the cut-vertex reasoning — assumes one thing: that BFS really computes shortest-path distances. We do not re-prove it; we cite it, which is the professional habit §39.7 names. The result, from Chapter 28:
BFS computes shortest-path distances. In BFS from $s$ on an unweighted graph, for every reachable vertex $v$,
dist[v]equals the number of edges on a shortest $s$–$v$ path.The strategy (recap). BFS dequeues vertices in nondecreasing order of true distance; an induction on the BFS layer shows every vertex at true distance $d$ is labeled exactly when layer $d-1$ is being processed, so its label is its distance. The full proof is in Chapter 28.
Why this matters for this audit specifically: the cut-vertex analysis depends on components being
correct, and components is correct only because each BFS reaches exactly the source's component (a
direct corollary — a vertex is reachable from $s$ iff BFS assigns it a finite dist). And the diameter
claim ("at least 5") is a max over distances that are themselves correct. This is the capstone lesson in
one paragraph: a correct audit is a composition of correct tools, and the audit is only as trustworthy
as the theorem under its smallest piece. We did not earn the right to say "Quinn is 5 hops from Wen"
by running the code; we earned it by citing the proof that the code's output means what we claim.
🔗 Connection. The same composition discipline appears in Track A: there, the load-bearing theorem is $m^{ed} \equiv m \pmod n$ (Euler, Chapter 25), and everything else — encoding, blocking — is plumbing around it. Different track, identical structure: one proven core, a thin correct layer on top.
Discussion Questions
- The audit found the team is one component but riddled with cut vertices. Sketch the minimal set of new edges you would recommend to make the graph 2-edge-connected (no single edge whose removal disconnects it). What does each new edge mean organizationally?
- We measured centrality by degree. Betweenness centrality counts how many shortest paths pass through a vertex. Argue, using the Sam–Tara bridge, why betweenness would rank Sam and Tara far above Priya even though Priya has higher degree. Which metric better matches "single point of failure"?
- The diameter is "at least 5" — we computed it from one BFS source (Wen). What additional computation
does the §39.4
diameterfunction do to turn "at least 5" into the exact diameter, and why is one BFS not enough? - Our model throws away how much two people collaborate (10 PRs vs. 1 PR are the same edge). Name the
§39.4 extension that fixes this, which Toolkit function replaces
bfs, and one conclusion above that might change under weighting. - Co-authorship is symmetric, so we used an undirected graph. Name a workplace relation that is not symmetric (so it would need a directed graph), and state how "cut vertex" would have to be redefined there.
Your Turn: Extensions
- Option A (recommend a collaboration). Implement the "people you may know" recommender from §39.4: for Wen, score every non-neighbor by shared collaborators $\lvert N(\text{Wen}) \cap N(w) \rvert$. Hand-derive the scores. Who should Wen pair with next to reduce her vulnerability? (Hint: $N(\text{Wen}) = {\text{Uma}}$, so the score is just "how many of $w$'s collaborators include Uma.")
- Option B (full cut-vertex finder). Replace the by-hand reasoning of Phase 3 with a correct,
clean
cut_vertices(g)that, for each vertex, rebuilds the graph without it and compares component counts. State its running time in terms of the per-vertexcomponentscost, and test it by hand against the three cut vertices we found (Sam, Tara, Uma). - Option C (write the one-page report). Draft the §39.7 five-section write-up for this audit: problem/model, design (which Toolkit pieces), correctness (cite the BFS theorem), evidence (the hand-traced tables above), and limits (what the unweighted, undirected, snapshot model ignores). The graded line is the limits section.
Key Takeaways
- The model is the work. Turning "are we siloed?" into "how many connected components?" and "who is critical?" into "which vertices are cut vertices?" is the intellectual core; the code is a handful of Toolkit calls.
- Pick the metric that answers the actual question. Degree centrality found ties that do not correspond to criticality; cut vertices do. The wrong metric yields a confident wrong answer.
- Compose proven tools, and cite the proof. The audit's trustworthiness rests entirely on the Chapter 28 BFS-correctness theorem — restated, not re-derived. A correct system is a composition of correct parts.
- Report ties and limits honestly. Three-way degree ties, an undirected snapshot that ignores collaboration strength, a diameter we only lower-bounded — naming each is the §39.7 discipline that separates an audit from a guess.