Case Study: Auditing a Team Collaboration Network with the Toolkit

DataField.Dev

Case Study: Auditing a Team Collaboration Network with the Toolkit

"The shortest path between two truths in the real domain passes through the complex domain." — Jacques Hadamard

Executive Summary

This is an analysis-heavy capstone walkthrough on Track B (the social-network analyzer of §39.4). You will take a real-shaped dataset — who-has-worked-with-whom on a mid-size engineering team — and run it through an analyzer assembled entirely from dmtoolkit pieces you already own. The goal is not to build a new algorithm; it is to interpret a graph: find the team's connectors, measure how far apart people are, detect whether the team has quietly split into silos, and report all of it honestly, distinguishing what the math guarantees from what the data merely suggests. Every number below is hand-derived from the graph, exactly as the book's no-execution rule requires, so you can check the analyzer's output against your own pencil.

The deliverable a manager would actually want — "who is the single point of failure, and is this team one team or two?" — turns out to be two graph quantities you proved things about in Part V: a high-degree, high-betweenness bridge vertex, and the connected-component count. The capstone lesson is that the hard part is no longer any algorithm; it is choosing the right quantity and reading it without overclaiming.

Skills applied

Modeling a messy real situation as an undirected graph $G = (V, E)$ (§39.4, Chapter 27).
Composing Graph, bfs, and a hand-written components/diameter into an analyzer.
Reading centrality, distance, components, and diameter off the graph and verifying each by hand.
Citing the BFS-correctness theorem (Chapter 28) instead of re-proving it.
Reporting findings with the §39.7 discipline: state the exact guarantee, name the limits.

Background

The scenario

You have joined a platform team of eight engineers as a tech lead, and your director asks a blunt question before the next reorg: "Is this one team or two teams pretending to be one — and if a key person leaves, does it fall apart?" You do not have a survey; what you have is the version-control history, from which you extract a simple fact for each pair of engineers: have they ever co-authored a pull request? That single relation is enough.

This is a constructed but realistic dataset (a hypothetical example, in the book's labeling): eight people, and an edge whenever two of them have shipped code together.

Engineer	Co-authored PRs with
Priya	Quinn, Raj, Sam
Quinn	Priya, Raj
Raj	Priya, Quinn
Sam	Priya, Tara
Tara	Sam, Uma, Vik
Uma	Tara, Vik, Wen
Vik	Tara, Uma
Wen	Uma

Co-authorship is symmetric — if Priya co-authored with Sam, then Sam co-authored with Priya — so the relation is an undirected graph, precisely the Graph class of Chapter 27. That modeling choice is the whole foundation; get it right and every metric is a Toolkit call.

💡 Intuition: A collaboration graph turns an organizational feeling ("it seems like the back-end folks never talk to the data folks") into a measurable object. "Are we siloed?" becomes "how many connected components are there?" "Who is the glue?" becomes "which vertex, if removed, disconnects the graph?" The director's intuition was always graph theory; we are just giving it coordinates.

Why this matters

Collaboration-, dependency-, and follower-graph analysis is a standard part of engineering management, organizational research, and product features ("people you may know"). The same analyzer applies, with nothing changed but the input, to a microservice call graph (who calls whom — a single point of failure is a service every request routes through) or a citation graph. You are building a lens, and the lens generalizes.

Phase 1: Build the graph and read off the obvious

First, encode the table as edges and load them into the Graph class. We deduplicate by listing each edge once.

from dmtoolkit.graphs import Graph, bfs

g = Graph()
edges = [
    ("Priya", "Quinn"), ("Priya", "Raj"), ("Quinn", "Raj"),   # cluster 1 triangle
    ("Priya", "Sam"),                                          # link 1->bridge
    ("Sam", "Tara"),                                           # the bridge edge
    ("Tara", "Uma"), ("Tara", "Vik"), ("Uma", "Vik"),         # cluster 2 triangle
    ("Uma", "Wen"),                                            # pendant
]
for u, v in edges:
    g.add_edge(u, v)

print("people:", len(g.vertices()))
print("connections:", len(edges))
# Expected output:
# people: 8
# connections: 9

There are 8 vertices and 9 edges. Now the simplest metric, degree centrality — who has the most direct collaborators?

central = max(sorted(g.vertices()), key=g.degree)
print("most connected:", central, "with degree", g.degree(central))
# Expected output:
# most connected: Tara with degree 3

Hand-trace the degrees, the Chapter 6 habit, so you trust the output:

Vertex	Neighbors	Degree
Priya	Quinn, Raj, Sam	3
Quinn	Priya, Raj	2
Raj	Priya, Quinn	2
Sam	Priya, Tara	2
Tara	Sam, Uma, Vik	3
Uma	Tara, Vik, Wen	3
Vik	Tara, Uma	2
Wen	Uma	1

Three vertices tie at degree 3 — Priya, Tara, and Uma. Sorting the vertices first makes the tie-break deterministic: iterating alphabetically [Priya, Quinn, Raj, Sam, Tara, Uma, Vik, Wen], max keeps the first to reach degree 3, which is Priya — wait. Read the rule carefully: Python's max returns the first element achieving the maximum, and Priya (degree 3) precedes Tara. So why does the output say Tara?

🐛 Find the Error: The output above is wrong on purpose. With sorted(g.vertices()), max(..., key=g.degree) returns the alphabetically-first degree-3 vertex, which is Priya, not Tara. Hand-trace it and fix the expected-output comment.

Answer

Priya, Tara, and Uma all have degree 3. max over the sorted list [Priya, Quinn, …] returns the first maximizer it meets — Priya. The corrected output is most connected: Priya with degree 3. The deeper point (and the §39.7 honesty lesson): degree centrality has a three-way tie here, and a good audit reports all three, not whichever one max happens to surface. Hiding a tie behind an arbitrary tie-break is exactly the kind of quiet overclaim the chapter warns against.

So the honest finding is: Priya, Tara, and Uma are tied as the most-connected engineers. Degree alone, though, misses something a manager cares about more — and Phase 3 will find it.

Phase 2: Degrees of separation, and a structural surprise

How far apart are people? Run BFS from one end and read distances. Take Wen (the lone pendant) as a source, since intuitively she is the most peripheral.

dist, parent = bfs(g, "Wen")
for person in sorted(dist):
    print(person, dist[person])
# Expected output:
# Priya 4
# Quinn 5
# Raj 5
# Sam 3
# Tara 2
# Uma 1
# Vik 2
# Wen 0

Hand-trace the BFS layers from Wen:

Layer 0: Wen.
Layer 1: Wen's neighbors — Uma. (dist[Uma] = 1.)
Layer 2: Uma's undiscovered neighbors — Tara, Vik. (dist = 2.)
Layer 3: Tara's undiscovered neighbor — Sam (Vik adds nothing new). (dist[Sam] = 3.)
Layer 4: Sam's undiscovered neighbor — Priya. (dist[Priya] = 4.)
Layer 5: Priya's undiscovered neighbors — Quinn, Raj. (dist = 5.)

Every vertex is reached, and the farthest from Wen are Quinn and Raj at distance 5. Because BFS labels vertices in order of true distance (the Chapter 28 theorem we rely on, restated in Phase 4), these numbers are the shortest hop-counts, not merely some path's length.

🔄 Check Your Understanding 1. Wen reaches Quinn in 5 hops. Trace the actual shortest path Wen→Quinn and confirm it has 5 edges. 2. The parent dictionary that bfs returns alongside dist lets you reconstruct that path. In one sentence, how?

Answer

Wen–Uma–Tara–Sam–Priya–Quinn: that is 5 edges, matching dist[Quinn] = 5. (Wen–Uma–Vik dead-ends back into the same cluster, so it is not shorter.) 2. Follow parent backward from Quinn — parent[Quinn] = Priya, parent[Priya] = Sam, …, up to parent[Uma] = Wen — then reverse the list; this is the standard BFS shortest-path reconstruction from Chapter 28.

The surprise: this eight-person team has a diameter of at least 5. For a team that small, a distance of 5 between Wen and Quinn is large — it is the signature of a chain-of-triangles shape rather than a tightly knit group. The "six degrees" folklore predicts small diameters in huge social graphs; here a tiny graph has a stretched-out diameter precisely because it is two clusters joined by a thin thread.

Phase 3: Components and the single point of failure

Now the questions the director actually asked. First: is this one team or two? Count connected components by iterating BFS (the §39.4 pattern).

def components(g):
    """Return a list of vertex-sets, one per connected component."""
    seen, comps = set(), []
    for v in g.vertices():
        if v not in seen:
            dist, _ = bfs(g, v)          # reaches v's whole component
            comp = set(dist)             # keys of dist = reachable from v
            comps.append(comp)
            seen |= comp
    return comps

print("number of teams (components):", len(components(g)))
# Expected output:
# number of teams (components): 1

The single edge Sam–Tara stitches the two triangles together, so the graph is connected: one component of all eight people. On paper: BFS from the first vertex (Priya, say) reaches everyone — Priya→{Quinn,Raj,Sam}→Tara→{Uma,Vik}→Wen — so seen swallows all eight after one BFS and the loop never starts a second component. The team is technically one team. But "connected" is a fragile kind of one-ness, which the next metric exposes.

Second, and more important: who is the single point of failure? A vertex whose removal disconnects the graph is a cut vertex (an articulation point). The chapter's analyzer doesn't ship one, but we can build it from the components tool: remove a vertex, rebuild the graph, and ask whether the component count went up.

def graph_without(g, v):
    """Rebuild g with vertex v (and its edges) removed. Survivors with no
    remaining edges become isolated -- add them so they still count."""
    h = Graph()
    survivors = [u for u in g.vertices() if u != v]
    for u in survivors:
        h.add_vertex(u)                       # keep isolated survivors visible
        for w in g.neighbors(u):              # rebuild edges not touching v
            if w != v and u < w:              # u < w lists each edge once
                h.add_edge(u, w)
    return h

def is_cut_vertex(g, v):
    """True if deleting v increases the number of connected components."""
    return len(components(graph_without(g, v))) > len(components(g))

print("Sam is a cut vertex:", is_cut_vertex(g, "Sam"))    # severs Priya-triangle
print("Tara is a cut vertex:", is_cut_vertex(g, "Tara"))  # severs Uma/Vik/Wen side
print("Priya is a cut vertex:", is_cut_vertex(g, "Priya"))# NOT a cut vertex
# Expected output:
# Sam is a cut vertex: True
# Tara is a cut vertex: True
# Priya is a cut vertex: False

The is_cut_vertex helper composes the components tool we already trust: it rebuilds the graph without v (keeping isolated survivors visible, the one piece of bookkeeping easy to forget) and asks whether the component count went up. Verify each answer by hand — which is what the chapter wants you to trust anyway:

Remove Sam. The only path from {Priya, Quinn, Raj} to {Tara, Uma, Vik, Wen} ran through the Sam–Tara bridge, and Sam's other edge was to Priya. With Sam gone, the Priya-triangle {Priya, Quinn, Raj} is severed from the Tara-cluster: 2 components. So Sam is a cut vertex.
Remove Tara. Tara is the other end of the bridge and also the only attachment of the Uma–Vik–Wen group to Sam's side. Removing Tara splits the graph into {Priya, Quinn, Raj, Sam} and {Uma, Vik, Wen}: 2 components. So Tara is a cut vertex.
Remove Uma. Wen's only neighbor is Uma. Removing Uma isolates Wen: {Wen} becomes its own component. So Uma is a cut vertex too — and Wen is the most vulnerable person on the team.

⚠️ Common Pitfall: Confusing high degree with criticality. Priya ties for the highest degree (3) yet is not a cut vertex — deleting Priya leaves Quinn–Raj still connected to each other and, through Sam, to the rest. Conversely, Sam has degree only 2 but is a single point of failure. The metric that answers "who is critical?" is cut vertices, not degree. Reporting Priya as "most important because most connected" would be a confident, wrong answer — the silent-overclaim trap of §39.7.

So the director's two questions resolve cleanly: it is nominally one team, but it hangs on the thin Sam–Tara thread, and Sam, Tara, and Uma are each single points of failure. That is a finding a manager can act on (pair people across the bridge; give Wen a second collaborator).

Phase 4: The correctness argument we are standing on

Every number above — distances, components, the cut-vertex reasoning — assumes one thing: that BFS really computes shortest-path distances. We do not re-prove it; we cite it, which is the professional habit §39.7 names. The result, from Chapter 28:

BFS computes shortest-path distances. In BFS from $s$ on an unweighted graph, for every reachable vertex $v$, dist[v] equals the number of edges on a shortest $s$–$v$ path.

The strategy (recap). BFS dequeues vertices in nondecreasing order of true distance; an induction on the BFS layer shows every vertex at true distance $d$ is labeled exactly when layer $d-1$ is being processed, so its label is its distance. The full proof is in Chapter 28.

Why this matters for this audit specifically: the cut-vertex analysis depends on components being correct, and components is correct only because each BFS reaches exactly the source's component (a direct corollary — a vertex is reachable from $s$ iff BFS assigns it a finite dist). And the diameter claim ("at least 5") is a max over distances that are themselves correct. This is the capstone lesson in one paragraph: a correct audit is a composition of correct tools, and the audit is only as trustworthy as the theorem under its smallest piece. We did not earn the right to say "Quinn is 5 hops from Wen" by running the code; we earned it by citing the proof that the code's output means what we claim.

🔗 Connection. The same composition discipline appears in Track A: there, the load-bearing theorem is $m^{ed} \equiv m \pmod n$ (Euler, Chapter 25), and everything else — encoding, blocking — is plumbing around it. Different track, identical structure: one proven core, a thin correct layer on top.

Discussion Questions

The audit found the team is one component but riddled with cut vertices. Sketch the minimal set of new edges you would recommend to make the graph 2-edge-connected (no single edge whose removal disconnects it). What does each new edge mean organizationally?
We measured centrality by degree. Betweenness centrality counts how many shortest paths pass through a vertex. Argue, using the Sam–Tara bridge, why betweenness would rank Sam and Tara far above Priya even though Priya has higher degree. Which metric better matches "single point of failure"?
The diameter is "at least 5" — we computed it from one BFS source (Wen). What additional computation does the §39.4 diameter function do to turn "at least 5" into the exact diameter, and why is one BFS not enough?
Our model throws away how much two people collaborate (10 PRs vs. 1 PR are the same edge). Name the §39.4 extension that fixes this, which Toolkit function replaces bfs, and one conclusion above that might change under weighting.
Co-authorship is symmetric, so we used an undirected graph. Name a workplace relation that is not symmetric (so it would need a directed graph), and state how "cut vertex" would have to be redefined there.

Your Turn: Extensions

Option A (recommend a collaboration). Implement the "people you may know" recommender from §39.4: for Wen, score every non-neighbor by shared collaborators $\lvert N(\text{Wen}) \cap N(w) \rvert$. Hand-derive the scores. Who should Wen pair with next to reduce her vulnerability? (Hint: $N(\text{Wen}) = {\text{Uma}}$, so the score is just "how many of $w$'s collaborators include Uma.")
Option B (full cut-vertex finder). Replace the by-hand reasoning of Phase 3 with a correct, clean cut_vertices(g) that, for each vertex, rebuilds the graph without it and compares component counts. State its running time in terms of the per-vertex components cost, and test it by hand against the three cut vertices we found (Sam, Tara, Uma).
Option C (write the one-page report). Draft the §39.7 five-section write-up for this audit: problem/model, design (which Toolkit pieces), correctness (cite the BFS theorem), evidence (the hand-traced tables above), and limits (what the unweighted, undirected, snapshot model ignores). The graded line is the limits section.

Key Takeaways

The model is the work. Turning "are we siloed?" into "how many connected components?" and "who is critical?" into "which vertices are cut vertices?" is the intellectual core; the code is a handful of Toolkit calls.
Pick the metric that answers the actual question. Degree centrality found ties that do not correspond to criticality; cut vertices do. The wrong metric yields a confident wrong answer.
Compose proven tools, and cite the proof. The audit's trustworthiness rests entirely on the Chapter 28 BFS-correctness theorem — restated, not re-derived. A correct system is a composition of correct parts.
Report ties and limits honestly. Three-way degree ties, an undirected snapshot that ignores collaboration strength, a diameter we only lower-bounded — naming each is the §39.7 discipline that separates an audit from a guess.