Case Study 2 — The Social Graph That Crushed the Joins

DataField.Dev

The honest counterpoint: sometimes the relational model genuinely is the wrong tool, and a NoSQL database is right. A social-network feature asked for deep, multi-hop relationship traversal — exactly what relational joins do badly and graph databases do well.

Background

A professional-networking app needed a "people you may know" feature: for a given user, find connections-of-connections (and connections-of-connections-of-connections) who aren't already connected, ranked by how many mutual connections they share. The data lived in PostgreSQL: a users table and a connections(user_a, user_b) junction (a graph, modeled relationally — Chapter 17).

For one hop ("my direct connections"), a simple join was instant. For two hops ("connections of my connections"), a self-join worked but got slow. For three or more hops — which the recommendation quality needed — the SQL became a tangle of repeated self-joins, and performance fell off a cliff: each hop multiplied the rows explored, and a popular user with thousands of connections produced an explosion the database couldn't handle in interactive time. The "people you may know" panel timed out.

What went wrong: relational joins vs. deep traversal

Relational databases handle moderate relationships beautifully — a join, a recursive CTE for a hierarchy (Chapter 11). But deep, branching, many-hop traversal of a large graph is their weak spot. Each hop is another join; N hops over a densely-connected graph means exploring an exponentially growing set of paths, re-joining the connections table again and again. The relational engine isn't optimized for "walk the graph N levels deep following edges" — it's optimized for set operations on tables. (A recursive CTE can traverse, but on a large, dense social graph the work still explodes.)

This is a genuine case where the data's essential nature — a large, densely-connected graph traversed deeply — doesn't fit the relational model's strengths. No index or rewrite changes that; the operation itself is graph traversal, and the tool wasn't built for it.

The fix: a graph database

The team moved the connections graph (not their whole database) into a graph database (Neo4j), where nodes are users and edges are connections, and traversal is the native, optimized operation:

   // Neo4j Cypher: people 2-3 hops away, not already connected, by mutual count
   MATCH (me:User {id: $id})-[:CONNECTED*2..3]-(candidate:User)
   WHERE NOT (me)-[:CONNECTED]-(candidate) AND me <> candidate
   RETURN candidate, count(*) AS mutuals
   ORDER BY mutuals DESC LIMIT 20

A graph database stores relationships as direct pointers between nodes, so traversing "connected to, 2–3 hops" is fast and natural — it follows edges instead of re-joining a table. The "people you may know" query that timed out in SQL returned in milliseconds. They kept PostgreSQL as the system of record for users, profiles, and everything relational, and synced the connection edges into the graph database for the traversal feature — using each tool for what it's best at.

The analysis

Deep graph traversal is a genuine relational weakness. Joins handle moderate relationships and recursive CTEs handle hierarchies, but many-hop traversal of a large, dense graph explodes in a relational engine. This is one of the clearest cases where NoSQL (specifically, a graph database) is the right tool.
The data's essential nature drives the choice (Chapter 37). Connections are a graph, and the feature traverses it deeply — so a database built for graph traversal fits. The earlier chapters' "relational is right for most problems" holds; this is one of the genuine exceptions, recognized on the merits.
Don't move everything — move the part that needs it. They didn't abandon PostgreSQL; they kept it as the system of record and added a graph database only for the traversal feature, syncing the relevant edges. Targeted polyglot persistence: the right tool for the specific workload, not a wholesale switch.
Recognize the limit honestly. A team dogmatically committed to "relational for everything" would have kept fighting timeouts. The skill (this whole part) is judgment: relational by default, and the honesty to reach elsewhere when the workload genuinely demands it.
The cost is real but justified. Adding a graph database means syncing data and operating a second system (the polyglot cost). Here the feature was core and impossible in SQL at the needed depth, so the cost was worth it — a concrete, demonstrated reason, exactly what justifies going beyond PostgreSQL.

Discussion questions

Why do relational joins handle 1–2 hops fine but fail at deep multi-hop traversal?
How does a graph database make deep traversal fast where SQL doesn't?
Why did the team keep PostgreSQL as the system of record instead of moving everything?
Contrast this with Case Study 1 (Redis): both add a NoSQL system — what makes each justified?
⭐ What ongoing cost did adding the graph database introduce, and why was it worth it here but not for, say, the orders data?