39 min read

> "The shortest distance between two points is under construction."

Prerequisites

  • 28

Learning Objectives

  • Model a real situation as a weighted graph and state the single-source and all-pairs shortest-path problems precisely.
  • Implement Dijkstra's algorithm with a priority queue and reconstruct the shortest path, not just its length.
  • Explain why Dijkstra is correct as a greedy algorithm justified by a loop invariant, and identify exactly where non-negativity is used.
  • Apply the relaxation operation, and use Bellman-Ford to handle negative edge weights and detect negative cycles.
  • Choose the right shortest-path algorithm for a given graph, and connect the choice to GPS navigation and internet routing.

Chapter 29: Weighted Graphs and Shortest Paths

"The shortest distance between two points is under construction." — attributed to Noelie Altito

Overview

Open a maps app and ask for directions. In under a second it searches a graph with tens of millions of intersections and returns not a route but the best one — fastest, or shortest, or cheapest in tolls. Click "send" on a message and a packet crosses a graph of routers, at each hop choosing the next link by a cost that the network has already computed. Ask a social network "how are you connected to this person?" and behind the friendly answer is a search for a path of minimum length through a graph of billions of people. All three are the same problem wearing different clothes: find the cheapest path through a graph whose edges carry costs.

In Chapter 28 you learned breadth-first search, which finds the path with the fewest edges. That is exactly right when every edge costs the same — one hop is one hop. But the real world is not so even. A highway segment and a gravel road are both single edges, yet one takes two minutes and the other twenty. A direct flight and a two-leg connection are both "paths," but their prices differ. The moment edges carry unequal weights, "fewest edges" and "cheapest" diverge, and BFS gives the wrong answer. This chapter is about getting the right one.

The reward for this chapter is twofold. First, you will be able to implement and defend the two shortest-path algorithms every programmer is expected to know — Dijkstra's algorithm and Bellman-Ford. Second, you will see a proof technique that recurs through the rest of Part V: showing a greedy algorithm correct by stating and maintaining a loop invariant. "It returned the right answer on my test graph" is not the same as "it returns the shortest path on every graph," and the gap between those two statements is exactly where this chapter lives.

In this chapter you will learn to:

  • Model weighted situations as graphs and state the shortest-path problems precisely.
  • Perform the single operation — relaxation — that underlies every shortest-path algorithm.
  • Implement Dijkstra's algorithm with a priority queue, and recover the actual path.
  • Prove Dijkstra correct, and pinpoint why it needs non-negative weights.
  • Use Bellman-Ford to survive negative edges and to detect negative cycles.
  • Compute all-pairs shortest paths with Floyd-Warshall, and choose the right tool for a given graph.
  • Connect all of this to GPS navigation and internet packet routing.

Learning Paths

🏃 Fast Track: If you have seen Dijkstra before, skim 29.1, study the relaxation operation in 29.1, then go to 29.2 for the priority-queue implementation and 29.4 for Bellman-Ford. Do the ⭐⭐/⭐⭐⭐ exercises. You can treat 29.5 (Floyd-Warshall) as reference.

📖 Standard Path: Read in order. Hand-trace the worked Dijkstra run in 29.2 on paper before reading the code, and work every 🔄 Check Your Understanding. Read the correctness proof in 29.3 slowly — it is the conceptual heart of the chapter.

🔬 Deep Dive: After the chapter, prove that Bellman-Ford is correct (induction on the number of edges in a shortest path) and work out why Dijkstra's greedy choice fails on negative edges by building the smallest counterexample yourself. Then read about the A* heuristic search (further reading) as a goal-directed refinement of Dijkstra.


29.1 Weighted graphs and the shortest-path problem

In Chapter 27 you met graphs as sets of vertices and edges, and you saw "weighted" listed as one of the graph types. In Chapter 28 you searched them with BFS and DFS, treating every edge as equal. Now we make the weights carry real meaning and ask a quantitative question about them.

Consider a small road map between five towns, with the driving time (in minutes) on each one-way road:

        4
   A ------> B
   |        /| \
 1 |    2 /  |  \ 7
   |    /  1 |   \
   v  /      v    v
   C ------> D --> E
        5       3

The edges and their weights:

Edge Weight Edge Weight
A → B 4 C → B 2
A → C 1 C → D 5
B → D 1 D → E 3
B → E 7

How fast can you drive from A to E? The single direct-looking instinct is A → B → E, costing $4 + 7 = 11$. But A → C → B → D → E costs $1 + 2 + 1 + 3 = 7$ — a route with more edges yet less total time. The cheapest path is not the one with the fewest hops. That is the whole reason BFS is not enough.

💡 Intuition: Think of edge weights as the price of using an edge. BFS minimizes the number of edges; shortest-path algorithms minimize the sum of their prices. When all prices are equal, the two coincide and BFS already solves the problem (it is, in effect, the special case "every weight is 1").

The definitions

A weighted graph is a graph $G = (V, E)$ together with a weight function $w \colon E \to \mathbb{R}$ assigning a real number $w(e)$ to each edge $e$. We write $w(u, v)$ for the weight of the edge from $u$ to $v$. Weights may model distance, time, cost, capacity, reliability — anything additive along a path. The graph may be directed or undirected.

🔗 Connection. Chapter 27 introduced "weighted" as one kind of graph and gave you the vertex/edge vocabulary (degree, path, cycle, connected). Here we pin it down formally with the weight function $w$, because every quantity in this chapter is computed from $w$. Everything you learned about paths and cycles still applies — we are only attaching numbers to the edges.

Recall from Chapter 27 that a path from $u$ to $v$ is a sequence of vertices $u = v_0, v_1, \dots, v_k = v$ where each consecutive pair $(v_{i-1}, v_i)$ is an edge. We can now measure a path.

The weight (or length) of a path $p = v_0, v_1, \dots, v_k$ is the sum of its edge weights, $$w(p) = \sum_{i=1}^{k} w(v_{i-1}, v_i).$$ A shortest path from $u$ to $v$ is a path of minimum weight among all paths from $u$ to $v$. Its weight is the shortest-path distance $\delta(u, v)$. If no path exists, $\delta(u, v) = \infty$; by convention $\delta(u, u) = 0$ via the empty path.

Two warnings hide in that definition. First, "shortest" means minimum total weight, never minimum number of edges. Second, a shortest path need not be unique — there can be several paths tying for the minimum. We compute a shortest path and the shortest distance.

There are two flavors of the problem, and they call for different algorithms:

  • Single-source shortest paths (SSSP): given a source $s$, find $\delta(s, v)$ for every vertex $v$. (Dijkstra and Bellman-Ford solve this.)
  • All-pairs shortest paths (APSP): find $\delta(u, v)$ for every ordered pair of vertices. (Floyd-Warshall solves this; §29.5.)

Why solve for all destinations when you only wanted A to E? Because the algorithms naturally produce the whole shortest-path tree for the cost of computing any single one — and a maps server answering millions of queries from the same source can reuse it.

Representing a weighted graph in code

We extend the adjacency-list idea from Chapter 28: instead of listing each vertex's neighbors, we list each neighbor together with the weight of the edge to it.

# adj[u] is a list of (neighbor, weight) pairs for edges u -> neighbor.
adj = {
    "A": [("B", 4), ("C", 1)],
    "B": [("D", 1), ("E", 7)],
    "C": [("B", 2), ("D", 5)],
    "D": [("E", 3)],
    "E": [],
}

def edge_weight(adj, u, v):
    """Return the weight of edge u -> v, or None if it is absent."""
    for nbr, w in adj[u]:
        if nbr == v:
            return w
    return None

print(edge_weight(adj, "A", "C"))   # 1
print(edge_weight(adj, "C", "B"))   # 2
print(edge_weight(adj, "A", "E"))   # None -- no direct edge
# Expected output:
# 1
# 2
# None

This is the same graph drawn above, and we will use it for every worked example in the chapter, so the distances stay comparable as the algorithms change.

Relaxation: the one operation behind every shortest-path algorithm

Every algorithm in this chapter maintains, for each vertex $v$, a tentative distance dist[v]: the weight of the best path to $v$ found so far. It starts at $0$ for the source and $\infty$ for everyone else, and only ever decreases as better paths are discovered. The single move that improves it has a name.

Relaxation of an edge $(u, v)$ with weight $w$ is the test-and-update: $$\text{if } \text{dist}[u] + w < \text{dist}[v]: \quad \text{dist}[v] \leftarrow \text{dist}[u] + w.$$ In words: "if reaching $v$ through $u$ is cheaper than the best route to $v$ we already know, adopt it." When we update, we also record $u$ as $v$'s predecessor, so we can rebuild the path later.

The name is borrowed from physics — like a stretched spring relaxing toward a shorter resting length, the estimate dist[v] relaxes downward toward its true value $\delta(s, v)$. Crucially, relaxation is safe: it never sets dist[v] below the true distance, because dist[u] is always the weight of some real path to $u$, so dist[u] + w is the weight of some real path to $v$ — and the true shortest distance is a lower bound on every real path. The algorithms differ only in the order in which they relax edges, and that order is everything.

INF = float("inf")

def relax(dist, prev, u, v, w):
    """If u -> v improves dist[v], update it. Return True if it changed."""
    if dist[u] + w < dist[v]:
        dist[v] = dist[u] + w
        prev[v] = u
        return True
    return False

dist = {"A": 0, "B": INF, "C": INF}
prev = {"A": None, "B": None, "C": None}
print(relax(dist, prev, "A", "B", 4))   # INF -> 4, True
print(relax(dist, prev, "A", "C", 1))   # INF -> 1, True
print(relax(dist, prev, "C", "B", 2))   # 1+2=3 beats 4, True
print(relax(dist, prev, "A", "B", 4))   # 0+4=4 does not beat 3, False
print(dist["B"], prev["B"])             # 3 C
# Expected output:
# True
# True
# True
# False
# 3 C

Notice the last two lines: once dist[B] is $3$ (via C), re-relaxing the worse edge A → B changes nothing. The estimate moved down and stayed down.

🔄 Check Your Understanding 1. In the road map, what is $\delta(A, D)$, and which path achieves it? 2. Why can relaxation only ever decrease a tentative distance, never increase it? 3. If every edge in a graph had weight $1$, what familiar algorithm would correctly compute all shortest-path distances from a source?

Answers

  1. $\delta(A, D) = 4$, via A → C → B → D ($1 + 2 + 1$). The direct-looking A → C → D costs $1 + 5 = 6$, and A → B → D costs $4 + 1 = 5$; both are beaten. 2. Relaxation only assigns dist[v] = dist[u] + w inside the test dist[u] + w < dist[v], so it writes a strictly smaller value or nothing at all.
  2. Breadth-first search (Chapter 28): with unit weights, the path with the fewest edges is the minimum-weight path, and BFS finds it in $O(V + E)$.

29.2 Dijkstra's algorithm

Here is the question that leads to the algorithm. We process vertices one at a time, and once we commit to a vertex — declare its dist final — we never revisit it. Which vertex is always safe to commit to next?

The answer is the greedy heart of the algorithm: the unvisited vertex with the smallest tentative distance. Why is that one safe? Intuitively, if every edge weight is non-negative, then any path to a still-unvisited vertex must pass through some unvisited vertex, and reaching that one already costs at least the current minimum — so no cheaper route to the closest unvisited vertex can exist. We will turn that intuition into a proof in §29.3. For now, accept the rule and watch it run.

Dijkstra's algorithm (single-source shortest paths, non-negative weights). Maintain a set of finalized vertices and a tentative distance dist[v] for each vertex. 1. Set dist[s] = 0 and dist[v] = $\infty$ for all $v \neq s$. No vertex is finalized. 2. Repeat until every reachable vertex is finalized: pick the unfinalized vertex $u$ with the smallest dist[u], finalize it, and relax every edge out of $u$. 3. When done, dist[v] $= \delta(s, v)$ for every $v$, and the recorded predecessors form a shortest-path tree.

This is a greedy algorithm: at each step it makes the locally cheapest commitment (finalize the closest frontier vertex) and never reconsiders it. Most greedy algorithms are wrong — local optima rarely glue into a global one — which is exactly why §29.3 must prove this one is right.

🔗 Connection. "Greedy" will return in Chapter 32, where Kruskal's and Prim's algorithms build a minimum spanning tree by repeatedly grabbing the cheapest safe edge. The proof shape there (the cut property) is a cousin of the invariant argument we use here. Dijkstra is your first rigorous greedy correctness proof; treat it as the template.

A hand trace

Let us run Dijkstra from A on our road map. We track dist, the finalized set, and which vertex we finalize each round. Initialize dist = {A:0, B:∞, C:∞, D:∞, E:∞}.

Round Finalize (min unfinalized) Edges relaxed dist after
1 A (0) A→B: $\infty\to 4$; A→C: $\infty\to 1$ A:0, B:4, C:1, D:∞, E:∞
2 C (1) C→B: $4\to 3$; C→D: $\infty\to 6$ A:0, B:3, C:1, D:6, E:∞
3 B (3) B→D: $6\to 4$; B→E: $\infty\to 10$ A:0, B:3, C:1, D:4, E:10
4 D (4) D→E: $10\to 7$ A:0, B:3, C:1, D:4, E:7
5 E (7) (no edges out of E) A:0, B:3, C:1, D:4, E:7

Read the table slowly — it contains two lessons. In round 2 we finalize C before B, even though B was discovered first (in round 1, at cost 4): we always finalize the globally closest frontier vertex, not the one we saw first. And finalizing C immediately improves B from 4 to 3 — the route A → C → B beat the direct A → B. By the time B is finalized in round 3, its distance is already correct, and finalizing it then improves D. The greedy choice keeps paying off.

The final shortest distances from A are $\delta = \{A:0, C:1, B:3, D:4, E:7\}$, matching the path we spotted by eye in §29.1.

⚠️ Common Pitfall: Do not finalize a vertex the moment you discover it (when its dist first drops below $\infty$). B was discovered in round 1 at distance 4, but its true distance is 3. A vertex is only safe to finalize when it is the minimum over all unfinalized vertices. Finalizing on discovery is one of the most common ways a hand-rolled Dijkstra goes wrong.

Implementing it efficiently: the priority queue

Step 2 repeatedly asks for "the unfinalized vertex with the smallest dist." Scanning all vertices each round costs $O(V)$ per round and $O(V^2)$ overall — fine for dense graphs, wasteful for sparse ones. The right data structure for "repeatedly extract the minimum" is a priority queue.

A priority queue is an abstract data type storing elements each with a priority (key), and supporting at least: insert(item, key) and extract_min() (remove and return the item of smallest key). Unlike an ordinary FIFO queue (Chapter 28's BFS), which returns items in arrival order, a priority queue always returns the currently cheapest item, regardless of when it arrived.

🔗 Connection. A priority queue is almost always implemented with a binary heap, the tree-shaped structure we study in Chapter 31. A heap supports both insert and extract_min in $O(\log n)$ time. Python's standard library exposes one as the heapq module, which keeps a list in heap order; heapq.heappush and heapq.heappop are exactly insert and extract_min. For now, treat the priority queue as a black box with those two fast operations.

One implementation subtlety. A plain binary heap has no efficient "decrease the key of an element already inside" operation. The clean workaround — the one used in practice — is lazy deletion: when relaxation improves dist[v], just push a new (dist[v], v) entry rather than editing the old one. The heap now holds stale, superseded entries, but that is harmless: when we pop a vertex, we check whether the popped distance still matches dist[v]; if it is larger, this entry is stale and we skip it. Each vertex is genuinely processed only the first (smallest) time it is popped.

import heapq
INF = float("inf")

def dijkstra(adj, source):
    dist = {v: INF for v in adj}
    dist[source] = 0
    pq = [(0, source)]                 # (best-known distance, vertex)
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:                # a stale, superseded entry
            continue
        for v, w in adj[u]:            # relax every edge out of u
            if d + w < dist[v]:
                dist[v] = d + w
                heapq.heappush(pq, (dist[v], v))
    return dist

adj = {"A": [("B", 4), ("C", 1)], "B": [("D", 1), ("E", 7)],
       "C": [("B", 2), ("D", 5)], "D": [("E", 3)], "E": []}
for v in sorted(adj):
    print(v, dijkstra(adj, "A")[v])
# Expected output:
# A 0
# B 3
# C 1
# D 4
# E 7

The tuple (distance, vertex) is the key trick: Python compares tuples lexicographically, so the heap orders by distance first, and heappop returns the smallest-distance entry — exactly the greedy choice the algorithm demands. The if d > dist[u]: continue line is the lazy-deletion guard; remove it and the algorithm still returns correct distances but wastes time reprocessing stale entries.

Complexity

Each edge is relaxed at most once per time its tail is popped-and-processed, and each vertex is processed once, so we perform $O(E)$ relaxations, each possibly pushing one heap entry. The heap holds at most $O(E)$ entries, and each push/pop costs $O(\log E) = O(\log V)$ (since $E \le V^2$, $\log E \le 2 \log V$). The total is

$$O\big((V + E)\log V\big),$$

usually written $O(E \log V)$ for a connected graph. Compare this to the $O(V + E)$ of plain BFS: the extra $\log V$ factor is the price of respecting edge weights.

🔗 Connection. The $O(E \log V)$ bound uses the Big-O machinery from Chapter 14 and the adjacency-list traversal cost $O(V + E)$ from Chapter 28. The $\log V$ comes entirely from the heap operations — swap in a fancier priority queue (a Fibonacci heap) and the bound improves to $O(E + V \log V)$ in theory, though the simple binary heap wins in practice.

🔄 Check Your Understanding 1. In the hand trace, why was B not finalized in round 1 even though its dist became finite (4)? 2. What does the line if d > dist[u]: continue accomplish, and what breaks if you delete it? 3. Run Dijkstra from C on the road map (by hand). What is $\delta(C, E)$?

Answers

  1. After round 1 the minimum unfinalized distance was C's (1), not B's (4); Dijkstra always finalizes the global minimum. Finalizing B then would have locked in 4, but its true distance is 3. 2. It skips stale heap entries — superseded (distance, vertex) pairs left behind by lazy deletion. Without it, correctness is unaffected (a stale pop just re-relaxes already-final edges, all of which fail the < test) but the algorithm wastes work; the guard makes each vertex's outgoing edges process once.
  2. From C: dist[C]=0; relax C→B (2), C→D (5). Finalize B(2): relax B→D ($5\to 3$), B→E ($\to 9$). Finalize D(3): relax D→E ($9\to 6$). Finalize E(6). So $\delta(C, E) = 6$, via C → B → D → E.

Recovering the path, not just the distance

A distance alone rarely satisfies a user — they want the route. We already have what we need: every relaxation recorded a predecessor. To rebuild the shortest path to a target, follow predecessors backward from the target to the source and reverse the list.

import heapq
INF = float("inf")

def dijkstra_paths(adj, source):
    dist = {v: INF for v in adj}
    prev = {v: None for v in adj}
    dist[source] = 0
    pq = [(0, source)]
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue
        for v, w in adj[u]:
            if d + w < dist[v]:
                dist[v], prev[v] = d + w, u      # remember how we reached v
                heapq.heappush(pq, (dist[v], v))
    return dist, prev

def path_to(prev, target):
    route = []
    while target is not None:
        route.append(target)
        target = prev[target]
    return route[::-1]                            # reverse: source -> target

adj = {"A": [("B", 4), ("C", 1)], "B": [("D", 1), ("E", 7)],
       "C": [("B", 2), ("D", 5)], "D": [("E", 3)], "E": []}
dist, prev = dijkstra_paths(adj, "A")
print(path_to(prev, "E"), "cost", dist["E"])
# Expected output:
# ['A', 'C', 'B', 'D', 'E'] cost 7

The predecessor pointers form a tree rooted at the source — the shortest-path tree — and every shortest path from $s$ is a path down this tree. One tree, computed once, answers "what is the best route to anywhere from $s$" for every destination at once.

💡 Intuition (the social-network lens). Picture the vertices as people and an edge's weight as how "costly" a connection is — perhaps inversely related to how often two people interact, so a close friendship is a light edge and a distant acquaintance a heavy one. In Chapter 28, BFS answered "how many handshakes separate two people?" Here, weighted shortest paths answer a richer question: "what is the strongest chain of connections between them?" The path A → C → B → D → E is the introduction sequence with the least total friction — the route a real recommendation would travel.


29.3 Why Dijkstra is correct

We have a greedy algorithm that seems to work. But greed is usually a trap, and "it worked on my example" guarantees nothing. We must prove that Dijkstra returns the true shortest distance to every vertex. This is the conceptual summit of the chapter, and it is worth the climb: the proof technique — state an invariant, maintain it inductively — is how essentially every greedy graph algorithm is justified.

The strategy first. We will prove one loop invariant: every time we finalize a vertex $u$, its tentative distance is already the true shortest distance, i.e. $\text{dist}[u] = \delta(s, u)$. If that holds at every finalization, then when the loop ends every reachable vertex has been finalized with its correct distance, and we are done. To prove the invariant we argue by contradiction: suppose some vertex is finalized with a wrong (too large) distance, look at the first such vertex, and follow a genuine shortest path to it until we hit the boundary between finalized and unfinalized vertices. Non-negativity of the weights forces a number at that boundary to be small enough to contradict our greedy choice. Watch for the one step that uses non-negativity — it is the whole reason the algorithm needs it.

🔗 Connection. The phrase "loop invariant" is from Chapter 6: a statement true before the loop and preserved by every iteration, proved by induction on iterations. There we used one to verify an iterative sum; here the same idea certifies a non-trivial algorithm. The induction is on the number of finalized vertices.

Theorem (Dijkstra correctness). Let $G = (V, E)$ be a directed graph with weight function $w \colon E \to \mathbb{R}_{\ge 0}$ (all weights non-negative), and let $s \in V$. When Dijkstra's algorithm finalizes a vertex $u$, $\text{dist}[u] = \delta(s, u)$.

Proof. We argue by contradiction. Suppose, toward a contradiction, that the invariant fails at some point, and let $u$ be the first vertex finalized for which $\text{dist}[u] \neq \delta(s, u)$.

First, $u \neq s$: the source is finalized first with $\text{dist}[s] = 0 = \delta(s, s)$. So at the moment $u$ is finalized, at least one vertex ($s$) is already finalized, and $u$ is reachable from $s$ (otherwise $\text{dist}[u] = \infty = \delta(s, u)$, contradicting $\text{dist}[u] \neq \delta(s, u)$). Because relaxation never makes dist too small, $\text{dist}[u] \ge \delta(s, u)$ always; combined with $\text{dist}[u] \neq \delta(s, u)$, we have the strict inequality $$\text{dist}[u] > \delta(s, u). \tag{$\ast$}$$

Now consider a genuine shortest path $p$ from $s$ to $u$, of weight $\delta(s, u)$. At the instant $u$ is finalized, $s$ is finalized but $u$ is not, so as we walk along $p$ from $s$ toward $u$ there must be a first vertex $y$ that is not finalized. Let $x$ be the vertex just before $y$ on $p$ (possibly $x = s$); then $x$ is finalized, and $(x, y)$ is an edge of $p$.

Two facts about $y$. First, since $u$ is the first vertex finalized with a wrong distance and $x$ was finalized before $u$, the invariant held for $x$: $\text{dist}[x] = \delta(s, x)$. When $x$ was finalized, the algorithm relaxed all its outgoing edges, including $(x, y)$, so afterward $$\text{dist}[y] \le \delta(s, x) + w(x, y).$$ Second, $y$ lies on a shortest path to $u$, so the prefix of $p$ ending at $y$ is itself a shortest path to $y$; hence $\delta(s, x) + w(x, y) = \delta(s, y)$. Combining, $$\text{dist}[y] \le \delta(s, y). \tag{1}$$ And again because relaxation is safe, $\text{dist}[y] \ge \delta(s, y)$, so in fact $\text{dist}[y] = \delta(s, y)$.

Here is the decisive step — the only place non-negativity is used. The vertices on $p$ from $y$ onward to $u$ contribute non-negative weight, so the part of the path after $y$ does not reduce the total: $$\delta(s, y) \le \delta(s, u). \tag{2}$$ (If a later edge could be negative, traveling further might lower the cost, and this inequality would fail — which is precisely how negative edges break Dijkstra; see §29.4.)

Finally, at this moment the algorithm is finalizing $u$, meaning $u$ is the unfinalized vertex of minimum tentative distance. Since $y$ is also unfinalized, $u$ was chosen over $y$, so $$\text{dist}[u] \le \text{dist}[y]. \tag{3}$$

Now chain (3), (1), and (2): $$\text{dist}[u] \;\overset{(3)}{\le}\; \text{dist}[y] \;\overset{(1)}{=}\; \delta(s, y) \;\overset{(2)}{\le}\; \delta(s, u).$$ This says $\text{dist}[u] \le \delta(s, u)$, flatly contradicting $(\ast)$, which said $\text{dist}[u] > \delta(s, u)$. The contradiction means no such first wrong vertex $u$ can exist, so the invariant holds at every finalization. $\blacksquare$

Take a breath and notice what just happened. The proof never traced the algorithm step by step; it assumed a failure and showed the failure cannot occur, leaning on (a) the safety of relaxation, (b) the greedy extract-min choice, and (c) — exactly once — the non-negativity of weights. Pull out any one of those three and the argument collapses.

🚪 Threshold Concept: greedy can be provably optimal. Most of the time, grabbing the locally best option does not yield the globally best result — greedy algorithms are notorious for being seductive and wrong. Dijkstra is the moment you see that a greedy strategy can be certified optimal by a clean invariant, and that the certificate names exactly the conditions it depends on (here, non-negative weights). Once you internalize "find the invariant the greedy choice preserves," you have the master key to Chapter 32's spanning-tree algorithms and to greedy correctness proofs throughout algorithm design. The lesson is not "greedy works"; it is "when greedy works, an invariant proves it, and the proof tells you when it stops working."

🔄 Check Your Understanding 1. Which single assumption in the theorem fails if a graph has a negative edge weight, and at which numbered step of the proof is it used? 2. The proof picks "the first vertex finalized with a wrong distance." Why is choosing the first one essential to the argument? 3. Where in the proof do we use the fact that relaxation never sets dist below the true distance?

Answers

  1. Non-negativity ($w \ge 0$); it is used at inequality (2), $\delta(s, y) \le \delta(s, u)$, which says extending a shortest path cannot lower its cost. With a negative edge later on, a longer path can be cheaper, and (2) can fail. 2. Choosing the first wrong vertex guarantees every vertex finalized earlier — in particular $x$ — has a correct distance, which is what lets us assert $\text{dist}[x] = \delta(s, x)$ and that relaxing $(x, y)$ gave $y$ a good estimate. 3. Twice: to get $\text{dist}[u] \ge \delta(s, u)$ (yielding the strict inequality $(\ast)$) and $\text{dist}[y] \ge \delta(s, y)$ (forcing $\text{dist}[y] = \delta(s, y)$ together with (1)).

29.4 Bellman-Ford and negative edges

Dijkstra's correctness proof leaned on non-negative weights at exactly one step — and that is not a technicality. There are real problems with negative "weights": a sequence of currency trades where some conversions gain value, a network of financial transactions, a game where some moves earn points. The moment an edge can be negative, Dijkstra can return a wrong answer.

Why Dijkstra fails on negative edges

Build the smallest counterexample. Three vertices $s, a, b$ with edges $$s \to a:\ 1, \qquad s \to b:\ 2, \qquad b \to a:\ -2.$$

The true shortest path to $a$ is $s \to b \to a$, costing $2 + (-2) = 0$, which beats the direct $s \to a$ at cost $1$. So $\delta(s, a) = 0$. Now run Dijkstra. It finalizes $s$ (0), relaxes to get $\text{dist}[a] = 1$ and $\text{dist}[b] = 2$, then finalizes the closest unfinalized vertex — $a$, at distance $1$ — and locks it in. But $\delta(s, a) = 0$, not $1$. Dijkstra committed to $a$ before it ever explored the negative edge $b \to a$ that would have improved it. The greedy choice, so reliable a moment ago, is now simply wrong.

⚠️ Common Pitfall: "Just add a big constant to every weight to make them all non-negative, then run Dijkstra." This does not work. Adding a constant $c$ to every edge penalizes paths in proportion to how many edges they use, so it favors paths with fewer hops and can change which path is shortest. A 5-edge path gains $5c$ while a 2-edge path gains only $2c$; the cheaper original path may lose. Re-weighting to remove negatives correctly is possible (Johnson's algorithm does it with Bellman-Ford as a subroutine) but it is far subtler than "add a constant."

The Bellman-Ford algorithm

If we cannot trust the greedy choice, we fall back on brute persistence: relax every edge, over and over, until nothing improves. This is Bellman-Ford.

Bellman-Ford algorithm (single-source shortest paths, weights may be negative). Initialize $\text{dist}[s] = 0$ and $\text{dist}[v] = \infty$ otherwise. Then relax every edge in the graph, and repeat that whole pass $|V| - 1$ times. After the passes, run one more pass: if any edge can still be relaxed, the graph has a negative cycle reachable from $s$, and shortest paths are undefined; otherwise dist holds the true shortest distances.

Why exactly $|V| - 1$ passes? Because a shortest path in a graph with no negative cycle is simple — it never repeats a vertex (repeating one could only add a non-negative or removable detour) — so it has at most $|V| - 1$ edges. The engine is a clean inductive fact:

Path-relaxation property. After $k$ passes of relaxing all edges, $\text{dist}[v] = \delta(s, v)$ for every vertex $v$ whose shortest path from $s$ uses at most $k$ edges.

The strategy first (sketch of why this holds). Induct on $k$. After $0$ passes, only $s$ has its final distance ($0$, a $0$-edge path) — the base case. For the step, take any $v$ whose shortest path uses $k$ edges; its predecessor $u$ on that path has a shortest path of $\le k-1$ edges, so by the inductive hypothesis $\text{dist}[u] = \delta(s, u)$ after $k-1$ passes. Pass $k$ relaxes the final edge $(u, v)$, setting $\text{dist}[v] \le \delta(s, u) + w(u, v) = \delta(s, v)$. Since relaxation is safe, equality holds. Because every simple path has at most $|V| - 1$ edges, $|V| - 1$ passes suffice for all vertices. $\blacksquare$

INF = float("inf")

def bellman_ford(vertices, edges, source):
    dist = {v: INF for v in vertices}
    dist[source] = 0
    for _ in range(len(vertices) - 1):          # |V| - 1 relaxation passes
        for u, v, w in edges:
            if dist[u] != INF and dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
    for u, v, w in edges:                        # one extra pass = detector
        if dist[u] != INF and dist[u] + w < dist[v]:
            return None                          # a negative cycle is reachable
    return dist

V1 = ["S", "A", "B", "C", "D"]
E1 = [("S","A",4), ("S","B",3), ("A","C",-2), ("C","D",2), ("B","D",6)]
print([(v, bellman_ford(V1, E1, "S")[v]) for v in sorted(["A","B","C","D","S"])])

V2 = ["X", "Y"]
E2 = [("X","Y",1), ("Y","X",-2)]                # cycle weight 1 + (-2) = -1 < 0
print(bellman_ford(V2, E2, "X"))
# Expected output:
# [('A', 4), ('B', 3), ('C', 2), ('D', 4), ('S', 0)]
# None

In the first graph, the negative edge A → C ($-2$) makes C reachable at cost $2$ ($4 - 2$) and D at cost $4$ (via C, beating the direct B → D at $9$). Bellman-Ford handles it without complaint. In the second graph, the cycle X → Y → X has weight $1 + (-2) = -1 < 0$: each lap lowers the cost, so there is no shortest path, and the detector pass returns None.

Negative cycles: where "shortest" stops meaning anything

A negative cycle is a cycle whose edge weights sum to a negative number. If one is reachable from $s$ and can reach $v$, then $\delta(s, v) = -\infty$: go around the cycle once more and your path gets cheaper, forever. "Shortest path" is no longer well defined. This is not a bug to be papered over — it is information. In a currency-exchange graph, a reachable negative cycle is an arbitrage opportunity: a sequence of trades that returns you to your starting currency with more money than you began with. Bellman-Ford's detector pass is how you find it.

💡 Intuition: Bellman-Ford trades Dijkstra's cleverness for stubbornness. Dijkstra is a careful planner that commits once and moves on — fast, but only valid when the world has no "too good to be true" shortcuts (negative edges). Bellman-Ford assumes nothing and re-checks every edge $|V|-1$ times. It is slower, but it survives negative edges and, uniquely, detects the negative cycles that make the question meaningless.

Complexity and the trade-off

Each pass relaxes all $|E|$ edges, and there are $|V| - 1$ passes plus one detector pass, so Bellman-Ford runs in $O(VE)$ time. That is markedly slower than Dijkstra's $O(E \log V)$ — on a graph with a million vertices and edges, the difference is enormous. The trade-off is the recurring engineering theme of this chapter:

Dijkstra Bellman-Ford
Negative edges not allowed allowed
Detects negative cycles no yes
Time $O(E \log V)$ $O(VE)$
Strategy greedy (extract-min) repeated full relaxation

Use Dijkstra when you can (non-negative weights — the common case for distances, times, and costs); fall back to Bellman-Ford only when negatives are genuinely possible.

🔄 Check Your Understanding 1. On the three-vertex graph $s\to a:1$, $s\to b:2$, $b\to a:-2$, what distance does Dijkstra report for $a$, and what is the true $\delta(s, a)$? 2. Why does Bellman-Ford perform exactly $|V| - 1$ relaxation passes (and not, say, $|V|$ or $|E|$)? 3. After the $|V|-1$ passes, an edge $(u, v)$ can still be relaxed. What does that tell you, and what should the algorithm return?

Answers

  1. Dijkstra reports $1$ (it finalizes $a$ before exploring the negative edge $b \to a$); the true distance is $\delta(s, a) = 0$ via $s \to b \to a$. 2. A shortest path in a graph with no negative cycle is simple and thus has at most $|V| - 1$ edges; by the path-relaxation property, $|V| - 1$ passes correctly fix every vertex's distance. More passes would be wasted work; the count is tied to the longest possible simple path. 3. It means a negative cycle is reachable from the source, so some distances are $-\infty$ and shortest paths are undefined; the algorithm should report the negative cycle (here, return None).

29.5 All-pairs shortest paths: Floyd-Warshall

Sometimes you need the distance between every pair of vertices — a full distance table for a small road network, the diameter of a social graph, a pre-computed routing matrix. You could run Dijkstra once from each of the $|V|$ vertices ($O(VE \log V)$ overall), and for sparse graphs that is the right move. But there is a strikingly short alternative that also tolerates negative edges: Floyd-Warshall.

The idea: intermediate vertices, one at a time

Number the vertices $0, 1, \dots, n-1$. Floyd-Warshall builds up the answer by gradually enlarging the set of vertices a path is allowed to pass through. Let $D_k[i][j]$ be the weight of the shortest path from $i$ to $j$ that uses only vertices $\{0, 1, \dots, k-1\}$ as intermediate stops. The recurrence is the entire algorithm:

$$D_{k+1}[i][j] = \min\big(\,D_k[i][j],\ \ D_k[i][k] + D_k[k][j]\,\big).$$

In words: when we newly allow vertex $k$ as a waypoint, the shortest $i \to j$ path either ignores $k$ (the old value $D_k[i][j]$) or routes through it exactly once (best way to $k$, then best way from $k$), and we keep the smaller. We start with $D_0$ = the raw edge weights (no intermediate vertices allowed: $D_0[i][j] = w(i, j)$, with $0$ on the diagonal and $\infty$ where there is no edge) and finish, after allowing all $n$ vertices, with the true all-pairs distances.

💡 Intuition: Adding a permitted waypoint can only help or leave things unchanged — it can never make a path longer, because you are never forced to use the new vertex. So each round the distance table only improves, monotonically marching toward the truth.

INF = float("inf")

def floyd_warshall(weight):
    n = len(weight)
    D = [row[:] for row in weight]                 # copy the weight matrix
    for k in range(n):                             # allow vertex k as a waypoint
        for i in range(n):
            for j in range(n):
                if D[i][k] + D[k][j] < D[i][j]:    # route i -> k -> j shorter?
                    D[i][j] = D[i][k] + D[k][j]
    return D

# Vertices 0,1,2. Edges: 0->1 (4), 0->2 (5), 1->2 (-2). No negative cycle.
W = [[0, 4, 5], [INF, 0, -2], [INF, INF, 0]]
for row in floyd_warshall(W):
    print([("INF" if x == INF else x) for x in row])
# Expected output:
# [0, 4, 2]
# ['INF', 0, -2]
# ['INF', 'INF', 0]

The single improvement is $D[0][2]$: dropping from $5$ (direct $0 \to 2$) to $2$, by routing through vertex $1$ ($0 \to 1 \to 2 = 4 + (-2)$). Three nested loops, three lines of logic, and a complete distance table — Floyd-Warshall's brevity is famous.

Complexity and when to use it

The three nested loops give $\Theta(V^3)$ time and $\Theta(V^2)$ space, independent of the number of edges. That makes it ideal for dense graphs and small-to-medium $|V|$ (a few thousand vertices), where its cache-friendly triple loop often beats running Dijkstra $|V|$ times. It naturally handles negative edges, and if any diagonal entry $D[i][i]$ ends up negative, a negative cycle passes through $i$. For large sparse graphs, prefer $|V| \times$ Dijkstra.

⚠️ Common Pitfall: The loop order is not negotiable — k (the allowed waypoint) must be the outermost loop. If you put i or j outside k, you may use a $D[i][k]$ value that hasn't yet accounted for the right intermediate vertices, and the answers come out wrong. The mantra: waypoint first, then source, then destination.

🔄 Check Your Understanding 1. What does $D_k[i][j]$ represent, in words, partway through Floyd-Warshall? 2. For a graph with $10{,}000$ vertices and only $30{,}000$ edges, which is likely faster for all-pairs distances: Floyd-Warshall, or Dijkstra from every vertex? Why? 3. After the algorithm finishes, you find $D[2][2] = -3$. What does that mean?

Answers

  1. The weight of the shortest path from $i$ to $j$ that is allowed to pass through only the vertices $\{0, \dots, k-1\}$ as intermediate stops. 2. Dijkstra-from-every-vertex: the graph is sparse ($E \approx 3V$), so that approach costs about $V \cdot E \log V \approx 10^4 \cdot 3{\times}10^4 \cdot \log(10^4)$, far less than Floyd-Warshall's $V^3 = 10^{12}$. 3. A negative-weight cycle passes through vertex $2$ (a vertex's shortest distance to itself dropped below $0$), so some shortest paths are undefined.

29.6 Routing in the real world

The algorithms in this chapter are not classroom toys; they are the working core of two systems you use constantly. Seeing how each is adapted in practice is the best test of whether you truly understand them.

GPS navigation

When a maps app routes you across a continent, the graph is the road network: vertices are intersections, edges are road segments, and weights are estimated travel times (often updated live from traffic data). Travel times are non-negative, so Dijkstra applies — but a literal Dijkstra would explore outward in all directions like an expanding circle, settling millions of irrelevant intersections before reaching the destination. Production routers add two refinements:

  • Goal direction (A*). Instead of always expanding the closest frontier vertex, expand the one that minimizes dist[v] plus an optimistic estimate of the remaining distance to the goal (for maps, the straight-line distance, which never overestimates road distance). This heuristic steers the search toward the destination and can cut the explored region dramatically. A* is exactly Dijkstra when the estimate is always $0$ — Dijkstra is the special, goal-blind case.
  • Preprocessing. Real systems pre-compute structures (contraction hierarchies, landmark distances) so that continental queries finish in microseconds. The shortest-path problem is the same; the engineering is in answering it billions of times per day.

🔗 Connection. A* generalizes Dijkstra by adding a heuristic estimate to the priority key. Because the heuristic never overestimates ("admissible"), the same invariant-style argument from §29.3 still certifies that A* returns a true shortest path. The further-reading section points you to the original A* paper.

Internet packet routing

The internet is a graph of routers; a "shortest path" decides each packet's next hop. Two classic protocol families map directly onto this chapter's two algorithms:

  • Link-state routing (OSPF) has each router learn the entire network map and then run Dijkstra locally to compute shortest paths to every destination. Each router sees the whole graph and solves SSSP from itself.
  • Distance-vector routing (RIP) is Bellman-Ford in disguise, run in a distributed way: each router knows only the distances reported by its direct neighbors and relaxes accordingly, iterating until distances stabilize — exactly Bellman-Ford's "relax every edge, repeatedly." Its vulnerability to the "count-to-infinity" problem when a link fails is, at heart, the difficulty of detecting a bad cycle in a distributed Bellman-Ford.

💡 Intuition: The Dijkstra-vs-Bellman-Ford choice you make as a programmer mirrors a real architectural split in how the internet computes routes. Link-state protocols pay to know the whole map and then think greedily (Dijkstra); distance-vector protocols know only their neighbors and think by stubborn iteration (Bellman-Ford). The same trade-off — global knowledge and speed versus local knowledge and robustness — recurs whenever a system must compute paths.

🔄 Check Your Understanding 1. GPS edge weights are travel times. Which algorithm is appropriate, and why is the choice safe? 2. In what precise sense is the A* search algorithm a generalization of Dijkstra? 3. Which routing-protocol family corresponds to Bellman-Ford, and what feature of Bellman-Ford causes its real-world weakness?

Answers

  1. Dijkstra — travel times are non-negative, the exact condition its correctness proof (§29.3) requires. 2. A* uses priority key dist[v] + h(v), where $h$ is an optimistic (never-overestimating) estimate of the remaining distance to the goal; with $h \equiv 0$ it is exactly Dijkstra, so Dijkstra is the goal-blind special case. 3. Distance-vector routing (e.g., RIP) is distributed Bellman-Ford; its slow convergence / count-to-infinity problem reflects Bellman-Ford's iterative relaxation needing many rounds (and the difficulty of detecting bad cycles) when a link goes down.

Project Checkpoint: adding dijkstra to the Toolkit

Time to grow dmtoolkit/graphs.py — the module you began in Chapter 27 with the Graph class and extended in Chapter 28 with bfs and dfs. This chapter adds the single most-asked-about graph function in technical interviews: dijkstra(g, s). Keeping the signature stable (see the canonical table in the style guide) lets later chapters — Chapter 32's mst_kruskal and Chapter 34's max_flow — compose with it.

Add this to graphs.py (the Graph class from Chapter 27 provides vertices() and neighbors(u), where neighbors(u) yields (v, weight) pairs):

import heapq

def dijkstra(g, s):
    """Single-source shortest paths from s over NON-NEGATIVE weights.
    Returns (dist, prev): dist[v] is the shortest distance s->v (inf if
    unreachable); prev[v] is v's predecessor on a shortest path. Raises
    ValueError on a negative edge weight."""
    INF = float("inf")
    dist = {v: INF for v in g.vertices()}
    prev = {v: None for v in g.vertices()}
    dist[s] = 0
    pq = [(0, s)]
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue
        for v, w in g.neighbors(u):
            if w < 0:
                raise ValueError("Dijkstra needs non-negative weights; use Bellman-Ford.")
            if d + w < dist[v]:
                dist[v], prev[v] = d + w, u
                heapq.heappush(pq, (dist[v], v))
    return dist, prev

Running it on this chapter's road map (built as a directed Graph and started from A) and reconstructing the path to E:

# ... build directed Graph g with the 7 road edges, then:
dist, prev = dijkstra(g, "A")
print({v: dist[v] for v in sorted(dist)})
# Expected output:
# {'A': 0, 'B': 3, 'C': 1, 'D': 4, 'E': 7}

Two deliberate design choices. First, returning both dist and prev means callers can recover actual routes (with the shortest_path helper in code/project-checkpoint.py), not just lengths — because users want directions, not numbers. Second, the explicit ValueError on a negative weight turns a silent wrong answer into a loud, correct refusal: the function will not pretend to solve a problem its proof does not cover. That is theme #2 of this book in one line of code — a correctness guarantee is worth more than a plausible-looking output. The full source, plus bellman_ford for the negative-edge case, is in this chapter's code/ folder and will be assembled into the complete package in Appendix I.

Toolkit so far: logic.py (Ch. 1–3), sets.py (Ch. 8), relations.py (Ch. 12–13), combinatorics.py (Ch. 15–17), recurrences.py (Ch. 18–19), probability.py (Ch. 20–21), numbertheory.py and crypto.py (Ch. 22–25), coding.py (Ch. 26), and a growing graphs.py (Graph, bfs, dfs, and now dijkstra). Chapter 32 adds mst_kruskal; Chapter 34 adds max_flow.


Summary

Shortest-path algorithms find the minimum-weight path through a weighted graph — the computational core of GPS, network routing, and "how are we connected?" queries.

Core definitions

Term Definition
Weighted graph $G = (V, E)$ with a weight function $w \colon E \to \mathbb{R}$ on the edges
Path weight $w(p) = \sum_{i=1}^{k} w(v_{i-1}, v_i)$, the sum of a path's edge weights
Shortest path / distance A minimum-weight path; its weight is $\delta(u, v)$ ($\infty$ if none)
Relaxation if dist[u]+w < dist[v]: dist[v] = dist[u]+w — the universal update step
Priority queue ADT returning the minimum-key element; the engine of Dijkstra (heap, Ch. 31)
Negative cycle A cycle of negative total weight; makes shortest paths undefined ($-\infty$)

The three algorithms

Algorithm Solves Negative edges? Time Strategy
Dijkstra single-source no $O(E \log V)$ greedy: finalize closest unfinalized vertex, relax its edges
Bellman-Ford single-source yes (detects neg. cycles) $O(VE)$ relax all edges $|V|-1$ times, then one detector pass
Floyd-Warshall all-pairs yes (no neg. cycles) $\Theta(V^3)$ DP over allowed intermediate vertices, waypoint loop outermost

Key results to carry forward

  • Relaxation is safe: it never sets dist[v] below the true distance $\delta(s, v)$, so any final value is achieved by a real path. Every algorithm here is "relax edges in some order."
  • Dijkstra correctness: the loop invariant is "a vertex's distance is final and correct the moment it is finalized." Proved by contradiction; the only step needing non-negativity is $\delta(s, y) \le \delta(s, u)$ (extending a path can't lower its cost).
  • Why Dijkstra needs $w \ge 0$: a negative edge can make a longer path cheaper, so committing to the closest frontier vertex is no longer safe.
  • Bellman-Ford's $|V|-1$ passes: a shortest path with no negative cycle is simple ($\le |V|-1$ edges); after $k$ passes, all distances reachable within $k$ edges are correct (path-relaxation property). A relaxable edge after $|V|-1$ passes signals a negative cycle.
  • Choosing an algorithm: non-negative weights, one source → Dijkstra. Negative edges possible → Bellman-Ford. Every pair, dense graph → Floyd-Warshall; every pair, sparse graph → Dijkstra from each vertex.
  • Path vs. distance: record predecessors during relaxation; follow them backward and reverse to rebuild the route. The predecessors form the shortest-path tree.

Toolkit: graphs.py gains dijkstra(g, s) -> (dist, prev) (plus bellman_ford in code/).


Spaced Review

Retrieval strengthens memory. Answer from memory before checking.

  1. (Ch. 28) BFS finds the path with the fewest edges. On a weighted graph, give a one-sentence reason BFS can return a path that is not shortest, and name the chapter's algorithm that fixes it.
  2. (Ch. 28) Dijkstra uses a priority queue while BFS uses an ordinary FIFO queue. What does each queue return next, and why does that difference make Dijkstra respect edge weights?
  3. (Ch. 27) State the handshaking lemma for an undirected graph. In a weighted graph, does attaching weights to the edges change the degree of any vertex?
  4. (Ch. 27) Our road map is a directed weighted graph. Give one real situation where a shortest-path graph must be directed (the cost from $u$ to $v$ differs from $v$ to $u$).

Answers

  1. BFS counts edges, not weights, so a path with more edges but smaller total weight (like A → C → B → D → E at cost 7 vs. A → B → E at cost 11) is invisible to it; Dijkstra (or Bellman-Ford) minimizes total weight instead. 2. A FIFO queue returns the earliest-inserted item (so BFS explores in waves of equal hop count); a priority queue returns the smallest-key item, so Dijkstra always expands the globally closest unfinalized vertex by total weight — which is what lets it account for unequal edge costs. 3. The handshaking lemma: $\sum_{v \in V} \deg(v) = 2|E|$ (the sum of degrees is twice the number of edges). Weights do not change degree — degree counts incident edges regardless of their weights. 4. Many valid answers: one-way streets or differing uphill/downhill travel times in road routing; asymmetric network link latencies or bandwidth; airfares that differ by direction; a "follows" relationship on a social network (A follows B without B following A).

What's Next

You can now find the cheapest path between two vertices. The next natural question flips the goal: not "what is the cheapest route from here to there?" but "what is the cheapest set of edges that connects everything?" That is the minimum spanning tree problem — wiring up a network, a power grid, or a chip at minimum total cost. Its algorithms (Kruskal's and Prim's) are greedy, just like Dijkstra, and their correctness rests on the same kind of invariant argument you mastered in §29.3 — sharpened into the cut property. Prim's algorithm even reuses your priority queue almost verbatim. Before that, Chapter 31 formalizes the trees those algorithms build, and gives you the heap that has been powering your priority queue all along. The greedy thinking you proved correct here is about to pay off again.