Case Study: Why the Map App Picked the Slower-Looking Road

"The shortest distance between two points is under construction." — attributed to Noelie Altito

Executive Summary

A delivery-routing team has a bug report that is not a bug: drivers complain that the app keeps choosing a route with more turns and more road segments than an obvious-looking shortcut, and they want to know whether the routing engine is broken. In this analysis-heavy case study you will not build a new system; you will diagnose an existing one. You will model the road network as a weighted directed graph, run three algorithms by hand and in code — breadth-first search (fewest road segments), Dijkstra's algorithm (least travel time), and A* (least travel time, goal-directed) — and explain exactly why they disagree, which one is correct for the stated objective, and what the disagreement reveals about the difference between "fewest hops" and "cheapest." The payoff is the single most important judgment in shortest-path engineering: the answer you get depends on what you put in the weights.

Skills applied

  • Modeling a real situation (a road map with travel times) as a weighted directed graph (§29.1).
  • Running BFS, Dijkstra, and A* on the same graph and reading the difference (§29.2, §29.6).
  • Tracing Dijkstra's finalization order and the relaxations at each step (§29.2).
  • Recovering the actual route from predecessor pointers, not just its cost (§29.2).
  • Connecting the BFS-vs-Dijkstra gap and the Dijkstra-vs-A* refinement to a production routing engine (§29.6).

Background

The scenario

"Northgate Couriers" runs a small fleet inside a city district. Their routing engine takes a road network and a driver's current intersection and returns directions to the next drop-off. The engine exposes a "mode" flag that the previous developer left undocumented: mode="segments" and mode="time". A new dispatcher set every route to mode="segments" ("fewest roads must be fastest, right?") and the complaints started: drivers were sent down a chain of short residential streets when a single arterial road would have been quicker, and sent down a single long road when a two-segment detour would have been quicker. The two complaints look contradictory. Your job is to explain both.

The district is small enough to model exactly. Intersections are vertices; one-way and two-way road segments are directed edges; each edge's weight is the estimated travel time in minutes at the current time of day. Here is the network the engine is working with, drawn with travel times on each directed edge:

            5                 6
      S --------> A --------> T
      |  \        ^           ^
   2  |    \ 9    | 1         | 4
      v      \    |           |
      C ------> B ------------+
      |   3        \    2
   8  |             \ 7
      v              v
      D ------------> E
            1

The edges and their travel-time weights (in minutes):

Edge Time Edge Time
S → A 5 C → B 3
S → C 2 B → A 1
S → B 9 B → T 2
C → D 8 B → E 7
D → E 1 A → T 6
E → T 4

The source is the depot S; the destination is the drop-off T.

💡 Intuition: Before computing anything, eyeball it. The "obvious shortcut" a driver sees is S → A → T: only two segments. But two segments is a count, not a duration. Whether it is actually fastest depends entirely on the minutes attached to those segments — which is the whole point of this chapter, and the whole point of the bug report.

Why this matters

Every production navigation system makes exactly this modeling choice, and getting it wrong is a class of bug that no unit test catches, because the code runs perfectly — it just optimizes the wrong quantity. "Fewest road segments" and "least travel time" are different objective functions over the same graph, and they generally have different optima. Diagnosing Northgate's complaints is a rehearsal for the real engineering question: what do your edge weights mean, and is that what your users actually want minimized?

Phase 1: Model the network and confirm the data

First, encode the graph the way the chapter does — an adjacency list of (neighbor, time) pairs — and write a helper to total the time along a named route, so we can check any path by hand.

INF = float("inf")

adj = {
    "S": [("A", 5), ("C", 2), ("B", 9)],
    "A": [("T", 6)],
    "B": [("A", 1), ("T", 2), ("E", 7)],
    "C": [("B", 3), ("D", 8)],
    "D": [("E", 1)],
    "E": [("T", 4)],
    "T": [],
}

def route_time(adj, route):
    """Total travel time along a vertex list, or None if a hop is not an edge."""
    total = 0
    for u, v in zip(route, route[1:]):
        step = next((w for nbr, w in adj[u] if nbr == v), None)
        if step is None:
            return None
        total += step
    return total

print(route_time(adj, ["S", "A", "T"]))            # the "obvious" 2-segment route
print(route_time(adj, ["S", "C", "B", "T"]))       # a 3-segment route
print(route_time(adj, ["S", "C", "B", "A", "T"]))  # a 4-segment route
# Expected output:
# 11
# 7
# 13

The hand-derived numbers settle the first complaint immediately. The "obvious" two-segment route S → A → T costs $5 + 6 = 11$ minutes. The three-segment route S → C → B → T costs $2 + 3 + 2 = 7$ minutes — fewer minutes despite more segments. And the four-segment S → C → B → A → T costs $2 + 3 + 1 + 6 = 13$, worse than both. So "more segments" is sometimes faster (7 beats 11) and sometimes slower (13 loses) — exactly the pair of complaints, and exactly what we expect once weights are unequal.

🔄 Check Your Understanding Without running anything, what does route_time(adj, ["S", "T"]) return, and why?

Answer

It returns None: there is no direct edge S → T in the adjacency list (the only ways out of S are to A, C, and B), so the helper hits a missing hop and returns None.

Phase 2: Run BFS — what mode="segments" actually computes

mode="segments" minimizes the number of edges, which is precisely breadth-first search from Chapter 28 (every edge counts as one hop). BFS explores in waves of equal hop-count. Let us trace the wave structure from S, recording the hop-distance (number of segments) at which each vertex is first reached:

Wave (hops) Vertices first reached Reached from
0 S
1 A, C, B S (S→A, S→C, S→B)
2 T, D, E A→T, C→D, B→E (B→A, C→B already seen)

So BFS reaches T in 2 hops, via S → A → T (A entered the frontier in wave 1, and A → T finishes T in wave 2). BFS would also accept S → B → T as a 2-hop route, depending on neighbor order — both have two segments. The fewest-segments answer is "2 segments," and the first such route BFS settles is typically S → A → T.

from collections import deque

def bfs_hops(adj, source, target):
    """Fewest-edges distance and one such path, ignoring weights (Chapter 28)."""
    prev, seen = {source: None}, {source}
    q = deque([source])
    while q:
        u = q.popleft()
        for v, _w in adj[u]:           # weights ignored: every edge is one hop
            if v not in seen:
                seen.add(v); prev[v] = u; q.append(v)
    path, cur = [], target
    while cur is not None:
        path.append(cur); cur = prev.get(cur)
    return (len(path) - 1, path[::-1]) if target in prev else (INF, [])

print(bfs_hops(adj, "S", "T"))
# Expected output:
# (2, ['S', 'A', 'T'])

There is the bug, in one line of output. BFS proudly returns ['S', 'A', 'T'] — the two-segment route — and that route takes 11 minutes, when a 7-minute route exists. mode="segments" is not broken; it is faithfully minimizing the wrong thing. The dispatcher asked for fewest roads and got exactly that.

⚠️ Common Pitfall: "Fewer segments must be faster" is the modeling error at the heart of the report. It is true only when all segments take the same time — the special case where BFS already solves the shortest-path problem (§29.1). The moment travel times differ, you must minimize time, not hops.

Phase 3: Run Dijkstra — what mode="time" should compute

Now minimize travel time with Dijkstra (§29.2). Travel times are non-negative, so Dijkstra applies and its correctness proof (§29.3) holds. We trace it by hand from S, finalizing the closest unfinalized vertex each round and relaxing its outgoing edges. Initialize dist = {S:0, rest:∞}.

Round Finalize (min unfinalized) Edges relaxed dist after
1 S (0) S→A: $\infty\to 5$; S→C: $\infty\to 2$; S→B: $\infty\to 9$ S:0, A:5, C:2, B:9, D:∞, E:∞, T:∞
2 C (2) C→B: $9\to 5$; C→D: $\infty\to 10$ S:0, A:5, C:2, B:5, D:10, E:∞, T:∞
3 A (5) A→T: $\infty\to 11$ S:0, A:5, C:2, B:5, D:10, E:∞, T:11
4 B (5) B→A: $5\not<5$ no; B→T: $11\to 7$; B→E: $\infty\to 12$ S:0, A:5, C:2, B:5, D:10, E:12, T:7
5 T (7) (no edges out of T) S:0, A:5, C:2, B:5, D:10, E:12, T:7

Read rounds 2 and 4 carefully — they contain the whole lesson. In round 2, finalizing C immediately improves B from $9$ down to $5$ (the route S → C → B at $2 + 3$ beats the direct S → B at $9$). In round 4, finalizing B improves T from $11$ down to $7$ (the route S → C → B → T beats S → A → T). Dijkstra finalizes T at 7 minutes, and the algorithm stops as soon as T is the minimum — it never needs to settle D or E, whose distances ($10$ and $12$) are larger. The remaining rounds for D and E are shown for completeness but are irrelevant to the S→T query.

⚠️ Common Pitfall: Notice round 3 finalizes A at distance 5 before B (also 5) — a tie, broken arbitrarily. It is tempting to think A's finalization "locks in" T at 11. It does not: T is only relaxed to 11 in round 3, not finalized. T is finalized in round 5 at its true value 7, after B's relaxation improves it. A vertex's distance is only final when the vertex itself is finalized — never when one of its predecessors is.

Confirm with code, and recover the actual route from predecessors:

import heapq

def dijkstra_paths(adj, source):
    dist = {v: INF for v in adj}
    prev = {v: None for v in adj}
    dist[source] = 0
    pq = [(0, source)]
    while pq:
        d, u = heapq.heappop(pq)
        if d > dist[u]:
            continue
        for v, w in adj[u]:
            if d + w < dist[v]:
                dist[v], prev[v] = d + w, u
                heapq.heappush(pq, (dist[v], v))
    return dist, prev

def path_to(prev, target):
    route, cur = [], target
    while cur is not None:
        route.append(cur); cur = prev[cur]
    return route[::-1]

dist, prev = dijkstra_paths(adj, "S")
print(dist["T"], path_to(prev, "T"))
# Expected output:
# 7 ['S', 'C', 'B', 'T']

The correct answer for the drivers' real objective is S → C → B → T, 7 minutes — three segments, beating the two-segment route by four minutes. The fix for Northgate is not a code change at all: set mode="time". The engine was right; the configuration was wrong.

💡 Intuition: Dijkstra's predecessor pointers form the shortest-time tree rooted at S. The same tree answers "fastest route from the depot to every intersection" at once — exactly what a dispatcher sending many drivers from one depot wants to precompute.

Phase 4: A* — the same answer, less searching

Dijkstra found the right route, but watch how much it explored: from S it relaxed edges into A, C, B, D, E, and T — essentially the whole district — even though the destination T was reachable early. On a city-scale graph that "expand outward in all directions" behavior is exactly the inefficiency §29.6 flags. A* fixes it by steering the search toward T using an optimistic estimate $h(v)$ of the remaining time from $v$ to T (a lower bound — for real maps, straight-line time, which never overestimates road time). A* expands the vertex minimizing dist[v] + h(v) instead of just dist[v].

Suppose the dispatcher has these admissible (never-overestimating) straight-line time estimates to T:

$v$ S A C B D E T
$h(v)$ 6 6 5 2 4 4 0

Each $h(v)$ is $\le$ the true remaining time (e.g. $h(A) = 6 = \delta(A, T)$; $h(C) = 5 \le \delta(C, T) = 5$; $h(B) = 2 = \delta(B, T)$), so the heuristic is admissible. Now trace A*, ordering the frontier by $f(v) = \text{dist}[v] + h(v)$:

Step Expand (min $f$) Frontier updates ($f = \text{dist}+h$)
1 S, $f = 0+6 = 6$ A: $5+6=11$; C: $2+5=7$; B: $9+2=11$
2 C, $f = 7$ relax C→B: dist[B] $9\to5$, so B: $5+2=7$; D: $10+4=14$
3 B, $f = 7$ relax B→T: dist[T] $\infty\to7$, T: $7+0=7$; (B→A: $5+1=6\not<5$)
4 T, $f = 7$ goal reached; dist[T] $= 7$

A* settles T in four expansions and never expands D or E at all — their $f$-values ($14$ and, had it been reached, $12 + 4 = 16$) keep them off the priority queue's front. It returns the identical route S → C → B → T at 7 minutes, but it ignored the dead-end southern branch entirely. That pruning is the difference between a continental query taking seconds and taking microseconds.

import heapq

def a_star(adj, source, target, h):
    dist = {v: INF for v in adj}
    prev = {v: None for v in adj}
    dist[source] = 0
    pq = [(h[source], source)]          # priority is f = dist + h
    while pq:
        f, u = heapq.heappop(pq)
        if u == target:
            break                        # goal popped: its dist is final
        if f - h[u] > dist[u]:
            continue                     # stale entry (lazy deletion)
        for v, w in adj[u]:
            if dist[u] + w < dist[v]:
                dist[v], prev[v] = dist[u] + w, u
                heapq.heappush(pq, (dist[v] + h[v], v))
    route, cur = [], target
    while cur is not None:
        route.append(cur); cur = prev[cur]
    return dist[target], route[::-1]

h = {"S": 6, "A": 6, "C": 5, "B": 2, "D": 4, "E": 4, "T": 0}
print(a_star(adj, "S", "T", h))
# Expected output:
# 7 ['S', 'C', 'B', 'T']

🔗 Connection. A* is exactly Dijkstra when $h \equiv 0$ (§29.6): with no estimate, $f(v) = \text{dist}[v]$ and the algorithm expands outward blindly. Because the estimate here never overestimates ("admissible"), the same invariant-style argument from §29.3 still certifies that A* returns a true shortest path — admissibility is what rescues the correctness argument when we add the heuristic to the key.

Discussion Questions

  1. The two complaints, reconciled. The dispatcher reported that the app both over-uses short streets and over-uses long roads. Using the three routes you priced in Phase 1 (11, 7, 13 minutes), explain in two sentences how a single objective ("fewest segments") produces both complaints.
  2. What's in the weight? Northgate's edges are current travel times. Name two other quantities a delivery company might reasonably put in the weights instead (e.g., distance, fuel, tolls), and for each say whether Dijkstra still applies (i.e., whether the weights stay non-negative).
  3. Ties. In Phase 3, A and B were both finalized at distance 5, broken arbitrarily. Could a different tie-break change the distance Dijkstra reports for T? Could it change the route returned? Justify.
  4. When A* buys nothing. Suppose the dispatcher sets $h(v) = 0$ for all $v$ (no estimate). What does A* reduce to, and on this graph would it explore more or fewer vertices than the run in Phase 4? Tie your answer to §29.6's claim that "Dijkstra is the goal-blind special case."
  5. Inadmissible heuristics. Suppose someone "improves" the heuristic by inflating it — setting $h(B) = 9$ (an overestimate of the true $\delta(B, T) = 2$). Argue informally why A* might now return a wrong (too-long) route, connecting to which inequality in the §29.3 correctness argument breaks.

Your Turn: Extensions

  • Option A (rush hour). The arterial S → A jams: its weight jumps from 5 to 20, while everything else holds. Re-run Dijkstra by hand from S. Does the fastest S→T route change? Does the fewest-segments route change? Report both and explain the asymmetry.
  • Option B (a second depot). Add a depot S2 with edges S2 → C (1) and S2 → A (4). Compute the shortest-time tree from S2 by hand, and decide which depot should serve T. (This is the multi-source decision a dispatcher makes constantly.)
  • Option C (verify the heuristic). Write is_admissible(adj, h, target) that checks $h(v) \le \delta(v, T)$ for every $v$ by computing true distances to T (run Dijkstra on the reverse graph from T). Explain what your function should return for the Phase 4 heuristic, and why an inadmissible heuristic is dangerous.
  • Option D (one-to-all). The dispatcher wants the fastest route from S to every intersection, not just T. Argue why you should run plain Dijkstra (not A*) for this, and read the full shortest-time tree off the prev array from Phase 3.

Key Takeaways

  1. The objective lives in the weights. BFS minimizes segment count; Dijkstra minimizes the sum of weights. Northgate's bug was a modeling choice (mode="segments"), not a coding error — the engine correctly optimized the wrong quantity.
  2. More edges can mean less cost. The 7-minute, three-segment route beat the 11-minute, two-segment route. "Fewest hops" equals "cheapest" only when all weights are equal (§29.1).
  3. A distance is final only when its vertex is finalized. T was relaxed to 11 in round 3 but finalized at 7 in round 5; never confuse "discovered" or "relaxed" with "finalized" (§29.2).
  4. A* is goal-directed Dijkstra. With an admissible heuristic it returns the same shortest path while exploring far fewer vertices; with $h \equiv 0$ it is Dijkstra. The heuristic must never overestimate, or correctness fails (§29.6, §29.3).
  5. Recover the route, not just the number. Predecessor pointers turned a distance (7) into directions (S → C → B → T) — which is what a driver actually needs.