Case Study: Building a Capacitated Assignment Engine for Course Enrollment
"A program is correct when you can explain why it works — to a skeptic."
Executive Summary
In this build-focused case study you will construct, piece by piece, a small assignment engine that
places students into course sections subject to seat limits and student preferences — and then prove
your engine optimal on every input by recovering a matching minimum cut. You will start from the chapter's
max_flow_value, generalize the bipartite reduction to allow capacities greater than 1 (a section has
many seats, a student wants at most one section), add the ability to read out the actual assignment
(not just its size), and finish with a self-checking assignment_report that returns both the placement
and a min-cut certificate. By the end you will have a reusable tool and, more importantly, the habit of
building a solver whose output carries its own proof of optimality.
This study is build-heavy: you assemble working machinery in stages, hand-verifying each stage's output before composing the next. (Case Study 1 was the complement — handed a system, you analyzed a failure.)
Skills applied
- Generalizing the bipartite-matching reduction to capacitated assignment (§34.5, §34.6).
- Building a max-flow solver that exposes residual capacities so the assignment is recoverable (§34.3).
- Recovering the assignment from a flow and certifying it with a min cut (§34.4).
- Incremental construction with hand-derived verification at each step (the book's
# Expected output:discipline).
Background
The scenario
A department runs a small seminar program. This term there are three sections with limited seats:
| Section | Seats (capacity) |
|---|---|
SecA |
2 |
SecB |
1 |
SecC |
2 |
Six students have each listed the sections they would accept (they want one section):
| Student | Acceptable sections |
|---|---|
s1 |
SecA, SecB |
s2 |
SecA |
s3 |
SecA, SecC |
s4 |
SecB, SecC |
s5 |
SecC |
s6 |
SecB |
We want to place as many students as possible, respecting both seat limits and preferences. This is capacitated bipartite matching: it is not plain matching (a section holds several students), but it is still a max-flow problem — give each section's sink edge a capacity equal to its seat count (§34.6).
💡 Intuition: Plain matching used capacity-1 sink edges so each right vertex was taken at most once. Bump a sink edge to capacity $k$ and that right vertex can absorb up to $k$ units of flow — exactly "this section has $k$ seats." The students keep capacity-1 source edges (one section each). One small change to the reduction unlocks a whole class of allocation problems.
Why this matters
Enrollment, shift staffing, ad-slot allocation, and load balancing are all "agents with single demands meeting resources with multi-unit supply." A single, well-built capacitated-matching engine solves all of them. And because we build it on max flow, every answer it returns can be certified optimal — we will make the engine emit the min cut alongside the placement, so a skeptic (or an auditor) can verify "no more students could have been placed" without re-running anything.
Phase 1: Start from a trusted max-flow core
We reuse the chapter's max_flow_value verbatim — it is our trusted primitive. (Trusting a tested,
proven core and building on it is the §34 Project Checkpoint philosophy.) We will not modify it yet;
first we get a correct value, then we extend.
from collections import deque
def max_flow_value(cap, s, t):
res = {}
for (u, v), c in cap.items():
res.setdefault(u, {}); res.setdefault(v, {})
res[u][v] = res[u].get(v, 0) + c
res[v].setdefault(u, 0)
total = 0
while True:
parent, q = {s: None}, deque([s])
while q:
u = q.popleft()
for v, c in res[u].items():
if c > 0 and v not in parent:
parent[v] = u; q.append(v)
if t not in parent:
return total
v, bottleneck = t, float("inf")
while parent[v] is not None:
bottleneck = min(bottleneck, res[parent[v]][v]); v = parent[v]
v = t
while parent[v] is not None:
res[parent[v]][v] -= bottleneck
res[v][parent[v]] += bottleneck
v = parent[v]
total += bottleneck
Phase 2: Build the capacitated reduction
Now generalize max_bipartite_matching (§34.5) so the sink edges carry seat counts instead of 1. The
only change from the unit-capacity version is step 4: section -> t gets capacity = seats.
def build_enrollment_network(students, prefs, seats):
"""students: list of student ids. prefs: list of (student, section) edges.
seats: dict section -> seat capacity. Returns a capacity dict for max flow."""
cap = {}
for st in students: # source -> each student, capacity 1
cap[("s", ("S", st))] = 1
for st, sec in prefs: # student -> acceptable section, capacity 1
cap[(("S", st), ("C", sec))] = 1
for sec, k in seats.items(): # section -> sink, capacity = seats
cap[(("C", sec), "t")] = k
return cap
students = ["s1","s2","s3","s4","s5","s6"]
prefs = [("s1","SecA"),("s1","SecB"),("s2","SecA"),("s3","SecA"),("s3","SecC"),
("s4","SecB"),("s4","SecC"),("s5","SecC"),("s6","SecB")]
seats = {"SecA":2, "SecB":1, "SecC":2}
cap = build_enrollment_network(students, prefs, seats)
print(max_flow_value(cap, "s", "t"))
# Expected output:
# 5
Let us hand-derive that 5 before trusting the code, because building means verifying every step.
The strategy first. The total seats are $2+1+2 = 5$, so at most 5 students can be placed (the sink-side ceiling). There are 6 students, so at least one will be left out no matter what. The question is whether we can hit the ceiling of 5. We trace augmenting paths.
- Place
s2→SecA(s2's only choice).SecAhas 1 of 2 seats used. - Place
s1→SecA(uses SecA's 2nd seat).SecAnow full (2/2). - Place
s6→SecB(s6's only choice).SecBfull (1/1). - Place
s5→SecC.SecC1 of 2. - Place
s3→SecC(s3 also lists SecA, but it is full; SecC has a seat).SecCfull (2/2). s4wantsSecB(full) orSecC(full). Try a reroute: BFS from $s$ reachess4, thenSecBvia $s4\to SecB$, then $SecB\to t$ is full, back edge $SecB\to s6$... buts6's only section isSecB(visited). Froms4trySecC: full, back edge $SecC\to s3$ or $SecC\to s5$;s5's only section isSecC(visited);s3also listsSecA, back-toSecAfull, back edge tos1/s2;s2onlySecA;s1alsoSecB(full). Every chain dead-ends. No augmenting path.s4is unplaced.
Max flow $= 5$, matching the code. Five students placed; s4 left out.
🔄 Check Your Understanding Why is the sink-side total (5 seats) the binding ceiling here rather than the source-side total (6 students)?
Answer
Every placed student consumes exactly one seat, and there are only $2+1+2=5$ seats total, so the flow can never exceed 5 — fewer seats than students. The 6 source edges would allow 6, but the sink edges cap the total at 5. With more seats than students the situation would flip and the source side would bind.
Phase 3: Recover the actual assignment (not just the count)
A count is not a schedule. Students need to know which section they got. We build solve_assignment,
which runs the same Edmonds–Karp loop but, instead of discarding the residual graph, inspects it at the
end: a middle edge $\text{student}\to\text{section}$ carries one unit of flow exactly when its reverse
residual is 1 (the unit we pushed sits there as cancellable back-flow).
def solve_assignment(cap, s, t):
"""Return (value, placements) where placements is a list of (student, section)."""
res = {}
for (u, v), c in cap.items():
res.setdefault(u, {}); res.setdefault(v, {})
res[u][v] = res[u].get(v, 0) + c
res[v].setdefault(u, 0)
total = 0
while True:
parent, q = {s: None}, deque([s])
while q:
u = q.popleft()
for v, c in res[u].items():
if c > 0 and v not in parent:
parent[v] = u; q.append(v)
if t not in parent:
break
v, b = t, float("inf")
while parent[v] is not None:
b = min(b, res[parent[v]][v]); v = parent[v]
v = t
while parent[v] is not None:
res[parent[v]][v] -= b; res[v][parent[v]] += b; v = parent[v]
total += b
placements = [(u[1], v[1]) # middle edge used <=> reverse residual == 1
for (u, v) in cap
if isinstance(u, tuple) and u[0] == "S" and res[v][u] == 1]
return total, placements
value, placements = solve_assignment(cap, "s", "t")
print(value)
print(sorted(placements))
# Expected output:
# 5
# [('s1', 'SecA'), ('s2', 'SecA'), ('s3', 'SecC'), ('s4', 'SecB'), ('s5', 'SecC')]
Hand-deriving which student lands where requires following the BFS order, since which maximum assignment
Edmonds–Karp returns depends on its path choices. Tracing shortest augmenting paths in turn: s1→SecA,
then s2→SecA (filling SecA's two seats), then s3→SecC, then s4→SecB (filling SecB's one seat), then
s5→SecC (filling SecC's two seats) — value 5. After that, the only student with an unused source edge is
s6, whose single option SecB is full, and no residual reroute reaches t. So s6 is the unplaced
student here.
⚠️ Common Pitfall: Reading the assignment off forward residuals is the trap. After pushing flow, a used middle edge $x\to y$ has forward residual 0 — it looks "empty." The unit is recorded on the reverse residual $y\to x$, which becomes 1. Check the reverse edge, never the forward one. (This is the same back-edge bookkeeping from §34.3: pushed flow lives where it can be canceled.)
The placement $\{(s1,SecA),(s2,SecA),(s3,SecC),(s4,SecB),(s5,SecC)\}$ uses both SecA seats, both SecC
seats, and the single SecB seat — five students, every seat filled, s6 unplaced. It is a valid
capacitated assignment (no section over its seat count, no student in two sections). But "valid" is weaker
than "optimal." For optimal we need the certificate.
🔄 Check Your Understanding A different BFS tie-break might place
s6→SecBand leaves4unplaced instead. Would that change the value of the solution, or only which student is left out? What does this say about uniqueness of the maximum?
Answer
Only which student is left out — the value is 5 either way (the sink-side ceiling). A maximum matching need not be unique; many different size-5 assignments exist. Max-flow min-cut certifies the size is optimal, not that the assignment is the only one. Both
s4-out ands6-out solutions are optimal.
Build a validation harness (test before you trust)
A solver you cannot check is a solver you should not ship. Before adding the optimality certificate, build
the structural check: does the returned placement actually respect the constraints? This is the "test"
half of theme four — the structural harness catches bugs the optimality proof silently assumes away. We
verify two invariants: no student appears twice, and no section exceeds its seats. We also confirm the
placement's size equals the independently computed max_flow_value, which links structural validity to
optimality.
def verify_assignment(cap, placements, seats, s="s", t="t"):
"""Structural + optimality check on a returned placement."""
students = [p[0] for p in placements]
if len(students) != len(set(students)): # no student placed twice
return False
used = {} # seats consumed per section
for _, sec in placements:
used[sec] = used.get(sec, 0) + 1
if any(used[sec] > seats[sec] for sec in used): # respect seat limits
return False
return len(placements) == max_flow_value(cap, s, t) # size == max flow (optimal)
print(verify_assignment(cap, placements, seats))
# Expected output:
# True
Hand-derive it: students = ['s1','s2','s3','s4','s5'] — five distinct, so the duplicate check passes.
used = {'SecA': 2, 'SecC': 2, 'SecB': 1}, each within seats (SecA 2/2, SecB 1/1, SecC 2/2), so
the capacity check passes. Finally len(placements) = 5 equals max_flow_value(cap, "s", "t") = 5, so the
function returns True. The harness now stands guard: if a future refactor (say, the reverse-residual
read-out of Phase 3) introduced a bug that double-booked a student or over-filled a section, this check
would immediately return False.
⚠️ Common Pitfall: A validation harness that only checks structural validity (no double-booking, no over-capacity) is necessary but not sufficient — the empty placement is structurally valid! That is why
verify_assignmentalso compares the size tomax_flow_value. Structural validity says "this is a legal assignment"; the size comparison says "and it is the largest one." You need both.
Phase 4: Emit the optimality certificate (the min cut)
We make the engine prove its own answer. After the flow loop, one more BFS over the residual graph finds
$S$ = the vertices reachable from $s$; by §34.4 that is the source side of a minimum cut, and its capacity
equals the flow. We bundle this into assignment_report.
def assignment_report(cap, s, t):
"""Return (value, placements, S) — S is the source side of a minimum cut."""
value, placements = solve_assignment(cap, s, t) # reuse Phase 3
# rebuild final residual to find reachable set S (recompute cleanly)
res = {}
for (u, v), c in cap.items():
res.setdefault(u, {}); res.setdefault(v, {})
res[u][v] = res[u].get(v, 0) + c
res[v].setdefault(u, 0)
while True:
parent, q = {s: None}, deque([s])
while q:
u = q.popleft()
for v, c in res[u].items():
if c > 0 and v not in parent:
parent[v] = u; q.append(v)
if t not in parent:
break
v, b = t, float("inf")
while parent[v] is not None:
b = min(b, res[parent[v]][v]); v = parent[v]
v = t
while parent[v] is not None:
res[parent[v]][v] -= b; res[v][parent[v]] += b; v = parent[v]
S = set() # final BFS: reachable from s
q = deque([s]); S.add(s)
while q:
u = q.popleft()
for v, c in res[u].items():
if c > 0 and v not in S:
S.add(v); q.append(v)
return value, placements, S
value, placements, S = assignment_report(cap, "s", "t")
print(value)
print(("C","SecB") in S, ("C","SecC") in S)
# Expected output:
# 5
# True True
Let us hand-derive $S$ and read the certificate. At the max flow, every seat is filled, so all three
section-sink edges ($SecA\to t$, $SecB\to t$, $SecC\to t$) are saturated. Run the final BFS from $s$ in the
residual graph. s6 still has an unused source edge $s\to s6$, so BFS reaches s6, then SecB (its
eligible section, with an unused middle edge $s6\to SecB$). From SecB, the back edge to placed student
s4 is available (residual 1), reaching s4; from s4 the unused middle edge $s4\to SecC$ reaches SecC;
from SecC the back edges to s3 and s5 reach those students; from s3 the unused middle edge
$s3\to SecA$ reaches SecA; from SecA the back edges to s1, s2 reach them. Following all such
residual hops, every vertex except $t$ is reachable:
$$S = \{\,s,\ s1,\ s2,\ s3,\ s4,\ s5,\ s6,\ SecA,\ SecB,\ SecC\,\}, \qquad T = \{\,t\,\}.$$
So SecB and SecC are in $S$ (the code prints True True), even though their sink edges are
saturated — being saturated does not make a node unreachable; a residual route reached them through back
edges. The minimum cut is therefore the sink cut $(S, \{t\})$. Its crossing $S\to T$ edges are exactly
the three section-sink edges (every other edge has its head inside $S$, not at $t$):
$$c(S,T) = c(SecA,t) + c(SecB,t) + c(SecC,t) = 2 + 1 + 2 = 5,$$
which equals the max flow — certifying optimality. We can confirm mechanically with cut_capacity from
Exercise 34.23.
print(cut_capacity(cap, S))
# Expected output:
# 5
💡 Intuition: Where the min cut sits tells you the kind of shortage. Here the cut is the sink cut — all three sections, all of them full — which says the bottleneck is total seat supply, not any single over-subscribed section. Contrast Case Study 1, where the min cut cut through specific machine slots, diagnosing a localized contention. Same theorem, two different shapes of answer: a sink cut means "you are globally seat-limited; add seats anywhere on the cut"; an internal cut means "a specific subset is the bottleneck."
🐛 Find the Error (built into the build). Suppose an engineer, eager to report the certificate by hand, writes: "The cut must contain a middle edge for every unplaced-student route — e.g. $s6\to SecB$ — plus the saturated sink edges, so the cut capacity is more than 5." Where does this go wrong, and why must you trust the mechanically computed $S$?
Answer
The edge $s6\to SecB$ does not cross the cut: both
s6andSecBare in $S$, so it is an internal edge and contributes 0 (only $S\to T$ edges count, §34.4). Hand-assembling cut edges from intuition over-counts exactly this way. The disciplined method is to let the code compute $S$ (the final BFS) and then sum capacities only over edges with tail in $S$ and head outside $S$ — which gives the three sink edges, total 5. The theorem guarantees the residual-reachable cut equals the flow; when a hand-count disagrees, the hand-count is wrong, not the theorem.
This is the payoff of building the certificate into the engine: the moment a hand-count disagrees with the flow value, you know the hand-count is wrong, because max-flow min-cut guarantees the residual-reachable cut equals the flow. The engine's output (value 5 + placement + the set $S$) is self-checking.
Discussion Questions
- Phase 2 changed exactly one thing from plain matching — the sink-edge capacities. What would change if instead a student could enroll in up to two sections? Which capacities move, and is it still a clean max-flow problem?
- In Phase 3 we read the assignment off reverse residuals. Write the one-sentence invariant that justifies "middle edge used $\iff$ reverse residual is 1," and tie it to the residual definition in §34.3.
- The engine leaves one student unplaced (
s6in our trace), and the min cut turned out to be the sink cut (all sections full). Explain why, in this term, there is no Hall-style deficient subset of students to blame — the shortage is global. What single change (and where) would raise the placement to 6, and how does the sink-cut location tell you that adding a seat to any section helps? - Why does emitting the min cut make the engine auditable in a way that returning only the placement does not? What could a skeptic verify in seconds with the cut that they could not verify from the placement alone?
- The integrality property (§34.4) is doing silent work in this whole build. Point to the exact place the engine would break if flows could be fractional, and explain why integer capacities prevent it.
Your Turn: Extensions
- Option A (priorities). Raising a student's source edge capacity is the wrong way to model priority
(a student still wants only one seat). Instead, add a tie-break: run the engine, and if a specific
student must be placed, pre-assign them (delete their competitors' edges to the chosen section) and
re-solve. Build a wrapper
place_first(cap, student, section)and test which other student gets bumped. - Option B (waitlist sizing). Extend
assignment_reportto also return the list of unplaced students (students whose source edge still has residual at the max flow). Hand-derive that it returns['s6']for this term, then add 1 seat toSecB(set("C","SecB")→tto capacity 2) and re-derive the new max flow and waitlist. - Option C (add a seat, watch the cut move). The min cut here is the sink cut (capacity 5). Add one
seat to
SecC(sink edge capacity 3) and re-runassignment_report. Predict by hand whether the flow rises to 6, and whether the min cut is still the sink cut or has moved to an internal cut — then verify withcut_capacity. - Option D (generalize the reduction). Refactor
build_enrollment_networkinto a generalbuild_capacitated_matching(left, edges, right_caps, left_caps=None)that allows multi-unit capacities on both sides, and re-solve this term as a special case to confirm you still get 5.
Key Takeaways
- One change unlocks capacitated assignment. Setting a right-side sink edge to capacity $k$ models "k seats"; the rest of the §34.5 reduction is unchanged, and max flow solves it directly.
- Build the read-out, not just the count. The placement lives in the reverse residuals of the middle edges; recovering it (Phase 3) turns a number into an actual schedule.
- Make the solver certify itself. Emitting the min cut (the residual-reachable set $S$) lets anyone verify optimality without re-running — the answer carries its own proof (§34.4).
- Trust the mechanical cut over your intuition. When a hand-counted cut disagrees with the flow value, the hand-count is wrong: max-flow min-cut guarantees the residual-reachable cut equals the flow. Compute $S$ from the residual graph and verify its capacity with code.