Case Study: Building a Capacitated Assignment Engine for Course Enrollment

DataField.Dev

Case Study: Building a Capacitated Assignment Engine for Course Enrollment

"A program is correct when you can explain why it works — to a skeptic."

Executive Summary

In this build-focused case study you will construct, piece by piece, a small assignment engine that places students into course sections subject to seat limits and student preferences — and then prove your engine optimal on every input by recovering a matching minimum cut. You will start from the chapter's max_flow_value, generalize the bipartite reduction to allow capacities greater than 1 (a section has many seats, a student wants at most one section), add the ability to read out the actual assignment (not just its size), and finish with a self-checking assignment_report that returns both the placement and a min-cut certificate. By the end you will have a reusable tool and, more importantly, the habit of building a solver whose output carries its own proof of optimality.

This study is build-heavy: you assemble working machinery in stages, hand-verifying each stage's output before composing the next. (Case Study 1 was the complement — handed a system, you analyzed a failure.)

Skills applied

Generalizing the bipartite-matching reduction to capacitated assignment (§34.5, §34.6).
Building a max-flow solver that exposes residual capacities so the assignment is recoverable (§34.3).
Recovering the assignment from a flow and certifying it with a min cut (§34.4).
Incremental construction with hand-derived verification at each step (the book's # Expected output: discipline).

Background

The scenario

A department runs a small seminar program. This term there are three sections with limited seats:

Section	Seats (capacity)
`SecA`	2
`SecB`	1
`SecC`	2

Six students have each listed the sections they would accept (they want one section):

Student	Acceptable sections
`s1`	`SecA`, `SecB`
`s2`	`SecA`
`s3`	`SecA`, `SecC`
`s4`	`SecB`, `SecC`
`s5`	`SecC`
`s6`	`SecB`

We want to place as many students as possible, respecting both seat limits and preferences. This is capacitated bipartite matching: it is not plain matching (a section holds several students), but it is still a max-flow problem — give each section's sink edge a capacity equal to its seat count (§34.6).

💡 Intuition: Plain matching used capacity-1 sink edges so each right vertex was taken at most once. Bump a sink edge to capacity $k$ and that right vertex can absorb up to $k$ units of flow — exactly "this section has $k$ seats." The students keep capacity-1 source edges (one section each). One small change to the reduction unlocks a whole class of allocation problems.

Why this matters

Enrollment, shift staffing, ad-slot allocation, and load balancing are all "agents with single demands meeting resources with multi-unit supply." A single, well-built capacitated-matching engine solves all of them. And because we build it on max flow, every answer it returns can be certified optimal — we will make the engine emit the min cut alongside the placement, so a skeptic (or an auditor) can verify "no more students could have been placed" without re-running anything.

Phase 1: Start from a trusted max-flow core

We reuse the chapter's max_flow_value verbatim — it is our trusted primitive. (Trusting a tested, proven core and building on it is the §34 Project Checkpoint philosophy.) We will not modify it yet; first we get a correct value, then we extend.

from collections import deque

def max_flow_value(cap, s, t):
    res = {}
    for (u, v), c in cap.items():
        res.setdefault(u, {}); res.setdefault(v, {})
        res[u][v] = res[u].get(v, 0) + c
        res[v].setdefault(u, 0)
    total = 0
    while True:
        parent, q = {s: None}, deque([s])
        while q:
            u = q.popleft()
            for v, c in res[u].items():
                if c > 0 and v not in parent:
                    parent[v] = u; q.append(v)
        if t not in parent:
            return total
        v, bottleneck = t, float("inf")
        while parent[v] is not None:
            bottleneck = min(bottleneck, res[parent[v]][v]); v = parent[v]
        v = t
        while parent[v] is not None:
            res[parent[v]][v] -= bottleneck
            res[v][parent[v]] += bottleneck
            v = parent[v]
        total += bottleneck

Phase 2: Build the capacitated reduction

Now generalize max_bipartite_matching (§34.5) so the sink edges carry seat counts instead of 1. The only change from the unit-capacity version is step 4: section -> t gets capacity = seats.

def build_enrollment_network(students, prefs, seats):
    """students: list of student ids. prefs: list of (student, section) edges.
       seats: dict section -> seat capacity. Returns a capacity dict for max flow."""
    cap = {}
    for st in students:                          # source -> each student, capacity 1
        cap[("s", ("S", st))] = 1
    for st, sec in prefs:                         # student -> acceptable section, capacity 1
        cap[(("S", st), ("C", sec))] = 1
    for sec, k in seats.items():                  # section -> sink, capacity = seats
        cap[(("C", sec), "t")] = k
    return cap

students = ["s1","s2","s3","s4","s5","s6"]
prefs = [("s1","SecA"),("s1","SecB"),("s2","SecA"),("s3","SecA"),("s3","SecC"),
         ("s4","SecB"),("s4","SecC"),("s5","SecC"),("s6","SecB")]
seats = {"SecA":2, "SecB":1, "SecC":2}

cap = build_enrollment_network(students, prefs, seats)
print(max_flow_value(cap, "s", "t"))
# Expected output:
# 5

Let us hand-derive that 5 before trusting the code, because building means verifying every step.

The strategy first. The total seats are $2+1+2 = 5$, so at most 5 students can be placed (the sink-side ceiling). There are 6 students, so at least one will be left out no matter what. The question is whether we can hit the ceiling of 5. We trace augmenting paths.

Place s2→SecA (s2's only choice). SecA has 1 of 2 seats used.
Place s1→SecA (uses SecA's 2nd seat). SecA now full (2/2).
Place s6→SecB (s6's only choice). SecB full (1/1).
Place s5→SecC. SecC 1 of 2.
Place s3→SecC (s3 also lists SecA, but it is full; SecC has a seat). SecC full (2/2).
s4 wants SecB (full) or SecC (full). Try a reroute: BFS from $s$ reaches s4, then SecB via $s4\to SecB$, then $SecB\to t$ is full, back edge $SecB\to s6$... but s6's only section is SecB (visited). From s4 try SecC: full, back edge $SecC\to s3$ or $SecC\to s5$; s5's only section is SecC (visited); s3 also lists SecA, back-to SecA full, back edge to s1/s2; s2 only SecA; s1 also SecB (full). Every chain dead-ends. No augmenting path. s4 is unplaced.

Max flow $= 5$, matching the code. Five students placed; s4 left out.

🔄 Check Your Understanding Why is the sink-side total (5 seats) the binding ceiling here rather than the source-side total (6 students)?

Answer

Every placed student consumes exactly one seat, and there are only $2+1+2=5$ seats total, so the flow can never exceed 5 — fewer seats than students. The 6 source edges would allow 6, but the sink edges cap the total at 5. With more seats than students the situation would flip and the source side would bind.

Phase 3: Recover the actual assignment (not just the count)

A count is not a schedule. Students need to know which section they got. We build solve_assignment, which runs the same Edmonds–Karp loop but, instead of discarding the residual graph, inspects it at the end: a middle edge $\text{student}\to\text{section}$ carries one unit of flow exactly when its reverse residual is 1 (the unit we pushed sits there as cancellable back-flow).

def solve_assignment(cap, s, t):
    """Return (value, placements) where placements is a list of (student, section)."""
    res = {}
    for (u, v), c in cap.items():
        res.setdefault(u, {}); res.setdefault(v, {})
        res[u][v] = res[u].get(v, 0) + c
        res[v].setdefault(u, 0)
    total = 0
    while True:
        parent, q = {s: None}, deque([s])
        while q:
            u = q.popleft()
            for v, c in res[u].items():
                if c > 0 and v not in parent:
                    parent[v] = u; q.append(v)
        if t not in parent:
            break
        v, b = t, float("inf")
        while parent[v] is not None:
            b = min(b, res[parent[v]][v]); v = parent[v]
        v = t
        while parent[v] is not None:
            res[parent[v]][v] -= b; res[v][parent[v]] += b; v = parent[v]
        total += b
    placements = [(u[1], v[1])                    # middle edge used  <=>  reverse residual == 1
                  for (u, v) in cap
                  if isinstance(u, tuple) and u[0] == "S" and res[v][u] == 1]
    return total, placements

value, placements = solve_assignment(cap, "s", "t")
print(value)
print(sorted(placements))
# Expected output:
# 5
# [('s1', 'SecA'), ('s2', 'SecA'), ('s3', 'SecC'), ('s4', 'SecB'), ('s5', 'SecC')]

Hand-deriving which student lands where requires following the BFS order, since which maximum assignment Edmonds–Karp returns depends on its path choices. Tracing shortest augmenting paths in turn: s1→SecA, then s2→SecA (filling SecA's two seats), then s3→SecC, then s4→SecB (filling SecB's one seat), then s5→SecC (filling SecC's two seats) — value 5. After that, the only student with an unused source edge is s6, whose single option SecB is full, and no residual reroute reaches t. So s6 is the unplaced student here.

⚠️ Common Pitfall: Reading the assignment off forward residuals is the trap. After pushing flow, a used middle edge $x\to y$ has forward residual 0 — it looks "empty." The unit is recorded on the reverse residual $y\to x$, which becomes 1. Check the reverse edge, never the forward one. (This is the same back-edge bookkeeping from §34.3: pushed flow lives where it can be canceled.)

The placement $\{(s1,SecA),(s2,SecA),(s3,SecC),(s4,SecB),(s5,SecC)\}$ uses both SecA seats, both SecC seats, and the single SecB seat — five students, every seat filled, s6 unplaced. It is a valid capacitated assignment (no section over its seat count, no student in two sections). But "valid" is weaker than "optimal." For optimal we need the certificate.

🔄 Check Your Understanding A different BFS tie-break might place s6→SecB and leave s4 unplaced instead. Would that change the value of the solution, or only which student is left out? What does this say about uniqueness of the maximum?

Answer

Only which student is left out — the value is 5 either way (the sink-side ceiling). A maximum matching need not be unique; many different size-5 assignments exist. Max-flow min-cut certifies the size is optimal, not that the assignment is the only one. Both s4-out and s6-out solutions are optimal.

Build a validation harness (test before you trust)

A solver you cannot check is a solver you should not ship. Before adding the optimality certificate, build the structural check: does the returned placement actually respect the constraints? This is the "test" half of theme four — the structural harness catches bugs the optimality proof silently assumes away. We verify two invariants: no student appears twice, and no section exceeds its seats. We also confirm the placement's size equals the independently computed max_flow_value, which links structural validity to optimality.

def verify_assignment(cap, placements, seats, s="s", t="t"):
    """Structural + optimality check on a returned placement."""
    students = [p[0] for p in placements]
    if len(students) != len(set(students)):          # no student placed twice
        return False
    used = {}                                         # seats consumed per section
    for _, sec in placements:
        used[sec] = used.get(sec, 0) + 1
    if any(used[sec] > seats[sec] for sec in used):   # respect seat limits
        return False
    return len(placements) == max_flow_value(cap, s, t)   # size == max flow (optimal)

print(verify_assignment(cap, placements, seats))
# Expected output:
# True

Hand-derive it: students = ['s1','s2','s3','s4','s5'] — five distinct, so the duplicate check passes. used = {'SecA': 2, 'SecC': 2, 'SecB': 1}, each within seats (SecA 2/2, SecB 1/1, SecC 2/2), so the capacity check passes. Finally len(placements) = 5 equals max_flow_value(cap, "s", "t") = 5, so the function returns True. The harness now stands guard: if a future refactor (say, the reverse-residual read-out of Phase 3) introduced a bug that double-booked a student or over-filled a section, this check would immediately return False.

⚠️ Common Pitfall: A validation harness that only checks structural validity (no double-booking, no over-capacity) is necessary but not sufficient — the empty placement is structurally valid! That is why verify_assignment also compares the size to max_flow_value. Structural validity says "this is a legal assignment"; the size comparison says "and it is the largest one." You need both.

Phase 4: Emit the optimality certificate (the min cut)

We make the engine prove its own answer. After the flow loop, one more BFS over the residual graph finds $S$ = the vertices reachable from $s$; by §34.4 that is the source side of a minimum cut, and its capacity equals the flow. We bundle this into assignment_report.

def assignment_report(cap, s, t):
    """Return (value, placements, S) — S is the source side of a minimum cut."""
    value, placements = solve_assignment(cap, s, t)   # reuse Phase 3
    # rebuild final residual to find reachable set S (recompute cleanly)
    res = {}
    for (u, v), c in cap.items():
        res.setdefault(u, {}); res.setdefault(v, {})
        res[u][v] = res[u].get(v, 0) + c
        res[v].setdefault(u, 0)
    while True:
        parent, q = {s: None}, deque([s])
        while q:
            u = q.popleft()
            for v, c in res[u].items():
                if c > 0 and v not in parent:
                    parent[v] = u; q.append(v)
        if t not in parent:
            break
        v, b = t, float("inf")
        while parent[v] is not None:
            b = min(b, res[parent[v]][v]); v = parent[v]
        v = t
        while parent[v] is not None:
            res[parent[v]][v] -= b; res[v][parent[v]] += b; v = parent[v]
    S = set()                                          # final BFS: reachable from s
    q = deque([s]); S.add(s)
    while q:
        u = q.popleft()
        for v, c in res[u].items():
            if c > 0 and v not in S:
                S.add(v); q.append(v)
    return value, placements, S

value, placements, S = assignment_report(cap, "s", "t")
print(value)
print(("C","SecB") in S, ("C","SecC") in S)
# Expected output:
# 5
# True True

Let us hand-derive $S$ and read the certificate. At the max flow, every seat is filled, so all three section-sink edges ($SecA\to t$, $SecB\to t$, $SecC\to t$) are saturated. Run the final BFS from $s$ in the residual graph. s6 still has an unused source edge $s\to s6$, so BFS reaches s6, then SecB (its eligible section, with an unused middle edge $s6\to SecB$). From SecB, the back edge to placed student s4 is available (residual 1), reaching s4; from s4 the unused middle edge $s4\to SecC$ reaches SecC; from SecC the back edges to s3 and s5 reach those students; from s3 the unused middle edge $s3\to SecA$ reaches SecA; from SecA the back edges to s1, s2 reach them. Following all such residual hops, every vertex except $t$ is reachable:

$$S = \{\,s,\ s1,\ s2,\ s3,\ s4,\ s5,\ s6,\ SecA,\ SecB,\ SecC\,\}, \qquad T = \{\,t\,\}.$$

So SecB and SecC are in $S$ (the code prints True True), even though their sink edges are saturated — being saturated does not make a node unreachable; a residual route reached them through back edges. The minimum cut is therefore the sink cut $(S, \{t\})$. Its crossing $S\to T$ edges are exactly the three section-sink edges (every other edge has its head inside $S$, not at $t$):

$$c(S,T) = c(SecA,t) + c(SecB,t) + c(SecC,t) = 2 + 1 + 2 = 5,$$

which equals the max flow — certifying optimality. We can confirm mechanically with cut_capacity from Exercise 34.23.

print(cut_capacity(cap, S))
# Expected output:
# 5

💡 Intuition: Where the min cut sits tells you the kind of shortage. Here the cut is the sink cut — all three sections, all of them full — which says the bottleneck is total seat supply, not any single over-subscribed section. Contrast Case Study 1, where the min cut cut through specific machine slots, diagnosing a localized contention. Same theorem, two different shapes of answer: a sink cut means "you are globally seat-limited; add seats anywhere on the cut"; an internal cut means "a specific subset is the bottleneck."

🐛 Find the Error (built into the build). Suppose an engineer, eager to report the certificate by hand, writes: "The cut must contain a middle edge for every unplaced-student route — e.g. $s6\to SecB$ — plus the saturated sink edges, so the cut capacity is more than 5." Where does this go wrong, and why must you trust the mechanically computed $S$?

Answer

The edge $s6\to SecB$ does not cross the cut: both s6 and SecB are in $S$, so it is an internal edge and contributes 0 (only $S\to T$ edges count, §34.4). Hand-assembling cut edges from intuition over-counts exactly this way. The disciplined method is to let the code compute $S$ (the final BFS) and then sum capacities only over edges with tail in $S$ and head outside $S$ — which gives the three sink edges, total 5. The theorem guarantees the residual-reachable cut equals the flow; when a hand-count disagrees, the hand-count is wrong, not the theorem.

This is the payoff of building the certificate into the engine: the moment a hand-count disagrees with the flow value, you know the hand-count is wrong, because max-flow min-cut guarantees the residual-reachable cut equals the flow. The engine's output (value 5 + placement + the set $S$) is self-checking.

Discussion Questions

Phase 2 changed exactly one thing from plain matching — the sink-edge capacities. What would change if instead a student could enroll in up to two sections? Which capacities move, and is it still a clean max-flow problem?
In Phase 3 we read the assignment off reverse residuals. Write the one-sentence invariant that justifies "middle edge used $\iff$ reverse residual is 1," and tie it to the residual definition in §34.3.
The engine leaves one student unplaced (s6 in our trace), and the min cut turned out to be the sink cut (all sections full). Explain why, in this term, there is no Hall-style deficient subset of students to blame — the shortage is global. What single change (and where) would raise the placement to 6, and how does the sink-cut location tell you that adding a seat to any section helps?
Why does emitting the min cut make the engine auditable in a way that returning only the placement does not? What could a skeptic verify in seconds with the cut that they could not verify from the placement alone?
The integrality property (§34.4) is doing silent work in this whole build. Point to the exact place the engine would break if flows could be fractional, and explain why integer capacities prevent it.

Your Turn: Extensions

Option A (priorities). Raising a student's source edge capacity is the wrong way to model priority (a student still wants only one seat). Instead, add a tie-break: run the engine, and if a specific student must be placed, pre-assign them (delete their competitors' edges to the chosen section) and re-solve. Build a wrapper place_first(cap, student, section) and test which other student gets bumped.
Option B (waitlist sizing). Extend assignment_report to also return the list of unplaced students (students whose source edge still has residual at the max flow). Hand-derive that it returns ['s6'] for this term, then add 1 seat to SecB (set ("C","SecB")→t to capacity 2) and re-derive the new max flow and waitlist.
Option C (add a seat, watch the cut move). The min cut here is the sink cut (capacity 5). Add one seat to SecC (sink edge capacity 3) and re-run assignment_report. Predict by hand whether the flow rises to 6, and whether the min cut is still the sink cut or has moved to an internal cut — then verify with cut_capacity.
Option D (generalize the reduction). Refactor build_enrollment_network into a general build_capacitated_matching(left, edges, right_caps, left_caps=None) that allows multi-unit capacities on both sides, and re-solve this term as a special case to confirm you still get 5.

Key Takeaways

One change unlocks capacitated assignment. Setting a right-side sink edge to capacity $k$ models "k seats"; the rest of the §34.5 reduction is unchanged, and max flow solves it directly.
Build the read-out, not just the count. The placement lives in the reverse residuals of the middle edges; recovering it (Phase 3) turns a number into an actual schedule.
Make the solver certify itself. Emitting the min cut (the residual-reachable set $S$) lets anyone verify optimality without re-running — the answer carries its own proof (§34.4).
Trust the mechanical cut over your intuition. When a hand-counted cut disagrees with the flow value, the hand-count is wrong: max-flow min-cut guarantees the residual-reachable cut equals the flow. Compute $S$ from the residual graph and verify its capacity with code.