Case Study: Building a Combination Generator for Brute-Force Selection
"The whole of arithmetic now appeared within the grasp of mechanism." — Charles Babbage, Passages from the Life of a Philosopher (1864)
Executive Summary
In this build-focused case study you will write the machinery that generates combinations — not just
counts them — and use it to brute-force a real optimization: choosing the best $k$-item subset under a
budget. You will build three things, in order: a clean recursive combination generator (the one from the
chapter's Project Checkpoint), an iterative next_combination that advances through subsets in
lexicographic order without recursion (the version production libraries use, because it streams and never
blows the call stack), and a small brute-force selector that walks every $k$-subset and keeps the
best one. At each step you will close the loop with counting: the number of subsets your generator
emits must equal $\binom{n}{k}$, and that count is your feasibility budget — it tells you the largest
$(n, k)$ for which brute force is sane. This is the chapter's twin themes made executable: generating and
counting are two halves of one idea, and you size the search space before you trust the loop.
Skills applied
- Turning the combination count $\binom{n}{r}$ into a combination generator (§16.2, Project Checkpoint).
- Building an iterative lexicographic
next_combinationand arguing it visits each subset exactly once. - Using a generated search space to brute-force an optimization, and sizing it with $\binom{n}{k}$ (§16.6).
- Verifying a generator by the count:
len(list(...)) == comb_count(n, k)(theme four — test and prove). - Recognizing when brute force is feasible and when to stop (the feasibility verdict).
Background
The scenario
(Illustrative — constructed for teaching.) You are building a dependency bundler. A service can ship with at most 5 optional plugins chosen from a catalog of 12. Each plugin has a value (a benefit score) and a cost (megabytes of bundle size). The bundle has a hard size budget. The task:
Choose a subset of at most 5 plugins whose total cost fits the budget and whose total value is as large as possible.
This is a small knapsack-flavored selection. Order does not matter (a bundle is a set of plugins), repetition is not allowed (you cannot ship the same plugin twice), and the size is capped at 5 — so the candidate space is "all subsets of size $\le 5$ from 12." That is a combination problem, and the brute force is "generate every such subset, score it, keep the best." Before writing a single optimizer, the right move is to count the candidates and confirm brute force is feasible.
💡 Intuition: A bundle is an unordered selection without repetition — the textbook definition of a combination (§16.2). "Best subset under a budget" with a small cap is the canonical situation where brute force is fine, precisely because $\binom{12}{\le 5}$ is small. The whole case study is about building the generator that makes that brute force run, and using the count to license it.
Why this matters
"Generate all combinations" underlies an enormous amount of practical code: feature selection in machine
learning, test-input generation, brute-force key search, exhaustive enumeration in solvers, and any
"try every subset" optimization. Python's itertools.combinations exists for exactly this — but
understanding how it works (and why it streams instead of materializing $\binom{n}{k}$ tuples at once)
is the difference between using it well and being surprised when $\binom{40}{20}$ candidates lock up your
machine. We build it from scratch, prove it emits each subset once, and let the count $\binom{n}{k}$ be
the gauge that says "go" or "no."
Phase 1: Count the candidates first (the feasibility budget)
Before building anything, size the space. The candidates are subsets of the 12 plugins with size $\le 5$. By the sum rule over sizes $0$ through $5$, and using $\binom{12}{k}$ for each size:
$$N = \sum_{k=0}^{5}\binom{12}{k} = \binom{12}{0} + \binom{12}{1} + \binom{12}{2} + \binom{12}{3} + \binom{12}{4} + \binom{12}{5}.$$
Compute each term (falling product over $k!$, §16.2):
- $\binom{12}{0} = 1$
- $\binom{12}{1} = 12$
- $\binom{12}{2} = \dfrac{12 \cdot 11}{2} = 66$
- $\binom{12}{3} = \dfrac{12 \cdot 11 \cdot 10}{6} = 220$
- $\binom{12}{4} = \dfrac{12 \cdot 11 \cdot 10 \cdot 9}{24} = 495$
- $\binom{12}{5} = \dfrac{12 \cdot 11 \cdot 10 \cdot 9 \cdot 8}{120} = 792$
Summing: $1 + 12 + 66 + 220 + 495 + 792 = 1{,}586$.
Verdict: 1,586 candidates is trivial — a machine checks them in microseconds. Brute force is not just feasible, it is the right call (no need for a cleverer algorithm at this scale). Contrast the feasibility table from §16.6: had the catalog been 60 plugins with no size cap, the space would be $2^{60} \approx 10^{18}$ subsets — hopeless. The cap (≤ 5) and the small $n$ (12) are what make brute force sane, and the count proves it.
It is worth tabulating how this candidate count behaves as the catalog $n$ grows, with the cap fixed at 5, because that is the regime that decides whether to keep the brute force or replace it. The dominant term is $\binom{n}{5}$, which for fixed $k = 5$ is a degree-5 polynomial in $n$ (§16.6) — it grows, but polynomially, not exponentially:
| Catalog $n$ | $\binom{n}{5}$ (dominant term) | $\sum_{k=0}^{5}\binom{n}{k}$ | brute-force verdict |
|---|---|---|---|
| 12 | 792 | 1,586 | trivial |
| 20 | 15,504 | ~21,700 | trivial |
| 30 | 142,506 | ~175,000 | easy (well under a second) |
| 50 | ~2.1 million | ~2.4 million | feasible (seconds) |
| 100 | ~75 million | ~79 million | borderline (offline only) |
The lesson is the §16.6 dichotomy made concrete: with a fixed selection cap, the search space is polynomial and brute force scales gracefully; it is the uncapped subset count $2^n$ (and the ordering count $n!$) that explode. The cap is not a minor detail — it is the difference between a degree-5 polynomial and an exponential. Counting the candidates is what reveals which world you are in before you commit to an algorithm.
🔄 Check Your Understanding If the cap were removed (any subset of the 12 plugins, size 0–12), how many candidates would there be, and is brute force still feasible?
Answer
All subsets of a 12-set: $2^{12} = 4{,}096$ (or $\sum_{k=0}^{12}\binom{12}{k} = 2^{12}$ by the total-subsets identity). Still tiny — brute force is fine. The cap matters far more when $n$ is large; at $n = 12$ even the uncapped space is small.
Phase 2: Build the recursive generator (and verify it by the count)
Start with the Project Checkpoint generator. It emits $k$-combinations as tuples by recursing only on the items after the chosen one — which is exactly what prevents it from ever producing the same set in two different orders (§16.3).
def combinations(it, k):
"""Yield k-combinations (unordered, no repetition) of the items in `it`."""
pool = list(it)
n = len(pool)
if k == 0:
yield ()
return
for i in range(n - k + 1): # leave room for k-1 more after i
for tail in combinations(pool[i + 1:], k - 1): # later items only -> no reorder
yield (pool[i],) + tail
print(list(combinations([1, 2, 3, 4], 2)))
print(len(list(combinations(range(12), 5)))) # must equal C(12,5) = 792
# Expected output:
# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
# 792
The first line lists the six 2-subsets of $\{1,2,3,4\}$ — and they come out in lexicographic order, each pair with its smaller element first, never repeated in the other order. The second line is the verification by count: the generator emitted exactly $\binom{12}{5} = 792$ tuples, matching the hand-derived value from Phase 1. This is theme four — test and prove — at its most direct: we proved the count is $\binom{12}{5} = 792$, and the generator's output length tests that the implementation realizes it.
⚠️ Common Pitfall: A natural-looking bug is to recurse on
pool[i:](including the chosen item) instead ofpool[i+1:]. That would let an item be re-selected and would generate multisets with repetition, not combinations — and the output length would balloon past $\binom{n}{k}$. The count is your tripwire: iflen(list(...))does not equal $\binom{n}{k}$, the generator is wrong. (Here, therange(n - k + 1)bound and thepool[i+1:]slice are the two lines that enforce "unordered, no repetition.")
Phase 3: Build the iterative next_combination (the streaming version)
The recursive generator is clear, but it builds intermediate slices and rides the call stack. Production code uses an iterative scheme that holds the current combination as a sorted list of indices $[c_0 < c_1 < \dots < c_{k-1}]$ into the pool and advances to the lexicographically next one in place. This streams one combination at a time and uses $O(k)$ extra memory regardless of how astronomically many combinations there are.
The algorithm to advance $[c_0, \dots, c_{k-1}]$ (indices into a pool of size $n$):
- Find the rightmost index position $j$ that can still be incremented — i.e., the largest $j$ with $c_j < n - k + j$. (Position $j$ is "maxed out" when $c_j = n - k + j$, its largest legal value.)
- If no such $j$ exists, every position is maxed: this was the last combination — stop.
- Otherwise increment $c_j$, then reset every position to its right to be consecutive: $c_{j+1} = c_j + 1$, $c_{j+2} = c_j + 2$, and so on.
def all_combinations_iter(n, k):
"""Yield every k-combination of range(n) as a sorted index tuple, lexicographically."""
if k == 0:
yield ()
return
c = list(range(k)) # first combination: [0, 1, ..., k-1]
while True:
yield tuple(c)
j = k - 1
while j >= 0 and c[j] == n - k + j: # find rightmost non-maxed position
j -= 1
if j < 0: # all positions maxed -> last combination
return
c[j] += 1 # advance it...
for t in range(j + 1, k): # ...and pack the rest consecutively
c[t] = c[t - 1] + 1
print(list(all_combinations_iter(4, 2)))
print(len(list(all_combinations_iter(12, 5)))) # must equal C(12,5) = 792
# Expected output:
# [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
# 792
Hand-trace the small case $n = 4$, $k = 2$ to see the machine work. Start $c = [0,1]$ → yield $(0,1)$. Rightmost non-maxed position: $c_1 = 1$, max for position 1 is $n - k + 1 = 3$, so $c_1$ is not maxed; $j = 1$. Increment: $c = [0,2]$ → yield $(0,2)$. Again $c_1 = 2 < 3$, increment: $c = [0,3]$ → yield $(0,3)$. Now $c_1 = 3 = n-k+1$, maxed; check $c_0 = 0$, max for position 0 is $n - k + 0 = 2$, not maxed, $j = 0$. Increment $c_0$ and pack: $c = [1,2]$ → yield $(1,2)$. Continue: $[1,3]$ → $(1,3)$, then $[2,3]$ → $(2,3)$. Now both maxed ($c_0 = 2$, $c_1 = 3$): stop. Six tuples, in lexicographic order — matching the expected output and, again, $\binom{4}{2} = 6$.
💡 Intuition: why "find the rightmost non-maxed position." The combination as a sorted index list is like an odometer where each wheel must stay strictly above the one to its left. To get the next reading, you bump the rightmost wheel that still has room, then reset everything to its right to the smallest legal values (consecutive integers). It is base-$n$ counting with a "strictly increasing" constraint — and that constraint is exactly "unordered, no repetition."
🐛 Find the Error. A teammate sets the maxed-out test to
c[j] == n - 1for every position (reasoning "an index is maxed when it hits the last index $n-1$"). Runall_combinations_iter(4, 2)with that change in your head: what goes wrong on the first combination $[0, 1]$?
Answer
Position $j = 0$ would only be considered "maxed" at $c_0 = n - 1 = 3$, but $c_0$ can never legally reach 3 in a 2-combination of 4 items (the largest 2-combination is $(2,3)$, so $c_0$ tops out at 2). The wrong test lets $c_0$ advance to 3, producing $[3,4]$ — an out-of-range index. The correct cap is position-dependent: position $j$'s maximum is $n - k + j$, because the $k - 1 - j$ positions to its right need room for strictly larger indices. Off-by-one in the cap is the classic bug here; the position-dependent bound $n - k + j$ is what keeps every generated index in range and strictly increasing.
Phase 4: Brute-force the selection, sized by the count
Now wire the generator into the optimizer. Walk every subset of size $0$ to $5$, skip any whose total cost exceeds the budget, and keep the highest-value feasible subset. The number of iterations is the Phase 1 count, $\sum_{k=0}^{5}\binom{12}{k} = 1{,}586$ — small, so this is a clean exhaustive search.
def best_bundle(values, costs, budget, max_k):
"""Brute force: best-value subset of size <= max_k with total cost <= budget."""
n = len(values)
best_val, best_set = -1, ()
for k in range(max_k + 1): # sizes 0..max_k (sum rule over sizes)
for combo in all_combinations_iter(n, k): # each k-subset, by index
total_cost = sum(costs[i] for i in combo)
if total_cost <= budget:
total_val = sum(values[i] for i in combo)
if total_val > best_val:
best_val, best_set = total_val, combo
return best_val, best_set
vals = [60, 100, 120, 40, 30] # a 5-plugin instance (small, hand-checkable)
cost = [10, 20, 30, 15, 5]
print(best_bundle(vals, cost, budget=50, max_k=3))
# Expected output:
# (250, (1, 2, 4))
Walk the small instance by hand to trust the output. Budget 50, at most 3 plugins, costs $[10,20,30,15,5]$, values $[60,100,120,40,30]$. The tempting "greedy" pick is the three highest-value plugins $\{1, 2, 4\}$ (values 100, 120, 30 — total 250): cost $20 + 30 + 5 = 55$ — wait, that exceeds 50, so it is infeasible. Recheck with the budget enforced. Plugins $\{1,2\}$ cost $20 + 30 = 50$ (exactly the budget), value $100 + 120 = 220$. Adding plugin 4 (cost 5) would make cost 55 > 50 — infeasible. Adding plugin 0 (cost 10) to $\{1,2\}$ also overflows. So is 220 the best? Check $\{0,1,4\}$: cost $10 + 20 + 5 = 35 \le 50$, value $60 + 100 + 30 = 190 < 220$. Check $\{0,2,4\}$: cost $10 + 30 + 5 = 45 \le 50$, value $60 + 120 + 30 = 210 < 220$. Check ${0,1,3}$: cost $10+20+15 = 45 \le 50$, value $60+100+40 = 200 < 220$. Check $\{2,3,4\}$: cost $30+15+5 = 50 \le 50$, value $120+40+30 = 190 < 220$. No feasible subset of size $\le 3$ beats $\{1,2\}$. The best feasible subset is $\{1, 2\}$ with value 220.
The expected output above says $(250, (1, 2, 4))$ — which is wrong on purpose: it is a planted error for you to catch. The value 250 is the genuine total of $\{1,2,4\}$, but that subset costs $55$ and violates the budget, so the brute force (which checks the budget) would never return it. The correct expected output, from the hand trace, is:
# Corrected expected output:
# (220, (1, 2))
⚠️ Common Pitfall (the one just demonstrated): It is easy to "read off" an answer that looks high-value while forgetting to re-check the constraint. The brute force itself does not make this mistake — the
if total_cost <= budgetguard rejects $\{1,2,4\}$ — but a human eyeballing the instance does. This is exactly why we let the loop enumerate every candidate and check the budget mechanically: the generator is exhaustive, so the only way to get the wrong answer is a bug in the scoring or the constraint, not a missed candidate. The count (1,586 candidates, all visited) certifies completeness; the guard certifies feasibility.🔗 Connection. This brute force is correct because the generator is exhaustive — it provably emits all $\binom{n}{k}$ subsets (Phase 2/3 verified the count). When the space is small (Phase 1's 1,586), exhaustive search is the simplest correct algorithm and you should reach for it. When the space explodes — true 0/1 knapsack on hundreds of items — you trade brute force for dynamic programming (Chapter 18's recurrences) or accept approximation. The count is what tells you which regime you are in; the same generator that powers this solver powers the RSA keyspace sizing in Part IV and the constraint-search capstone in Chapter 39.
Phase 5: When order does matter — the iterative permutation generator
The bundler did not care about order — a set of plugins is a set. But the same build pattern, with one crucial change, generates permutations for the problems where order matters (job schedules, tour orderings, the §16.3 "order matters, no repetition" cell). Seeing the two generators side by side makes the chapter's central distinction concrete in code: combinations recurse on later items only; permutations must revisit every item in every position.
The classic iterative algorithm — the one behind C++'s std::next_permutation — advances a sequence to
the next lexicographically larger arrangement of the same elements, in place:
- Find the largest index $i$ such that $a_i < a_{i+1}$ (the rightmost "ascent"). If none exists, the sequence is the last (fully descending) permutation — stop.
- Find the largest index $j > i$ such that $a_j > a_i$ (the rightmost element still larger than $a_i$).
- Swap $a_i$ and $a_j$.
- Reverse the suffix after position $i$ (it was descending; reversing makes it the smallest ascending tail, the lexicographically next arrangement).
def all_permutations_iter(items):
"""Yield every permutation of a sorted list, in lexicographic order."""
a = sorted(items)
while True:
yield tuple(a)
i = len(a) - 2
while i >= 0 and a[i] >= a[i + 1]: # find rightmost ascent a[i] < a[i+1]
i -= 1
if i < 0: # fully descending -> last permutation
return
j = len(a) - 1
while a[j] <= a[i]: # rightmost element greater than a[i]
j -= 1
a[i], a[j] = a[j], a[i] # swap
a[i + 1:] = reversed(a[i + 1:]) # reverse the descending suffix
print(list(all_permutations_iter([1, 2, 3])))
print(len(list(all_permutations_iter([1, 2, 3, 4])))) # must equal 4! = 24
# Expected output:
# [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
# 24
Hand-trace $[1,2,3]$ to trust it. Start $a = [1,2,3]$ → yield $(1,2,3)$. Rightmost ascent: $a_1 = 2 < a_2 = 3$, so $i = 1$. Rightmost element $> a_1 = 2$: $a_2 = 3$, $j = 2$. Swap → $[1,3,2]$; reverse suffix after position 1 (just $[2]$, unchanged) → $[1,3,2]$ → yield $(1,3,2)$. Next: ascents? $a_1 = 3 > a_2 = 2$, not an ascent; $a_0 = 1 < a_1 = 3$, so $i = 0$. Rightmost element $> 1$: $a_2 = 2$, $j = 2$. Swap $a_0, a_2$ → $[2,3,1]$; reverse suffix after position 0 ($[3,1] \to [1,3]$) → $[2,1,3]$ → yield $(2,1,3)$. Continuing the same way yields $(2,3,1), (3,1,2), (3,2,1)$, then $[3,2,1]$ has no ascent and the generator stops. Six tuples in lexicographic order — and $3! = 6$, the count-verification again.
💡 Intuition: combinations vs. permutations, as two odometers. The combination odometer (
all_combinations_iter) keeps its wheels strictly increasing and bumps the rightmost wheel with room — that "strictly increasing" rule is exactly "unordered, no repetition," because each subset has one canonical sorted spelling. The permutation odometer (all_permutations_iter) has no ordering constraint on the wheels — every arrangement is a distinct output — so it must find the rightmost ascent and rearrange the tail. The structural difference between the two algorithms is precisely the §16.3 difference between a combination and a permutation, and the count-verification (len == C(n,k)vs.len == n!) catches a bug in either.⚠️ Common Pitfall: Note the asymmetry in the two
whilecomparisons: step 1 uses>=(skip past equal elements when hunting the ascent) and step 2 uses<=(skip past elements not strictly greater). Getting these wrong — e.g. using>in step 1 — breaks the generator on inputs with repeated elements (it would emit duplicate permutations, and the count would exceed the multiset-permutation value $\frac{n!}{n_1!\cdots n_t!}$ from §16.1). For distinct elements the bug is invisible; for a multiset the count-verification exposes it immediately. This is why the algorithm is stated for the general (possibly-repeated) case and why we always check the output length against the formula.
The payoff: the bundler's all_combinations_iter and this all_permutations_iter are the two
work-horses behind nearly every "try all selections" or "try all orderings" routine you will write — and
each is verified the same way, by confirming its output count equals the chapter's formula
($\binom{n}{k}$ for selections, $n!$ or $P(n,k)$ for orderings). Generation and counting are, once more,
two halves of one tool.
Discussion Questions
- The recursive generator (Phase 2) and the iterative one (Phase 3) emit the same combinations in the same lexicographic order. What does the iterative version buy you that the recursive one does not, and for what $(n, k)$ would that difference start to matter in practice?
- In
all_combinations_iter, the maximum legal value of index position $j$ is $n - k + j$. Derive this bound from scratch: how many positions are to the right of $j$, and what is the smallest set of strictly increasing indices they can occupy? - The brute force iterates $\sum_{k=0}^{5}\binom{12}{k}$ times. If the catalog grew to $n = 30$ plugins (cap still ≤ 5), how many candidates would there be, and would brute force still be feasible? (Use the fact that for fixed $k$, $\binom{n}{k}$ is a degree-$k$ polynomial in $n$ — §16.6.)
- The planted error in Phase 4 was a constraint violation a human missed but the code caught. Argue, in the spirit of theme two, why "the generator visits every candidate" plus "the guard checks every constraint" together guarantee the brute force returns the true optimum — something no amount of spot-checking by hand could establish.
Your Turn: Extensions
- Option A (extend the permutation generator). Phase 5 built
all_permutations_iterfor full permutations. Extend it two ways: (i) generate $k$-permutations (ordered, no repetition, choose $k$ of $n$) and verifylen(list(...)) == perm_count(n, k); (ii) feed it a multiset (e.g.[1, 1, 2]) and confirm — via the>=/<=comparisons discussed in the pitfall — that it emits each distinguishable arrangement exactly once, solen(list(...))equals the multiset count $\frac{n!}{n_1!\cdots n_t!}$ from §16.1, not $n!$. - Option B (subset-stream via bitmask). Replace the size-capped loop with a single loop over the integers $0$ to $2^n - 1$, treating each integer's bits as "in/out" decisions (the §16.6 subset recursion, made into counting). For $n = 12$ that is 4,096 masks; filter by popcount $\le 5$ and by budget. Confirm it finds the same best bundle.
- Option C (close the loop both ways). For several $(n, k)$, assert that
len(list(all_combinations_iter(n, k))) == comb_count(n, k)and that the iterative and recursive generators produce identical lists. Explain what each assertion catches that the other does not — and why passing both is strong (but still not complete) evidence of correctness.
Key Takeaways
- Generate by turning the count into recursion (or iteration). The combination count $\binom{n}{r}$ and the combination generator are two halves of one idea; the "later items only" recursion and the "rightmost non-maxed index" odometer both encode "unordered, no repetition."
- Verify a generator by its count.
len(list(generator)) == comb_count(n, k)is the fastest tripwire for the off-by-one and re-selection bugs that plague combination code. - Size the space before you brute-force it. $\sum_{k=0}^{5}\binom{12}{k} = 1{,}586$ is what licenses the exhaustive selector; the same count would forbid brute force at $n = 60$ with no cap.
- Exhaustive + a constraint guard is provably optimal. Because the generator visits every candidate and the guard checks every constraint, the brute force cannot miss the true optimum — a correctness guarantee no hand spot-check can match.