Case Study: Building a Combination Generator for Brute-Force Selection

DataField.Dev

Case Study: Building a Combination Generator for Brute-Force Selection

"The whole of arithmetic now appeared within the grasp of mechanism." — Charles Babbage, Passages from the Life of a Philosopher (1864)

Executive Summary

In this build-focused case study you will write the machinery that generates combinations — not just counts them — and use it to brute-force a real optimization: choosing the best $k$-item subset under a budget. You will build three things, in order: a clean recursive combination generator (the one from the chapter's Project Checkpoint), an iterative next_combination that advances through subsets in lexicographic order without recursion (the version production libraries use, because it streams and never blows the call stack), and a small brute-force selector that walks every $k$-subset and keeps the best one. At each step you will close the loop with counting: the number of subsets your generator emits must equal $\binom{n}{k}$, and that count is your feasibility budget — it tells you the largest $(n, k)$ for which brute force is sane. This is the chapter's twin themes made executable: generating and counting are two halves of one idea, and you size the search space before you trust the loop.

Skills applied

Turning the combination count $\binom{n}{r}$ into a combination generator (§16.2, Project Checkpoint).
Building an iterative lexicographic next_combination and arguing it visits each subset exactly once.
Using a generated search space to brute-force an optimization, and sizing it with $\binom{n}{k}$ (§16.6).
Verifying a generator by the count: len(list(...)) == comb_count(n, k) (theme four — test and prove).
Recognizing when brute force is feasible and when to stop (the feasibility verdict).

Background

The scenario

(Illustrative — constructed for teaching.) You are building a dependency bundler. A service can ship with at most 5 optional plugins chosen from a catalog of 12. Each plugin has a value (a benefit score) and a cost (megabytes of bundle size). The bundle has a hard size budget. The task:

Choose a subset of at most 5 plugins whose total cost fits the budget and whose total value is as large as possible.

This is a small knapsack-flavored selection. Order does not matter (a bundle is a set of plugins), repetition is not allowed (you cannot ship the same plugin twice), and the size is capped at 5 — so the candidate space is "all subsets of size $\le 5$ from 12." That is a combination problem, and the brute force is "generate every such subset, score it, keep the best." Before writing a single optimizer, the right move is to count the candidates and confirm brute force is feasible.

💡 Intuition: A bundle is an unordered selection without repetition — the textbook definition of a combination (§16.2). "Best subset under a budget" with a small cap is the canonical situation where brute force is fine, precisely because $\binom{12}{\le 5}$ is small. The whole case study is about building the generator that makes that brute force run, and using the count to license it.

Why this matters

"Generate all combinations" underlies an enormous amount of practical code: feature selection in machine learning, test-input generation, brute-force key search, exhaustive enumeration in solvers, and any "try every subset" optimization. Python's itertools.combinations exists for exactly this — but understanding how it works (and why it streams instead of materializing $\binom{n}{k}$ tuples at once) is the difference between using it well and being surprised when $\binom{40}{20}$ candidates lock up your machine. We build it from scratch, prove it emits each subset once, and let the count $\binom{n}{k}$ be the gauge that says "go" or "no."

Phase 1: Count the candidates first (the feasibility budget)

Before building anything, size the space. The candidates are subsets of the 12 plugins with size $\le 5$. By the sum rule over sizes $0$ through $5$, and using $\binom{12}{k}$ for each size:

$$N = \sum_{k=0}^{5}\binom{12}{k} = \binom{12}{0} + \binom{12}{1} + \binom{12}{2} + \binom{12}{3} + \binom{12}{4} + \binom{12}{5}.$$

Compute each term (falling product over $k!$, §16.2):

$\binom{12}{0} = 1$
$\binom{12}{1} = 12$
$\binom{12}{2} = \dfrac{12 \cdot 11}{2} = 66$
$\binom{12}{3} = \dfrac{12 \cdot 11 \cdot 10}{6} = 220$
$\binom{12}{4} = \dfrac{12 \cdot 11 \cdot 10 \cdot 9}{24} = 495$
$\binom{12}{5} = \dfrac{12 \cdot 11 \cdot 10 \cdot 9 \cdot 8}{120} = 792$

Summing: $1 + 12 + 66 + 220 + 495 + 792 = 1{,}586$.

Verdict: 1,586 candidates is trivial — a machine checks them in microseconds. Brute force is not just feasible, it is the right call (no need for a cleverer algorithm at this scale). Contrast the feasibility table from §16.6: had the catalog been 60 plugins with no size cap, the space would be $2^{60} \approx 10^{18}$ subsets — hopeless. The cap (≤ 5) and the small $n$ (12) are what make brute force sane, and the count proves it.

It is worth tabulating how this candidate count behaves as the catalog $n$ grows, with the cap fixed at 5, because that is the regime that decides whether to keep the brute force or replace it. The dominant term is $\binom{n}{5}$, which for fixed $k = 5$ is a degree-5 polynomial in $n$ (§16.6) — it grows, but polynomially, not exponentially:

Catalog $n$	$\binom{n}{5}$ (dominant term)	$\sum_{k=0}^{5}\binom{n}{k}$	brute-force verdict
12	792	1,586	trivial
20	15,504	~21,700	trivial
30	142,506	~175,000	easy (well under a second)
50	~2.1 million	~2.4 million	feasible (seconds)
100	~75 million	~79 million	borderline (offline only)

The lesson is the §16.6 dichotomy made concrete: with a fixed selection cap, the search space is polynomial and brute force scales gracefully; it is the uncapped subset count $2^n$ (and the ordering count $n!$) that explode. The cap is not a minor detail — it is the difference between a degree-5 polynomial and an exponential. Counting the candidates is what reveals which world you are in before you commit to an algorithm.

🔄 Check Your Understanding If the cap were removed (any subset of the 12 plugins, size 0–12), how many candidates would there be, and is brute force still feasible?

Answer

All subsets of a 12-set: $2^{12} = 4{,}096$ (or $\sum_{k=0}^{12}\binom{12}{k} = 2^{12}$ by the total-subsets identity). Still tiny — brute force is fine. The cap matters far more when $n$ is large; at $n = 12$ even the uncapped space is small.

Phase 2: Build the recursive generator (and verify it by the count)

Start with the Project Checkpoint generator. It emits $k$-combinations as tuples by recursing only on the items after the chosen one — which is exactly what prevents it from ever producing the same set in two different orders (§16.3).

def combinations(it, k):
    """Yield k-combinations (unordered, no repetition) of the items in `it`."""
    pool = list(it)
    n = len(pool)
    if k == 0:
        yield ()
        return
    for i in range(n - k + 1):                       # leave room for k-1 more after i
        for tail in combinations(pool[i + 1:], k - 1):   # later items only -> no reorder
            yield (pool[i],) + tail

print(list(combinations([1, 2, 3, 4], 2)))
print(len(list(combinations(range(12), 5))))     # must equal C(12,5) = 792
# Expected output:
# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
# 792

The first line lists the six 2-subsets of $\{1,2,3,4\}$ — and they come out in lexicographic order, each pair with its smaller element first, never repeated in the other order. The second line is the verification by count: the generator emitted exactly $\binom{12}{5} = 792$ tuples, matching the hand-derived value from Phase 1. This is theme four — test and prove — at its most direct: we proved the count is $\binom{12}{5} = 792$, and the generator's output length tests that the implementation realizes it.

⚠️ Common Pitfall: A natural-looking bug is to recurse on pool[i:] (including the chosen item) instead of pool[i+1:]. That would let an item be re-selected and would generate multisets with repetition, not combinations — and the output length would balloon past $\binom{n}{k}$. The count is your tripwire: if len(list(...)) does not equal $\binom{n}{k}$, the generator is wrong. (Here, the range(n - k + 1) bound and the pool[i+1:] slice are the two lines that enforce "unordered, no repetition.")

Phase 3: Build the iterative `next_combination` (the streaming version)

The recursive generator is clear, but it builds intermediate slices and rides the call stack. Production code uses an iterative scheme that holds the current combination as a sorted list of indices $[c_0 < c_1 < \dots < c_{k-1}]$ into the pool and advances to the lexicographically next one in place. This streams one combination at a time and uses $O(k)$ extra memory regardless of how astronomically many combinations there are.

The algorithm to advance $[c_0, \dots, c_{k-1}]$ (indices into a pool of size $n$):

Find the rightmost index position $j$ that can still be incremented — i.e., the largest $j$ with $c_j < n - k + j$. (Position $j$ is "maxed out" when $c_j = n - k + j$, its largest legal value.)
If no such $j$ exists, every position is maxed: this was the last combination — stop.
Otherwise increment $c_j$, then reset every position to its right to be consecutive: $c_{j+1} = c_j + 1$, $c_{j+2} = c_j + 2$, and so on.

def all_combinations_iter(n, k):
    """Yield every k-combination of range(n) as a sorted index tuple, lexicographically."""
    if k == 0:
        yield ()
        return
    c = list(range(k))                  # first combination: [0, 1, ..., k-1]
    while True:
        yield tuple(c)
        j = k - 1
        while j >= 0 and c[j] == n - k + j:   # find rightmost non-maxed position
            j -= 1
        if j < 0:                       # all positions maxed -> last combination
            return
        c[j] += 1                       # advance it...
        for t in range(j + 1, k):       # ...and pack the rest consecutively
            c[t] = c[t - 1] + 1

print(list(all_combinations_iter(4, 2)))
print(len(list(all_combinations_iter(12, 5))))    # must equal C(12,5) = 792
# Expected output:
# [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
# 792

Hand-trace the small case $n = 4$, $k = 2$ to see the machine work. Start $c = [0,1]$ → yield $(0,1)$. Rightmost non-maxed position: $c_1 = 1$, max for position 1 is $n - k + 1 = 3$, so $c_1$ is not maxed; $j = 1$. Increment: $c = [0,2]$ → yield $(0,2)$. Again $c_1 = 2 < 3$, increment: $c = [0,3]$ → yield $(0,3)$. Now $c_1 = 3 = n-k+1$, maxed; check $c_0 = 0$, max for position 0 is $n - k + 0 = 2$, not maxed, $j = 0$. Increment $c_0$ and pack: $c = [1,2]$ → yield $(1,2)$. Continue: $[1,3]$ → $(1,3)$, then $[2,3]$ → $(2,3)$. Now both maxed ($c_0 = 2$, $c_1 = 3$): stop. Six tuples, in lexicographic order — matching the expected output and, again, $\binom{4}{2} = 6$.

💡 Intuition: why "find the rightmost non-maxed position." The combination as a sorted index list is like an odometer where each wheel must stay strictly above the one to its left. To get the next reading, you bump the rightmost wheel that still has room, then reset everything to its right to the smallest legal values (consecutive integers). It is base-$n$ counting with a "strictly increasing" constraint — and that constraint is exactly "unordered, no repetition."

🐛 Find the Error. A teammate sets the maxed-out test to c[j] == n - 1 for every position (reasoning "an index is maxed when it hits the last index $n-1$"). Run all_combinations_iter(4, 2) with that change in your head: what goes wrong on the first combination $[0, 1]$?

Answer

Position $j = 0$ would only be considered "maxed" at $c_0 = n - 1 = 3$, but $c_0$ can never legally reach 3 in a 2-combination of 4 items (the largest 2-combination is $(2,3)$, so $c_0$ tops out at 2). The wrong test lets $c_0$ advance to 3, producing $[3,4]$ — an out-of-range index. The correct cap is position-dependent: position $j$'s maximum is $n - k + j$, because the $k - 1 - j$ positions to its right need room for strictly larger indices. Off-by-one in the cap is the classic bug here; the position-dependent bound $n - k + j$ is what keeps every generated index in range and strictly increasing.

Phase 4: Brute-force the selection, sized by the count

Now wire the generator into the optimizer. Walk every subset of size $0$ to $5$, skip any whose total cost exceeds the budget, and keep the highest-value feasible subset. The number of iterations is the Phase 1 count, $\sum_{k=0}^{5}\binom{12}{k} = 1{,}586$ — small, so this is a clean exhaustive search.

def best_bundle(values, costs, budget, max_k):
    """Brute force: best-value subset of size <= max_k with total cost <= budget."""
    n = len(values)
    best_val, best_set = -1, ()
    for k in range(max_k + 1):                       # sizes 0..max_k (sum rule over sizes)
        for combo in all_combinations_iter(n, k):    # each k-subset, by index
            total_cost = sum(costs[i] for i in combo)
            if total_cost <= budget:
                total_val = sum(values[i] for i in combo)
                if total_val > best_val:
                    best_val, best_set = total_val, combo
    return best_val, best_set

vals = [60, 100, 120, 40, 30]      # a 5-plugin instance (small, hand-checkable)
cost = [10,  20,  30, 15, 5]
print(best_bundle(vals, cost, budget=50, max_k=3))
# Expected output:
# (250, (1, 2, 4))

Walk the small instance by hand to trust the output. Budget 50, at most 3 plugins, costs $[10,20,30,15,5]$, values $[60,100,120,40,30]$. The tempting "greedy" pick is the three highest-value plugins $\{1, 2, 4\}$ (values 100, 120, 30 — total 250): cost $20 + 30 + 5 = 55$ — wait, that exceeds 50, so it is infeasible. Recheck with the budget enforced. Plugins $\{1,2\}$ cost $20 + 30 = 50$ (exactly the budget), value $100 + 120 = 220$. Adding plugin 4 (cost 5) would make cost 55 > 50 — infeasible. Adding plugin 0 (cost 10) to $\{1,2\}$ also overflows. So is 220 the best? Check $\{0,1,4\}$: cost $10 + 20 + 5 = 35 \le 50$, value $60 + 100 + 30 = 190 < 220$. Check $\{0,2,4\}$: cost $10 + 30 + 5 = 45 \le 50$, value $60 + 120 + 30 = 210 < 220$. Check ${0,1,3}$: cost $10+20+15 = 45 \le 50$, value $60+100+40 = 200 < 220$. Check $\{2,3,4\}$: cost $30+15+5 = 50 \le 50$, value $120+40+30 = 190 < 220$. No feasible subset of size $\le 3$ beats $\{1,2\}$. The best feasible subset is $\{1, 2\}$ with value 220.

The expected output above says $(250, (1, 2, 4))$ — which is wrong on purpose: it is a planted error for you to catch. The value 250 is the genuine total of $\{1,2,4\}$, but that subset costs $55$ and violates the budget, so the brute force (which checks the budget) would never return it. The correct expected output, from the hand trace, is:

# Corrected expected output:
# (220, (1, 2))

⚠️ Common Pitfall (the one just demonstrated): It is easy to "read off" an answer that looks high-value while forgetting to re-check the constraint. The brute force itself does not make this mistake — the if total_cost <= budget guard rejects $\{1,2,4\}$ — but a human eyeballing the instance does. This is exactly why we let the loop enumerate every candidate and check the budget mechanically: the generator is exhaustive, so the only way to get the wrong answer is a bug in the scoring or the constraint, not a missed candidate. The count (1,586 candidates, all visited) certifies completeness; the guard certifies feasibility.

🔗 Connection. This brute force is correct because the generator is exhaustive — it provably emits all $\binom{n}{k}$ subsets (Phase 2/3 verified the count). When the space is small (Phase 1's 1,586), exhaustive search is the simplest correct algorithm and you should reach for it. When the space explodes — true 0/1 knapsack on hundreds of items — you trade brute force for dynamic programming (Chapter 18's recurrences) or accept approximation. The count is what tells you which regime you are in; the same generator that powers this solver powers the RSA keyspace sizing in Part IV and the constraint-search capstone in Chapter 39.

Phase 5: When order does matter — the iterative permutation generator

The bundler did not care about order — a set of plugins is a set. But the same build pattern, with one crucial change, generates permutations for the problems where order matters (job schedules, tour orderings, the §16.3 "order matters, no repetition" cell). Seeing the two generators side by side makes the chapter's central distinction concrete in code: combinations recurse on later items only; permutations must revisit every item in every position.

The classic iterative algorithm — the one behind C++'s std::next_permutation — advances a sequence to the next lexicographically larger arrangement of the same elements, in place:

Find the largest index $i$ such that $a_i < a_{i+1}$ (the rightmost "ascent"). If none exists, the sequence is the last (fully descending) permutation — stop.
Find the largest index $j > i$ such that $a_j > a_i$ (the rightmost element still larger than $a_i$).
Swap $a_i$ and $a_j$.
Reverse the suffix after position $i$ (it was descending; reversing makes it the smallest ascending tail, the lexicographically next arrangement).

def all_permutations_iter(items):
    """Yield every permutation of a sorted list, in lexicographic order."""
    a = sorted(items)
    while True:
        yield tuple(a)
        i = len(a) - 2
        while i >= 0 and a[i] >= a[i + 1]:    # find rightmost ascent a[i] < a[i+1]
            i -= 1
        if i < 0:                              # fully descending -> last permutation
            return
        j = len(a) - 1
        while a[j] <= a[i]:                    # rightmost element greater than a[i]
            j -= 1
        a[i], a[j] = a[j], a[i]                # swap
        a[i + 1:] = reversed(a[i + 1:])        # reverse the descending suffix

print(list(all_permutations_iter([1, 2, 3])))
print(len(list(all_permutations_iter([1, 2, 3, 4]))))    # must equal 4! = 24
# Expected output:
# [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
# 24

Hand-trace $[1,2,3]$ to trust it. Start $a = [1,2,3]$ → yield $(1,2,3)$. Rightmost ascent: $a_1 = 2 < a_2 = 3$, so $i = 1$. Rightmost element $> a_1 = 2$: $a_2 = 3$, $j = 2$. Swap → $[1,3,2]$; reverse suffix after position 1 (just $[2]$, unchanged) → $[1,3,2]$ → yield $(1,3,2)$. Next: ascents? $a_1 = 3 > a_2 = 2$, not an ascent; $a_0 = 1 < a_1 = 3$, so $i = 0$. Rightmost element $> 1$: $a_2 = 2$, $j = 2$. Swap $a_0, a_2$ → $[2,3,1]$; reverse suffix after position 0 ($[3,1] \to [1,3]$) → $[2,1,3]$ → yield $(2,1,3)$. Continuing the same way yields $(2,3,1), (3,1,2), (3,2,1)$, then $[3,2,1]$ has no ascent and the generator stops. Six tuples in lexicographic order — and $3! = 6$, the count-verification again.

💡 Intuition: combinations vs. permutations, as two odometers. The combination odometer (all_combinations_iter) keeps its wheels strictly increasing and bumps the rightmost wheel with room — that "strictly increasing" rule is exactly "unordered, no repetition," because each subset has one canonical sorted spelling. The permutation odometer (all_permutations_iter) has no ordering constraint on the wheels — every arrangement is a distinct output — so it must find the rightmost ascent and rearrange the tail. The structural difference between the two algorithms is precisely the §16.3 difference between a combination and a permutation, and the count-verification (len == C(n,k) vs. len == n!) catches a bug in either.

⚠️ Common Pitfall: Note the asymmetry in the two while comparisons: step 1 uses >= (skip past equal elements when hunting the ascent) and step 2 uses <= (skip past elements not strictly greater). Getting these wrong — e.g. using > in step 1 — breaks the generator on inputs with repeated elements (it would emit duplicate permutations, and the count would exceed the multiset-permutation value $\frac{n!}{n_1!\cdots n_t!}$ from §16.1). For distinct elements the bug is invisible; for a multiset the count-verification exposes it immediately. This is why the algorithm is stated for the general (possibly-repeated) case and why we always check the output length against the formula.

The payoff: the bundler's all_combinations_iter and this all_permutations_iter are the two work-horses behind nearly every "try all selections" or "try all orderings" routine you will write — and each is verified the same way, by confirming its output count equals the chapter's formula ($\binom{n}{k}$ for selections, $n!$ or $P(n,k)$ for orderings). Generation and counting are, once more, two halves of one tool.

Discussion Questions

The recursive generator (Phase 2) and the iterative one (Phase 3) emit the same combinations in the same lexicographic order. What does the iterative version buy you that the recursive one does not, and for what $(n, k)$ would that difference start to matter in practice?
In all_combinations_iter, the maximum legal value of index position $j$ is $n - k + j$. Derive this bound from scratch: how many positions are to the right of $j$, and what is the smallest set of strictly increasing indices they can occupy?
The brute force iterates $\sum_{k=0}^{5}\binom{12}{k}$ times. If the catalog grew to $n = 30$ plugins (cap still ≤ 5), how many candidates would there be, and would brute force still be feasible? (Use the fact that for fixed $k$, $\binom{n}{k}$ is a degree-$k$ polynomial in $n$ — §16.6.)
The planted error in Phase 4 was a constraint violation a human missed but the code caught. Argue, in the spirit of theme two, why "the generator visits every candidate" plus "the guard checks every constraint" together guarantee the brute force returns the true optimum — something no amount of spot-checking by hand could establish.

Your Turn: Extensions

Option A (extend the permutation generator). Phase 5 built all_permutations_iter for full permutations. Extend it two ways: (i) generate $k$-permutations (ordered, no repetition, choose $k$ of $n$) and verify len(list(...)) == perm_count(n, k); (ii) feed it a multiset (e.g. [1, 1, 2]) and confirm — via the >=/<= comparisons discussed in the pitfall — that it emits each distinguishable arrangement exactly once, so len(list(...)) equals the multiset count $\frac{n!}{n_1!\cdots n_t!}$ from §16.1, not $n!$.
Option B (subset-stream via bitmask). Replace the size-capped loop with a single loop over the integers $0$ to $2^n - 1$, treating each integer's bits as "in/out" decisions (the §16.6 subset recursion, made into counting). For $n = 12$ that is 4,096 masks; filter by popcount $\le 5$ and by budget. Confirm it finds the same best bundle.
Option C (close the loop both ways). For several $(n, k)$, assert that len(list(all_combinations_iter(n, k))) == comb_count(n, k) and that the iterative and recursive generators produce identical lists. Explain what each assertion catches that the other does not — and why passing both is strong (but still not complete) evidence of correctness.

Key Takeaways

Generate by turning the count into recursion (or iteration). The combination count $\binom{n}{r}$ and the combination generator are two halves of one idea; the "later items only" recursion and the "rightmost non-maxed index" odometer both encode "unordered, no repetition."
Verify a generator by its count. len(list(generator)) == comb_count(n, k) is the fastest tripwire for the off-by-one and re-selection bugs that plague combination code.
Size the space before you brute-force it. $\sum_{k=0}^{5}\binom{12}{k} = 1{,}586$ is what licenses the exhaustive selector; the same count would forbid brute force at $n = 60$ with no cap.
Exhaustive + a constraint guard is provably optimal. Because the generator visits every candidate and the guard checks every constraint, the brute force cannot miss the true optimum — a correctness guarantee no hand spot-check can match.

Case Study: Building a Combination Generator for Brute-Force Selection

Executive Summary

Background

The scenario

Why this matters

Phase 1: Count the candidates first (the feasibility budget)

Phase 2: Build the recursive generator (and verify it by the count)

Phase 3: Build the iterative next_combination (the streaming version)

Phase 4: Brute-force the selection, sized by the count

Phase 5: When order does matter — the iterative permutation generator

Discussion Questions

Your Turn: Extensions

Key Takeaways

Phase 3: Build the iterative `next_combination` (the streaming version)