Chapter 16: Permutations and Combinations

DataField.Dev

38 min read

> "The whole of arithmetic now appeared within the grasp of mechanism."

Prerequisites

15

Learning Objectives

Count ordered arrangements with permutations P(n, k), and explain when order matters.
Count unordered selections with combinations and the binomial coefficient, and derive the formula from permutations.
Choose correctly between a permutation model and a combination model for a given counting problem.
Expand powers with the binomial theorem and generate coefficients with Pascal's triangle and its recurrence.
Write a combinatorial (double-counting) proof of an identity, and explain why it is a proof.
Estimate the size of a brute-force search space (subsets, arrangements, k-selections) and read the result as a feasibility verdict.

In This Chapter

Overview
Learning Paths
16.1 Permutations: when order matters
16.2 Combinations and the binomial coefficient
16.3 Permutations vs. combinations: choosing the model
16.4 The binomial theorem and Pascal's triangle
16.5 Combinatorial proofs: counting one thing two ways
16.6 Counting in algorithms: sizing the search space
Project Checkpoint: permutation and combination generators
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 16: Permutations and Combinations

"The whole of arithmetic now appeared within the grasp of mechanism." — Charles Babbage, Passages from the Life of a Philosopher (1864)

Overview

Here is a question you will face, in some form, for the rest of your programming life: how big is the thing I am about to loop over? Before you write a brute-force search, you need to know whether it will finish before the heat death of the universe. Before you size a cache, a password policy, or a test matrix, you need to count the possibilities. Chapter 15 gave you the two engines — the product rule and the sum rule — for counting anything you can describe as a sequence of independent choices. This chapter specializes those engines into the two patterns that come up constantly: arrangements where order matters (permutations) and selections where order does not (combinations).

The difference sounds trivial and is anything but. "Pick 3 of these 10 servers to be the primaries" and "pick 3 of these 10 servers and assign them to slots A, B, C" are different questions with different answers, and confusing them is one of the most common counting errors there is — in homework, in interviews, and in production capacity estimates. By the end of this chapter you will be able to look at a counting problem and immediately classify it: ordered or unordered, repetition allowed or not. That classification is most of the battle.

The payoff is concrete. The number of subsets of an $n$-element set, the number of ways to seat people or schedule jobs, the number of poker hands, the size of the search space your backtracking algorithm must explore, the coefficients in $(1+x)^n$ that show up in probability and in the analysis of algorithms — all of these are permutations and combinations wearing different costumes. Learn the two patterns and a single elegant object, the binomial coefficient $\binom{n}{k}$, and a remarkable amount of counting collapses into a few formulas you can derive rather than memorize.

In this chapter, you will learn to:

Count ordered arrangements with the permutation formula $P(n,k) = \frac{n!}{(n-k)!}$, including arrangements with repeated elements.
Count unordered selections with the combination formula $\binom{n}{k} = \frac{n!}{k!\,(n-k)!}$, and derive it from permutations by dividing out the orderings.
Decide, for any counting problem, whether order matters and whether repetition is allowed — and pick the right model.
Expand $(x+y)^n$ with the binomial theorem and build the coefficients with Pascal's triangle.
Prove identities by combinatorial proof — counting one set two ways — and see why that is every bit as rigorous as algebra.
Estimate the size of brute-force search spaces and turn that estimate into a "feasible / infeasible" judgment about an algorithm.

Learning Paths

🏃 Fast Track: If permutations and combinations are review, skim 16.1–16.2 for the notation we use, then go to 16.4 (binomial theorem), 16.5 (combinatorial proofs — likely the newest material for you), and 16.6 (search-space sizing). Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. The single most important section is 16.3 — choosing the model. Work every 🔄 Check Your Understanding before moving on, and try each "decide the model" question before reading the resolution.

🔬 Deep Dive: After the chapter, prove the hockey-stick identity and Vandermonde's identity by combinatorial proof, then verify both in code with the Toolkit. Exercise set, Part E.

16.1 Permutations: when order matters

Start with a concrete problem. You are building a job scheduler. Five distinct background jobs — call them $a, b, c, d, e$ — must run one after another on a single worker, and the order affects total runtime because of caching. In how many distinct orders can the five jobs run?

Reach for the product rule from Chapter 15. The first job to run can be any of the 5. Whatever you picked first, the second job can be any of the remaining 4. Then 3, then 2, then 1. By the product rule the number of orderings is

$$5 \times 4 \times 3 \times 2 \times 1 = 120.$$

That product — every positive integer from $n$ down to $1$ — is the factorial $n!$, which you met in Chapter 11. So there are $5! = 120$ ways to order 5 distinct jobs. An ordering of all the objects in a set is the first thing we name.

💡 Intuition: A permutation is a lineup. You have a set of distinct things and a row of slots, one slot per thing, and you are asking how many different lineups exist. The product rule answers it directly: $n$ choices for the first slot, $n-1$ for the next (one is used up), and so on.

Now change the problem slightly: you have the same 5 jobs but only 3 time slots before a deadline, and you must choose which 3 jobs run and in what order. The first slot: 5 choices. The second: 4. The third: 3. The product rule gives $5 \times 4 \times 3 = 60$. You stopped after 3 factors instead of running all the way down to 1. This is an $r$-permutation: an ordered arrangement of $r$ things chosen from $n$.

Definition (permutation). A permutation of a set of $n$ distinct objects is an ordered arrangement of those objects. An $r$-permutation is an ordered arrangement of $r$ of the $n$ objects ($0 \le r \le n$). The number of $r$-permutations of an $n$-set is written $P(n, r)$.

To get a closed form, look at the pattern $5 \times 4 \times 3$. It is $5!$ with the tail $2 \times 1$ removed — that is, $\frac{5!}{2!} = \frac{120}{2} = 60$. The "$2$" is $5 - 3$. In general, an $r$-permutation uses the top $r$ factors of $n!$:

$$P(n, r) = \underbrace{n \cdot (n-1) \cdots (n-r+1)}_{r \text{ factors}} = \frac{n!}{(n-r)!}.$$

A few sanity checks that are worth doing once so the formula stops being arbitrary:

$P(n, n) = \frac{n!}{0!} = n!$ — arranging all $n$ objects, which is what "permutation" meant in the first place. (Recall $0! = 1$; the empty product is 1, exactly so this formula works out.)
$P(n, 1) = \frac{n!}{(n-1)!} = n$ — choosing and placing one object: $n$ ways. Obvious, and the formula agrees.
$P(n, 0) = \frac{n!}{n!} = 1$ — there is exactly one way to arrange nothing (the empty arrangement). This convention will matter when we get to combinations.

Let's compute one with code, building the falling product directly so the formula stays visible:

def perm_count(n, r):
    """Number of r-permutations of n distinct objects: P(n, r) = n!/(n-r)!."""
    result = 1
    for i in range(n, n - r, -1):   # multiply n * (n-1) * ... * (n-r+1)
        result *= i
    return result

print(perm_count(5, 3))   # 5 * 4 * 3
print(perm_count(5, 5))   # 5!
# Expected output:
# 60
# 120

Notice we compute $P(5,3)$ as $5 \times 4 \times 3$ rather than $\frac{5!}{2!}$. The two are equal, but the falling product is faster (3 multiplications, not building $5!$ and $2!$ and dividing) and avoids huge intermediate values. That is theme three of this book — abstraction at the right level — even in something this small: the math says $\frac{n!}{(n-r)!}$, but the good implementation never forms the factorials.

⚠️ Common Pitfall: Do not reach for math.factorial(n) // math.factorial(n - r) as your mental model of how to compute $P(n,r)$, even though it is correct. For $n = 10^6$ and $r = 2$, that builds a number with millions of digits just to multiply $1{,}000{,}000 \times 999{,}999$. The falling product computes the same answer with two multiplications. The formula and the algorithm are not the same thing.

Permutations with repeated objects

What if the objects are not all distinct? Consider the number of distinguishable strings you can make by rearranging the letters of LEVEL. There are 5 letters, so a first guess is $5! = 120$. But that overcounts, because the two Ls are interchangeable and so are the two Es — swapping the two Ls produces the same string, yet $5!$ counted it twice.

The fix is the division rule from Chapter 15. Pretend for a moment the letters are distinct by tagging them: $L_1, E_1, V, E_2, L_2$. Now all $5! = 120$ arrangements are genuinely different. But each real string (like LEVEL) corresponds to several tagged arrangements: we can permute the two Ls in $2!$ ways and the two Es in $2!$ ways without changing the visible string. So every visible string was counted $2! \times 2! = 4$ times. Divide it out:

$$\frac{5!}{2!\,2!} = \frac{120}{4} = 30.$$

This generalizes to the permutations of a multiset. If you have $n$ objects of which $n_1$ are of one kind, $n_2$ of a second kind, …, $n_t$ of a $t$-th kind (so $n_1 + n_2 + \cdots + n_t = n$), the number of distinguishable arrangements is

$$\frac{n!}{n_1!\,n_2!\cdots n_t!}.$$

💡 Intuition: Start by pretending everything is distinct ($n!$ arrangements), then divide out the rearrangements you cannot see — the $n_i!$ internal shuffles within each group of identical objects. Overcount on purpose, then correct by division. This "overcount then divide" move is the single most useful trick in counting, and it is exactly what turns permutations into combinations in the next section.

🔗 Connection. The classic worked example is the number of distinct rearrangements of MISSISSIPPI: 11 letters with M:1, I:4, S:4, P:2, giving $\frac{11!}{1!\,4!\,4!\,2!} = 34{,}650$. If you ever shuffle a deck, deal cards, or generate anagrams in code, this is the count of "visibly different outcomes."

Circular arrangements: a second use of "divide out the symmetry"

The "overcount, then divide" idea has another classic application that is worth seeing once, because it reinforces the move and shows up in problems about rings, cycles, and round-robin structures. Suppose $n$ distinct processes are arranged around a circular ring buffer — or $n$ people are seated at a round table — and we consider two arrangements the same if one is a rotation of the other (there is no distinguished "first" seat). How many genuinely different circular arrangements are there?

A linear arrangement of $n$ distinct objects has $n!$ orderings. But on a circle, each circular arrangement can be "cut open" at any of its $n$ positions to produce $n$ different linear sequences that are all the same circle. So $n!$ overcounts circular arrangements by a factor of $n$ (the rotations). Divide it out:

$$\text{circular arrangements of } n \text{ distinct objects} = \frac{n!}{n} = (n-1)!.$$

For $n = 5$ that is $4! = 24$ distinct round-table seatings. The same logic — count the linear version, divide by the size of the symmetry group you are ignoring — is exactly what turned $5!$ into $30$ for LEVEL (dividing by the $2!\,2!$ internal shuffles) and is about to turn $P(n,r)$ into $\binom{n}{r}$ (dividing by the $r!$ orderings). One move, three guises.

🔄 Check Your Understanding 1. How many ways can 7 distinct processes be arranged in a priority queue? 2. Compute $P(8, 3)$ as a falling product, then as $\frac{8!}{5!}$, and confirm they match. 3. How many distinguishable strings come from rearranging the letters of PEPPER? 4. How many distinct ways can 6 people be seated around a round table (rotations considered the same)?

Answers

$7! = 5040$. 2. Falling product: $8 \times 7 \times 6 = 336$; and $\frac{8!}{5!} = \frac{40320}{120} = 336$. They match. 3. PEPPER has 6 letters: P:3, E:2, R:1, so $\frac{6!}{3!\,2!\,1!} = \frac{720}{12} = 60$. 4. $(6-1)! = 5! = 120$ — fix one person to break the rotational symmetry, then arrange the other 5 linearly.

16.2 Combinations and the binomial coefficient

Back to the server problem from the overview, now stated carefully. You have 10 servers and want to choose 3 of them to be primaries. The three primaries are interchangeable — there is no "slot A, slot B, slot C," just a set of three. How many ways are there to choose the 3?

If order mattered, the answer would be $P(10, 3) = 10 \times 9 \times 8 = 720$. But order does not matter: choosing $\{s_2, s_5, s_9\}$ is the same choice as $\{s_9, s_2, s_5\}$. We have overcounted — and by exactly the trick from the last section, we know the overcount factor. Each unordered set of 3 servers was counted once for every ordering of those 3, and there are $3! = 6$ orderings of 3 things. So divide:

$$\frac{P(10,3)}{3!} = \frac{720}{6} = 120.$$

There are 120 ways to choose 3 servers from 10 when order is irrelevant. That number has a name and a notation.

Definition (combination). A combination of $r$ objects chosen from a set of $n$ distinct objects is an unordered selection — equivalently, an $r$-element subset. The number of such selections is the binomial coefficient, written $\binom{n}{r}$ and read "$n$ choose $r$."

The derivation we just did is the formula. An $r$-permutation is "choose an $r$-subset, then order it," and the product rule says (number of $r$-permutations) = (number of $r$-subsets) $\times$ (ways to order $r$ things), i.e. $P(n, r) = \binom{n}{r} \cdot r!$. Solve for $\binom{n}{r}$:

$$\binom{n}{r} = \frac{P(n, r)}{r!} = \frac{n!}{r!\,(n-r)!}.$$

🚪 Threshold Concept: order is a choice you make, and you can always divide it out. The single idea that unlocks all of combinatorics is this: count the ordered version with the product rule, then divide by the number of orderings you do not want to distinguish. Permutations are the ordered count; combinations are the ordered count divided by $r!$; multiset permutations are $n!$ divided by the internal shuffles. Once you internalize "overcount, then divide by the symmetry," you stop memorizing formulas and start deriving them. Every harder counting problem in this book — and in your career — is a variation on this move.

Let's compute a few and notice their structure:

$\binom{5}{2} = \frac{5!}{2!\,3!} = \frac{120}{2 \cdot 6} = 10$.
$\binom{5}{3} = \frac{5!}{3!\,2!} = \frac{120}{6 \cdot 2} = 10$. The same number. Choosing 3 to keep is the same as choosing 2 to discard.
$\binom{n}{0} = \frac{n!}{0!\,n!} = 1$ — there is exactly one way to choose nothing (the empty set).
$\binom{n}{n} = 1$ — exactly one way to choose everything.
$\binom{n}{1} = n$ — $n$ ways to choose a single element.

The observation that $\binom{5}{2} = \binom{5}{3}$ is no accident. It is the first identity worth proving, and we will prove it twice — once with algebra now, and again, more beautifully, in 16.5.

Symmetry identity. For all integers $0 \le r \le n$, $\displaystyle \binom{n}{r} = \binom{n}{n-r}.$

Strategy (algebraic). Just substitute $n - r$ for $r$ in the formula and watch the two factors in the denominator swap places.

Proof. By definition, $$\binom{n}{n-r} = \frac{n!}{(n-r)!\,\bigl(n-(n-r)\bigr)!} = \frac{n!}{(n-r)!\,r!} = \frac{n!}{r!\,(n-r)!} = \binom{n}{r}.$$ The middle step uses only that multiplication is commutative in the denominator. $\blacksquare$

Now in code. As with permutations, the naive route through three factorials is correct but wasteful; we compute $\binom{n}{r}$ as a falling product divided as we go, and we exploit the symmetry to keep $r$ small:

def comb_count(n, r):
    """Number of r-combinations of n objects: C(n, r) = n!/(r!(n-r)!)."""
    if r < 0 or r > n:
        return 0
    r = min(r, n - r)          # symmetry: C(n,r) == C(n,n-r); keep r small
    result = 1
    for i in range(r):         # multiply (n-r+1..n), divide by (1..r) interleaved
        result = result * (n - i) // (i + 1)
    return result

print(comb_count(10, 3))
print(comb_count(52, 5))      # number of 5-card poker hands
# Expected output:
# 120
# 2598960

The interleaving result * (n - i) // (i + 1) is worth a second look. After step $i$, result holds $\binom{n}{i+1}$, which is always an integer, so the integer division // never loses anything. (That the running value is always a whole number is itself a small theorem — a product of $j$ consecutive integers is divisible by $j!$ — and it is exactly why this loop is safe.) The famous count $\binom{52}{5} = 2{,}598{,}960$ — the number of 5-card poker hands from a standard deck — falls right out.

Combining the counting rules: selection with constraints

Combinations rarely appear alone. The real power comes from composing $\binom{n}{r}$ with the product, sum, and subtraction rules of Chapter 15. Here is the kind of problem that composition solves.

Worked example. A code-review rotation must select a panel of 5 reviewers from a pool of 7 senior and 6 junior engineers (13 in all). How many panels are possible if the panel must contain at least one senior and at least one junior?

There are two clean strategies, and computing it both ways is a good consistency check.

Strategy 1 — subtraction rule. Count all panels, then subtract the forbidden ones (all-senior or all-junior). Total panels: $\binom{13}{5} = 1287$. All-senior panels (5 from the 7 seniors, 0 juniors): $\binom{7}{5} = 21$. All-junior panels: $\binom{6}{5} = 6$. The two forbidden cases are disjoint, so

$$\binom{13}{5} - \binom{7}{5} - \binom{6}{5} = 1287 - 21 - 6 = 1260.$$

Strategy 2 — sum rule over cases. A valid panel has $s$ seniors and $5 - s$ juniors with $1 \le s \le 4$ (so that neither group is empty). For each $s$, choose the seniors and juniors independently (product rule): $\binom{7}{s}\binom{6}{5-s}$. Sum over the cases:

$$\sum_{s=1}^{4}\binom{7}{s}\binom{6}{5-s} = \binom{7}{1}\binom{6}{4} + \binom{7}{2}\binom{6}{3} + \binom{7}{3}\binom{6}{2} + \binom{7}{4}\binom{6}{1}.$$

Evaluating the four terms: $7 \cdot 15 = 105$, then $21 \cdot 20 = 420$, then $35 \cdot 15 = 525$, then $35 \cdot 6 = 210$. Their sum is $105 + 420 + 525 + 210 = 1260$ — the same answer. The agreement is not luck; it is the subtraction rule and the sum rule counting the same set, and seeing two methods land on $1260$ is exactly the kind of self-check you should build into counting work.

💡 Intuition: "choose the seniors, then choose the juniors" is a product. When a selection splits into independent sub-selections from disjoint groups, multiply the counts (product rule). When a selection breaks into mutually exclusive cases, add the counts (sum rule). Constraints like "at least one" are usually easiest by subtraction — count everything, remove what violates the constraint.

⚠️ Common Pitfall: $\binom{n}{r}$ vs. $P(n,r)$. They differ by the factor $r!$: $P(n,r) = r! \binom{n}{r}$. If you find yourself unsure which to use, ask the one question that decides it: would reordering the chosen items give a different outcome I want to count separately? If yes, it's a permutation; if no, it's a combination. "Three primaries" — no, a set is a set — combination. "Gold, silver, bronze medalists" — yes, who got which medal matters — permutation.

🔄 Check Your Understanding 1. A standard lottery draws 6 numbers from 1–49, order irrelevant. How many possible draws? (Leave it as $\binom{49}{6}$ and as a single integer if you can.) 2. Why is $\binom{n}{r}$ always a whole number even though $\frac{n!}{r!(n-r)!}$ looks like it could be fractional? 3. Without computing, which is larger: $\binom{100}{2}$ or $P(100, 2)$? By what factor?

Answers

$\binom{49}{6} = 13{,}983{,}816$. (You can leave it symbolic; the integer is here for reference.)

Because it counts something — the number of $r$-subsets — and a count of objects is necessarily a non-negative integer. The formula's value must therefore be a whole number. 3. $P(100,2)$ is larger, by a factor of $2! = 2$, since $P(n,r) = r!\binom{n}{r}$.

16.3 Permutations vs. combinations: choosing the model

This is the section that matters most, because the formulas are useless if you apply the wrong one. The entire skill is answering two yes/no questions about a problem, in order:

Does order matter? — If reordering the chosen items produces a different outcome you want to count, order matters (use a permutation). If reordering gives the same outcome, order does not matter (use a combination).
Is repetition allowed? — Can the same item be chosen more than once?

Those two questions give four cases. Here is the decision table; commit it to memory, because nearly every elementary counting problem is one of these four.

	Order matters	Order does not matter
No repetition	$r$-permutation: $\dfrac{n!}{(n-r)!}$	$r$-combination: $\dbinom{n}{r}$
Repetition allowed	sequences: $n^r$	multiset selection: $\dbinom{n+r-1}{r}$ (Chapter 17)

A word on the four cells:

Order matters, no repetition — the $r$-permutation $P(n,r)$. Assigning distinct people to distinct ranked roles; the top $r$ finishers of a race; the first three bytes of a header drawn from distinct values.
Order matters, repetition allowed — plain sequences, counted $n^r$ by the product rule (Chapter 15). A length-$r$ password over an $n$-symbol alphabet; an $r$-digit PIN; the contents of an array of length $r$ with $n$ possible values per cell.
Order does not matter, no repetition — the $r$-combination $\binom{n}{r}$. Choosing an unordered committee, a hand of cards, a subset of features to enable.
Order does not matter, repetition allowed — choosing a multiset, e.g. how many ways to pick $r$ scoops from $n$ flavors when flavors can repeat. The count is $\binom{n+r-1}{r}$, but the technique behind it ("stars and bars") is the opening act of Chapter 17, so we only flag it here.

🧩 Productive Struggle. Classify each before reading on. (a) Number of 4-character passwords over a 26-letter alphabet. (b) Number of ways to seat 4 of 9 interns in 4 distinct numbered desks. (c) Number of ways to choose 4 of 9 interns to receive identical T-shirts. (d) Number of ways to pick 4 doughnuts from 9 available kinds, repeats allowed.

Decide order? and repetition? for each, then pick the cell.

Now the resolutions, with the reasoning made explicit:

(a) Order matters (AB $\neq$ BA as passwords) and repetition is allowed (AA is a valid password). Cell: $n^r = 26^4 = 456{,}976$.
(b) Order matters (desk 1 vs. desk 2 is a different assignment) and no repetition (one intern per desk). Cell: $P(9,4) = 9 \times 8 \times 7 \times 6 = 3024$.
(c) Order does not matter (the T-shirts are identical; it is just a set of 4 interns) and no repetition. Cell: $\binom{9}{4} = 126$.
(d) Order does not matter (a box of doughnuts is a multiset, not a sequence) and repetition is allowed. Cell: $\binom{9 + 4 - 1}{4} = \binom{12}{4} = 495$ — the Chapter 17 case.

Notice how (b) and (c) use the same $n$ and $r$ but differ by the factor $4! = 24$: $3024 / 24 = 126$. That factor is precisely the $r!$ orderings of the chosen 4 that the combination ignores and the permutation counts. Whenever you see two versions of a problem that differ only by "does the order of the chosen things matter," they will differ by exactly $r!$.

💡 Intuition: the two questions, as code. "Order matters?" is "would [a, b] and [b, a] be different results?" "Repetition allowed?" is "can the same element appear twice in my selection?" If you can answer those about your loops, you can pick the model. A nested loop that does for i: for j > i: generates combinations (it never revisits a pair in the other order); a nested loop with for i: for j != i: generates permutations.

🐛 Find the Error. A student counts the number of ways to deal a 5-card poker hand as $52 \times 51 \times 50 \times 49 \times 48 = 311{,}875{,}200$ and concludes there are about 312 million distinct hands. What did they count, and what is the actual number of distinct hands?

Answer

They computed $P(52, 5)$ — the number of ordered deals, i.e. sequences of 5 cards. But a poker hand is a set; the order you are dealt the cards does not change the hand. They overcounted by the $5! = 120$ orderings of each hand. The actual number of distinct hands is $\frac{311{,}875{,}200}{120} = 2{,}598{,}960 = \binom{52}{5}$. This is the single most common counting error: using a permutation where the problem wants a combination.

16.4 The binomial theorem and Pascal's triangle

The binomial coefficient $\binom{n}{r}$ earns its name "binomial" from algebra. Expand a power of a binomial $(x + y)$ and watch what the coefficients turn out to be:

$$(x+y)^2 = x^2 + 2xy + y^2, \qquad (x+y)^3 = x^3 + 3x^2y + 3xy^2 + y^3.$$

The coefficients are $1, 2, 1$ and $1, 3, 3, 1$ — which are exactly $\binom{2}{0}, \binom{2}{1}, \binom{2}{2}$ and $\binom{3}{0}, \binom{3}{1}, \binom{3}{2}, \binom{3}{3}$. That is not a coincidence, and the reason is pure counting.

💡 Intuition: why combinations appear in $(x+y)^n$. Write $(x+y)^n$ as the product of $n$ identical factors $(x+y)(x+y)\cdots(x+y)$. To expand, you pick either the $x$ or the $y$ from each factor and multiply your choices. A term $x^{n-k}y^{k}$ arises every time you pick $y$ from exactly $k$ of the $n$ factors — and the number of ways to choose which $k$ factors contribute a $y$ is $\binom{n}{k}$. So the coefficient of $x^{n-k}y^k$ is $\binom{n}{k}$. The binomial coefficient is "choose which factors give a $y$."

That intuition is already a proof, but let's state it cleanly.

The Binomial Theorem. For any $x, y$ and any integer $n \ge 0$, $$(x+y)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} y^{k}.$$

Strategy. We give the counting argument above in formal dress: expanding the product means choosing one term from each of the $n$ factors; collecting like terms groups those choices by how many $y$'s were taken; the count of each group is a binomial coefficient. (A second proof, by induction on $n$ using Pascal's rule below, is a guided exercise.)

Proof. Expanding $(x+y)^n = \prod_{i=1}^{n}(x+y)$ by the distributive law produces a sum of $2^n$ terms, one for each way of choosing $x$ or $y$ from each factor. A given choice contributes the product $x^{n-k}y^{k}$, where $k$ is the number of factors from which $y$ was chosen. The number of choices yielding a particular $k$ is the number of ways to select which $k$ of the $n$ factors contribute a $y$, namely $\binom{n}{k}$. Summing over $k = 0, 1, \dots, n$ and collecting like terms gives $\sum_{k=0}^{n}\binom{n}{k}x^{n-k}y^{k}$. $\blacksquare$

Setting $x = y = 1$ in the theorem gives an identity we will lean on repeatedly:

$$\sum_{k=0}^{n}\binom{n}{k} = (1+1)^n = 2^n.$$

The left side adds up the number of subsets of each size $0, 1, \dots, n$; the right side is the total number of subsets of an $n$-set. We will reprove this purely by counting in 16.5 — and that count is why $\binom{n}{k}$ is the heartbeat of the next section.

Setting $x = 1, y = -1$ (for $n \ge 1$) gives a second, slyer identity, the alternating sum:

$$\sum_{k=0}^{n}(-1)^k\binom{n}{k} = (1 - 1)^n = 0.$$

In words: in any non-empty set, the number of even-sized subsets equals the number of odd-sized subsets. Try verifying it on a 3-element set by hand — it is a small surprise that pays off in inclusion–exclusion later.

Pascal's triangle and Pascal's rule

There is a way to generate all the binomial coefficients without ever computing a factorial, and it exposes the structure beautifully. Stack the coefficients in rows, row $n$ holding $\binom{n}{0}, \binom{n}{1}, \dots, \binom{n}{n}$:

n=0:            1
n=1:           1   1
n=2:          1   2   1
n=3:         1   3   3   1
n=4:        1   4   6   4   1
n=5:       1   5  10  10   5   1

This is Pascal's triangle. Two things jump out. Each row begins and ends with 1 (that is $\binom{n}{0} = \binom{n}{n} = 1$). And every interior entry is the sum of the two entries above it — for instance $10 = 4 + 6$. That second observation is a theorem:

Pascal's Rule. For $1 \le k \le n-1$ (indeed for $1 \le k \le n$), $$\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}.$$

Strategy (combinatorial — this is the natural proof). Both sides count the number of $k$-subsets of an $n$-set. We will count the right side by splitting on a single element: fix one element, and ask whether a $k$-subset contains it or not. Those two cases are disjoint and cover everything, so by the sum rule their counts add up to the left side. (An algebraic proof by combining fractions also works and is a fine exercise.)

Proof. Let $S$ be a set with $n$ elements, and single out one particular element $a \in S$. Every $k$-subset of $S$ either contains $a$ or does not, and these two possibilities are mutually exclusive, so by the sum rule $$\binom{n}{k} = (\text{$k$-subsets containing } a) + (\text{$k$-subsets not containing } a).$$ A $k$-subset containing $a$ is determined by choosing the other $k-1$ elements from the $n-1$ elements of $S \setminus \{a\}$: there are $\binom{n-1}{k-1}$ of these. A $k$-subset not containing $a$ is just a $k$-subset of $S \setminus \{a\}$: there are $\binom{n-1}{k}$ of these. Adding, $\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}$. $\blacksquare$

Pascal's rule gives an algorithm for the whole triangle that uses only addition — no multiplication, no factorials, no division — which makes it the standard way to tabulate binomial coefficients, especially when you need many of them or are working modulo something:

def pascal_row(n):
    """Return row n of Pascal's triangle: [C(n,0), ..., C(n,n)]."""
    row = [1]
    for k in range(1, n + 1):
        row.append(row[-1] * (n - k + 1) // k)   # C(n,k) from C(n,k-1)
    return row

def pascal_triangle(rows):
    tri = [[1]]
    for n in range(1, rows):
        prev = tri[-1]
        tri.append([1] + [prev[i] + prev[i + 1] for i in range(len(prev) - 1)] + [1])
    return tri

print(pascal_row(5))
print([sum(r) for r in pascal_triangle(5)])   # row sums = powers of 2
# Expected output:
# [1, 5, 10, 10, 5, 1]
# [1, 2, 4, 8, 16]

The pascal_triangle builder uses only Pascal's rule (prev[i] + prev[i+1]), the pure-addition method. And the row sums come out $1, 2, 4, 8, 16$ — the powers of $2$, confirming $\sum_k \binom{n}{k} = 2^n$ on the nose.

🔄 Check Your Understanding 1. Use the binomial theorem to write the coefficient of $x^2 y^3$ in $(x+y)^5$. Check it against Pascal's triangle row 5. 2. What is $\binom{6}{2}$? Get it two ways: from the formula, and from Pascal's rule using row 5 above. 3. What does $\sum_{k=0}^{n} \binom{n}{k} = 2^n$ say about subsets, in one sentence?

Answers

The term is $x^{5-k}y^k$ with $k=3$, so the coefficient is $\binom{5}{3} = 10$ — and row 5 of the triangle is $1,5,10,10,5,1$, whose entry for $k=3$ is indeed $10$. 2. Formula: $\binom{6}{2}=\frac{6\cdot5}{2}=15$. Pascal: row 5 is $1,5,10,10,5,1$; the $k=2$ entry of row 6 is $\binom{5}{1}+\binom{5}{2}=5+10=15$. 3. The total number of subsets of an $n$-element set equals the sum, over every possible size $k$, of the number of subsets of that size.

16.5 Combinatorial proofs: counting one thing two ways

You have now seen two proof styles for the same kind of fact. The symmetry identity $\binom{n}{r} = \binom{n}{n-r}$ we proved by algebra — pushing factorials around. Pascal's rule we proved by counting — splitting the $k$-subsets into two cases. The second style has a name and is one of the most satisfying techniques in all of mathematics.

Definition (combinatorial proof). A combinatorial proof of an identity establishes that two expressions are equal by showing they count the same finite set — either by counting one set two different ways (a double-counting argument), or by exhibiting a bijection between two sets whose sizes are the two expressions. No algebra is required; the equality holds because both sides are the size of the same collection.

Why is that a proof and not just a story? Because counting is a function: a finite set has exactly one size. If expression $A$ correctly counts set $S$ and expression $B$ also correctly counts the same set $S$, then $A = B$ — there is nothing left to check. The rigor lives in the word "correctly": you must describe a concrete set and argue, carefully, that each side counts it without omission or duplication.

🚪 Threshold Concept: an identity is a coincidence of counts. Most people first meet identities as things to verify by algebra. Combinatorial proof flips this: an identity becomes true for a reason — both sides are literally counting the same objects. Once you see proofs this way, many identities that look like algebraic accidents become obvious, and — crucially — you can discover new identities by counting a set two ways and setting the answers equal. This is how working combinatorialists think.

Let's do three, escalating in cleverness.

Combinatorial proof 1: the total number of subsets

Claim. $\displaystyle \sum_{k=0}^{n} \binom{n}{k} = 2^n.$

Strategy. Count the set of all subsets of an $n$-element set $S$ in two ways. One way groups subsets by size; the other builds each subset by an element-by-element decision.

Proof. Let $S$ be a set with $n$ elements, and let $\mathcal{P}(S)$ be its collection of subsets.

Count one (by size). Every subset of $S$ has some size $k$ between $0$ and $n$, and the number of subsets of size exactly $k$ is $\binom{n}{k}$ by definition. Subsets of different sizes are distinct, so by the sum rule the total number of subsets is $\sum_{k=0}^{n}\binom{n}{k}$.

Count two (by membership decisions). Build a subset by deciding, for each of the $n$ elements independently, whether to include it: 2 choices per element. By the product rule there are $2^n$ subsets.

Both counts enumerate $\mathcal{P}(S)$ exactly once, so they are equal: $\sum_{k=0}^{n}\binom{n}{k} = 2^n$. $\blacksquare$

Compare this to the algebraic proof (set $x = y = 1$ in the binomial theorem). Both are valid; the combinatorial one explains the identity — it tells you the $2^n$ is "two choices per element," which the algebra hides.

Combinatorial proof 2: symmetry, again — but as a bijection

Claim. $\displaystyle \binom{n}{r} = \binom{n}{n-r}.$

Strategy. Exhibit a bijection (Chapter 9) between the $r$-subsets and the $(n-r)$-subsets of an $n$-set. If a one-to-one correspondence exists, the two collections have the same size.

Proof. Let $S$ have $n$ elements. Define a map from $r$-subsets of $S$ to $(n-r)$-subsets by sending each subset $A$ to its complement $S \setminus A$. If $|A| = r$ then $|S \setminus A| = n - r$, so the output is an $(n-r)$-subset. This map is its own inverse — complementing twice returns $A$ — so it is a bijection. Therefore the number of $r$-subsets equals the number of $(n-r)$-subsets: $\binom{n}{r} = \binom{n}{n-r}$. $\blacksquare$

💡 Intuition: "Choosing who is in the committee" and "choosing who is out" are the same act seen from two sides. The complement map is that change of perspective. A bijection proof is often just naming the obvious symmetry precisely.

Combinatorial proof 3: the committee–chair identity

Claim. For $1 \le k \le n$, $\displaystyle k\binom{n}{k} = n\binom{n-1}{k-1}.$

Strategy. Both sides count the same thing: the number of ways to choose a committee of $k$ people from $n$ and designate one of the committee members as chair. Count it "committee first, then chair," and also "chair first, then the rest of the committee." Equate.

Proof. Count the number of ways to form a $k$-member committee from $n$ people and appoint one committee member as chair.

Count one (committee, then chair). Choose the $k$ members: $\binom{n}{k}$ ways. Then choose the chair from among those $k$: $k$ ways. By the product rule: $k\binom{n}{k}$.

Count two (chair, then the rest). Choose the chair from all $n$ people: $n$ ways. Then fill out the committee by choosing the remaining $k-1$ members from the other $n-1$ people: $\binom{n-1}{k-1}$ ways. By the product rule: $n\binom{n-1}{k-1}$.

Both count the identical set of (committee, chair) outcomes, so $k\binom{n}{k} = n\binom{n-1}{k-1}$. $\blacksquare$

Verify on small numbers to build trust: with $n = 5, k = 2$, the left side is $2\binom{5}{2} = 2 \cdot 10 = 20$ and the right is $5\binom{4}{1} = 5 \cdot 4 = 20$. Equal, as promised. Theme four of the book in miniature: code can check this identity for thousands of $(n,k)$ pairs in a blink — and indeed the Project Checkpoint below does exactly that — but the combinatorial proof settles it for all of them at once, and tells you why.

⚠️ Common Pitfall: A combinatorial "proof" that only describes one side is not a proof. You must (1) name a single concrete set, and (2) argue that both expressions count it correctly — same set, two counts. Hand-waving "this looks like choosing a committee" without pinning down the exact set and both counts is where these arguments go wrong.

🔄 Check Your Understanding 1. State, in one sentence each, the "set counted two ways" in proofs 1 and 3 above. 2. Give the bijection used in proof 2, and explain why it is its own inverse. 3. What is the difference between a double-counting proof and a bijection proof?

Answers

Proof 1 counts all subsets of an $n$-set (by size, vs. by per-element decisions); proof 3 counts all (committee of $k$, designated chair) pairs (committee-then-chair, vs. chair-then-rest). 2. The bijection sends a subset $A$ to its complement $S\setminus A$; applying it twice gives $S\setminus(S\setminus A) = A$, so it is its own inverse, hence a bijection. 3. Double counting counts one set two different ways and equates the counts; a bijection sets up a one-to-one correspondence between two sets to show they have equal size. Both prove equality of sizes.

16.6 Counting in algorithms: sizing the search space

Now the chapter pays its way as computer science. Before you write a brute-force algorithm — one that tries every candidate — you should count the candidates. The count tells you whether brute force is a reasonable plan or a fantasy, and it tells you which combinatorial object your algorithm is really enumerating. Three sizes come up constantly:

If your algorithm enumerates…	Number of candidates	Grows like
all subsets of $n$ items (include/exclude each)	$2^n$	exponential
all orderings of $n$ items (permutations)	$n!$	faster than any exponential
all $k$-subsets of $n$ items	$\binom{n}{k}$	polynomial in $n$ for fixed $k$
all length-$r$ strings over $n$ symbols	$n^r$	exponential in $r$

These four numbers are the difference between a feasible loop and an impossible one. A worked example makes the stakes vivid.

Scenario (illustrative). You are asked to solve a small instance of the subset-sum problem: given $n$ integers, is there a subset that sums to a target? The obvious algorithm tries every subset and checks its sum. How large can $n$ be before brute force becomes hopeless on a machine doing, say, $10^9$ subset-checks per second?

The number of subsets is $2^n$. Tabulate:

$n$	$2^n$	time at $10^9$/s
20	$\approx 10^6$	about a millisecond
30	$\approx 10^9$	about a second
40	$\approx 10^{12}$	about 18 minutes
50	$\approx 10^{15}$	about 13 days
60	$\approx 10^{18}$	about 36 years

The lesson lands hard: every time $n$ grows by 10, the work multiplies by about a thousand. Brute-force subset enumeration is fine to $n \approx 30$, painful by 40, and hopeless past 50 — no matter how fast your computer is. This is why "just try all subsets" is a red flag in code review for anything but tiny $n$, and why we develop better tools (recurrences in Chapters 18–19, and the whole theory of intractability in Chapter 37) for the cases that matter.

Permutation enumeration is worse. Trying all orderings of $n$ items is $n!$ work, and $n!$ outgrows $2^n$: already $13! \approx 6.2 \times 10^9$ exceeds a second at $10^9$/s, and $20! \approx 2.4 \times 10^{18}$ is decades. The brute-force Traveling Salesman approach — try every tour — examines $(n-1)!/2$ tours and is the canonical example of an algorithm that is correct and useless past about $n = 12$. (We'll meet TSP properly in Chapter 30.)

Here is the connection between generating these objects and counting them, made concrete. The generators below produce the candidates one at a time; the counts above tell you how many times the inner block will run.

def subsets(items):
    """Yield every subset of items (as a list). There are 2**len(items) of them."""
    if not items:
        yield []
        return
    first, rest = items[0], items[1:]
    for sub in subsets(rest):       # each subset of rest...
        yield sub                   # ...without 'first'
        yield [first] + sub         # ...and with 'first'

all_subsets = list(subsets([1, 2, 3]))
print(len(all_subsets))             # should equal 2**3
print(sorted(all_subsets, key=len))
# Expected output:
# 8
# [[], [1], [2], [3], [1, 2], [1, 3], [2, 3], [1, 2, 3]]

The function realizes the combinatorial proof of $|\mathcal{P}(S)| = 2^n$ as code: for each element, it branches into "without it" and "with it" — exactly the two-choices-per-element argument from 16.5, turned into recursion. The number of leaves of that recursion is $2^n$, which is why len(all_subsets) is $8 = 2^3$. (The exact order in which subsets come out depends on the recursion; the count does not.)

🔗 Connection. This is the bridge to the rest of the book. The search-space sizes here are why Chapter 17 cares about the pigeonhole principle (some collisions are unavoidable once the space is too small), why Chapters 18–19 develop recurrences (to count and to compute without enumerating), and why Chapter 37 distinguishes problems where a clever algorithm beats brute force from problems where we suspect none does. Counting the candidates is the first thing a good algorithm designer does.

🔄 Check Your Understanding 1. An algorithm enumerates all $k$-element subsets of $n$ items with $k$ fixed at 3. As $n$ grows, does the work grow polynomially or exponentially in $n$? Why? 2. Roughly how many subsets does a 40-element set have, and is brute-force enumeration of them feasible on a fast machine? 3. You must enumerate all orderings of 15 items. Estimate the count and say whether that is feasible.

Answers

Polynomially: $\binom{n}{3} = \frac{n(n-1)(n-2)}{6} = \Theta(n^3)$, a cubic — fixed $k$ makes the $k$-subset count a degree-$k$ polynomial in $n$, not exponential. 2. About $2^{40} \approx 10^{12}$ subsets; at $10^9$/s that is roughly 18 minutes — borderline, usually too slow for an interactive tool but possible offline. 3. $15! \approx 1.3 \times 10^{12}$; at $10^9$/s that is around 20 minutes — infeasible for anything interactive, and it gets a factor worse with each additional item.

Project Checkpoint: permutation and combination generators

In Chapter 15 you began dmtoolkit/combinatorics.py with the basic counting helpers. This chapter contributes the two counters for ordered and unordered selection — perm_count and comb_count, which you have already seen above — and, more importantly, the two generators that actually produce the arrangements and selections: permutations(it, k) and combinations(it, k). Counting tells you how many; generating lets your code iterate over them — to brute-force a search, build test cases, or check a combinatorial identity by direct enumeration.

Add these to dmtoolkit/combinatorics.py (keeping the canonical signatures from the style bible so the pieces compose with the rest of the Toolkit):

def permutations(it, k=None):
    """Yield k-permutations (ordered, no repetition) of the items in it."""
    pool = list(it)
    n = len(pool)
    k = n if k is None else k
    if k == 0:
        yield ()
        return
    for i in range(n):
        rest = pool[:i] + pool[i + 1:]          # remove the chosen element
        for tail in permutations(rest, k - 1):  # order the remaining k-1
            yield (pool[i],) + tail

def combinations(it, k):
    """Yield k-combinations (unordered, no repetition) of the items in it."""
    pool = list(it)
    n = len(pool)
    if k == 0:
        yield ()
        return
    for i in range(n - k + 1):                  # leave room for k-1 more after i
        for tail in combinations(pool[i + 1:], k - 1):  # only later items -> no reorder
            yield (pool[i],) + tail

The two generators differ by exactly the idea of 16.3. permutations removes the chosen element and recurses on everything else, so it revisits the same elements in different orders — ordered selection. combinations recurses only on the elements after the chosen one (pool[i+1:]), so it never produces the same set in two orders — unordered selection. That one-line difference (pool[:i] + pool[i+1:] versus pool[i+1:]) is the whole distinction between a permutation and a combination, made executable.

You can now close the loop between counting and generating, which is the point: the length of the generated list must equal the count.

from dmtoolkit.combinatorics import perm_count, comb_count, permutations, combinations
print(len(list(permutations([1,2,3,4], 2))), perm_count(4, 2))   # generated vs. counted
print(len(list(combinations([1,2,3,4], 2))), comb_count(4, 2))
# Expected output:
# 12 12
# 6 6

Toward the capstone. With counting and generating in hand, combinatorics.py is complete enough to drive brute-force search and exhaustive testing throughout the rest of the book — sizing keyspaces for the RSA work in Part IV, enumerating small graph structures in Part V, and powering the constraint-search capstone (the Sudoku solver, Track C) in Chapter 39. From here, probability.py (Chapters 20–21) builds directly on these counts: a probability is, very often, a count of favorable outcomes divided by a count of all outcomes — and you now have both halves.

Toolkit so far: logic.py (Chapters 1–3), sets.py (Chapter 8), relations.py (Chapters 12–13), the start of combinatorics.py (Chapter 15) now extended with perm_count, comb_count, permutations, and combinations, plus the beginnings of recurrences.py (Chapter 6).

Summary

This chapter specialized the counting principles of Chapter 15 into two patterns and one central object. Reference-grade recap:

The four basic models (choose by asking order? and repetition?):

	Order matters	Order does not matter
No repetition	$P(n,r) = \dfrac{n!}{(n-r)!}$	$\dbinom{n}{r} = \dfrac{n!}{r!\,(n-r)!}$
Repetition allowed	$n^r$	$\dbinom{n+r-1}{r}$ (Chapter 17)

Key formulas and identities:

Name	Statement
Permutations (all $n$)	$P(n,n) = n!$
$r$-permutations	$P(n,r) = \dfrac{n!}{(n-r)!} = n(n-1)\cdots(n-r+1)$
Multiset permutations	$\dfrac{n!}{n_1!\,n_2!\cdots n_t!}$ (sizes sum to $n$)
Combinations	$\dbinom{n}{r} = \dfrac{P(n,r)}{r!} = \dfrac{n!}{r!\,(n-r)!}$
Relation	$P(n,r) = r!\,\dbinom{n}{r}$
Symmetry	$\dbinom{n}{r} = \dbinom{n}{n-r}$
Pascal's rule	$\dbinom{n}{k} = \dbinom{n-1}{k-1} + \dbinom{n-1}{k}$
Binomial theorem	$(x+y)^n = \sum_{k=0}^{n}\dbinom{n}{k}x^{n-k}y^{k}$
Total subsets	$\sum_{k=0}^{n}\dbinom{n}{k} = 2^n$
Alternating sum	$\sum_{k=0}^{n}(-1)^k\dbinom{n}{k} = 0$ for $n \ge 1$
Committee–chair	$k\dbinom{n}{k} = n\dbinom{n-1}{k-1}$

Search-space sizes (for feasibility judgments): subsets $2^n$; permutations $n!$; $k$-subsets $\binom{n}{k}$ ($\Theta(n^k)$ for fixed $k$); length-$r$ strings over $n$ symbols $n^r$.

Method notes:

Overcount, then divide by the symmetry is the master move: $n!$ ÷ internal shuffles = multiset permutations; $P(n,r)$ ÷ $r!$ orderings = combinations.
Compute, don't form factorials. Use the falling product for $P(n,r)$, and the interleaved multiply-then-integer-divide for $\binom{n}{r}$; use Pascal's rule (addition only) to tabulate.
Combinatorial proof: name one finite set, count it two ways (or biject two sets), conclude the counts are equal. It is a proof because a finite set has exactly one size.

Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

(Ch. 15) The combination formula $\binom{n}{r} = P(n,r)/r!$ is an application of one of Chapter 15's counting principles. Which one — and what is being "divided out"?

(Ch. 15) Count the number of 4-character strings over the 26-letter lowercase alphabet using a Chapter 15 rule. Which rule, and what is the number?

(Ch. 11) Write $P(n,r)$ using factorial notation, and state the convention for $0!$ that makes $P(n,n) = n!$ come out right.

(Ch. 11) The sum $\sum_{k=0}^{n}\binom{n}{k} = 2^n$ is a summation in the notation of Chapter 11. Evaluate it for $n = 4$ by writing out all five terms.

Answers

The division rule: we overcount the unordered selections by treating them as ordered ($P(n,r)$), then divide out the $r!$ orderings of each chosen $r$-subset that we do not want to distinguish. 2. The product rule: $26 \times 26 \times 26 \times 26 = 26^4 = 456{,}976$ (26 independent choices for each of 4 positions, repetition allowed). 3. $P(n,r) = \frac{n!}{(n-r)!}$; the convention $0! = 1$ makes $P(n,n) = \frac{n!}{0!} = n!$. 4. $\binom{4}{0}+\binom{4}{1}+\binom{4}{2}+\binom{4}{3}+\binom{4}{4} = 1+4+6+4+1 = 16 = 2^4$.

What's Next

You can now count ordered arrangements, unordered selections, and the subsets and search spaces that algorithms must explore. But some counting problems resist the four basic models — they need a different kind of reasoning entirely. What is the smallest group in which two people must share a birthday month? Why are hash collisions mathematically guaranteed once you store enough keys? How many ways can you distribute identical items into distinct bins? Chapter 17 takes up these questions with the pigeonhole principle (a deceptively simple idea with surprising force), stars and bars (the technique behind that fourth cell of our decision table), and generating functions (counting with algebra). After that, Chapter 20 turns counting into probability — because once you can count the favorable outcomes and the total outcomes, you can measure uncertainty.