Chapter 17: Advanced Counting

DataField.Dev

40 min read

> "The Pigeonhole Principle... is so simple that one might think it could not possibly say anything of interest. But it does."

Prerequisites

16

Learning Objectives

Apply the pigeonhole principle to prove that a collision or repeat must exist, and identify the pigeons and the holes in a real problem.
Use the generalized pigeonhole principle to bound the fullest box, computing the guaranteed minimum with a ceiling.
Count the number of ways to distribute identical objects into distinct boxes using stars and bars, and translate a problem into the integer-equation form it solves.
Apply the general inclusion–exclusion principle to count the size of a union of any number of sets, and prove the formula by a counting argument.
Set up an ordinary generating function for a counting problem and extract a coefficient to read off an answer.
Count derangements and surjections by inclusion–exclusion, and explain the appearance of $1/e$ in the hat-check problem.

In This Chapter

Overview
Learning Paths
17.1 The pigeonhole principle (and hashing collisions)
17.2 The generalized pigeonhole principle
17.3 Stars and bars: counting distributions
17.4 General inclusion–exclusion
17.5 Generating functions: counting with algebra
17.6 Applications: derangements and surjection counts
Project Checkpoint: distribution and derangement counts for the Toolkit
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 17: Advanced Counting

"The Pigeonhole Principle... is so simple that one might think it could not possibly say anything of interest. But it does." — paraphrasing the folklore of combinatorics

Overview

Here is a fact you can prove in one sentence, with no computer and no measurement: in any city of a million people, at least two of them have exactly the same number of hairs on their head. You don't need to count anyone's hairs. A human head holds at most a few hundred thousand hairs; there are a million people; so two of them must collide on a hair count. That is the pigeonhole principle, and once you see it you will see it everywhere — most importantly, in the one place every working programmer eventually meets it: hash collisions.

When you store keys in a hash table, you map a huge space of possible keys (every string, say) into a small space of slots (an array of size $m$). You cannot avoid two different keys landing in the same slot, no matter how clever your hash function is, the instant you have more keys than slots. That is not a flaw in your code; it is a theorem. Chapter 26 will build the machinery of hash tables on top of exactly this observation. This chapter gives you the theorem — and three more counting tools that, together, let you answer questions that the product and sum rules of Chapter 15 and the permutations and combinations of Chapter 16 cannot touch on their own.

The tools in this chapter share a flavor: each one lets you count something that resists direct enumeration. You cannot list every way to split 100 identical servers across 7 data centers — there are tens of thousands of ways — but stars and bars gives you the count in one binomial coefficient. You cannot easily count the passwords that violate at least one of several rules without double-counting the ones that violate several — but inclusion–exclusion handles the overlaps exactly. And when a counting problem hides a pattern across all values of $n$ at once, a generating function packs the entire infinite sequence of answers into a single algebraic expression you can manipulate like a polynomial.

In this chapter, you will learn to:

State and apply the pigeonhole principle, and use it to prove that collisions and repeats are unavoidable.
Sharpen it into the generalized pigeonhole principle to bound how full the fullest box must be.
Count distributions of identical objects with stars and bars, and recognize the integer-equation problems it solves.
Apply inclusion–exclusion to a union of any number of sets, and prove the formula counts each element exactly once.
Build a generating function for a counting problem and read off an answer as a coefficient.
Count derangements and surjections, two classic problems that fall to inclusion–exclusion.

Learning Paths

🏃 Fast Track: If you have seen pigeonhole before, skim 17.1–17.2 for the CS framing (hashing), then concentrate on 17.3 (stars and bars) and 17.4 (general inclusion–exclusion) — the two workhorses you will reuse most. Read the statements of the theorems in 17.5–17.6 and do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. Each 🔄 Check Your Understanding is a retrieval gate — answer it before continuing. Set up the integer equations in 17.3 yourself before reading the solutions, and write out the inclusion–exclusion proof in 17.4 from the strategy alone before checking ours.

🔬 Deep Dive: Generating functions (17.5) reward patience. After the chapter, use one to solve (not just verify) the Fibonacci recurrence, and prove the derangement recurrence $D_n = (n-1)(D_{n-1}+D_{n-2})$ combinatorially. The exercises flag these.

17.1 The pigeonhole principle (and hashing collisions)

Start with the cleanest possible version of the idea. You have 13 people in a room. Must two of them share a birth month? Yes — there are only 12 months, and 13 is more than 12, so at least two people were born in the same month. You did not need to know which month, or who; the conclusion is forced by the mismatch between 13 people and 12 months.

That is the entire content of the pigeonhole principle, named for its most homely illustration: if you have more pigeons than pigeonholes and every pigeon roosts in some hole, then some hole holds at least two pigeons.

The Pigeonhole Principle. If $n$ objects are placed into $m$ boxes and $n > m$, then at least one box contains at least two objects.

Equivalently, and this is the form that makes proofs flow: if a function $f \colon A \to B$ has $\lvert A\rvert > \lvert B\rvert$ (with $A$ and $B$ finite), then $f$ is not injective — two distinct inputs must map to the same output. (Injective functions were defined in Chapter 9; "injective" means no two inputs share an output. The pigeonhole principle says you simply cannot fit a large domain injectively into a small codomain.)

💡 Intuition: "Pigeons" are the things you are placing; "holes" are the categories they fall into. The whole skill is choosing what to make the pigeons and what to make the holes so that "more pigeons than holes" says something you care about. In the birthday example, people are pigeons and months are holes.

The proof is a single clean stroke of contradiction.

The strategy first. We want to show some box is crowded. Directly tracking where every object goes is hopeless — there are too many arrangements. Instead we assume the opposite of the conclusion (no box has two or more objects, i.e. every box has at most one) and show it forces too few objects to exist. The mismatch is the contradiction. This "assume everything is spread out, count, and overflow" pattern drives every pigeonhole proof.

Proof. Suppose, for contradiction, that $n$ objects are placed into $m$ boxes with $n > m$, yet no box contains two or more objects. Then every box contains at most one object. Summing over the $m$ boxes, the total number of objects is at most $1 + 1 + \cdots + 1 = m$. But the total number of objects is exactly $n$, so $n \le m$. This contradicts our assumption that $n > m$. Hence some box must contain at least two objects. $\blacksquare$

Now the payoff a programmer cares about.

🔗 Connection: why hash collisions are unavoidable. A hash function $h$ maps keys to slots: $h \colon K \to \{0, 1, \dots, m-1\}$, where $K$ is the set of possible keys and $m$ is the table size. If there are more possible keys than slots — and there always are; the set of all strings is infinite, while $m$ is some fixed array length — then $h$ cannot be injective. Two distinct keys $k_1 \ne k_2$ must satisfy $h(k_1) = h(k_2)$. That shared slot is a collision. No hash function, however brilliant, escapes this; the best you can do is spread keys evenly and handle the inevitable collisions gracefully. Chapter 26 builds hash tables — chaining, open addressing, load factors — directly on this guarantee.

Here is the principle expressed as a check in code. We are not avoiding collisions; we are detecting the one the pigeonhole principle promises must exist.

def find_collision(keys, m):
    """If len(keys) > m, the pigeonhole principle guarantees two keys
    share a slot under h(k) = k % m. Return such a pair, or None."""
    seen = {}                      # slot -> first key that landed there
    for k in keys:
        slot = k % m               # a toy hash function
        if slot in seen:
            return (seen[slot], k) # collision: two keys, same slot
        seen[slot] = k
    return None

print(find_collision([7, 14, 23, 100], m=5))
# Expected output:
# None

Walk the trace by hand — this discipline is exactly what the book insists on. $7 \bmod 5 = 2$ (store), $14 \bmod 5 = 4$ (store), $23 \bmod 5 = 3$ (store), $100 \bmod 5 = 0$ (store): four keys, four distinct slots, so the function returns None. That is not a contradiction of the principle — with only 4 keys and 5 slots we have $n \le m$, and pigeonhole makes no promise here. To force a collision we need $n > m$:

print(find_collision([7, 14, 23, 100, 12], m=5))
# Expected output:
# (7, 12)

Now there are 5 keys and 5 slots — still $n = m$, not $n > m$! But a collision can happen before the table is overfull, and indeed $7 \to 2$ and $12 \to 2$ collide. The lesson is exactly the logic of the theorem: $n > m$ guarantees a collision; $n \le m$ merely permits one. The principle gives a sufficient condition, not a necessary one.

⚠️ Common Pitfall: The pigeonhole principle tells you a collision must exist when $n > m$; it does not tell you collisions are rare when $n \le m$, and it does not tell you where the collision is. It is a pure existence result. Students often over-read it ("so with $\le m$ keys I'm safe") — you are not; you merely lack the guarantee of a forced collision. The Birthday Paradox (Chapter 20 territory) shows collisions become likely far below $n = m$.

The pigeonhole principle earns its reputation when the holes are not obvious — when you have to invent them. Here is the prototype of that creativity, a result that looks like it needs number theory but is pure pigeonhole.

🧩 Productive Struggle. Choose any 6 integers. Must two of them leave the same remainder when divided by 5? More pointedly: must some two of them have a difference that is divisible by 5? Try to find the pigeons and the holes before reading on.

Worked example (a forced divisible difference). Claim: among any 6 integers, some two have a difference divisible by 5.

The strategy first. "Two numbers differ by a multiple of 5" is the same as "two numbers have the same remainder mod 5," since $a - b$ is divisible by 5 exactly when $a$ and $b$ are congruent mod 5. There are only 5 possible remainders $\{0,1,2,3,4\}$ — those are our holes. The 6 integers are the pigeons. Six pigeons into five holes forces a repeat.

Proof. Each integer leaves one of exactly 5 remainders when divided by 5: $0, 1, 2, 3,$ or $4$. Treat the 6 chosen integers as objects and these 5 remainders as boxes, sorting each integer into the box of its remainder. With $6 > 5$, the pigeonhole principle guarantees two integers, say $a$ and $b$, fall in the same box — they share a remainder $r$. Then $a = 5q_1 + r$ and $b = 5q_2 + r$ for integers $q_1, q_2$, so $a - b = 5(q_1 - q_2)$, which is divisible by 5. $\blacksquare$

Notice the move: the integers themselves are not the holes — their remainders are. Whenever a pigeonhole problem mentions divisibility, "remainder mod $m$" is almost always the right set of holes. The only fact we used is that division by $m$ produces exactly $m$ possible remainders $\{0, 1, \dots, m-1\}$ — the modular arithmetic of Part IV will make a great deal more of this, but here it is just a tidy supply of holes.

🔄 Check Your Understanding 1. You have 367 people. Must two share a birthday (day of the year)? Identify the pigeons and the holes, and say why. 2. A drawer holds 10 red and 10 blue socks, unsorted in the dark. How many socks must you pull to guarantee a matching pair? What are the pigeons and holes? 3. Restate the pigeonhole principle as a statement about a function $f\colon A \to B$ failing to be injective.

Answers

Yes. Pigeons = 367 people; holes = 366 possible birthdays (including Feb 29). Since $367 > 366$, two people collide. 2. Three socks. Pigeons = the socks you pull; holes = the 2 colors. With 3 socks and 2 colors, $3 > 2$, so two share a color — a pair. (Two socks is not enough: you could draw one of each color.) 3. If $A$ and $B$ are finite and $\lvert A\rvert > \lvert B\rvert$, then $f$ is not injective: some two distinct elements of $A$ have the same image in $B$.

17.2 The generalized pigeonhole principle

The basic principle says "at least two." But often you want more: how crowded must the fullest box be? If 100 people walk into a room, the basic principle (100 > 12 months) guarantees two share a birth month. Surely we can say more than two. We can.

Spread 100 people as evenly as possible over 12 months and you get 8 in most months — but $12 \times 8 = 96 < 100$, leaving 4 people who must pile onto already-occupied months. So some month holds at least 9. The general principle makes this precise with a ceiling.

The Generalized Pigeonhole Principle. If $n$ objects are placed into $m$ boxes, then at least one box contains at least $\left\lceil \dfrac{n}{m} \right\rceil$ objects.

The ceiling $\lceil x \rceil$ (Chapter 9) is the smallest integer $\ge x$. For 100 people and 12 months, $\lceil 100/12 \rceil = \lceil 8.33\dots \rceil = 9$. The basic principle is the special case $n > m$, where $\lceil n/m \rceil \ge 2$.

The strategy first. Same engine as before: assume the conclusion fails, count, and overflow. If every box held fewer than $\lceil n/m \rceil$ objects — that is, at most $\lceil n/m \rceil - 1$ — then the total is at most $m(\lceil n/m \rceil - 1)$. We show that total is strictly less than $n$, which is impossible since the boxes hold all $n$ objects. The one technical step is the inequality $\lceil n/m \rceil < n/m + 1$.

Proof. Suppose, for contradiction, that every one of the $m$ boxes contains at most $\lceil n/m \rceil - 1$ objects. Then the total number of objects is at most $$m\left(\left\lceil \frac{n}{m}\right\rceil - 1\right).$$ Using the ceiling bound $\lceil n/m \rceil < n/m + 1$ (the ceiling never reaches a full integer above its argument), we get $$m\left(\left\lceil \frac{n}{m}\right\rceil - 1\right) < m\left(\frac{n}{m} + 1 - 1\right) = m \cdot \frac{n}{m} = n.$$ So the boxes together hold strictly fewer than $n$ objects — but they hold all $n$. Contradiction. Therefore at least one box contains at least $\lceil n/m\rceil$ objects. $\blacksquare$

A clean worked example, and then a famous one.

Worked example (load balancing). A load balancer distributes 1,000 incoming requests across 7 servers. No matter how it distributes them, some server handles at least how many requests? Compute $\lceil 1000/7 \rceil = \lceil 142.857\dots\rceil = 143$. So some server is guaranteed at least 143 requests — a hard lower bound on worst-case load that no balancing strategy can beat. This is exactly the kind of bound you cite when arguing about tail latency: you cannot make every server's load below $\lceil n/m \rceil$.

Worked example (the classic geometry one). Show that among any 5 points placed inside a unit square (side 1), some two are at distance at most $\frac{\sqrt{2}}{2}$.

The strategy first. The points are pigeons; we must invent holes. Cut the unit square into 4 smaller squares of side $\tfrac12$. With 5 points and 4 small squares, $\lceil 5/4\rceil = 2$, so some small square holds two points. The farthest two points in a square of side $\tfrac12$ can be is its diagonal, $\frac{\sqrt 2}{2}$.

Proof. Partition the unit square into four congruent sub-squares, each of side $\tfrac12$, by cutting at the midlines. There are 5 points and 4 sub-squares, so by the generalized pigeonhole principle some sub-square contains at least $\lceil 5/4 \rceil = 2$ of the points. Two points lying in the same sub-square of side $\tfrac12$ are separated by at most the length of that square's diagonal, which is $\sqrt{(\tfrac12)^2 + (\tfrac12)^2} = \sqrt{\tfrac12} = \frac{\sqrt 2}{2}$. Hence some two of the five points are within $\frac{\sqrt2}{2}$ of each other. $\blacksquare$

💡 Intuition: The hard part of a pigeonhole argument is almost never the principle — it is constructing the holes. In the geometry problem, the creative act was carving the square into four boxes so that "two points in one box" implied the distance bound. Train yourself to ask: what partition turns the thing I want into a "two-in-one-box" statement?

Here is the generalized principle in code, computing the guaranteed minimum and confirming it on an actual distribution.

from math import ceil

def guaranteed_fullest(n, m):
    """Minimum number the fullest of m boxes must hold for n objects."""
    return ceil(n / m)

def fullest_actual(distribution):
    """The actual max over a given list of box counts."""
    return max(distribution)

print(guaranteed_fullest(100, 12))          # 100 people, 12 months
print(fullest_actual([9, 8, 8, 8, 9, 8, 8, 8, 9, 9, 8, 8]))
# Expected output:
# 9
# 9

The second list sums to $9\cdot4 + 8\cdot8 = 36 + 64 = 100$ and its maximum is 9 — a distribution that meets the guaranteed bound exactly, showing the bound is tight (it cannot be raised to 10 in general).

🔄 Check Your Understanding 1. Among 30 students, at least how many share a birth month? Compute the guaranteed minimum. 2. How many cards must you draw from a standard 52-card deck to guarantee at least 3 of the same suit? 3. Why is the bound $\lceil n/m \rceil$ "tight" — what does it mean that we exhibited a distribution achieving it?

Answers

$\lceil 30/12\rceil = \lceil 2.5\rceil = 3$. At least 3 students share a month. 2. Holes = 4 suits; we want some hole to reach 3, i.e. $\lceil n/4\rceil \ge 3$, which first holds at $n = 9$ ($\lceil 9/4\rceil = 3$). With 8 cards you could have exactly 2 of each suit, so 9 is required. 3. Tight means the guarantee cannot be improved: there exists an arrangement in which the fullest box holds exactly $\lceil n/m\rceil$ and no more. Our 100-people distribution had maximum exactly 9, so "at least 9" cannot be strengthened to "at least 10."

17.3 Stars and bars: counting distributions

Permutations and combinations (Chapter 16) count arrangements and selections of distinct objects. But a huge class of real problems involves identical objects landing in distinct containers. How many ways can you distribute 10 identical software licenses among 4 distinct teams (a team may get zero)? How many ways can 100 identical requests be assigned to 7 servers? Combinations alone don't answer these — the objects are interchangeable, so "which license" is meaningless; only "how many per team" matters.

Concretely, count the ways to put 5 identical balls into 3 distinct boxes. List a few by their counts $(b_1, b_2, b_3)$: $(5,0,0)$, $(0,5,0)$, $(2,2,1)$, $(3,1,1)$, $(0,2,3)$, and so on. Enumerating all of them by hand is error-prone. There is a beautiful bijection that converts the problem into a combination you already know how to count.

The trick. Represent a distribution as a row of symbols: draw the balls as stars ($\star$) and use bars ($\mid$) to separate the boxes. For 5 balls in 3 boxes you need $3 - 1 = 2$ bars (two dividers create three regions). For instance:

$\star\star\mid\star\star\mid\star$ means 2 balls, 2 balls, 1 ball — the distribution $(2,2,1)$.
$\star\star\star\star\star\mid\mid$ means $(5,0,0)$.
$\mid\mid\star\star\star\star\star$ means $(0,0,5)$.

Every distribution corresponds to exactly one such row, and every row corresponds to exactly one distribution. So counting distributions = counting arrangements of stars and bars. The row has $5 + 2 = 7$ symbol positions; a distribution is fixed once you choose which 2 of the 7 positions hold the bars (the rest are stars). That is $\binom{7}{2} = 21$.

Stars and bars. The number of ways to distribute $n$ identical objects into $k$ distinct boxes (boxes may be empty) is $$\binom{n + k - 1}{k - 1} = \binom{n+k-1}{n}.$$ Equivalently, it is the number of non-negative integer solutions $(x_1, \dots, x_k)$ to the equation $$x_1 + x_2 + \cdots + x_k = n.$$

The two binomial forms are equal because $\binom{a}{b} = \binom{a}{a-b}$ (Chapter 16): you may count by choosing the $k-1$ bar positions or the $n$ star positions among $n+k-1$ slots.

The strategy first. We prove the formula by exhibiting a bijection between the things we want to count (distributions) and things we already count (binary strings with a fixed number of bars). A bijection (Chapter 9) shows two finite sets have equal size. The whole argument is: describe the encoding, then check it is reversible.

Proof. A distribution of $n$ identical objects into $k$ distinct boxes is determined by the counts $(x_1, \dots, x_k)$ with each $x_i \ge 0$ and $\sum x_i = n$. Encode such a distribution as a string of $n$ stars and $k-1$ bars: write $x_1$ stars, a bar, $x_2$ stars, a bar, …, a bar, then $x_k$ stars. This string has exactly $n + (k-1)$ symbols. Conversely, any string of $n$ stars and $k-1$ bars decodes to a unique distribution: read off the run-lengths of stars between consecutive bars (with empty runs giving $x_i = 0$). Encoding and decoding are inverse to each other, so the map is a bijection between distributions and such strings. A string is determined by choosing which $k-1$ of the $n+k-1$ positions are bars, of which there are $\binom{n+k-1}{k-1}$. Therefore the number of distributions equals $\binom{n+k-1}{k-1}$. $\blacksquare$

🔗 Connection: combinations with repetition. The same formula answers a question phrased differently: how many ways can you choose $n$ items from $k$ types, with repetition allowed and order irrelevant (a multiset of size $n$)? Picking $x_i$ copies of type $i$ is exactly distributing $n$ choices among $k$ type-boxes, so the count is again $\binom{n+k-1}{n}$. "Distribute identical objects into distinct boxes" and "choose with repetition from distinct types" are the same combinatorial problem wearing different clothes. (Chapter 16 counted selections without repetition as $\binom{n}{k}$; this is its with-repetition sibling.)

Worked example 1 (donut shop). A shop sells 4 kinds of donut. You buy a dozen (12 donuts), and you don't care about order. How many different boxes are possible? Choosing 12 donuts from 4 types with repetition: $n = 12$ choices, $k = 4$ types, so $\binom{12 + 4 - 1}{12} = \binom{15}{12} = \binom{15}{3} = \frac{15 \cdot 14 \cdot 13}{6} = 455$.

Worked example 2 (with a lower bound). How many non-negative integer solutions does $x_1 + x_2 + x_3 + x_4 = 20$ have if we require $x_1 \ge 3$? Don't re-derive a formula — substitute. Let $y_1 = x_1 - 3 \ge 0$. Then $y_1 + x_2 + x_3 + x_4 = 17$, an unconstrained non-negative problem: $\binom{17 + 4 - 1}{4 - 1} = \binom{20}{3} = \frac{20 \cdot 19 \cdot 18}{6} = 1140$.

⚠️ Common Pitfall: positive vs. non-negative. Stars and bars as stated allows empty boxes ($x_i \ge 0$). If every box must be non-empty ($x_i \ge 1$), first place one object in each box (using $k$ objects), then distribute the remaining $n - k$ freely: the count becomes $\binom{(n-k) + k - 1}{k-1} = \binom{n-1}{k-1}$. Always check whether zeros are allowed before reaching for a formula. Mixing these up is the single most common stars-and-bars error.

Here is the count in code, built from a combination function exactly like the one in your Toolkit.

from math import comb

def stars_and_bars(n, k):
    """Ways to distribute n identical objects into k distinct boxes,
    equivalently #non-negative integer solutions to x_1+...+x_k = n."""
    return comb(n + k - 1, k - 1)

print(stars_and_bars(5, 3))    # 5 balls, 3 boxes
print(stars_and_bars(12, 4))   # a dozen donuts, 4 types
# Expected output:
# 21
# 455

🔄 Check Your Understanding 1. How many ways can 8 identical pieces of candy be given to 3 children (a child may get none)? Set up the equation and compute. 2. How many ways if each child must get at least one piece? 3. You roll 5 indistinguishable dice. How many distinct outcomes (multisets of face values) are there? (Hint: 6 types, choose 5 with repetition.)

Answers

$x_1 + x_2 + x_3 = 8$, non-negative: $\binom{8+3-1}{3-1} = \binom{10}{2} = 45$. 2. Give each child one first ($3$ used), distribute $5$ more: $\binom{5+3-1}{2} = \binom{7}{2} = 21$ (equivalently $\binom{8-1}{3-1} = \binom72$). 3. Choose 5 from 6 types with repetition: $\binom{5+6-1}{5} = \binom{10}{5} = 252$.

17.4 General inclusion–exclusion

Chapter 15 introduced inclusion–exclusion for two and three sets: $$\lvert A \cup B\rvert = \lvert A\rvert + \lvert B\rvert - \lvert A\cap B\rvert.$$ You add the parts, then subtract the overlap you double-counted. For three sets you add the three singles, subtract the three pairwise overlaps, then add back the triple overlap (which you over-subtracted). The pattern of signs — add singles, subtract pairs, add triples, subtract quadruples — continues for any number of sets. Making that precise is the general inclusion–exclusion principle.

Why do you need it? Counting things that satisfy at least one of several conditions, where the conditions overlap. How many integers from 1 to 1000 are divisible by 2, 3, or 5? How many passwords violate at least one of several composition rules? How many permutations leave at least one element fixed? Each is a union of overlapping sets, and naïvely adding the pieces over-counts every element that lands in more than one set.

The General Inclusion–Exclusion Principle. For finite sets $A_1, A_2, \dots, A_n$, $$\left\lvert \bigcup_{i=1}^{n} A_i \right\rvert = \sum_{i} \lvert A_i\rvert \; - \sum_{i
The strategy first. We do not prove this by induction (though one can). The cleanest argument is to track a single element $x$ and show the right-hand side counts it exactly once if $x$ is in the union, and zero times if it isn't. The total count on the right is the sum over all elements of "how many times this element is counted," so if every element of the union is counted once, the right side equals the size of the union. The crisp identity that makes the bookkeeping work is the alternating sum of binomial coefficients $\sum_{t=0}^{s}(-1)^t\binom{s}{t} = 0$, which is itself a snapshot of the binomial theorem (Chapter 16).

Proof. Both sides count elements of $\bigcup_i A_i$; we show each element contributes equally to each side. Take any element $x$.

Case 1: $x \notin \bigcup_i A_i$. Then $x$ is in none of the $A_i$, so it appears in none of the intersections; it contributes $0$ to every term on the right, and $0$ to the left. Both sides agree.

Case 2: $x \in \bigcup_i A_i$. The left side counts $x$ exactly once. For the right side, let $s \ge 1$ be the number of the sets $A_i$ that contain $x$. Then $x$ lies in a $t$-fold intersection $A_{i_1} \cap \cdots \cap A_{i_t}$ exactly when all $t$ chosen sets are among those $s$ — which happens for $\binom{s}{t}$ of the $t$-fold terms. So $x$'s total contribution to the right side is $$\binom{s}{1} - \binom{s}{2} + \binom{s}{3} - \cdots + (-1)^{s+1}\binom{s}{s} = \sum_{t=1}^{s} (-1)^{t+1}\binom{s}{t}.$$ By the binomial theorem, $\sum_{t=0}^{s}(-1)^{t}\binom{s}{t} = (1 - 1)^s = 0$. Pulling out the $t = 0$ term (which is $\binom{s}{0} = 1$) gives $\sum_{t=1}^{s}(-1)^{t}\binom{s}{t} = -1$, hence $\sum_{t=1}^{s}(-1)^{t+1}\binom{s}{t} = 1$. So $x$ contributes exactly $1$ to the right side — matching the left. Since every element contributes equally to both sides, the two sides are equal. $\blacksquare$

💡 Intuition: Picture the over-counting as a tug-of-war. Adding all the singles counts an element in two sets twice; subtracting the pairs pulls it back, but an element in three sets gets subtracted too hard; adding the triples corrects again. The alternating sum $1 - 1 + 1 - \cdots$ for any element settles to exactly $1$. Inclusion–exclusion is the precise accounting that makes every element pay for itself once and only once.

Worked example (divisibility). How many integers in $\{1, 2, \dots, 1000\}$ are divisible by 2, 3, or 5? Let $A$, $B$, $C$ be the sets divisible by 2, 3, 5 respectively. Count by inclusion–exclusion, using $\lvert A\rvert = \lfloor 1000/d\rfloor$ for divisor $d$ (the floor counts multiples of $d$ up to 1000, from Chapter 9):

$$\lvert A\rvert = \lfloor 1000/2\rfloor = 500,\quad \lvert B\rvert = \lfloor 1000/3\rfloor = 333,\quad \lvert C\rvert = \lfloor 1000/5\rfloor = 200.$$

Pairwise intersections are multiples of the products (since 2, 3, 5 are pairwise coprime, "divisible by 2 and 3" means "divisible by 6"):

$$\lvert A\cap B\rvert = \lfloor 1000/6\rfloor = 166,\quad \lvert A\cap C\rvert = \lfloor 1000/10\rfloor = 100,\quad \lvert B\cap C\rvert = \lfloor 1000/15\rfloor = 66.$$

Triple intersection: multiples of $2\cdot3\cdot5 = 30$, so $\lvert A\cap B\cap C\rvert = \lfloor 1000/30\rfloor = 33$. Assemble:

$$\lvert A\cup B\cup C\rvert = (500 + 333 + 200) - (166 + 100 + 66) + 33 = 1033 - 332 + 33 = 734.$$

So 734 of the first 1000 positive integers are divisible by at least one of 2, 3, 5 — and therefore $1000 - 734 = 266$ are divisible by none of them (coprime to 30). That complementary count is itself a useful pattern, and it powers the next two sections.

Here is the three-set computation in code.

def inclusion_exclusion_3(A, B, C):
    """|A ∪ B ∪ C| from the sizes of all intersections, via inclusion-exclusion.
    Inputs are Python sets; we use len() of actual intersections to stay honest."""
    return (len(A) + len(B) + len(C)
            - len(A & B) - len(A & C) - len(B & C)
            + len(A & B & C))

upto = set(range(1, 1001))
A = {x for x in upto if x % 2 == 0}
B = {x for x in upto if x % 3 == 0}
C = {x for x in upto if x % 5 == 0}
print(inclusion_exclusion_3(A, B, C))
# Expected output:
# 734

🔄 Check Your Understanding 1. In the proof, where exactly did the binomial theorem get used, and what fact about $(1-1)^s$ did it give? 2. For two sets, the general formula collapses to which familiar identity? 3. Of the first 1000 positive integers, how many are divisible by none of 2, 3, 5? (Use the worked example.)

Answers

To evaluate $\sum_{t=0}^{s}(-1)^t\binom{s}{t}$ as $(1-1)^s = 0$; subtracting the $t=0$ term then forced the alternating sum from $t=1$ to equal $1$. 2. $\lvert A\cup B\rvert = \lvert A\rvert + \lvert B\rvert - \lvert A\cap B\rvert$ (singles minus the one pair; no higher terms). 3. $1000 - 734 = 266$.

17.5 Generating functions: counting with algebra

Every tool so far gives a number. Generating functions give something stranger and more powerful: they pack an entire sequence of counts into a single algebraic object, turning combinatorial questions into algebra you can do by hand. The slogan, due to Herbert Wilf, is that a generating function is "a clothesline on which we hang up a sequence of numbers for display."

Start concretely. Suppose you want to choose some subset of three distinct items $\{a, b, c\}$ and ask: how many subsets have size $0$, size $1$, size $2$, size $3$? You already know the answers are $\binom{3}{0}, \binom{3}{1}, \binom{3}{2}, \binom{3}{3} = 1, 3, 3, 1$. Watch what happens if we model "include or exclude" each item as the factor $(1 + x)$, where $x$ tracks "this item was chosen": $$(1 + x)(1 + x)(1 + x) = (1+x)^3 = 1 + 3x + 3x^2 + x^3.$$ The coefficient of $x^k$ is the number of ways to choose $k$ items. The algebra did the counting. Each factor $(1 + x)$ offered a choice — take the $1$ (exclude, contributing $x^0$) or the $x$ (include, contributing $x^1$) — and multiplying out the polynomial enumerated every combination of choices, sorting them by total size into powers of $x$. This is the entire idea.

Generating function. The ordinary generating function (OGF) of a sequence $a_0, a_1, a_2, \dots$ is the formal power series $$A(x) = \sum_{n=0}^{\infty} a_n x^n = a_0 + a_1 x + a_2 x^2 + \cdots.$$ Here $x$ is a formal symbol — we do not plug in numbers or worry about convergence; $x^n$ is just a label marking the $n$-th term. To count with a generating function, build $A(x)$ so that $a_n$ is the answer to your problem for size $n$, then read off the coefficient of $x^n$, written $[x^n]A(x)$.

Two building-block series do most of the work, and both deserve to be memorized:

Series	Closed form	What it models
$1 + x + x^2 + x^3 + \cdots$	$\dfrac{1}{1-x}$	one box that can hold any number ($0,1,2,\dots$) of identical items
$(1+x)^n = \sum_{k=0}^{n}\binom{n}{k}x^k$	$(1+x)^n$	$n$ distinct items, each in or out

The first is the geometric series (Chapter 11) read as a formal identity: $(1-x)\sum_{n\ge0}x^n = 1$, so $\sum_{n\ge0}x^n = \frac{1}{1-x}$. The factor $\frac{1}{1-x}$ is the generating function for "an unlimited supply of one thing."

Worked example (making change, small). In how many ways can you make $n$ cents using pennies (1¢) and nickels (5¢), order irrelevant? Pennies contribute $1 + x + x^2 + \cdots = \frac{1}{1-x}$ (any number of pennies); nickels contribute $1 + x^5 + x^{10} + \cdots = \frac{1}{1-x^5}$ (any number of nickels, each worth 5). The product $$\frac{1}{1-x}\cdot\frac{1}{1-x^5} = (1 + x + x^2 + \cdots)(1 + x^5 + x^{10} + \cdots)$$ has, as its coefficient of $x^n$, the number of ways to choose a count of pennies and a count of nickels summing to $n$ cents. For $n = 7$: the ways are $7$ pennies ($x^7 \cdot x^0$) or $2$ pennies $+1$ nickel ($x^2\cdot x^5$). The coefficient of $x^7$ is $2$ — matching the two ways. The generating function encodes the answer for every $n$ at once.

Worked example (stars and bars, recovered). Section 17.3 distributed $n$ identical objects into $k$ boxes. Each box can hold any number of objects, so each box contributes $\frac{1}{1-x}$, and $k$ boxes contribute $$\left(\frac{1}{1-x}\right)^k = \frac{1}{(1-x)^k} = \sum_{n=0}^{\infty}\binom{n+k-1}{k-1}x^n.$$ The coefficient of $x^n$ is $\binom{n+k-1}{k-1}$ — exactly the stars-and-bars formula. The generating function gives a second proof: the algebra of $\frac{1}{(1-x)^k}$ already knows how to distribute identical objects. (The expansion of $\frac{1}{(1-x)^k}$ is the generalized binomial series; you can take the coefficient formula as given here and verify it for small $k$ in the exercises.)

💡 Intuition: A generating function turns "combine independent choices and total them up" into "multiply polynomials." Multiplication of power series is the operation of forming all combinations of one term from each factor and grouping by the total exponent — which is precisely what counting a distribution does. Constraints become which powers of $x$ a factor is allowed to contribute: "at most 3 of this item" is the factor $1 + x + x^2 + x^3$; "an even number" is $1 + x^2 + x^4 + \cdots$.

Let's let the computer multiply two polynomials and read a coefficient — using only list arithmetic, so you see the convolution that power-series multiplication performs.

def poly_mult(p, q):
    """Multiply two polynomials given as coefficient lists (index = power)."""
    result = [0] * (len(p) + len(q) - 1)
    for i, a in enumerate(p):
        for j, b in enumerate(q):
            result[i + j] += a * b      # x^i * x^j contributes to x^(i+j)
    return result

# pennies: 1 + x + ... + x^7 ; nickels: 1 + x^5  (enough terms to read [x^7])
pennies = [1, 1, 1, 1, 1, 1, 1, 1]      # coefficients of x^0..x^7
nickels = [1, 0, 0, 0, 0, 1]            # coefficients of x^0..x^5
ways = poly_mult(pennies, nickels)
print(ways[7])                          # coefficient of x^7
# Expected output:
# 2

The coefficient ways[7] is 2 because the only ways to reach $x^7$ are $x^7\cdot x^0$ (7 pennies) and $x^2\cdot x^5$ (2 pennies and 1 nickel) — the two ways to make 7¢.

Worked example (a constrained selection). A vending machine restocker must load $n$ items into a slot, choosing from three products under rules: an even number of waters (including zero), at most two sodas, and any number of juices. How many loadings total exactly $n$ items? Translate each rule into a factor, letting $x$ track item count:

waters, even count: $1 + x^2 + x^4 + \cdots = \dfrac{1}{1 - x^2}$,
sodas, at most two: $1 + x + x^2$,
juices, any count: $\dfrac{1}{1-x}$.

The generating function is the product $$G(x) = \frac{1}{1-x^2}\cdot(1 + x + x^2)\cdot\frac{1}{1-x},$$ and the number of valid loadings of size $n$ is $[x^n]G(x)$. For $n = 3$, multiply just enough low-order terms: $\frac{1}{1-x^2} = 1 + x^2 + \cdots$ and $\frac{1}{1-x} = 1 + x + x^2 + x^3 + \cdots$. The contributions to $x^3$ come from picking exponents from the three factors that sum to 3. Enumerating $(\text{water}, \text{soda}, \text{juice})$ exponents with water even, soda $\le 2$: $(0,0,3), (0,1,2), (0,2,1), (2,0,1), (2,1,0)$ — five loadings, so $[x^3]G(x) = 5$. The discipline is always the same: each rule is a factor; legal sizes are the exponents a factor may contribute; multiply and read the coefficient.

⚠️ Common Pitfall: When you "read off a coefficient," make sure you have kept enough terms of each factor. To extract $[x^n]$, every factor only needs its terms up to $x^n$ — higher terms can never contribute to $x^n$ — but dropping a needed low term silently undercounts. In code (below), make each coefficient list long enough to reach the power you want, then index it.

🔗 Connection: generating functions solve recurrences. Generating functions are not only a counting device; they are the systematic method for solving recurrence relations. In Chapters 18–19 you will meet the Fibonacci recurrence $F_n = F_{n-1} + F_{n-2}$. Multiplying its generating function out and doing partial fractions yields the closed form with the golden ratio — no guessing required. The same algebra that counts distributions here will, two chapters from now, hand you exact formulas for sequences. (We flag this method now; Chapter 19 develops it in full.)

🔄 Check Your Understanding 1. What is the generating function for "an unlimited supply of one kind of object," and what is its coefficient of $x^n$? 2. You may take 0, 1, or 2 of item A and any number of item B. Write the product of generating functions whose $x^n$ coefficient counts the ways to total $n$. 3. In the change example, what does the coefficient of $x^{10}$ count, and what is it?

Answers

$\frac{1}{1-x} = \sum_{n\ge0}x^n$; the coefficient of $x^n$ is $1$ (one way to take exactly $n$ copies). 2. $(1 + x + x^2)\cdot\frac{1}{1-x}$. 3. The number of ways to make 10¢ with pennies and nickels: $10$p, or $5$p$+1$n, or $0$p$+2$n — three ways, so $[x^{10}] = 3$.

17.6 Applications: derangements and surjection counts

Two classic problems show inclusion–exclusion at full power, and both have direct CS echoes (secret-santa assignments, counting onto functions / surjective hashes).

Derangements: the hat-check problem

$n$ guests check $n$ hats. A careless attendant returns the hats at random. What is the chance that no one gets their own hat back? A permutation in which no element stays in its original position is a derangement.

Derangement. A derangement of $n$ objects is a permutation with no fixed point — no object maps to its own position. The number of derangements of $n$ objects is written $D_n$.

Counting derangements directly is hard; counting permutations with at least one fixed point is an inclusion–exclusion natural, and derangements are the complement.

The strategy first. Let $A_i$ be the set of permutations that fix position $i$ (guest $i$ gets their own hat). A permutation has at least one fixed point iff it lies in $A_1 \cup \cdots \cup A_n$. We compute that union by inclusion–exclusion — and the intersection sizes are easy: fixing a specific set of $t$ positions leaves the other $n-t$ to be permuted freely, in $(n-t)!$ ways. Then $D_n = n! - \lvert\bigcup A_i\rvert$.

Derivation. There are $n!$ permutations total. For a fixed set of $t$ positions, the permutations fixing all of them number $(n-t)!$, and there are $\binom{n}{t}$ such sets, so the $t$-fold term of inclusion–exclusion is $\binom{n}{t}(n-t)!= \frac{n!}{t!}$. Therefore $$\left\lvert \bigcup_{i=1}^{n} A_i\right\rvert = \sum_{t=1}^{n} (-1)^{t+1}\binom{n}{t}(n-t)! = \sum_{t=1}^{n}(-1)^{t+1}\frac{n!}{t!}.$$ The derangements are the permutations with no fixed point, i.e. the complement: $$D_n = n! - \left\lvert\bigcup_i A_i\right\rvert = n!\left(1 - \sum_{t=1}^{n}\frac{(-1)^{t+1}}{t!}\right) = n!\sum_{t=0}^{n}\frac{(-1)^t}{t!}.$$

So $D_n = n!\left(1 - \frac{1}{1!} + \frac{1}{2!} - \frac{1}{3!} + \cdots + \frac{(-1)^n}{n!}\right)$. Compute a few: $D_1 = 0$, $D_2 = 1$, $D_3 = 2$, $D_4 = 9$, $D_5 = 44$.

🚪 Threshold Concept: the appearance of $1/e$. The fraction of permutations that are derangements is $$\frac{D_n}{n!} = \sum_{t=0}^{n}\frac{(-1)^t}{t!} \;\longrightarrow\; \sum_{t=0}^{\infty}\frac{(-1)^t}{t!} = e^{-1} \approx 0.3679.$$ The right-hand sum is the Taylor series for $e^{-1}$. So as $n$ grows, the probability that a random shuffle leaves nothing in place converges to $1/e$ — about $37\%$ — and it does so almost instantly: it is already $0.375$ at $n = 4$. A purely combinatorial question (no calculus in the setup) produces the constant $e$ in its answer. Moments where a constant from one branch of mathematics appears, unbidden, in another are worth pausing on: they are evidence that the structures are deeper than any one application. You will see $e$ again when we analyze hashing and randomized algorithms.

There is also a tidy recurrence, $D_n = (n-1)(D_{n-1} + D_{n-2})$, with $D_0 = 1$, $D_1 = 0$ — provable combinatorially (an exercise in the Deep Dive path) and convenient for computation without factorials.

def derangements(n):
    """D_n via the recurrence D_n = (n-1)(D_{n-1}+D_{n-2}), D_0=1, D_1=0."""
    if n == 0:
        return 1
    if n == 1:
        return 0
    prev2, prev1 = 1, 0            # D_0, D_1
    for k in range(2, n + 1):
        prev2, prev1 = prev1, (k - 1) * (prev1 + prev2)
    return prev1

print([derangements(n) for n in range(7)])
# Expected output:
# [1, 0, 1, 2, 9, 44, 265]

Trace the loop for $n=4$: start $(D_0,D_1)=(1,0)$. $k=2$: $D_2 = 1\cdot(0+1)=1$, state $(0,1)$. $k=3$: $D_3 = 2\cdot(1+0)=2$, state $(1,2)$. $k=4$: $D_4 = 3\cdot(2+1)=9$, state $(2,9)$. Returns $9$. ✓

Counting surjections (onto functions)

How many functions from an $n$-element set onto an $m$-element set are surjective (every target hit at least once)? Surjections were defined in Chapter 9; counting them is, again, an inclusion–exclusion over the "missed" targets.

The strategy first. Total functions from an $n$-set to an $m$-set number $m^n$ (each of $n$ inputs independently picks one of $m$ outputs — the product rule of Chapter 15). A function fails to be onto if it misses at least one target. Let $A_i$ be the functions that miss target $i$. We want $m^n$ minus the size of $\bigcup A_i$. Missing a specific set of $j$ targets leaves $m - j$ allowed outputs, giving $(m-j)^n$ functions, and there are $\binom{m}{j}$ such sets.

Derivation. By inclusion–exclusion on the targets missed, $$\#\text{surjections}(n \to m) = \sum_{j=0}^{m} (-1)^j \binom{m}{j}(m - j)^n.$$ (The $j=0$ term is $m^n$, all functions; each later term corrects for over- or under-counted "misses.")

Check on small numbers. Surjections from a 3-set onto a 2-set: $\sum_{j=0}^{2}(-1)^j\binom{2}{j}(2-j)^3 = \binom20 2^3 - \binom21 1^3 + \binom22 0^3 = 8 - 2 + 0 = 6$. Direct check: there are $2^3 = 8$ functions from $\{a,b,c\}$ to $\{1,2\}$; exactly $2$ are constant (everything to 1, or everything to 2) and thus not onto, leaving $8 - 2 = 6$ surjections. The formula agrees.

from math import comb

def surjections(n, m):
    """Number of surjective (onto) functions from an n-set to an m-set,
    by inclusion-exclusion over the missed targets."""
    return sum((-1)**j * comb(m, j) * (m - j)**n for j in range(m + 1))

print(surjections(3, 2))    # 3-set onto 2-set
print(surjections(4, 3))    # 4-set onto 3-set
# Expected output:
# 6
# 36

For $n=4, m=3$: $\binom30 3^4 - \binom31 2^4 + \binom32 1^4 - \binom33 0^4 = 81 - 3\cdot16 + 3\cdot1 - 0 = 81 - 48 + 3 = 36$. ✓

🔗 Connection. Surjection counts are the bridge to Stirling numbers of the second kind and to the analysis of how full a hash table gets — "how many functions hit every slot" is the onto question in disguise. The same inclusion–exclusion shape recurs whenever you count "configurations that cover everything," from test-suite coverage to coupon-collector problems (Chapter 20).

🔄 Check Your Understanding 1. Why is counting derangements done as a complement of an inclusion–exclusion, rather than directly? 2. Compute the probability that a random permutation of 5 items is a derangement, as a fraction $D_5/5!$, and compare it to $1/e$. 3. How many surjections are there from a 3-element set onto a 3-element set, and why must the answer be $3!$?

Answers

"No fixed point" is hard to enforce directly, but "at least one fixed point" is a clean union $\bigcup A_i$ whose intersection sizes are easy ($(n-t)!$); derangements are everything else. 2. $D_5/5! = 44/120 = 0.3667$, extremely close to $1/e \approx 0.3679$. 3. A surjection from a set onto another set of the same size must be a bijection (no target can be missed without forcing a collision that wastes a target), and the bijections of a 3-set number $3! = 6$. The formula gives $3^3 - 3\cdot2^3 + 3\cdot1^3 - 0 = 27 - 24 + 3 = 6$. ✓

Project Checkpoint: distribution and derangement counts for the Toolkit

The combinatorics.py module began in Chapter 15 and gained the permutation/combination machinery in Chapter 16 (perm_count, comb_count, permutations, combinations). This chapter adds the advanced counters — the ones that answer "how many distributions?" and "how many derangements?" — so that by the capstone your Toolkit can size any of the combinatorial spaces this book has met. None of these functions executes anything heavy; each is a direct transcription of a formula you proved above.

Add to dmtoolkit/combinatorics.py:

from math import comb

def stars_and_bars(n, k):
    """Ways to distribute n identical objects into k distinct boxes (boxes
    may be empty); equivalently #non-negative integer solutions of
    x_1 + ... + x_k = n. Proved by the stars-and-bars bijection (Ch. 17)."""
    return comb(n + k - 1, k - 1)

def derangement_count(n):
    """Number of permutations of n objects with NO fixed point (Ch. 17),
    via the recurrence D_n = (n-1)(D_{n-1}+D_{n-2}), D_0=1, D_1=0."""
    if n == 0:
        return 1
    a, b = 1, 0                       # D_0, D_1
    for k in range(2, n + 1):
        a, b = b, (k - 1) * (a + b)
    return b if n >= 1 else a

# Distribute 12 identical donuts among 4 types; count derangements of 5 hats.
print(stars_and_bars(12, 4))
print(derangement_count(5))
# Expected output:
# 455
# 44

stars_and_bars(12, 4) = comb(15, 3) = 455, the donut count from 17.3; derangement_count(5) = 44, the hat-check count from 17.6. These compose cleanly with Chapter 16's comb_count (both reduce to binomial coefficients) and feed the capstone in two ways: sizing a brute-force search space before you commit to enumerating it (Chapter 16's theme, sharpened), and — via derangement_count and the surjection idea — reasoning about the probability questions that open Chapter 20. When your code can count a configuration space exactly, you can decide whether a problem is tractable to enumerate, estimable by simulation, or only reachable by proof — the recurring judgment this book is training.

Toolkit so far: logic.py (Ch. 1–3), recurrences.py (begun Ch. 6), sets.py (Ch. 8), and a now-substantial combinatorics.py (Ch. 15–17): counting rules, permutations, combinations, distributions, and derangements. Chapter 18 turns to recurrences.py in earnest.

Summary

This chapter added four counting tools that reach past direct enumeration. Reference table:

Tool	Counts	Formula / statement	Key move
Pigeonhole principle	existence of a collision	$n > m$ objects/boxes $\Rightarrow$ some box has $\ge 2$	choose pigeons & holes; argue by contradiction
Generalized pigeonhole	fullest box	$n$ objects, $m$ boxes $\Rightarrow$ some box has $\ge \lceil n/m\rceil$	construct holes so "2 in one box" means something
Stars and bars	distributions of identical objects	non-neg. solutions of $x_1+\cdots+x_k=n$ is $\binom{n+k-1}{k-1}$	bijection: stars = objects, bars = dividers
Inclusion–exclusion (general)	size of a union of $n$ sets	$\bigl\lvert\bigcup A_i\bigr\rvert=\sum\lvert A_i\rvert-\sum\lvert A_i\cap A_j\rvert+\cdots$	each element counted once via $\sum(-1)^t\binom st=0$
Generating functions	a whole sequence at once	$A(x)=\sum a_n x^n$; answer $=[x^n]A(x)$	model choices as polynomial factors; multiply

Key results and special cases:

Pigeonhole ⇒ hash collisions are unavoidable whenever the key space exceeds the slot count (a function from a larger finite set to a smaller one cannot be injective). Sets up Chapter 26.
Stars and bars, two flavors: boxes may be empty $\to \binom{n+k-1}{k-1}$; every box non-empty $\to \binom{n-1}{k-1}$ (place one per box first). Same formula counts combinations with repetition (multisets) — choosing $n$ from $k$ types is $\binom{n+k-1}{n}$.
Inclusion–exclusion sign rule: $t$-fold intersections enter with sign $(-1)^{t+1}$; "divisible by at least one of $2,3,5$" in $1..1000$ is $734$.
Building-block generating functions: $\dfrac{1}{1-x}=\sum x^n$ (unlimited supply of one thing); $(1+x)^n=\sum\binom nk x^k$ ($n$ distinct items, in/out); $\dfrac{1}{(1-x)^k}=\sum\binom{n+k-1}{k-1}x^n$ (recovers stars and bars).
Derangements: $D_n=n!\sum_{t=0}^n\frac{(-1)^t}{t!}$; recurrence $D_n=(n-1)(D_{n-1}+D_{n-2})$; $D_n/n!\to 1/e$. Values $D_1..D_5 = 0,1,2,9,44$.
Surjections ($n$-set onto $m$-set): $\sum_{j=0}^m(-1)^j\binom mj(m-j)^n$.
Toolkit additions: stars_and_bars(n,k), derangement_count(n) in combinatorics.py.

These four tools, with Chapter 16's permutations and combinations, are the counting half of Part III. The probability of Chapters 20–21 rests on them: a probability is, at bottom, a count of favorable outcomes over a count of all outcomes.

Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

(Ch. 16) Give the formula for $\binom{n}{k}$ in terms of factorials, and use it to compute $\binom{15}{3}$ — the number we needed for the dozen-donuts problem.

(Ch. 16) State the binomial theorem. Which special case of it did the inclusion–exclusion proof rely on, and what value does it give?

(Ch. 15) State the product rule and the sum rule. In counting all functions from an $n$-set to an $m$-set (the start of the surjection count), which rule gives $m^n$, and why?

(Ch. 15) Inclusion–exclusion for two sets was introduced in Chapter 15. Write it, and explain in one sentence how this chapter's general version reduces to it.

Answers

$\binom{n}{k} = \dfrac{n!}{k!\,(n-k)!}$. So $\binom{15}{3} = \dfrac{15!}{3!\,12!} = \dfrac{15\cdot14\cdot13}{6} = 455$. 2. The binomial theorem: $(x+y)^n = \sum_{k=0}^{n}\binom{n}{k}x^k y^{n-k}$. The proof used $x = -1, y = 1$, giving $\sum_{t=0}^{s}(-1)^t\binom{s}{t} = (1-1)^s = 0$. 3. The product rule: each of the $n$ inputs independently chooses one of $m$ outputs, and independent sequential choices multiply, giving $m\cdot m\cdots m = m^n$. 4. $\lvert A\cup B\rvert = \lvert A\rvert + \lvert B\rvert - \lvert A\cap B\rvert$. The general formula's terms beyond the pairwise one all involve intersections of $\ge 3$ sets, which are empty/absent when there are only two sets, leaving exactly singles minus the single pair.

What's Next

You can now count arrangements, selections, distributions, unions with overlaps, and entire sequences-of-counts via generating functions — and you have met, in the derangement's $1/e$, the first hint that counting and the continuous constants of analysis are secretly connected. The natural next question is dynamic: many quantities are defined not by a closed formula but by how each value depends on earlier ones — the way $F_n$ depends on $F_{n-1}$ and $F_{n-2}$. Those are recurrence relations, and Chapter 18 shows how to model problems with them (the Tower of Hanoi, tilings, and our running Fibonacci thread) and begins the systematic theory of solving them — a theory whose most powerful method, you have just glimpsed, is the generating function.