Chapter 15: The Basics of Counting

DataField.Dev

38 min read

How many passwords of length 8 are there over the printable ASCII characters? How many distinct

Prerequisites

8

Learning Objectives

Apply the product rule to count sequences of independent choices, and justify why you multiply.
Apply the sum rule to count over disjoint cases, and recognize when cases are genuinely disjoint.
Use inclusion-exclusion to count the union of two or three overlapping sets without double-counting.
Apply the division rule to count when each object has been over-counted a fixed number of times.
Size an input space or keyspace for a real program, and convert that size into a security or feasibility argument.

In This Chapter

Overview
Learning Paths
15.1 Why count? Passwords, test cases, and states
15.2 The product rule
15.3 The sum rule
15.4 Inclusion-exclusion: counting overlaps correctly
15.5 The division rule
15.6 Counting the size of an input space
Project Checkpoint: starting combinatorics.py
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 15: The Basics of Counting

"The art of counting is the art of not listing." — a teaching maxim of combinatorics

Overview

How many passwords of length 8 are there over the printable ASCII characters? How many distinct states can a 32-bit register hold? How many test cases would you need to exhaustively check a function that takes three boolean flags and a byte? Every one of these is a counting question, and every one of them has an answer you can compute in seconds — if you never try to list the possibilities.

That "if" is the whole subject. The naive way to count is to enumerate: write out every option and tally them. For a 4-digit PIN that almost works (you could, in principle, list all 10,000). For an 8-character password it is hopeless — there are more than six quadrillion of them, and a program that tried to list them would outlast your career. Combinatorics is the branch of mathematics that counts without listing, by recognizing the structure of the choices and turning that structure into arithmetic. It is the foundation of algorithm analysis (how many operations does this loop run?), cryptography (how long would a brute-force attack take?), probability (how many outcomes are there?), and data structures (how many distinct hashes can this function produce?).

This chapter builds the four counting rules that everything else rests on. They sound almost too simple to need names — multiply when you make a sequence of choices; add when you split into cases; subtract the overlap; divide out the over-count — but applied carefully they let you size spaces with billions of elements on the back of an envelope. The discipline is entirely in modeling: deciding, for a given problem, which rule applies and why. Get the model right and the arithmetic is trivial. Get it wrong and you will confidently compute a number that is off by orders of magnitude. We will spend most of our energy on the modeling.

In this chapter, you will learn to:

Count a sequence of independent choices with the product rule, and explain why multiplication is the right operation.
Count across mutually exclusive cases with the sum rule, and check that the cases really are disjoint.
Count the size of a union of overlapping sets with inclusion-exclusion, for two sets and three.
Recognize over-counting and correct it with the division rule.
Size a real input space or password keyspace, and turn that count into a feasibility argument — the same reasoning that decides whether a brute-force attack is practical.

Learning Paths

🏃 Fast Track: If multiplication-and-addition counting already feels natural, skim 15.2 and 15.3, then concentrate on 15.4 (inclusion-exclusion) and 15.5 (the division rule), which are where careful counters and careless ones part ways. Do the ⭐⭐ and ⭐⭐⭐ exercises and the keyspace material in 15.6.

📖 Standard Path: Read in order. Work every 🔄 Check Your Understanding before continuing, and try each 🧩 Productive Struggle before reading the resolution. The four rules build on each other.

🔬 Deep Dive: After the chapter, prove the two-set and three-set inclusion-exclusion formulas from the definition of set cardinality, then look ahead to Chapter 17, where inclusion-exclusion is generalized to $n$ sets. Try to guess the general pattern before you get there.

15.1 Why count? Passwords, test cases, and states

Start with a question a security engineer actually has to answer: is this password policy strong enough? Suppose a system requires passwords that are exactly 8 characters long, each character drawn from the 26 lowercase letters. An attacker who has stolen the password database wants to try every possibility ("brute force"). How many must they try in the worst case?

You could imagine listing them: aaaaaaaa, aaaaaaab, aaaaaaac, …. You would never finish. But you do not need to list them to count them. There are 8 positions; each position is one of 26 choices, made independently of the others. The total is $$26 \times 26 \times 26 \times 26 \times 26 \times 26 \times 26 \times 26 = 26^8 = 208{,}827{,}064{,}576,$$ a bit over 200 billion. At a million guesses per second, the worst case is about 58 hours — crackable over a long weekend. That single number, computed without listing anything, is the security analysis. Change the policy to allow all 95 printable ASCII characters and the count jumps to $95^8 \approx 6.6 \times 10^{15}$, which at the same rate would take over 200 years. Counting is the difference between a policy you can defend and one you cannot.

The same skill appears everywhere a programmer looks:

Test coverage. A function takes three independent boolean flags. "Exhaustive" testing means covering every combination — and there are $2 \times 2 \times 2 = 8$ of them, few enough to test all. Add a parameter that is one byte (256 values) and exhaustive testing needs $8 \times 256 = 2048$ cases — still feasible. Add a 32-bit integer and exhaustive testing is off the table; you have just counted your way to the conclusion that you need a smarter testing strategy.
State spaces. A protocol's state machine, a game board, a configuration file: knowing how many states exist tells you whether you can explore them all (model checking) or must sample.
Data structures. How many distinct values can a hash function output? How many keys can a B-tree node hold? These are counting questions whose answers shape performance.

💡 Intuition: Counting is compression. The set of all 8-character lowercase passwords is huge, but its description is tiny: "8 independent choices, 26 each." Combinatorics is the art of reading a small description and extracting the large number it implies — never building the set itself.

Throughout this chapter we will lean on one idea you already own from Chapter 8: the cardinality of a finite set, written $\lvert S \rvert$, is the number of elements in it.

🔗 Connection. In Chapter 8 you defined sets, the Cartesian product $A \times B$ (the set of ordered pairs $(a, b)$ with $a \in A$ and $b \in B$), and finite cardinality $\lvert S \rvert$. This chapter is, at its heart, the study of how cardinality behaves under set operations: products, unions, and quotients. Every counting rule below is a fact about $\lvert \cdot \rvert$. Keep the set-operations picture in mind — it is the rigorous skeleton under the arithmetic.

Counting principles are collectively called counting principles for a reason: they are the axioms of enumeration, the small set of rules from which every formula in Chapters 16 and 17 is derived. Learn these five pages well and the rest of Part III is bookkeeping.

🔄 Check Your Understanding 1. Without listing, how many 4-digit PINs (digits 0-9, repeats allowed) are there? 2. A function takes two boolean flags and a value that is one of 5 enum cases. How many distinct input combinations are there? Is exhaustive testing feasible?

Answers

Four positions, 10 choices each: $10^4 = 10{,}000$. 2. $2 \times 2 \times 5 = 20$ combinations — easily exhaustive. (Both use the product rule of §15.2, which we are about to make precise.)

15.2 The product rule

We have been multiplying without justifying it. Let's fix that, because the product rule is the workhorse of all counting and the place where most modeling errors begin.

Here is the smallest interesting example. A diner offers a fixed-price meal: choose one of 3 appetizers and one of 4 main courses. How many different meals are possible? Draw it as a two-stage choice. For each of the 3 appetizers, there are 4 possible mains, giving a little fan of 4 meals. There are 3 such fans, so the total is $3 + 3 + 3 + 3$ — no, wait: 4 meals per appetizer, 3 appetizers, $4 \times 3 = 12$ meals. You can see all 12 in a grid with 3 rows (appetizers) and 4 columns (mains): every cell is a distinct meal, and a grid with 3 rows and 4 columns has $3 \times 4 = 12$ cells.

That grid is the whole idea. A two-stage choice is a Cartesian product, and the number of cells in a rectangle is the product of its side lengths.

The Product Rule. Suppose a procedure can be broken into a sequence of $k$ steps, where step $i$ can be performed in $n_i$ ways regardless of how the earlier steps were performed. Then the number of ways to perform the whole procedure is $$n_1 \times n_2 \times \cdots \times n_k.$$ Equivalently, in set language: $\lvert A_1 \times A_2 \times \cdots \times A_k \rvert = \lvert A_1 \rvert \cdot \lvert A_2 \rvert \cdots \lvert A_k \rvert.$

The phrase in italics is the part everyone forgets, and it is the entire content of the rule. The number of ways to do step $i$ must not depend on which choice you made at earlier steps — only on the fact that you have arrived at step $i$. (The set of options may change as long as its size does not.) When that holds, the choices are "independent" in the counting sense and you may multiply.

💡 Intuition: Think of it as a tree. The root branches $n_1$ ways; each of those nodes branches $n_2$ ways; and so on. The number of leaves — complete sequences of choices — is the product of the branching factors, provided every node at a given depth has the same number of children. That proviso is exactly the "regardless of earlier steps" condition.

Why is the rule true? It follows from the definition of the Cartesian product by induction on $k$, but the two-step case carries all the insight.

The strategy first. To count $\lvert A \times B \rvert$, group the pairs by their first coordinate. For each fixed $a \in A$, the pairs $(a, b)$ with that first coordinate are in one-to-one correspondence with the elements $b \in B$, so there are exactly $\lvert B \rvert$ of them. The pairs split into $\lvert A \rvert$ such groups, each of the same size $\lvert B \rvert$, and equal-sized groups are exactly what multiplication counts.

Proof (two-step product rule). Let $A$ and $B$ be finite sets. For each $a \in A$, define $G_a = \{(a, b) \mid b \in B\}$, the set of pairs in $A \times B$ whose first coordinate is $a$. Two observations. First, the map $b \mapsto (a, b)$ is a bijection from $B$ to $G_a$ (it is injective because pairs are equal only when both coordinates match, and surjective by the definition of $G_a$), so $\lvert G_a \rvert = \lvert B \rvert$. Second, every pair in $A \times B$ lies in exactly one $G_a$ (namely the one for its first coordinate), so the sets $\{G_a \mid a \in A\}$ partition $A \times B$ into $\lvert A \rvert$ disjoint blocks, each of size $\lvert B \rvert$. The size of a set partitioned into $\lvert A \rvert$ disjoint blocks of size $\lvert B \rvert$ each is $\underbrace{\lvert B \rvert + \cdots + \lvert B \rvert}_{\lvert A \rvert \text{ terms}} = \lvert A \rvert \cdot \lvert B \rvert$. Hence $\lvert A \times B \rvert = \lvert A \rvert \cdot \lvert B \rvert$. The general $k$-step rule follows by induction on $k$, treating a $k$-tuple as a pair (a $(k-1)$-tuple, a final element). $\blacksquare$

Worked example: passwords with and without repetition

Return to passwords, now with a twist that exposes the rule's fine print.

(a) Repetition allowed. Strings of length 8 over a 26-letter alphabet. Each of the 8 positions is an independent choice among all 26 letters — the choice at position 5 does not constrain position 6 — so by the product rule the count is $26^8 = 208{,}827{,}064{,}576$.

(b) No repeated letter. Now require all 8 letters to be distinct. The first position has 26 choices. The second cannot reuse the first letter, so it has 25 choices — and crucially, it has exactly 25 no matter which letter went first. The third has 24, and so on. The product rule still applies (the size of the option set is the same at each step, even though the set itself shifts), and the count is $$26 \times 25 \times 24 \times 23 \times 22 \times 21 \times 20 \times 19 = 62{,}990{,}928{,}000.$$ That descending product — $n$ choices, then $n-1$, then $n-2$, for $k$ steps — counts ordered selections without repetition. It will get a name and a notation, $P(n, k)$, in Chapter 16; for now notice it falls straight out of the product rule, because the number of choices at each step is fixed even though the available letters change.

⚠️ Common Pitfall: hidden dependence. The product rule fails the instant the number of options at a step depends on earlier choices. "Choose a 2-letter string whose two letters are in alphabetical order" is not $26 \times 26$ and not $26 \times 25$: after picking the first letter, the number of valid second letters depends on which first letter you picked (a leaves 25 options, z leaves 0). When the count-per-step varies with history, you must split into cases (§15.3), use a different technique, or — often — count something easier and divide (§15.5). Always ask: "Does the number of choices here depend on what I chose before?" If yes, do not multiply.

Here is the product rule in code, framed exactly as a programmer meets it — sizing the space of configurations.

def product_rule(*step_counts):
    """Number of outcomes when each step i has step_counts[i] independent options."""
    total = 1
    for n in step_counts:
        total *= n
    return total

# A config: log level (5 options), region (3), and a boolean "verbose" flag.
print(product_rule(5, 3, 2))
# Expected output:
# 30

🔄 Check Your Understanding 1. A vehicle license plate is 3 uppercase letters followed by 4 digits. How many plates are possible? 2. How many bytes are there (8 bits, each 0 or 1)? How many distinct values can a 32-bit register hold? 3. Why is "the number of 8-letter strings with all distinct letters" still a product even though the available letters shrink at each step?

Answers

$26^3 \times 10^4 = 17{,}576 \times 10{,}000 = 175{,}760{,}000$. 2. A byte: $2^8 = 256$ values. A 32-bit register: $2^{32} = 4{,}294{,}967{,}296$ values. 3. Because the product rule needs the number of choices at each step to be constant regardless of history (26, then 25, then 24, …), not the identity of the choices; here the count at each step is fixed, so we may multiply.

15.3 The sum rule

The product rule counts a sequence of choices ("do this, then that"). The sum rule counts a split into cases ("do this, or do that"). Switching "and" for "or" switches multiplication for addition — but only under a condition just as important as the product rule's, and just as easy to violate.

Concrete first. A team is choosing one representative, who must be either one of the 5 backend engineers or one of the 3 designers — and no one is both. How many choices? You are not making two choices in sequence; you are making one choice that falls into one of two separate pools. Lay the two pools side by side: 5 people in one, 3 in the other, none shared. The total number of people you could pick is simply $5 + 3 = 8$.

The Sum Rule. If a task can be done in one of several mutually exclusive ways — way 1 in $n_1$ ways, way 2 in $n_2$ ways, …, way $k$ in $n_k$ ways, where no outcome is counted under more than one way — then the number of ways to do the task is $$n_1 + n_2 + \cdots + n_k.$$ In set language: if $A_1, A_2, \dots, A_k$ are pairwise disjoint finite sets (every $A_i \cap A_j = \emptyset$ for $i \ne j$), then $\lvert A_1 \cup A_2 \cup \cdots \cup A_k \rvert = \lvert A_1 \rvert + \lvert A_2 \rvert + \cdots + \lvert A_k \rvert.$

The load-bearing word is mutually exclusive (equivalently, the sets are disjoint). If an outcome could be counted under two different cases, plain addition counts it twice, and your total is too big. Repairing that double-count is exactly what §15.4 is about; the sum rule itself is the clean case where the cases genuinely do not overlap.

💡 Intuition: Product rule = "and" = multiply (a grid). Sum rule = "or" = add (two separate piles). The grid combines choices along independent axes; the piles are alternatives you choose between. Whenever you write "and" in your description, look for a product; whenever you write "or," look for a sum — then check the proviso (independence for products, disjointness for sums).

Often the rules work together. Counting frequently goes: split into disjoint cases (sum), then count each case as a sequence of choices (product). That two-step pattern — sum of products — solves a large fraction of all counting problems.

Worked example: passwords of length 6 or 7 or 8

Many real password policies set a range of lengths. Count the lowercase passwords whose length is 6, 7, or 8.

A password cannot have two different lengths at once, so the three cases — "length 6," "length 7," "length 8" — are mutually exclusive. The sum rule says add the three counts; each count is a product: $$\underbrace{26^6}_{\text{length 6}} + \underbrace{26^7}_{\text{length 7}} + \underbrace{26^8}_{\text{length 8}} = 308{,}915{,}776 + 8{,}031{,}810{,}176 + 208{,}827{,}064{,}576 = 217{,}167{,}790{,}528.$$ Notice the shape: a sum (over the disjoint length-cases) of products (the choices within each length). That is the canonical combination of the two rules.

🧩 Productive Struggle. How many bit strings of length 8 start with 1 or end with 00? Try it with the sum rule as stated, then check your answer against the resolution below. (Hint: are the two cases mutually exclusive?)

Resolution

The two cases are not mutually exclusive — a string like 10000000 both starts with 1 and ends with 00, so the plain sum rule does not apply. "Start with 1": the first bit is fixed, the other 7 are free, $2^7 = 128$. "End with 00": the last two bits fixed, the other 6 free, $2^6 = 64$. "Both": first bit and last two bits fixed, the middle 5 free, $2^5 = 32$. Adding $128 + 64 = 192$ double-counts the 32 overlapping strings, so the correct answer is $128 + 64 - 32 = 160$. You have just rediscovered inclusion-exclusion — the subject of the next section — by noticing the sum rule's proviso fail.

def sum_rule(*case_counts):
    """Number of outcomes across mutually EXCLUSIVE cases. Caller must ensure disjointness."""
    return sum(case_counts)

# Lowercase passwords of length 6, 7, or 8 (lengths are mutually exclusive).
print(sum_rule(26**6, 26**7, 26**8))
# Expected output:
# 217167790528

🔄 Check Your Understanding 1. A menu lets you order one item that is either a coffee (8 kinds) or a tea (5 kinds). How many single-drink orders are possible? Which rule, and why? 2. In the bit-string struggle above, what specifically went wrong with the plain sum rule, and what quantity fixed it?

Answers

$8 + 5 = 13$, by the sum rule: the choice is one drink from two disjoint pools (no drink is both a coffee and a tea). 2. The two cases overlapped (strings that both start 1 and end 00), so addition counted those strings twice; subtracting the size of the overlap ($2^5 = 32$) corrected it.

15.4 Inclusion-exclusion: counting overlaps correctly

The sum rule demands disjoint cases. Reality rarely cooperates: the set of users who opened the app last week and the set who opened it this week overlap; the integers divisible by 2 and those divisible by 3 overlap (at the multiples of 6); strings that start one way and end another way overlap. When the cases overlap, adding their sizes double-counts everything in the overlap. Inclusion-exclusion is the bookkeeping that fixes the double-count.

Two sets

Picture two overlapping circles — a Venn diagram. To count the elements in either circle, add the two circle sizes; but the lens-shaped overlap got added twice (once from each circle), so subtract it once to put it back to a single count.

Inclusion-Exclusion (two sets). For any finite sets $A$ and $B$, $$\lvert A \cup B \rvert = \lvert A \rvert + \lvert B \rvert - \lvert A \cap B \rvert.$$

The strategy first. Split the union into three disjoint regions — "$A$ only," "$B$ only," and "both" — apply the sum rule to those genuinely disjoint pieces, then re-express the three region counts in terms of the quantities we can actually measure ($\lvert A \rvert$, $\lvert B \rvert$, $\lvert A \cap B \rvert$). The double-subtraction bookkeeping falls out of the algebra.

Proof. Partition $A \cup B$ into three pairwise-disjoint pieces: $A \setminus B$ (in $A$ only), $B \setminus A$ (in $B$ only), and $A \cap B$ (in both). These are disjoint and their union is $A \cup B$, so by the sum rule $$\lvert A \cup B \rvert = \lvert A \setminus B \rvert + \lvert B \setminus A \rvert + \lvert A \cap B \rvert.$$ Now $A$ itself partitions into $A \setminus B$ and $A \cap B$, so $\lvert A \setminus B \rvert = \lvert A \rvert - \lvert A \cap B \rvert$; symmetrically $\lvert B \setminus A \rvert = \lvert B \rvert - \lvert A \cap B \rvert$. Substituting, $$\lvert A \cup B \rvert = \big(\lvert A \rvert - \lvert A \cap B \rvert\big) + \big(\lvert B \rvert - \lvert A \cap B \rvert\big) + \lvert A \cap B \rvert = \lvert A \rvert + \lvert B \rvert - \lvert A \cap B \rvert.$$ $\blacksquare$

Notice that the sum rule is just the special case $\lvert A \cap B \rvert = 0$: when the sets are disjoint, the correction term vanishes and inclusion-exclusion collapses back to addition. The sum rule is not a separate fact so much as the no-overlap corner of this one.

Worked example: divisibility counting

How many integers from 1 to 100 are divisible by 2 or by 5?

Let $A$ be the multiples of 2 in that range and $B$ the multiples of 5. Counting multiples of $d$ up to $N$ is itself a tidy use of the floor function: there are $\lfloor N / d \rfloor$ of them.

$\lvert A \rvert = \lfloor 100/2 \rfloor = 50$ (multiples of 2).
$\lvert B \rvert = \lfloor 100/5 \rfloor = 20$ (multiples of 5).
$A \cap B$ is the numbers divisible by both 2 and 5, i.e. divisible by their least common multiple, $10$: $\lvert A \cap B \rvert = \lfloor 100/10 \rfloor = 10$.

By inclusion-exclusion, $\lvert A \cup B \rvert = 50 + 20 - 10 = 60$. So 60 of the first 100 integers are divisible by 2 or 5 — and, as a bonus, $100 - 60 = 40$ are divisible by neither, a "count the complement" move we will use again in §15.6.

💡 Intuition: "Divisible by both $a$ and $b$" means "divisible by $\operatorname{lcm}(a, b)$." When $a$ and $b$ share no common factor, that lcm is just $ab$ — which is why, for coprime divisors, the overlap count is the clean product $\lfloor N/(ab) \rfloor$. (We will make divisibility and lcm rigorous in Part IV; here we only need the count.)

Three sets

With three overlapping circles the bookkeeping has one more layer. Add the three singles; you have double-counted each pairwise overlap, so subtract all three pairwise intersections; but now the triple overlap — added 3 times, subtracted 3 times — has been removed entirely, so add it back once.

Inclusion-Exclusion (three sets). For any finite sets $A$, $B$, $C$, $$\lvert A \cup B \cup C \rvert = \lvert A \rvert + \lvert B \rvert + \lvert C \rvert > - \lvert A \cap B \rvert - \lvert A \cap C \rvert - \lvert B \cap C \rvert > + \lvert A \cap B \cap C \rvert.$$

The alternating pattern — add singles, subtract pairs, add the triple — is no accident; it continues for any number of sets (add singles, subtract pairs, add triples, subtract quadruples, …), and Chapter 17 states and proves the general $n$-set version. For now, two and three sets cover the vast majority of practical cases.

The strategy first. Rather than re-partition into seven Venn regions, derive the three-set formula by applying the two-set rule twice: treat $A \cup B$ as a single set, expand $\lvert (A \cup B) \cup C \rvert$, then distribute the intersection over the union. The algebra does the counting for us.

Proof. Apply the two-set rule with $(A \cup B)$ in the role of the first set and $C$ the second: $$\lvert A \cup B \cup C \rvert = \lvert A \cup B \rvert + \lvert C \rvert - \lvert (A \cup B) \cap C \rvert.$$ Expand $\lvert A \cup B \rvert = \lvert A \rvert + \lvert B \rvert - \lvert A \cap B \rvert$ by the two-set rule. For the last term, distribute (a set identity from Chapter 8): $(A \cup B) \cap C = (A \cap C) \cup (B \cap C)$, so by the two-set rule again, $$\lvert (A \cup B) \cap C \rvert = \lvert A \cap C \rvert + \lvert B \cap C \rvert - \lvert (A \cap C) \cap (B \cap C) \rvert = \lvert A \cap C \rvert + \lvert B \cap C \rvert - \lvert A \cap B \cap C \rvert,$$ using $(A \cap C) \cap (B \cap C) = A \cap B \cap C$. Substituting both expansions, $$\lvert A \cup B \cup C \rvert = \lvert A \rvert + \lvert B \rvert - \lvert A \cap B \rvert + \lvert C \rvert - \big(\lvert A \cap C \rvert + \lvert B \cap C \rvert - \lvert A \cap B \cap C \rvert\big),$$ which rearranges to the claimed formula. $\blacksquare$

Worked example: divisible by 2, 3, or 5

How many integers from 1 to 100 are divisible by 2, 3, or 5?

Let $A, B, C$ be the multiples of 2, 3, 5 respectively. Singles: $\lfloor 100/2 \rfloor = 50$, $\lfloor 100/3 \rfloor = 33$, $\lfloor 100/5 \rfloor = 20$. Pairs use the lcm of each divisor pair (all three pairs here are coprime, so lcm is the product): $\lfloor 100/6 \rfloor = 16$, $\lfloor 100/10 \rfloor = 10$, $\lfloor 100/15 \rfloor = 6$. Triple: divisible by $\operatorname{lcm}(2,3,5) = 30$, so $\lfloor 100/30 \rfloor = 3$. Therefore $$\lvert A \cup B \cup C \rvert = (50 + 33 + 20) - (16 + 10 + 6) + 3 = 103 - 32 + 3 = 74.$$ 74 integers up to 100 are divisible by at least one of 2, 3, 5 — so 26 are divisible by none of them.

🔗 Connection. "Count the integers up to $N$ divisible by none of a set of primes" is precisely the engine of the Sieve of Eratosthenes and of formulas for Euler's totient $\phi(n)$ — tools you will build in Part IV on the way to RSA. Inclusion-exclusion is how those counts are derived. The habit you are forming now (singles minus pairs plus triples) reappears as a cornerstone of number theory.

Here is inclusion-exclusion for two and three sets, computed directly on Python sets so you can see the formula agree with a brute-force len(a | b).

def incl_excl_2(a, b):
    """|A ∪ B| via inclusion-exclusion, for finite Python sets a, b."""
    return len(a) + len(b) - len(a & b)

def incl_excl_3(a, b, c):
    """|A ∪ B ∪ C| via inclusion-exclusion (add singles, sub pairs, add triple)."""
    return (len(a) + len(b) + len(c)
            - len(a & b) - len(a & c) - len(b & c)
            + len(a & b & c))

div2 = {n for n in range(1, 101) if n % 2 == 0}
div3 = {n for n in range(1, 101) if n % 3 == 0}
div5 = {n for n in range(1, 101) if n % 5 == 0}
print(incl_excl_2(div2, div5), incl_excl_3(div2, div3, div5))
# Expected output:
# 60 74

🔄 Check Your Understanding 1. In a class, 18 students take Python, 15 take Java, and 7 take both. How many take at least one of the two languages? 2. Why does inclusion-exclusion add back the triple intersection in the three-set formula? 3. What does the two-set formula become when $A$ and $B$ are disjoint, and what rule is that?

Answers

$18 + 15 - 7 = 26$. 2. Each element of $A \cap B \cap C$ is added 3 times (in the singles) and subtracted 3 times (in the pairs), netting to 0 — so it must be added once more to count it exactly once. 3. It becomes $\lvert A \cup B \rvert = \lvert A \rvert + \lvert B \rvert$ (the correction term $\lvert A \cap B \rvert = 0$), which is the sum rule.

15.5 The division rule

The last rule corrects the opposite error from inclusion-exclusion. Sometimes the natural way to count produces a number that is too big by a consistent factor, because it treats as distinct things that should be counted as one. When every object you care about has been counted the same number of times, you recover the true count by dividing.

A clean example: how many ways can 4 people sit around a circular table, where two seatings count as the same if one is a rotation of the other? (Around a round table, "everyone shifts one seat to the left" is not a new seating — the same people have the same neighbors.) If the seats were a straight row, the answer would be $4! = 4 \times 3 \times 2 \times 1 = 24$ orderings. But each circular arrangement has been counted once for each of its 4 rotations: the four row-orderings ABCD, BCDA, CDAB, DABC all describe the same circle. Every circular arrangement is over-counted by exactly the same factor, 4, so the number of genuinely distinct circular seatings is $24 / 4 = 6$.

The Division Rule. If a set can be counted in a way that lists every object exactly $d$ times — that is, the listing has size $N$ and the objects fall into groups of equal size $d$, each group being the copies of one object — then the number of distinct objects is $N / d$. Equivalently: if a $d$-to-one function maps an $N$-element set onto a set $T$ (every element of $T$ has exactly $d$ preimages), then $\lvert T \rvert = N / d$.

The condition that every object is over-counted by the same $d$ is essential, exactly as "mutually exclusive" was essential to the sum rule. If different objects were counted different numbers of times, you could not divide by a single $d$ — you would be back to a case analysis.

💡 Intuition: The division rule is the product rule run backward. The product rule says: if you have $T$ true objects and each spawns $d$ labeled copies, you see $\lvert T \rvert \times d = N$ copies. So if you can count the $N$ copies and you know the constant copy-count $d$, then $\lvert T \rvert = N / d$. You are un-multiplying a known, uniform over-count.

The strategy first. To prove it, package the over-counting as a function. Let $f$ map each of the $N$ listed items to the unique object it is a copy of. "Every object is listed exactly $d$ times" means every object in the target has exactly $d$ preimages. Then the $N$ items partition into target-many groups of size $d$, and the sum rule (equal groups again!) gives $N = \lvert T \rvert \cdot d$.

Proof. Let $f \colon S \to T$ be onto, with $\lvert S \rvert = N$, and suppose every $t \in T$ has exactly $d$ preimages, i.e. $\lvert f^{-1}(t) \rvert = d$ for all $t$. The preimage sets $\{f^{-1}(t) \mid t \in T\}$ are pairwise disjoint (an element of $S$ maps to exactly one $t$) and their union is all of $S$ (every element maps somewhere), so they partition $S$. By the sum rule, $$N = \lvert S \rvert = \sum_{t \in T} \lvert f^{-1}(t) \rvert = \sum_{t \in T} d = \lvert T \rvert \cdot d.$$ Dividing by $d$ gives $\lvert T \rvert = N / d$. $\blacksquare$

Worked example: circular arrangements and necklaces

(a) Seating $n$ people around a round table. As above, $n!$ row-orderings, each circular arrangement counted $n$ times (its $n$ rotations), so the number of distinct circular arrangements is $n! / n = (n-1)!$. For $n = 4$ that is $3! = 6$, matching our hand count.

(b) A subtle warning. Suppose instead you are counting bracelets: a circular arrangement that can also be flipped over (reflected) without becoming new. Now each arrangement is over-counted by rotations and reflections. For most sizes that is $2n$ copies each — but for small or symmetric arrangements a configuration can coincide with its own reflection, so it is not over-counted by the full $2n$. The instant the over-count factor stops being constant across all objects, the division rule in its simple form breaks, and you need heavier machinery (Burnside's lemma, beyond this book). The lesson generalizes:

⚠️ Common Pitfall: non-uniform over-counting. The division rule requires that every object be counted exactly $d$ times — the same $d$ for all. If some objects have extra symmetry and are counted fewer times, dividing by a single $d$ gives the wrong answer. Before you divide, verify the over-count is uniform. (This is why "arrangements around a table" is clean — every seating has exactly $n$ distinct rotations — but "arrangements you may also flip" is not.)

from math import factorial

def division_rule(total, overcount):
    """True count when 'total' lists each object exactly 'overcount' times (uniformly)."""
    return total // overcount   # exact: divisibility is guaranteed by the uniform over-count

# Distinct seatings of 5 people around a round table: 5! rows, each counted 5 times (rotations).
print(division_rule(factorial(5), 5))
# Expected output:
# 24

🔄 Check Your Understanding 1. Six people sit around a circular table. How many distinct seatings are there, counting rotations as the same? 2. You list all $3! = 6$ orderings of {red, green, blue} beads in a row. If the beads are on a fixed-position circular ring where only rotations are identified, how many distinct rings — and why is the division rule safe to apply here?

Answers

$6! / 6 = 5! = 120$. 2. $6 / 3 = 2$ distinct rings; the division rule is safe because with 3 distinct beads every ring has exactly 3 rotations and none coincides with another (no rotational symmetry among 3 distinct items), so the over-count is uniformly $d = 3$.

15.6 Counting the size of an input space

Now we put all four rules to work on the problem this chapter opened with — sizing a space — because that is the counting a programmer does most. Three running uses: test spaces (can I test exhaustively?), state spaces (can I explore them all?), and keyspaces (how hard is brute force?). The technique is always the same: model the space as choices, apply product/sum, subtract forbidden configurations with inclusion-exclusion, and read off a feasibility conclusion.

Sizing a keyspace (the RSA / crypto anchor)

A keyspace is the set of all possible keys (or passwords, or secrets) an attacker might have to try. Its size is the single most important number in a brute-force security argument, and it is a pure product-rule computation.

Take a concrete policy and tighten it step by step. Passwords are 10 characters over the 95 printable ASCII characters.

No constraints. $95^{10} \approx 5.99 \times 10^{19}$ keys. At $10^9$ guesses per second, brute force takes about $6 \times 10^{10}$ seconds — roughly 1,900 years. Comfortable.
Now require at least one digit. It is far easier to count the complement: the passwords with no digit. There are 85 non-digit characters (95 − 10), so $85^{10}$ all-non-digit passwords. By the subtraction principle (inclusion-exclusion against the universe — see below), the count with at least one digit is $$95^{10} - 85^{10} \approx 5.99 \times 10^{19} - 1.97 \times 10^{19} \approx 4.02 \times 10^{19}.$$ The policy shrinks the keyspace (you removed the all-letter passwords), a reminder that complexity rules trade some brute-force space for resistance to other attacks.

The "count the complement" move deserves to be stated as its own principle, because it is the most useful corollary of inclusion-exclusion in practice.

The Subtraction Principle (complement counting). If every object lies in a finite universe $U$, then the number of objects in $U$ with a property is $$\lvert U \rvert - (\text{number in } U \text{ without the property}).$$ This is inclusion-exclusion with a single set $A \subseteq U$: $\lvert U \setminus A \rvert = \lvert U \rvert - \lvert A \rvert$. Reach for it whenever "with at least one …" is hard but "with none …" is easy.

🔗 Connection. This keyspace reasoning is the bridge to Part IV. In Chapter 25 you will implement RSA, whose security rests on the infeasibility of searching an astronomically large space (factoring a number with hundreds of digits). The arithmetic of "how big is the space, and how long to search it?" — exactly what you are doing now — is what separates a 56-bit key (broken in public in the 1990s) from a 256-bit key (a space of $2^{256} \approx 10^{77}$, beyond any conceivable brute force). Counting is the first quantitative tool of cryptography.

Sizing a state space (testing and model checking)

Consider a small configuration object: a log level (one of 5), a region (one of 3), three independent boolean feature flags, and a retry count that is an integer from 0 to 9. How many configurations exist? Product rule throughout, because the fields are independent: $$5 \times 3 \times \underbrace{2 \times 2 \times 2}_{\text{three flags}} \times 10 = 5 \times 3 \times 8 \times 10 = 1200.$$ 1,200 configurations is eminently testable — you could exhaustively check every one in a unit-test suite. Now imagine the retry count were a full 32-bit integer instead of 0-9: the space balloons to $5 \times 3 \times 8 \times 2^{32} \approx 5.2 \times 10^{11}$, and exhaustive testing is impossible. The count is the decision: small space → test everything; large space → test boundaries and sample.

💡 Intuition: The "combinatorial explosion" that makes exhaustive testing and brute-force search infeasible is just the product rule with many factors. Each independent option you add multiplies the space. This is why adding one 32-bit field can take a problem from trivially testable to astronomically large in a single stroke — and why thinking in counts, not lists, is the only way to see it coming.

A combined example: passwords with structure

Count the 8-character passwords over lowercase letters and digits (36 symbols) that contain at least one letter and at least one digit. This needs all the machinery at once. Work in the universe $U$ of all $36^8$ strings and subtract the bad ones. A string is "bad" if it is missing letters or missing digits. Let $A$ = strings with no letter (all digits), $B$ = strings with no digit (all letters).

$\lvert A \rvert = 10^8$ (all 8 positions digits), $\lvert B \rvert = 26^8$ (all letters).
$A \cap B$ = strings with neither a letter nor a digit — impossible over a letters-and-digits alphabet, so $\lvert A \cap B \rvert = 0$.
Bad strings: $\lvert A \cup B \rvert = 10^8 + 26^8 - 0 = 100{,}000{,}000 + 208{,}827{,}064{,}576 = 208{,}927{,}064{,}576$.
Good strings: $\lvert U \rvert - \lvert A \cup B \rvert = 36^8 - 208{,}927{,}064{,}576$. Since $36^8 = 2{,}821{,}109{,}907{,}456$, the answer is $2{,}821{,}109{,}907{,}456 - 208{,}927{,}064{,}576 = 2{,}612{,}182{,}842{,}880.$

That single computation used the product rule (sizing $U$, $A$, $B$), inclusion-exclusion (the union of bad sets), and the subtraction principle (good = universe − bad). This is the texture of real counting: not one rule, but the right combination, chosen by reading the structure of the problem.

🔄 Check Your Understanding 1. How many 6-character lowercase-letter passwords contain at least one vowel (a, e, i, o, u)? (Hint: complement.) 2. A config has 4 independent boolean flags and one field that is one of 6 enum values. Is exhaustive testing feasible? How many cases?

Answers

Universe $26^6$; strings with no vowel use the 21 consonants, $21^6$; so at-least-one-vowel is $26^6 - 21^6 = 308{,}915{,}776 - 85{,}766{,}121 = 223{,}149{,}655$. 2. $2^4 \times 6 = 16 \times 6 = 96$ cases — trivially exhaustive.

Project Checkpoint: starting `combinatorics.py`

This chapter begins the Toolkit's counting module, combinatorics.py, which Chapters 16 and 17 will grow into permutation and combination generators. The four counting rules give us the foundational count functions; we add them now and seed the first canonical signature the later chapters depend on.

Theme one of this book — discrete math is the language of CS — is on full display: these few lines are the exact computation behind every keyspace estimate and test-coverage decision.

Create dmtoolkit/combinatorics.py and add the rule helpers plus the first canonical function, perm_count(n, k). The product rule §15.2 already derived $P(n, k) = n(n-1)\cdots(n-k+1)$ as "ordered choices without repetition"; Chapter 16 will name it a permutation count, but the arithmetic is pure Chapter 15.

def product_rule(*step_counts):
    """Product rule: independent steps with step_counts[i] options each."""
    total = 1
    for n in step_counts:
        total *= n
    return total

def sum_rule(*case_counts):
    """Sum rule: mutually EXCLUSIVE cases (caller guarantees disjointness)."""
    return sum(case_counts)

def inclusion_exclusion_2(size_a, size_b, size_a_and_b):
    """|A ∪ B| = |A| + |B| - |A ∩ B|."""
    return size_a + size_b - size_a_and_b

def perm_count(n, k):
    """Ordered choices of k from n WITHOUT repetition: n·(n-1)···(n-k+1).
    The product rule (Ch.15) derives this; Ch.16 names it P(n, k)."""
    if k < 0 or k > n:
        return 0
    result = 1
    for i in range(k):
        result *= (n - i)        # n, then n-1, ..., then n-k+1  (one product-rule step each)
    return result

# Keyspace: 8 lowercase letters, all distinct = perm_count(26, 8).
print(perm_count(26, 8))
# Expected output:
# 62990928000

We hand-traced perm_count(26, 8) in §15.2: $26 \times 25 \times 24 \times 23 \times 22 \times 21 \times 20 \times 19 = 62{,}990{,}928{,}000$. Notice `perm_count(n, n)` returns $n!$ (the full factorial), and perm_count(n, 0) returns 1 (the empty product — there is exactly one way to choose nothing), both of which match the conventions you will rely on in Chapter 16.

Toolkit so far: logic.py (Chapters 1-3), recurrences.py (begun in Chapter 6), and now combinatorics.py with product_rule, sum_rule, inclusion_exclusion_2, and the canonical perm_count(n, k). Chapter 16 adds comb_count(n, k) and the actual generators; by the capstone, these counting primitives underpin the probability and cryptography modules.

Summary

The four counting rules (plus the subtraction corollary) are the axioms of enumeration. Each is a fact about finite cardinality.

Rule	When it applies	Formula	The fatal proviso
Product rule	A sequence of choices ("and")	$n_1 \times n_2 \times \cdots \times n_k$	#options per step must not depend on earlier choices
Sum rule	A split into cases ("or")	$n_1 + n_2 + \cdots + n_k$	cases must be mutually exclusive (disjoint)
Inclusion-exclusion (2)	Union of two overlapping sets	$\lvert A \rvert + \lvert B \rvert - \lvert A \cap B \rvert$	(handles the overlap automatically)
Inclusion-exclusion (3)	Union of three overlapping sets	$\sum$ singles $-\sum$ pairs $+$ triple	(alternating signs)
Division rule	A listing over-counts each object equally	$N / d$	over-count $d$ must be the same for every object
Subtraction principle	"at least one …" is hard, "none …" is easy	$\lvert U \rvert - \lvert U \setminus \text{property} \rvert$	objects must live in a known universe $U$

Key relationships and habits to carry forward:

"and" → multiply, "or" → add. Read your problem statement; the conjunctions tell you the rule.
The sum rule is inclusion-exclusion with empty overlap; inclusion-exclusion is the sum rule repaired for overlapping cases.
Sum of products (split into disjoint cases, count each as a sequence) is the single most common combined pattern.
For "at least one," count the complement — the all-or-nothing case is almost always simpler.
A keyspace size is a product; converting it to a brute-force time is how you argue security.
A combinatorial explosion is just the product rule with many factors — one extra independent field can multiply a space past the point of exhaustive testing.
$P(n, k) = n(n-1)\cdots(n-k+1)$ (ordered, no repetition) falls directly out of the product rule; Chapter 16 names and extends it.

Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

(Ch. 13) A build system models dependencies as a partial order and runs tasks in a valid order via topological sort. What property of a relation guarantees such an ordering exists, and what is the name for an ordering that respects all the "must come before" constraints?

(Ch. 13) In a partial order, two elements can be incomparable. Give a one-line example of two incomparable elements in the divisibility order on $\{2, 3, 4, 6\}$.

(Ch. 8) State the definition of the Cartesian product $A \times B$, and explain in one sentence why $\lvert A \times B \rvert = \lvert A \rvert \cdot \lvert B \rvert$ — connecting it to the product rule.

(Ch. 8) Using set operations, write the size of "elements in $A$ or $B$ but not both" (the symmetric difference) in terms of $\lvert A \rvert$, $\lvert B \rvert$, and $\lvert A \cap B \rvert$.

Answers

The relation must be a partial order on a finite set with no cycles (a DAG); an ordering that respects every constraint is a topological sort (also called a topological/linear extension), and it exists precisely because a finite partial order always has a minimal element to place next.

$2$ and $3$ are incomparable ($2 \nmid 3$ and $3 \nmid 2$); so are $4$ and $6$. 3. $A \times B = {(a, b) \mid a \in A,\ b \in B}$, the set of all ordered pairs; its size is $\lvert A \rvert \cdot \lvert B \rvert$ because forming a pair is a two-step independent choice (first coordinate, then second) — exactly the product rule. 4. $\lvert A \rvert + \lvert B \rvert - 2\lvert A \cap B \rvert$ (start from $\lvert A \cup B \rvert = \lvert A \rvert + \lvert B \rvert - \lvert A \cap B \rvert$ and remove the overlap a second time, since the "both" elements are excluded).

What's Next

The product rule already handed us $P(n, k) = n(n-1)\cdots(n-k+1)$ for counting ordered selections without repetition — that descending product of choices. But a huge class of problems does not care about order: a poker hand is the same five cards in any order, a committee is the same people in any arrangement, a subset is a subset. Counting unordered selections requires dividing out the orderings — a direct application of the division rule you just learned — and that quotient is the binomial coefficient $\binom{n}{k}$, the single most important number in combinatorics. Chapter 16 develops permutations and combinations in full, proves the binomial theorem, and shows how Pascal's triangle encodes it all — turning the four rules of this chapter into a complete toolkit for counting selections.

Prerequisites

Learning Objectives

In This Chapter

Chapter 15: The Basics of Counting

Overview

Learning Paths

15.1 Why count? Passwords, test cases, and states

15.2 The product rule

Worked example: passwords with and without repetition

15.3 The sum rule

Worked example: passwords of length 6 or 7 or 8

15.4 Inclusion-exclusion: counting overlaps correctly

Two sets

Worked example: divisibility counting

Three sets

Worked example: divisible by 2, 3, or 5

15.5 The division rule

Worked example: circular arrangements and necklaces

15.6 Counting the size of an input space

Sizing a keyspace (the RSA / crypto anchor)

Sizing a state space (testing and model checking)

A combined example: passwords with structure

Project Checkpoint: starting combinatorics.py

Summary

Spaced Review

What's Next

Project Checkpoint: starting `combinatorics.py`