For thirty-eight chapters you have been building two things at once. One is a body of knowledge — logic, proof, sets, counting, number theory, graphs, and the theory of computation. The other is a piece of software: dmtoolkit, the Python package you...
Prerequisites
- 6
- 8
- 12
- 22
- 23
- 24
- 25
- 27
- 28
- 29
- 33
Learning Objectives
- Inventory the dmtoolkit package and map each module to the discrete-math concept it implements.
- Choose a capstone track and justify the discrete-math model behind it (correctness, complexity, and scope).
- Assemble a complete, working project from Toolkit pieces: an RSA system, a social-network analyzer, a Sudoku solver, or an error-correcting code.
- State and defend a correctness argument for the core of your chosen project, using proof techniques from across the book.
- Communicate a technical result: structure a write-up, choose evidence (proof vs. simulation), and report honestly about limits.
In This Chapter
- Overview
- Learning Paths
- 39.1 The Toolkit so far: an inventory
- 39.2 Choosing a capstone
- 39.3 Track A: an RSA encryption system
- 39.4 Track B: a social-network analyzer
- 39.5 Track C: a Sudoku solver
- 39.6 Track D: an error-correcting code
- 39.7 Presenting your work
- Project Checkpoint: assembling and shipping dmtoolkit
- Summary
- Spaced Review
- What's Next
Chapter 39: Capstone — Applying Discrete Mathematics to a Real Computing Problem
"What I cannot create, I do not understand." — Richard Feynman (found on his blackboard, 1988)
Overview
For thirty-eight chapters you have been building two things at once. One is a body of knowledge — logic, proof, sets, counting, number theory, graphs, and the theory of computation. The other is a piece of software: dmtoolkit, the Python package you have grown one function at a time, starting with a truth-table generator in Chapter 1 and ending with graph coloring in Chapter 33. This chapter is where the two meet. You will take the Toolkit you built to understand mathematics and turn it into a tool that solves a real problem.
That is the whole point of the exercise, and it is the answer to a question a working programmer eventually asks about every theory course: what is this actually for? The honest answer is not "you'll need the proof of the handshaking lemma at your job." The answer is that discrete math is the layer underneath the systems you build — the encryption protecting a login, the graph search behind a "people you may know" feature, the constraint solver inside a scheduler, the error-correcting code that lets a scratched disc still play. You have learned the mathematics behind each of those. Now you assemble it.
A capstone is different from a homework problem. A homework problem has a known answer and tests one idea. A capstone project is an open-ended piece of work that integrates many ideas, has design decisions with no single right answer, and ends in something you can show another person. The skill it builds is integration — and integration is exactly the skill that separates someone who has passed a discrete-math course from someone who can use discrete math. This chapter gives you four tracks to choose from, each a complete small system, each drawing on a different region of the book.
In this chapter you will learn to:
- Take inventory of the
dmtoolkitpackage and recall which discrete-math concept each module rests on. - Choose among four capstone tracks and justify the choice against your goals.
- Build Track A, an RSA encryption system, on the number theory of Part IV.
- Build Track B, a social-network analyzer, on the graph theory of Part V.
- Build Track C, a Sudoku solver, on the logic and constraint reasoning of Part I.
- Build Track D, an error-correcting code, on the algebra and coding theory of Parts IV and VI.
- State a correctness argument for your project's core and present the finished work honestly.
This is not a chapter to read passively. Read 39.1 and 39.2 fully, then pick one track, read it carefully, and build it. The other tracks are there for reference and for a second pass.
Learning Paths
🏃 Fast Track: Read 39.1 (the inventory) and 39.2 (how to choose). Then jump to the single track that matches your strongest part of the book — most readers pick A (if Part IV clicked) or B (if Part V did). Build the core, skip the extensions, and write the one-page report in 39.7.
📖 Standard Path: Read 39.1–39.2, build one full track end to end including at least one extension, and write the full report. Work every
🔄 Check Your Understanding. Budget two to three sittings.🔬 Deep Dive: Build two tracks from different parts of the book (A and B is the classic pairing — number theory and graphs). For each, write the correctness argument as a formal proof, add the listed extensions, and benchmark the complexity claims against measured behavior on inputs of growing size. This is the closest the book comes to a portfolio piece.
39.1 The Toolkit so far: an inventory
Before building anything new, take stock of what you already have. The discipline here is the same one a software engineer practices before starting a feature: read the existing code and know your dependencies. Every function below already exists in your dmtoolkit package, and every one is a faithful implementation of a concept you proved something about.
The package is organized into modules, one per region of the book:
| Module | Built in | Key functions | The math underneath |
|---|---|---|---|
logic.py |
Ch. 1–3 | truth_table, is_tautology, equivalent |
Propositional logic, equivalence |
sets.py |
Ch. 8 | union, intersection, difference, power_set, cartesian |
Set algebra |
relations.py |
Ch. 12–13 | is_equivalence, closure_transitive, topo_sort |
Relations, partial orders |
combinatorics.py |
Ch. 15–17 | perm_count, comb_count, permutations, combinations |
Counting principles |
recurrences.py |
Ch. 6, 18–19 | check_identity, solve_linear, fib |
Induction, recurrences |
probability.py |
Ch. 20–21 | simulate, expected_value, bayes |
Discrete probability |
numbertheory.py |
Ch. 22–23 | gcd, ext_gcd, sieve, mod_inverse, mod_pow, crt |
Divisibility, modular arithmetic |
crypto.py |
Ch. 24–25 | rsa_keygen, rsa_encrypt, rsa_decrypt |
Public-key cryptography |
graphs.py |
Ch. 27–34 | Graph, bfs, dfs, dijkstra, mst_kruskal, greedy_coloring, max_flow |
Graph theory |
coding.py |
Ch. 26, 38 | hamming_distance, hamming_encode, hamming_decode |
Error-detecting/correcting codes |
That is a real library. Ten modules, roughly forty functions, every one written from scratch (not imported from sympy or networkx) so that you understand it to the bottom. The capstone does not ask you to write much new low-level code. It asks you to compose what you have into something larger — which is exactly what professional software development is most of the time.
💡 Intuition: Think of the Toolkit as a kitchen you have spent a semester stocking. You have knives (the algorithms), staples (gcd, mod_pow, bfs), and you know how each one behaves under pressure (you proved it correct). The capstone is the first time you cook a full meal instead of practicing knife cuts. The hard part is no longer any single technique; it is choosing a dish and sequencing the steps.
Two habits from earlier chapters carry into the capstone and are worth restating, because the capstone is where they pay off:
- Computation tests; proof guarantees (theme four of this book). Your Toolkit can run on a million inputs and find no bug — that is evidence, not certainty. Where a correctness claim matters (does RSA always decrypt? does the Sudoku solver always find a solution if one exists?), you will state a proof, not just a passing test.
- Stable signatures compose. The reason
dijkstracan sit on top of theGraphclass from Chapter 27 is that the signatures never changed. Your capstone code should respect the same discipline: small functions with clear contracts, assembled into a whole.
🔄 Check Your Understanding 1. Which Toolkit module would a digital-signature feature draw on, and which two functions in it are the operational core of RSA? 2. The Toolkit was written from scratch instead of calling
networkxandsympy. Give the pedagogical reason — and the practical reason a production system would do the opposite.
Answers
crypto.py, resting onnumbertheory.py. The operational core ismod_pow(which is encryption and decryption, $m^e \bmod n$ and $c^d \bmod n$) andmod_inverse(which computes the private exponent $d$ from the public exponent $e$). 2. Pedagogically, writing it yourself forces you to understand the algorithm to the bottom — you cannot hand-wave a function you had to debug. Practically, a production system should use audited, optimized, side-channel-resistant libraries; a hand-rolledmod_powis fine for learning but must never guard real secrets (a point Chapter 25 made sharply).
39.2 Choosing a capstone
Four tracks follow. They are deliberately different in kind, so that whatever part of the book resonated with you, there is a project that uses it. Read this section, then commit to one.
| Track | Build | Leans on | Flavor | Difficulty |
|---|---|---|---|---|
| A | RSA encryption system | Ch. 22–25 (number theory, crypto) | Number-crunching, security | ⭐⭐⭐ |
| B | Social-network analyzer | Ch. 27–33 (graphs) | Data analysis, modeling | ⭐⭐ |
| C | Sudoku solver | Ch. 1–3 (logic), search | Constraint satisfaction, backtracking | ⭐⭐ |
| D | Error-correcting code | Ch. 24, 26, 38 (algebra, coding) | Bit-level engineering | ⭐⭐⭐ |
How to choose:
- Pick the part of the book you understood best, not the one you found hardest. A capstone is for consolidating strength, not patching weakness. If Part IV's modular arithmetic finally made sense, do Track A. If you spent Part V drawing graphs on napkins, do Track B.
- Match the flavor to your taste. Track A and D are about getting bits and numbers exactly right (an off-by-one in the padding breaks decryption). Track B is about modeling messy real data (who is connected to whom). Track C is about search (try, fail, backtrack). All four are honest computer science; they exercise different muscles.
- Consider what you want to show. If you are building a portfolio, Track A ("I implemented RSA from number theory") and Track B ("I analyzed a real social graph") are the most legible to an interviewer.
⚠️ Common Pitfall: Trying to do all four "a little." The value of a capstone is depth — one finished, defended, documented system beats four half-built ones. Scope down ruthlessly: a working core you understand completely is the goal. The extensions are optional dessert.
Each track below follows the same shape: the model (what discrete-math objects represent the problem), the build (Toolkit pieces assembled, with code and hand-derived output), a correctness argument (why it works, stated as a proof or a clear justification), and extensions (where to take it further). Build the model and the core; the correctness argument is what makes it a mathematics capstone and not just a coding exercise.
🔄 Check Your Understanding 1. You found graph theory intuitive but modular arithmetic slippery. Which track does the chapter's advice steer you toward, and why? 2. Two of the four tracks are most about "getting bits exactly right." Which two, and what kind of bug is most likely in each?
Answers
- Track B (social-network analyzer), because the advice is to consolidate a strength, not patch a weakness — a capstone should build on what you understood best. 2. Tracks A (RSA) and D (error-correcting codes). In A, an off-by-one in message encoding/padding or an exponent computed mod the wrong number silently breaks decryption; in D, an incorrect parity-bit position means a flipped bit is "corrected" to the wrong place. Both fail silently — the output looks plausible but is wrong — which is exactly why a correctness argument matters.
39.3 Track A: an RSA encryption system
The payoff of the RSA thread. Since Chapter 15 (sizing a keyspace), through Chapters 22–24 (gcd, modular arithmetic, the group structure), and into Chapter 25 (where you implemented the RSA core), this book has been building toward a working public-key cryptosystem. Track A turns that core into a system: something that encrypts and decrypts actual text, and that can sign a message so a recipient knows it came from you.
🔗 Connection. This track is the direct continuation of Chapter 25's Project Checkpoint, which gave you
rsa_keygen,rsa_encrypt, andrsa_decryptincrypto.py. If you have not built that, build it first — Track A assumes it.
The model
RSA operates on numbers, but you want to send text. So the first modeling question is: how does a string become a number that RSA can encrypt? The answer reuses ideas from all over the book.
A message is a sequence of characters; each character is a number (its code point — recall the function "char → integer" is exactly a function in the sense of Chapter 9). The cleanest encoding for a learning implementation: convert the whole message to bytes, then interpret those bytes as one big integer in base 256. Encryption is then a single modular exponentiation, $c = m^e \bmod n$, and decryption is $m = c^d \bmod n$ — the operations you proved correct in Chapter 25.
There is one constraint, and it is a counting constraint (Part III thinking): the integer $m$ must be smaller than the modulus $n$, because arithmetic happens in $\mathbb{Z}_n$ — there are only $n$ residues, so any $m \ge n$ collides with $m \bmod n$ and cannot be recovered. So either the message must be short enough that $m < n$, or it must be split into blocks each smaller than $n$. A real system uses hybrid encryption (RSA encrypts a short symmetric key, which encrypts the bulk), but for the capstone, block-by-block is the honest, understandable choice.
💡 Intuition: RSA is a lock whose closing mechanism (the public key) is different from its opening mechanism (the private key). Anyone can snap the lock shut around a message; only the holder of the private key can open it. The number theory of Chapters 23–25 is what makes the two keys mathematically inverse without either revealing the other.
The build
The core already exists. Track A adds a thin layer: encode text → integer, block the message so each block is below $n$, and decode integer → text. Here is the encoding layer.
# Track A core: text <-> integer, then RSA per block.
from dmtoolkit.crypto import rsa_encrypt, rsa_decrypt
def text_to_int(s):
return int.from_bytes(s.encode("utf-8"), "big")
def int_to_text(m):
length = (m.bit_length() + 7) // 8 # bytes needed to hold m
return m.to_bytes(length, "big").decode("utf-8")
m = text_to_int("Hi") # 'H'=72, 'i'=105
print(m) # 72*256 + 105 = 18537
print(int_to_text(m))
# Expected output:
# 18537
# Hi
The output is hand-derived: "Hi" encodes to the two bytes 72, 105; read big-endian as one integer that is $72 \times 256 + 105 = 18432 + 105 = 18537$. Going back, $18537$ needs $\lceil 15/8 \rceil = 2$ bytes (its bit length is 15), which decode to "Hi". Now wire it to the RSA core with a real key:
from dmtoolkit.crypto import rsa_keygen_from_primes, rsa_encrypt, rsa_decrypt
pub, d = rsa_keygen_from_primes(61, 53, 17) # n = 3233, from Chapter 25
n = pub[0]
m = text_to_int("A") # 'A' = 65, and 65 < 3233 -> one block
c = rsa_encrypt(m, pub) # 65^17 mod 3233
back = rsa_decrypt(c, d, n) # c^2753 mod 3233
print(m, c, back, int_to_text(back))
# Expected output:
# 65 2790 65 A
The numbers are exactly Chapter 25's hand-verified worked key: with $n=3233$, $e=17$, $d=2753$, encrypting $m=65$ gives $c = 65^{17} \bmod 3233 = 2790$, and decrypting recovers $65$, which decodes to "A" (byte 0x41). (We chose a one-character message because the toy modulus $n = 3233$ only has room for messages with $m < 3233$ — i.e., a single byte. A real key with $n$ thousands of bits long holds hundreds of bytes per block.)
⚠️ Common Pitfall: Forgetting the $m < n$ constraint. With the toy modulus $3233$,
text_to_int("Hi") == 18537, which is larger than $3233$. Encrypting it computes $18537^{17} \bmod 3233$, and decryption returns $18537 \bmod 3233 = 2372 \neq 18537$ — silent corruption. The fix is blocking: split the byte string into chunks each guaranteed below $n$. This is the single most common bug in a first RSA implementation, and it never throws an exception — it just returns the wrong plaintext.
Correctness argument
Why does decryption invert encryption? You proved this in Chapter 25, and the capstone is the place to restate the result you are relying on — a professional habit, because a system is only as trustworthy as the theorem under it.
The strategy. We must show $(m^e)^d \equiv m \pmod n$ whenever $0 \le m < n$. The exponents multiply, so the claim is $m^{ed} \equiv m \pmod n$. The key fact is that $d$ was chosen so that $ed \equiv 1 \pmod{\phi(n)}$, i.e. $ed = 1 + k\phi(n)$ for some integer $k$. Then Euler's theorem (Chapter 23), $m^{\phi(n)} \equiv 1 \pmod n$ for $m$ coprime to $n$, collapses the extra factor. The coprime-to-$n$ gap is patched with the Chinese Remainder Theorem.
RSA correctness (restated). Let $n = pq$ for distinct primes $p, q$, let $\phi(n) = (p-1)(q-1)$, and let $e, d$ satisfy $ed \equiv 1 \pmod{\phi(n)}$. Then for every integer $m$ with $0 \le m < n$, $\ m^{ed} \equiv m \pmod n$.
Proof (sketch — full version in Chapter 25). Write $ed = 1 + k\phi(n)$. If $\gcd(m, n) = 1$, then by Euler's theorem $m^{\phi(n)} \equiv 1 \pmod n$, so $$m^{ed} = m^{1 + k\phi(n)} = m \cdot \left(m^{\phi(n)}\right)^{k} \equiv m \cdot 1^{k} = m \pmod n.$$ If $\gcd(m,n) \neq 1$ (so $m$ is a multiple of $p$ or of $q$), the same congruence holds modulo $p$ and modulo $q$ separately — by Fermat's little theorem on the prime that does not divide $m$, and trivially on the one that does — and the Chinese Remainder Theorem stitches the two congruences into $m^{ed} \equiv m \pmod{n}$. Either way the message is recovered. $\blacksquare$
That single theorem is the load-bearing wall of Track A. Everything else — encoding, blocking, signatures — is plumbing around it.
Extensions
- Blocking (do this one — it makes the system usable). Split the message bytes into fixed-size blocks each below $n$, encrypt each, and concatenate. The decoder reverses it.
- Digital signatures. Run RSA "backwards": sign by computing $s = H(m)^d \bmod n$ (decrypt-with-private), verify by checking $s^e \equiv H(m) \pmod n$ (encrypt-with-public), where $H$ is a hash from your
coding.py/Chapter 26 work. A valid signature proves the holder of $d$ produced it. - Key generation from random primes. Replace
rsa_keygen_from_primeswithrsa_keygen(bits), which draws random primes of a given bit length using a primality test. Chapter 25'sproject-checkpoint.pyshows the wrapper (the prime search is shown, not executed, per the no-run rule).
39.4 Track B: a social-network analyzer
The payoff of the social-graph thread. Since Chapter 27 you have modeled people as vertices and relationships as edges. Chapter 28 gave you BFS for "degrees of separation"; Chapter 29, shortest paths; Chapter 33, coloring for conflict-free grouping. Track B assembles these into an analyzer: feed it a friendship graph and it reports the network's structure — who is central, how far apart people are, how the network splits into groups.
🔗 Connection. This track stands directly on the
graphs.pymodule: theGraphclass (Chapter 27),bfs(Chapter 28), andgreedy_coloring(Chapter 33). It is the explicit Track B payoff named in the anchor schedule.
The model
The model is the most natural in the book: a social network is an undirected graph $G = (V, E)$ where $V$ is the set of people and an edge $\{u, v\}$ means "$u$ and $v$ are friends." Friendship is symmetric, so the graph is undirected — exactly the Graph class from Chapter 27.
From that one model, every metric you want is a graph-theory quantity you already know how to compute:
| Question about the network | Graph quantity | Toolkit tool |
|---|---|---|
| How connected is person $v$? | $\deg(v)$ | g.degree(v) |
| How far apart are two people? | shortest-path distance (unweighted) | bfs(g, s) then dist[t] |
| Who is "everyone's friend"? | maximum-degree vertex | g.degree over vertices() |
| How many separate friend-clusters? | connected components | repeated bfs |
| What is the network's "width"? | diameter (max distance) | bfs from every vertex |
| Can we split into conflict-free teams? | a proper coloring | greedy_coloring(g) |
💡 Intuition: "Six degrees of separation" is the claim that the diameter of the human friendship graph is small — that a BFS from anyone reaches everyone in a handful of hops. Your analyzer measures exactly that quantity on whatever graph you feed it. The folklore becomes a number your code prints.
The build
Build a small, concrete network and run the analyzer. We use the friendship graph below (a hypothetical example — six people in two loosely connected clusters):
from dmtoolkit.graphs import Graph, bfs
g = Graph()
edges = [("Ana","Ben"), ("Ana","Cam"), ("Ben","Cam"), # cluster 1
("Cam","Dev"), # bridge
("Dev","Eve"), ("Dev","Fay"), ("Eve","Fay")] # cluster 2
for u, v in edges:
g.add_edge(u, v)
# Degree centrality: who has the most friends? (sort first so ties break
# deterministically -- among degree-3 vertices, 'Cam' < 'Dev' wins)
central = max(sorted(g.vertices()), key=g.degree)
print("most central:", central, "with degree", g.degree(central))
# Degrees of separation from Ana
dist, _ = bfs(g, "Ana")
print("Ana -> Eve:", dist["Eve"], "hops")
# Expected output:
# most central: Cam with degree 3
# Ana -> Eve: 3 hops
Hand-trace it (the Chapter 6 habit). Degrees: Ana = {Ben, Cam} = 2; Ben = {Ana, Cam} = 2; Cam = {Ana, Ben, Dev} = 3; Dev = {Cam, Eve, Fay} = 3; Eve = {Dev, Fay} = 2; Fay = {Dev, Eve} = 2. The maximum degree is 3, achieved by both Cam and Dev — a genuine tie. Sorting the vertices first makes the tie-break deterministic: iterating [Ana, Ben, Cam, Dev, Eve, Fay], max keeps the first vertex to reach degree 3, which is Cam. (That two people tie for "most central" is itself a finding worth reporting — your write-up should name both, not hide the tie behind an arbitrary choice.) BFS from Ana: Ana at distance 0; Ben, Cam at 1; Dev at 2; Eve, Fay at 3. So Ana is 3 hops from Eve — across the bridge at Cam–Dev.
Now the two structural metrics — components and diameter — built by iterating BFS:
def components(g):
"""Return a list of vertex-sets, one per connected component."""
seen, comps = set(), []
for v in g.vertices():
if v not in seen:
dist, _ = bfs(g, v) # reaches v's whole component
comp = set(dist) # keys of dist = reachable from v
comps.append(comp)
seen |= comp
return comps
def diameter(g):
"""Longest shortest-path distance, over a single connected graph."""
best = 0
for v in g.vertices():
dist, _ = bfs(g, v)
best = max(best, max(dist.values())) # farthest point from v
return best
print("components:", len(components(g)))
print("diameter:", diameter(g))
# Expected output:
# components: 1
# diameter: 3
The graph is connected (the Cam–Dev bridge joins the two clusters), so there is 1 component containing all six people. The diameter is the largest of all pairwise shortest distances; the farthest-apart pairs are the cluster-1 ends to cluster-2 ends (e.g. Ana to Eve, Ben to Fay), all at distance 3. No pair is farther, so the diameter is 3.
🔄 Check Your Understanding 1. If you delete the single edge
("Cam","Dev")from the graph, what docomponents(g)anddiameter(g)return, and why doesdiameterneed a moment's thought? 2.componentscallsbfsonce per component, not once per vertex, even though the loop is over vertices. How does theseenset arrange that, and what does it do for the running time?
Answers
- Deleting the bridge splits the graph into two triangles {Ana, Ben, Cam} and {Dev, Eve, Fay}, so
components(g)returns 2.diameteras written takes the max distance over all vertices, but with a disconnected graph somedist.values()are over a single component only (BFS from Ana never reaches Dev, so Dev is simply absent fromdist) — within each triangle the diameter is 1, so it returns 1. The subtlety: "diameter" is only well-defined on a connected graph; on a disconnected one you must take the max within components (or call it infinite). 2. After BFS from the first unvisited vertex,seenabsorbs that vertex's whole component, so every other vertex in the same component is skipped by theif v not in seenguard. BFS therefore runs once per component; total work is $O(V + E)$ across all of them, not $O(V \cdot (V+E))$.
Correctness argument
The analyzer's claims rest on properties of BFS you proved in Chapter 28. The central one:
The strategy. We want to know that
dist[t]really is the fewest-edges distance from the source — not just some path length. BFS explores vertices in order of distance, layer by layer; the proof is an induction on the distance $d$, showing every vertex at true distance $d$ is discovered exactly when the queue is processing layer $d-1$.BFS computes shortest-path distances. In BFS from $s$ on an unweighted graph, for every reachable vertex $v$, the value
dist[v]equals the number of edges on a shortest $s$–$v$ path.
Proof (by induction on the BFS layer, restated from Chapter 28). Let $\delta(v)$ be the true shortest-path distance from $s$. Claim: vertices are dequeued in nondecreasing order of $\delta$, and dist[v] =$\delta(v)$ for all reachable $v$. Base case: $s$ is dequeued first with dist[s] = 0 =$\delta(s)$. Inductive step: suppose all vertices at distance $\le k$ have been correctly labeled and enqueued before any vertex at distance $k+1$. A vertex $w$ with $\delta(w) = k+1$ has a neighbor $u$ with $\delta(u) = k$ (take the second-to-last vertex on a shortest path). When $u$ is dequeued, $w$ — if not already discovered — is assigned dist[w] = dist[u] + 1 = k+1 and enqueued. No shorter label is possible, since any neighbor of $w$ has distance $\ge k$, so $w$ cannot be labeled before layer $k$ is processed. Hence dist[w] =$k+1 = \delta(w)$. By induction the labels are exactly the distances. $\blacksquare$
Because dist is correct, diameter (a max over correct distances) and components (the reachable set from each BFS) are correct too. This is the capstone lesson in miniature: a system is correct because the pieces it composes are correct. You did not re-prove BFS here; you cited the proof and built on it.
Extensions
- Recommend friends ("people you may know"). For person $v$, score every non-neighbor $w$ by the number of common friends $\lvert N(v) \cap N(w) \rvert$ (a set intersection — Chapter 8) and recommend the top scorers. This is the real algorithm behind the feature, in a dozen lines.
- Community detection by coloring. Run
greedy_coloring(Chapter 33) — vertices that cannot share a color are adjacent, so colors hint at independent sets; or detect tightly knit groups by looking for dense subgraphs. - Weighted closeness. Attach weights (interaction frequency) and swap
bfsfordijkstra(Chapter 29) to get weighted closeness centrality.
39.5 Track C: a Sudoku solver
Track C is the constraint-satisfaction track. It uses less of the Toolkit and more of the reasoning style of Part I (logic) and the search ideas behind backtracking. It is the best choice if you enjoyed the puzzle-solving feel of proofs and want to see logic become an algorithm.
The model
A Sudoku puzzle is a constraint satisfaction problem (CSP): a set of variables, each with a domain of possible values, and a set of constraints that restrict which combinations are allowed. For Sudoku:
- Variables: the 81 cells of the $9 \times 9$ grid.
- Domains: each empty cell can hold a digit $1$ through $9$; each given cell is fixed.
- Constraints: every row, every column, and every $3 \times 3$ box must contain the digits $1$–$9$ with no repeats — that is, the nine cells in each unit are all different.
That "all different" constraint is pure discrete math. "No two cells in this row are equal" is a conjunction of $\binom{9}{2} = 36$ pairwise inequalities — combinatorics (Chapter 16) describing the structure, and logic (Chapter 1) describing each clause. Solving the puzzle means finding an assignment of digits to variables that makes every constraint true: a satisfying assignment, the same notion as satisfiability from Chapter 1, specialized to this CSP.
💡 Intuition: Sudoku is a tiny, friendly cousin of SAT, the satisfiability problem from Chapter 1 (and the canonical NP-complete problem of Chapter 37). You are searching for an assignment that satisfies a pile of constraints. Sudoku is easy in practice because the constraints are tight — they prune the search tree aggressively — even though the general problem in its class is hard.
🔗 Connection. Generalized $n \times n$ Sudoku is NP-complete (Chapter 37's territory). The $9 \times 9$ board you solve here is small and over-constrained, so a simple backtracking search dispatches it in milliseconds — a concrete example of the gap between worst-case hardness and typical-case ease that Chapter 30 and Chapter 37 discuss.
The build
The algorithm is backtracking search: find an empty cell, try each legal digit, recurse, and undo the choice if the recursion fails. This is depth-first search (Chapter 28) over the tree of partial assignments. Represent the grid as a list of 9 rows of 9 integers, with 0 for blank.
def is_legal(grid, r, c, v):
"""May digit v go in cell (r, c)? Check row, column, and 3x3 box."""
if any(grid[r][j] == v for j in range(9)): return False # row
if any(grid[i][c] == v for i in range(9)): return False # column
br, bc = 3 * (r // 3), 3 * (c // 3) # box corner
return all(grid[br + i][bc + j] != v for i in range(3) for j in range(3))
def solve(grid):
"""Fill grid in place; return True iff a solution exists."""
for r in range(9):
for c in range(9):
if grid[r][c] == 0: # first empty cell
for v in range(1, 10): # try digits 1..9
if is_legal(grid, r, c, v):
grid[r][c] = v
if solve(grid): # recurse
return True
grid[r][c] = 0 # undo (backtrack)
return False # no digit worked here
return True # no empty cell: solved
A miniature trace shows the backtracking logic without a full board. Suppose the first empty cell is (0,0) and the only legal digit there is 7 (all others appear in its row, column, or box). Then solve sets grid[0][0] = 7 and recurses. If the recursion eventually returns False (some later cell has no legal digit), the line grid[r][c] = 0 erases the 7 and the for v loop continues — but since 7 was the only legal value, the loop ends and this call returns False, propagating the failure upward to undo an earlier choice. That undo-and-retry is the entire engine.
print(is_legal([[0]*9 for _ in range(9)], 0, 0, 5)) # empty board: 5 is fine
# Expected output:
# True
On a completely empty board, no row, column, or box contains anything, so every any(...) is False and the all(...) is True: digit 5 is legal at (0,0). (The full solver and a worked $9\times 9$ puzzle live in this chapter's code/ folder; we do not run it here, per the book's no-execution rule — but the logic above is hand-verifiable.)
Correctness argument
Two things must be argued: that the solver never returns a wrong board, and that it finds a solution whenever one exists.
The strategy. Soundness (no wrong answer) is immediate from the construction: a digit is only ever placed after
is_legalconfirms it violates no constraint, so any completed board satisfies every constraint. Completeness (finds a solution if one exists) is the interesting half — it follows because backtracking is an exhaustive search of a finite tree: it tries every legal value at every cell, so it cannot miss a satisfying assignment that exists.The backtracking solver is sound and complete. If
solve(grid)returnsTrue, the resulting grid is a valid Sudoku solution extending the input. If a valid solution extending the input exists,solvereturnsTrue(and produces one).
Proof. Soundness: induct on the number of filled cells. The base case is a full grid with no empty cell, which solve returns True for only after every cell was filled by an is_legal-checked placement; since each placement violated no row/column/box constraint at the time, and constraints are never weakened by later placements, the final grid satisfies all constraints. Completeness: the search tree of partial assignments is finite (at most 81 cells, each branching at most 9 ways), and solve performs a depth-first traversal that, at each empty cell, tries every legal digit before reporting failure. Suppose a valid completion $S$ exists. Consider the first empty cell; $S$ assigns it some legal digit $v$. The loop tries $v$ (perhaps after other digits that lead to dead ends and are backtracked), recurses on a subproblem that still admits $S$, and by induction on the number of empty cells the recursion succeeds. Hence solve returns True. The traversal is finite, so it always terminates. $\blacksquare$
⚠️ Common Pitfall: Forgetting the backtrack line
grid[r][c] = 0. Without it, a digit that led to a dead end is left in the grid when the loop tries the next digit, corrupting the constraints for all subsequent cells — the search explores a polluted board and either fails to find a real solution or "finds" an invalid one. The undo is what makes the search a tree traversal rather than a one-way street.
Extensions
- Constraint propagation (do this for a real speedup). Before guessing, repeatedly fill any cell that has only one legal digit (a "naked single"). This shrinks the search dramatically and is the technique human solvers use.
- Count solutions / check uniqueness. A well-formed Sudoku has exactly one solution. Modify
solveto keep searching after the first success and count solutions; a valid puzzle returns 1. - Generalize the model. The same
solveskeleton handles any CSP — graph coloring (Chapter 33), the $n$-queens problem, scheduling — by swapping the variables, domains, andis_legalconstraint check. The engine is constraint-independent.
39.6 Track D: an error-correcting code
Track D is the bit-level engineering track. It builds a system that survives a noisy channel: encode data with redundancy, corrupt a bit in transit, and correct the corruption at the far end. It is the payoff of the algebra in Chapter 24 and the coding theory of Chapters 26 and 38, and the best choice if you liked thinking in bits and finite fields.
🔗 Connection. This track builds on
coding.py—hamming_distance,hamming_encode,hamming_decode(Chapters 26 and 38) — and is the explicit Track D payoff named in the anchor schedule. It assumes the Hamming-code material of Chapter 38.
The model
The setting is the noisy channel: you send bits, and the channel may flip some of them. The model that makes correction possible is a code — a carefully chosen subset of bit strings (the codewords) spread far apart from each other, measured by Hamming distance (the number of positions in which two equal-length strings differ, from Chapter 26).
The governing quantity is the minimum distance $d$ of the code — the smallest Hamming distance between any two distinct codewords. The whole theory rests on one inequality you met in Chapter 38:
A code with minimum distance $d$ can detect up to $d - 1$ errors and correct up to $\left\lfloor \frac{d-1}{2} \right\rfloor$ errors.
For the Hamming(7,4) code — 4 data bits encoded as 7 bits using 3 parity bits — the minimum distance is $d = 3$, so it corrects $\lfloor 2/2 \rfloor = 1$ error per 7-bit block. One flipped bit anywhere in a block is silently and exactly repaired.
💡 Intuition: Spread the valid codewords far apart in "bit-string space," like towns separated by wide moats. A single bit-flip nudges a codeword a little way into a moat, but it stays closer to its own town than to any other — so the decoder, by snapping the received word to the nearest codeword, lands back home. Correction is nearest-neighbor decoding; minimum distance is how wide the moats are.
The build
The Hamming(7,4) encoder places 4 data bits in positions 3, 5, 6, 7 of a 7-bit codeword and computes 3 parity bits in positions 1, 2, 4, each covering a specific subset of positions. The decoder recomputes the parities; their pattern (the syndrome) is the binary index of the flipped bit — or zero if none flipped.
from dmtoolkit.coding import hamming_encode, hamming_decode, hamming_distance
data = [1, 0, 1, 1] # 4 data bits
code = hamming_encode(data) # -> 7-bit codeword
print("encoded:", code)
# Simulate the channel flipping exactly one bit (position index 4):
received = code[:]
received[4] ^= 1 # flip one bit
print("flipped distance:", hamming_distance(code, received))
decoded = hamming_decode(received) # corrects the single flip
print("decoded:", decoded)
# Expected output:
# encoded: [0, 1, 1, 0, 0, 1, 1]
# flipped distance: 1
# decoded: [1, 0, 1, 1]
The encoded value is hand-derived from the standard Hamming(7,4) layout (data bits in positions 3,5,6,7 = 1,0,1,1; parity bits $p_1,p_2,p_4$ chosen for even parity over their covered positions give the codeword 0 1 1 0 0 1 1 in positions 1..7). Flipping position index 4 changes exactly one bit, so the Hamming distance between original and received is 1. The decoder recomputes the parity checks, finds the syndrome pointing at the flipped position, flips it back, and extracts the original 4 data bits [1, 0, 1, 1]. The error is gone — that is error correction.
🔄 Check Your Understanding 1. Hamming(7,4) has minimum distance $d = 3$. How many errors can it detect but not correct, and what happens if two bits flip in one block? 2. The "code rate" is data bits over total bits. What is Hamming(7,4)'s rate, and what is the engineering trade-off it represents?
Answers
- With $d = 3$ it detects up to $d - 1 = 2$ errors but corrects only $\lfloor (d-1)/2 \rfloor = 1$. If two bits flip, the received word may land closer to a different codeword; the decoder will "correct" it to the wrong codeword — producing a confident, wrong answer. This is why real systems either keep the per-block error rate low (interleaving, short blocks) or use codes with larger $d$. 2. The rate is $4/7 \approx 0.57$: you pay 3 redundant bits for every 4 data bits. The trade-off is robustness vs. efficiency — more parity means more correcting power but lower throughput. Choosing the rate is choosing how much bandwidth to spend on reliability.
Correctness argument
Why does nearest-codeword decoding correct any single error? The argument is a clean consequence of the minimum-distance bound, and it is a satisfying place to use the triangle inequality.
The strategy. Hamming distance is a metric — in particular it satisfies the triangle inequality. If codewords are at least $d = 3$ apart and the channel flips one bit, the received word is distance 1 from the true codeword and therefore (by the triangle inequality) at least distance 2 from every other codeword. So the true codeword is the unique nearest one, and snapping to the nearest neighbor recovers it.
Single-error correction. A code with minimum distance $d \ge 3$ corrects any single-bit error by decoding to the nearest codeword.
Proof. Let $x$ be the transmitted codeword and $y$ the received word with exactly one bit flipped, so the Hamming distance $\operatorname{dist}(x, y) = 1$. Let $x'$ be any other codeword, $x' \ne x$. By definition of minimum distance, $\operatorname{dist}(x, x') \ge d \ge 3$. Hamming distance satisfies the triangle inequality (it is the number of differing positions, a genuine metric), so $$\operatorname{dist}(x, x') \le \operatorname{dist}(x, y) + \operatorname{dist}(y, x'),$$ hence $$\operatorname{dist}(y, x') \ge \operatorname{dist}(x, x') - \operatorname{dist}(x, y) \ge 3 - 1 = 2 > 1 = \operatorname{dist}(y, x).$$ So $y$ is strictly closer to $x$ than to any other codeword $x'$. Nearest-codeword decoding therefore returns $x$, correcting the error. $\blacksquare$
That proof is the entire justification for error correction, and it is pure discrete math: a metric, an inequality, and the pigeonhole-flavored insight that well-separated codewords leave no room for ambiguity.
Extensions
- Burst errors and interleaving. Hamming(7,4) corrects one error per block. Real channels produce bursts (consecutive flips). Interleaving — transmitting bits from many blocks in a rotated order — spreads a burst across blocks so each block sees at most one error.
- Detect double errors (SECDED). Add one overall parity bit to make Hamming(8,4): it then corrects single errors and detects double errors, the scheme used in ECC memory.
- Reed–Solomon (read about it). The codes in QR codes, CDs, and deep-space links work over the finite field $\mathrm{GF}(2^8)$ (Chapter 24) and correct whole symbols, not just bits. Implementing one is a substantial project; understanding why it needs a finite field is a satisfying capstone-of-the-capstone.
39.7 Presenting your work
A capstone is not finished when the code runs. It is finished when you can explain it to someone who was not in your head while you built it. Communication is a first-class engineering skill — an unexplained system is an untrustworthy one — and it is the last thing this book asks of you.
A good technical write-up for your capstone has a predictable shape. Use it:
- Problem and model. State the problem in one paragraph, then say what discrete-math objects you used to represent it ("I modeled the network as an undirected graph $G = (V, E)$ where…"). The model is the intellectual core; lead with it.
- Design and build. Describe the architecture — which Toolkit modules, which new code, how they compose. A small diagram or a list of functions with one-line contracts is worth a page of prose.
- Correctness. State the central correctness claim and its argument. This is what makes it a mathematics project. Cite the theorems you relied on (e.g., "decryption inverts encryption by Euler's theorem, Chapter 25").
- Evidence. Show it working: inputs, outputs, and — crucially — the limits you tested. Be honest about what you did not verify.
- Complexity and limits. State the running time of the core operation and the scope of the implementation ("corrects one error per block"; "toy modulus, not secure for real use"). Naming a limitation is a strength, not a weakness.
Two principles govern the evidence you present, both straight from the themes of this book:
💡 Intuition: proof and testing are different kinds of evidence — report both honestly. A passing test on a thousand inputs says "I found no bug." A proof says "there is no bug." Use tests to demonstrate and proofs to guarantee; never dress one up as the other. "All my tests passed" is not "it is correct" — that distinction is theme two of this book, and it is the mark of someone who understands what they built.
⚠️ Common Pitfall: Overclaiming. The fastest way to lose a reader's trust is to call a toy RSA "secure," or a backtracking solver "efficient on all inputs," or a single-error code "robust." State the exact guarantee your correctness argument gives you, and not one inch more. A precise, modest claim you can defend beats an impressive claim you cannot.
🔄 Check Your Understanding 1. Your write-up says "I tested decryption on 10,000 random messages and all decrypted correctly." Is that a proof of correctness? What single sentence would you add to make the correctness claim sound? 2. Which section of the write-up template carries the "mathematics" of the project, and why is citing earlier theorems a strength rather than a sign you did less work?
Answers
- No — 10,000 passing cases is evidence, not proof; there are infinitely many possible messages. The sentence to add is the theorem: "and decryption is guaranteed correct for every $m < n$ by the RSA correctness theorem (Euler's theorem, Chapter 25), which proves $m^{ed} \equiv m \pmod n$." That converts demonstration into guarantee. 2. The Correctness section. Citing earlier theorems is a strength because it shows you know why the system works and can locate the load-bearing result — composing proven pieces is exactly the engineering skill the capstone tests, and re-deriving Euler's theorem from scratch would be wasted effort, not extra credit.
Project Checkpoint: assembling and shipping dmtoolkit
This is the last Project Checkpoint, and its job is different from all the others. The previous thirty-odd checkpoints each added a function. This one assembles the package and gives it a front door — turning a folder of modules into a tool a person can actually use.
Two small pieces complete the Toolkit. First, a package initializer that re-exports the headline functions so a user can write from dmtoolkit import rsa_encrypt, bfs, hamming_decode without knowing which module each lives in:
# dmtoolkit/__init__.py -- the package's public front door
from dmtoolkit.numbertheory import gcd, mod_inverse, mod_pow, crt, sieve
from dmtoolkit.crypto import rsa_keygen_from_primes, rsa_encrypt, rsa_decrypt
from dmtoolkit.graphs import Graph, bfs, dfs, dijkstra, greedy_coloring
from dmtoolkit.coding import hamming_distance, hamming_encode, hamming_decode
__all__ = ["gcd", "mod_inverse", "mod_pow", "crt", "sieve",
"rsa_keygen_from_primes", "rsa_encrypt", "rsa_decrypt",
"Graph", "bfs", "dfs", "dijkstra", "greedy_coloring",
"hamming_distance", "hamming_encode", "hamming_decode"]
__version__ = "1.0"
Second, an integration test — a single function that exercises one capability from each major module and asserts the round-trip works, so that any future change which breaks composition is caught immediately:
def self_test():
"""Smoke-test that the modules still compose. Returns True if all pass."""
assert gcd(252, 198) == 18 # numbertheory
pub, d = rsa_keygen_from_primes(61, 53, 17)
assert rsa_decrypt(rsa_encrypt(66, pub), d, pub[0]) == 66 # crypto round-trip
g = Graph()
for u, v in [(0, 1), (1, 2)]:
g.add_edge(u, v)
assert bfs(g, 0)[0][2] == 2 # graphs: dist 0->2
assert hamming_decode(hamming_encode([1, 0, 1, 1])) == [1, 0, 1, 1] # coding
return True
print(self_test())
# Expected output:
# True
Every assertion is hand-derived from earlier checkpoints: $\gcd(252,198)=18$ (Chapter 22); the RSA round-trip recovers $66$ (Chapter 25); BFS distance from vertex 0 to vertex 2 along the path $0\!-\!1\!-\!2$ is 2 (Chapter 28); and Hamming encode-then-decode returns the original 4 bits when no error is introduced (Chapter 38). If all four hold, self_test returns True.
This is what "done" looks like in software: not just code that works once, but a package with a clean public interface and a test that proves the pieces still fit together. Toward the capstone: with __init__.py and self_test in place, your dmtoolkit is no longer a pile of exercises — it is a library, and it is the foundation under whichever track you built above. The full assembled package is reproduced in Appendix I.
Toolkit complete:
logic.py,sets.py,relations.py,combinatorics.py,recurrences.py,probability.py,numbertheory.py,crypto.py,graphs.py,coding.py— ten modules, one package, built from scratch across thirty-nine chapters. You understand every line, because you wrote every line.
Summary
The capstone is the integration exercise: take the dmtoolkit package and the body of theory behind it, and build one complete system you can defend and explain.
The four tracks at a glance:
| Track | System | Core Toolkit pieces | Load-bearing theorem |
|---|---|---|---|
| A | RSA encryption | crypto.py, numbertheory.py |
$m^{ed} \equiv m \pmod n$ (Euler, Ch. 25) |
| B | Social-network analyzer | Graph, bfs, greedy_coloring |
BFS computes shortest distances (Ch. 28) |
| C | Sudoku solver | backtracking (DFS), is_legal |
Backtracking is sound and complete |
| D | Error-correcting code | coding.py (Hamming 7,4) |
$d \ge 3 \Rightarrow$ corrects 1 error (triangle ineq.) |
The reusable shape of every track:
- Model — choose the discrete-math objects (graph, $\mathbb{Z}_n$, CSP, code) that represent the problem.
- Build — compose Toolkit pieces; write only the thin new glue.
- Correctness — state the central claim and prove it (or cite the proof you built earlier).
- Present — problem/model, design, correctness, evidence, limits.
Principles to carry out of the book:
- A correct system is a composition of correct parts — cite the theorems under your code.
- Tests demonstrate; proofs guarantee. Report both, and never confuse them.
- Choose the right model first; the model is the intellectual core, the code is plumbing.
- State the exact guarantee your argument gives — precise and modest beats impressive and indefensible.
Spaced Review
The capstone draws on the whole book, so this review does too. Answer from memory before checking.
- (Ch. 25 / 23) In Track A, decryption works because $ed \equiv 1 \pmod{\phi(n)}$. Which theorem turns that congruence into $m^{ed} \equiv m \pmod n$, and what role does the Chinese Remainder Theorem play when $\gcd(m, n) \ne 1$?
- (Ch. 28) Track B's analyzer uses BFS for "degrees of separation." Why does BFS — and not DFS — give the shortest number of hops, in one sentence?
- (Ch. 5) Track C searches for a satisfying assignment. If a Sudoku has no solution, the solver returns
Falseby exhausting the tree. Which Chapter 5 proof technique is "exhaust all cases and find none works" an instance of?- (Ch. 6) Each correctness argument in this chapter used induction or a cited inductive proof. State, in one line, what the inductive hypothesis was in the proof that BFS computes shortest distances.
- (Ch. 17 / 26) Track D's correction guarantee rests on codewords being spread far apart. Which counting principle from Chapter 17 explains why you cannot have too many codewords if you insist on a large minimum distance?
Answers
- Euler's theorem, $m^{\phi(n)} \equiv 1 \pmod n$ for $m$ coprime to $n$: writing $ed = 1 + k\phi(n)$ gives $m^{ed} = m \cdot (m^{\phi(n)})^k \equiv m$. When $\gcd(m, n) \ne 1$, Euler's theorem does not apply directly, so you prove the congruence modulo $p$ and modulo $q$ separately (Fermat's little theorem) and the CRT reassembles them into a single congruence mod $n$.
- BFS explores vertices in order of increasing distance (layer by layer), so the first time it reaches a vertex is along a path with the fewest edges; DFS plunges deep first and may reach a vertex by a long path before a short one.
- Proof by cases (exhaustive cases) — and more pointedly, it is how you establish a universal negative ("no assignment satisfies the constraints") by checking that every case fails, the same logic behind a disproof by exhausting possibilities.
- The inductive hypothesis was: all vertices at shortest-path distance $\le k$ have already been correctly labeled (
dist= true distance) and enqueued before any vertex at distance $k+1$ is dequeued.- The pigeonhole principle (Chapter 17): the space of $2^n$ bit strings is finite, and demanding that every pair of codewords differ in many positions packs "balls" of a fixed radius around each codeword that must not overlap — there is only so much room, so a large minimum distance forces few codewords (the trade-off between code rate and correcting power).
What's Next
You have built something. The final chapter, Chapter 40, steps back from building to looking ahead: where these tools take you next. The discrete math in this book is the foundation under whole fields — combinatorial optimization and linear programming, information and coding theory, the probabilistic method, even the category theory that organizes modern functional programming. Chapter 40 maps those directions, connects them to the rest of the computer-science curriculum, and offers a reading path forward. The Toolkit is finished; your use of these ideas is just beginning.