Case Study: Building a Running-Time Predictor from Closed Forms
"The purpose of computing is insight, not numbers." — Richard W. Hamming
Executive Summary
In this build-focused case study you will construct a small, reusable module — cost.py — that turns
the chapter's closed forms into a practical tool: given the shape of a loop, it returns a closed-form
step count and uses that formula to predict the running time at any input size, in $O(1)$ time,
without ever running the loop. You will build it in layers — first the exact summation, then each
closed-form replacement (arithmetic, geometric, factorial), then a verifier that pits closed form
against brute force the way Chapter 6's check_identity does — and finish by predicting the cost of
the three signature loop shapes from §11.6 before reading a single nanosecond off a clock. This is the
construction side of the chapter: where Case Study 1 analyzed one function, here you build the
machinery that analyzes any of them, and feeds straight into the dmtoolkit work of later chapters.
Skills applied
- Implementing summation and the arithmetic/geometric/factorial closed forms (§11.2–11.5).
- Verifying a closed form against a brute-force loop (the
check_identitypattern, §11.6). - Using a closed form as an $O(1)$ predictor in place of an $O(n)$ count.
- Recognizing the $\Theta(n^2)$, $\Theta(n)$, and $\Theta(n\log n)$ loop signatures (§11.6).
Background
The scenario
Your team keeps re-deriving the same loop costs by hand, and the derivations keep drifting (someone forgets the $+1$ on an inclusive bound; someone else mixes up $2^n$ and $2^n - 1$). You decide to capture the chapter's results once, in code, so that "how many steps does this loop shape do?" becomes a function call that is both fast and trustworthy. The requirement is twofold:
- For each loop shape, a brute-force counter that actually runs the loop and tallies the work — slow ($O(n)$ or worse), but unambiguous; it is the ground truth.
- A closed-form predictor for the same shape — a single arithmetic expression, $O(1)$ — that must agree with the brute-force counter on every input you test.
The discipline mirrors theme four of the book: the brute-force counter is computation (it tests), and each closed form is the proof's payoff (it settles all cases at once, once we have proved it in the chapter). The verifier keeps them honest against each other.
💡 Intuition: A closed form is to a loop what a formula is to a table. You could store the cost of every loop size in a table (run the loop, write down the number) — that is the brute-force counter. Or you could store the rule that generates the table and evaluate it on demand — that is the predictor. The whole chapter is about replacing tables with rules.
Why this matters
A running-time predictor is not a toy. Capacity planning ("can this job finish overnight at next quarter's data volume?"), choosing between two algorithms before implementing either, and setting realistic timeouts all rest on evaluating a cost formula at a size you have not measured. The brute force cannot answer "how long at $n = 10^9$?" — it would have to run that long. The closed form answers it in one multiplication. Building the predictor is building the ability to see the future cost of code.
Phase 1: The foundation — exact summation
Start with the most general, slowest, most obviously correct tool: a function that evaluates any sum
by actually adding the terms. This is the ground truth every later closed form must match. It is the
same summation the chapter adds to dmtoolkit, reproduced here as the base of cost.py.
def summation(term, lo, hi):
"""Evaluate sum_{i=lo}^{hi} term(i), inclusive of hi. Empty sum (lo>hi) is 0."""
total = 0
for i in range(lo, hi + 1): # +1 because the upper limit is inclusive
total += term(i)
return total
print(summation(lambda i: i, 1, 100)) # 1 + 2 + ... + 100
# Expected output:
# 5050
Two design choices encode chapter conventions directly. The hi + 1 in the range is the inclusive-upper
-limit rule from §11.2 (drop it and you silently lose the last term). And when lo > hi, range
produces nothing, so total stays 0 — the empty-sum convention, for free.
🔄 Check Your Understanding What does
summation(lambda i: 2**i, 0, 4)return, and which chapter identity does it illustrate?
Answer
$2^0 + 2^1 + 2^2 + 2^3 + 2^4 = 1 + 2 + 4 + 8 + 16 = 31 = 2^5 - 1$ — the powers-of-two identity $\sum_{i=0}^{n-1} 2^i = 2^n - 1$ with $n = 5$.
Phase 2: The closed-form layer
Now add the fast replacements, one per family from the chapter. Each is a single arithmetic expression — no loop — and each is something we proved in §11.3–11.5, so we may trust it for all $n$ once the verifier (Phase 3) confirms we transcribed it correctly.
def arithmetic_sum(first, diff, n):
"""n terms a + (a+d) + ... ; closed form n*(2a+(n-1)d)//2 (Section 11.3)."""
return n * (2 * first + (n - 1) * diff) // 2
def geometric_sum(first, ratio, n):
"""first + first*ratio + ... + first*ratio^(n-1); ratio != 1 (Section 11.3)."""
if ratio == 1:
return first * n # every term equals 'first'
return first * (ratio ** n - 1) // (ratio - 1)
def factorial(n):
"""n! = 1*2*...*n, with 0! = 1 (Section 11.5)."""
result = 1 # empty product is 1
for i in range(2, n + 1):
result *= i
return result
print(arithmetic_sum(1, 1, 100), geometric_sum(1, 2, 5), factorial(5))
# arithmetic: 1+...+100 = 5050 ; geometric: 1+2+4+8+16 = 31 ; 5! = 120
# Expected output:
# 5050 31 120
Each line is the chapter made executable: arithmetic_sum(1, 1, n) is the triangular formula
$\frac{n(n+1)}{2}$; geometric_sum(1, 2, n) is $2^n - 1$; factorial is the product $\prod_{i=1}^{n}
i$ with the empty-product convention baked into the initial result = 1.
⚠️ Common Pitfall: the geometric $r = 1$ branch. The formula $\frac{r^n - 1}{r - 1}$ divides by $r - 1$, which is $0$ when $r = 1$. Omitting the guard makes
geometric_sum(5, 1, n)crash (or, with floats, returnnan). The mathematically correct value when $r = 1$ is just $n$ copies of the first term, $a n$ — exactly what the guard returns. The chapter flagged this as the one special case of the geometric formula; the code must honor it.🐛 Find the Error. A teammate writes
geometric_sumasreturn first * (ratio ** n - 1) / (ratio - 1)(a single/instead of//). For integer inputs likegeometric_sum(1, 3, 4), what goes wrong, and why does//fix it here?
Answer
With
/, Python returns a float:geometric_sum(1, 3, 4)gives40.0instead of the integer40, and for large exponents floats lose precision (e.g. $r^n$ beyond $2^{53}$ is no longer exact), so the predictor would silently disagree with the brute-force integer count. Because the geometric sum of integers is always an integer (the numerator $r^n - 1$ is divisible by $r - 1$), integer division//is exact and correct. Step counts are integers; keep them integers.
Phase 3: The verifier — closed form vs. brute force
A predictor you cannot trust is worse than none. Borrow Chapter 6's check_identity pattern: scan a
range of inputs and report the first one (if any) where the closed form disagrees with the brute-force
summation. None means "no disagreement found on the tested range" — evidence, not proof, but exactly
the safety net that catches a transcription typo.
def check_identity(lhs, rhs, upto=20, start=1):
"""First n in [start, upto] where lhs(n) != rhs(n), else None (from Ch. 6)."""
for n in range(start, upto + 1):
if lhs(n) != rhs(n):
return n
return None
# Verify each closed form against the brute-force summation:
arith_ok = check_identity(lambda n: summation(lambda i: i, 1, n),
lambda n: arithmetic_sum(1, 1, n), upto=50)
geo_ok = check_identity(lambda n: summation(lambda i: 2 ** i, 0, n - 1),
lambda n: geometric_sum(1, 2, n), upto=20)
print(arith_ok, geo_ok)
# Both agree on every tested n -> None None
# Expected output:
# None None
The two Nones say the closed forms match the brute force everywhere we looked. Combined with the
chapter's proofs of those closed forms, that is the belt-and-suspenders confidence theme four asks
for: the test catches the typo, the proof certifies the cases the test never reached.
🔗 Connection. This is the same
check_identitythe reader wired intodmtoolkit/recurrences.pyin Chapter 6 and again in this chapter's Project Checkpoint. Here it earns its keep as the regression test for an entire cost library: any time someone "optimizes" a closed form, re-running the verifier confirms they did not change its value.
Phase 4: The predictor — loop shapes to costs
Now assemble the payoff. For each of the three signature loop shapes from §11.6 we provide both a brute-force counter (the ground truth) and a closed-form predictor (the fast formula), and we verify they agree. Then — the whole point — we evaluate the predictor at a size far beyond what brute force could reach.
# --- Shape 1: triangular double loop -> n(n-1)/2 , Theta(n^2) ---
def pairs_brute(n):
steps = 0
for i in range(n):
for j in range(i + 1, n):
steps += 1
return steps
def pairs_closed(n):
return n * (n - 1) // 2 # = sum_{i=0}^{n-1} (n-1-i), Section 11.6
# --- Shape 2: doubling loop -> number of passes ~ ceil(log2 n) , Theta(log n) ---
def doubling_brute(n):
steps, i = 0, 1
while i < n:
steps += 1
i *= 2
return steps
def doubling_closed(n):
k = 0 # smallest k with 2^k >= n
while (1 << k) < n:
k += 1
return k # = ceil(log2 n) for n >= 1
# --- Shape 3: linear-times-doubling -> n * passes , Theta(n log n) ---
def nlogn_closed(n):
return n * doubling_closed(n)
print(check_identity(pairs_brute, pairs_closed, upto=40)) # None
print(check_identity(doubling_brute, doubling_closed, upto=40)) # None
# Predict costs WITHOUT running the loops, at a size brute force can't reach:
for n in [1_000, 1_000_000]:
print(n, pairs_closed(n), nlogn_closed(n))
# Expected output:
# None
# None
# 1000 499500 19000
# 1000000 499999500000 39000000
Before we read those numbers as gospel, sanity-check the predicted values by hand — the discipline this
whole chapter has been preaching, now turned on our own output. The quadratic counts are easy:
pairs_closed(1000) $= \frac{1000 \cdot 999}{2} = 499{,}500$ and pairs_closed(1_000_000) $=
\frac{10^6(10^6 - 1)}{2} = 499{,}999{,}500{,}000$, matching the printed first column. The $n\log n$
column needs the doubling count. For $n = 1000$: $2^9 = 512 < 1000 \le 1024 = 2^{10}$, so
$\lceil\log_2 1000\rceil = 10$ and the correct value is $1000 \times 10 = 10{,}000$. For
$n = 1{,}000{,}000$: $2^{19} = 524{,}288 < 10^6 \le 1{,}048{,}576 = 2^{20}$, so
$\lceil\log_2 10^6\rceil = 20$ and the correct value is $10^6 \times 20 = 20{,}000{,}000$.
⚠️ Common Pitfall: trusting a predicted number you did not sanity-check. The hand-checks above do not match the
# Expected output:comment, which claims19000and39000000. Those two values are wrong — a transcription slip in the comment — and the correct lines are1000 499500 10000and1000000 499999500000 20000000. This is the chapter's own lesson applied to ourselves: a predicted cost is only as trustworthy as your check of it. We did two things, and you always should: verify the predictor against the brute force on small $n$ (the twoNones fromcheck_identity, which catch transcription bugs in the formula) and recompute one value by hand at large $n$ (which catches a typo in the output, exactly the bug here). Fix the comment before shipping.
With the corrected numbers, read the last two lines as the chapter's destination realized. At $n = 1{,}000$, the quadratic double loop does $499{,}500$ steps versus the $n\log n$ loop's $10{,}000$ — already a $50\times$ gap. At $n = 1{,}000{,}000$ the gap explodes: the quadratic loop needs about $5 \times 10^{11}$ steps (running the brute-force counter for that would itself take ages), while the $n\log n$ loop needs $20$ million — a factor of $25{,}000$. The predictor told us this in microseconds, by evaluating a formula, because each formula is a closed form for a sum we collapsed in §11.6.
Discussion Questions
- The verifier returns
Noneto mean "no counterexample found." Why is that evidence rather than proof that a closed form is correct, and what in this chapter upgrades the evidence to certainty? doubling_closeditself contains a loop (counting up $k$), so it is not strictly $O(1)$ — it is $O(\log n)$. Does that undermine its use as a "fast predictor"? Compare its cost to that of the brute-forcedoubling_bruteit replaces, and to the loop whose cost it is predicting.- We deliberately planted a wrong number in the Phase 4 expected output and caught it with a hand-check. Design a tiny automated assertion (one line) that would have failed loudly on that wrong value. What property of the correct answer does it test?
- Suppose you wanted to predict the cost of the triple-nested loop from Case Study 1
($\binom{n}{3}$). Write the one-line
triples_closed(n)you would add tocost.py, and say how you would verify it without running the $\Theta(n^3)$ brute force for large $n$.
Your Turn: Extensions
- Option A (add a shape). Implement
triples_closed(n)returning $\frac{n(n-1)(n-2)}{6}$ and a matchingtriples_brute(n), verify them withcheck_identityup to $n = 30$, then predict the count at $n = 100{,}000$. By what factor does doubling $n$ multiply the predicted cost? - Option B (geometric work, not just passes). Extend the doubling shape so that pass $j$ does $2^j$ units of work (not $1$), and add a predictor using $\sum_{j=0}^{k-1} 2^j = 2^k - 1$. Verify it against a brute-force counter, and confirm the total work is less than $2n$ — the §11.6 "doubling is cheap" result, now in code.
- Option C (factorial warning). Add
permutations_brute(n)that counts the leaves of an all-orderings search (it should equal $n!$) andfactorial(n)as the predictor. Verify they agree for $n \le 8$, then use the predictor to show why you would never run the brute force for $n = 20$ (report $20!$ and compare it to, say, the number of nanoseconds in a year, $\approx 3.15 \times 10^{16}$).
Key Takeaways
- Build the slow ground truth first, then the fast closed form. The brute-force counter is unambiguous; the closed form is fast; the verifier keeps them in agreement.
- A closed form is an $O(1)$ predictor. It evaluates the cost at sizes the brute force could never reach — the practical reason the chapter cares about collapsing sums.
- Verify and sanity-check.
check_identityon small $n$ catches transcription bugs in the formula; a single hand-computed value at large $n$ catches a typo in the output (the wrong39000000we caught by recomputing $10^6 \times 20$). Use both. - The three loop signatures are now executable. $\Theta(n^2)$ triangular, $\Theta(\log n)$
doubling, and $\Theta(n\log n)$ linear-over-doubling — the same shapes Chapter 14 will name
formally — are each one closed-form line in
cost.py.