Case Study: Building a Complexity Fingerprinter
"An algorithm must be seen to be believed." — Donald E. Knuth, The Art of Computer Programming, Vol. 1
Executive Summary
In this build-focused case study you will construct a small, reusable tool — a complexity
fingerprinter — that takes any algorithm, runs it at a ladder of growing input sizes, counts the
operations, and reports its likely growth class from the doubling ratios. Where Case Study 1
diagnosed one endpoint, here you build the instrument that does the diagnosing automatically, then
turn it loose on a battery of algorithms (constant, logarithmic, linear, quadratic, cubic) and watch
each one announce its $\Theta$-class through its ratio signature. You will assemble the tool in layers —
an op counter, a size ladder, a ratio reporter, and finally a classifier that maps a ratio to a label —
and at each layer you will hand-derive the expected output so you trust it without ever running it. The
result extends the chapter's Project Checkpoint (complexity.py) into a genuinely useful developer
tool, and it makes the §14.5 hierarchy tangible: you will see the numbers $1, 2, 4, 8$ fall out of
the code.
Skills applied
- Building an instrument incrementally, hand-deriving each layer's output (no code execution).
- Counting operations for several algorithms and predicting their doubling ratios (§14.4, §14.5).
- Mapping the empirical ratio $\approx 2^k$ back to the growth class $\Theta(n^k)$ (Project Checkpoint).
- Distinguishing what measurement can establish from what only a proof settles (theme four).
Background
The scenario
Your team keeps shipping "accidentally quadratic" endpoints (the failure mode of Case Study 1). You
decide to build a guardrail: a tiny library, fingerprint, that any engineer can point at a function to
get a quick read on its growth class before it reaches production. The design constraint, straight
from §14.2, is that it must be machine-independent — it must not use wall-clock timing, because
timing is noisy and changes with hardware. Instead it counts operations, exactly as the RAM model
prescribes, and reports the doubling-ratio fingerprint from §14.5: double $n$, and the work multiplies
by $2^k$ for a $\Theta(n^k)$ algorithm.
💡 Intuition: You can read an algorithm's growth class off a ratio without ever knowing the hidden constant $c_2$. If $T(n) = c_2 n^k$, then $\frac{T(2n)}{T(n)} = \frac{c_2 (2n)^k}{c_2 n^k} = 2^k$ — the $c_2$ cancels. That cancellation is why the fingerprinter works on any machine: it never needs to know how fast the machine is, only how the work scales.
Why this matters
A team that can fingerprint a function in thirty seconds catches the $\Theta(n^2)$ surprise of Case
Study 1 in code review instead of in an outage. And building the tool, rather than just using one, forces
you to internalize the §14.4–14.5 machinery: every layer you write is a piece of the chapter made
executable. The tool is also the Project Checkpoint's complexity.py grown up — you will reuse count_ops
and doubling_ratios and add a classifier on top.
Phase 1: The op counter — instrument one algorithm
The foundation is a run function: given an input, it returns the number of primitive operations the algorithm performs (you instrument the algorithm yourself, as §14.2 requires — the RAM model counts operations, so we count them explicitly). Start with the simplest case, a linear scan.
def linear_run(data):
"""Instrumented O(n) scan: one operation per element. Returns the op count."""
ops = 0
for _x in data:
ops += 1 # count visiting each element
return ops
print(linear_run([10, 20, 30, 40]))
# Hand trace: 4 elements, one op each.
# Expected output:
# 4
On a list of length $n$, linear_run returns exactly $n$. That is the operation count $T(n) = n$, the
ground truth we will fingerprint. The discipline here is the §14.2 one: a "run" function reports what
the machine actually does, not how many lines you typed.
🔄 Check Your Understanding Write the one-line predicted operation count $T(n)$ for an instrumented quadratic run (two nested
range(n)loops, one op in the body) and for an instrumented cubic run (three nested loops).
Answer
Quadratic: $T(n) = n^2$ (the body runs $n \cdot n$ times). Cubic: $T(n) = n^3$ (it runs $n \cdot n \cdot n$ times). These are the §14.4 nested-loop counts.
Phase 2: The size ladder and ratio reporter
Next, run a given algorithm at a ladder of sizes and report the ratio of successive op counts. This is
the chapter's count_ops from complexity.py; we reproduce it so the tool is self-contained, then build
on it.
def count_ops(make_input, run, sizes):
"""For each n in sizes: build an input of that size and record the op count.
Returns a list of (n, ops, ratio_to_previous)."""
results, prev = [], None
for n in sizes:
ops = run(make_input(n))
ratio = None if prev is None else ops / prev
results.append((n, ops, ratio))
prev = ops
return results
# Fingerprint the linear algorithm on a doubling ladder.
print(count_ops(lambda n: list(range(n)), linear_run, [10, 20, 40]))
# Hand trace: op counts are n itself -> 10, 20, 40.
# ratios: first is None; 20/10 = 2.0; 40/20 = 2.0.
# Expected output:
# [(10, 10, None), (20, 20, 2.0), (40, 40, 2.0)]
The make_input argument is what lets the tool fingerprint any algorithm: you supply a function that
builds a valid input of size $n$ (here, list(range(n))). For a linear algorithm the ratios sit at
exactly $2.0$ — doubling the input doubled the work, the §14.5 signature of $\Theta(n)$.
⚠️ Common Pitfall: The
make_inputmust build an input whose size is $n$ in the sense the algorithm cares about. For a sorting routine, size is the list length; for an integer routine, size is the bit length $\log_2 N$, not the value (§14.2). Feeding the wrong notion of size produces a fingerprint of the wrong function — a classic §14.2 trap.
Phase 3: From ratios to a verdict — the classifier
Raw ratios are informative, but the guardrail should print a label. Add a thin classifier that maps an observed ratio to its nearest growth class. The map is the §14.5 fingerprint table read backwards: a ratio near $1$ means logarithmic, near $2$ linear, near $4$ quadratic, near $8$ cubic.
def doubling_ratios(make_input, run, start=100, doublings=4):
"""Run count_ops on start, 2*start, 4*start, ...; return just the ratios."""
sizes = [start * (2 ** i) for i in range(doublings + 1)]
return [r for (_, _, r) in count_ops(make_input, run, sizes) if r is not None]
def classify(ratio):
"""Map an observed doubling ratio to its nearest growth-class label."""
table = [(1, "Theta(log n)"), (2, "Theta(n)"),
(4, "Theta(n^2)"), (8, "Theta(n^3)")]
best = min(table, key=lambda kv: abs(ratio - kv[0])) # nearest target
return best[1]
print(classify(3.97), classify(2.01), classify(1.05))
# Hand trace: 3.97 nearest to 4 -> n^2; 2.01 nearest to 2 -> n; 1.05 nearest to 1 -> log n.
# Expected output:
# Theta(n^2) Theta(n) Theta(log n)
classify simply finds which of the target ratios $\{1, 2, 4, 8\}$ the observed ratio is closest to and
returns the matching label. It is deliberately crude — a ratio of $3$ is genuinely ambiguous (between
linear and quadratic) and the tool will guess — but for the clean power-law algorithms it diagnoses, it
is exactly right. We will see its limits in Phase 5.
🔗 Connection. The mapping ratio $\mapsto 2^k \mapsto \Theta(n^k)$ is the §14.5 claim "doubling $n$ multiplies $\Theta(n^k)$ work by $2^k$" turned into a lookup. Building the lookup is how you prove to yourself you understand the claim.
Phase 4: Turn it loose — fingerprint the whole hierarchy
Now assemble the tool against a battery of instrumented algorithms and watch each announce its class. We write one run-function per growth class and fingerprint them all.
def constant_run(data): # Theta(1): work ignores input size
return 1
def quadratic_run(data): # Theta(n^2): full n x n
n, ops = len(data), 0
for _i in range(n):
for _j in range(n):
ops += 1
return ops
def cubic_run(data): # Theta(n^3): three nested loops
n, ops = len(data), 0
for _i in range(n):
for _j in range(n):
for _k in range(n):
ops += 1
return ops
mk = lambda n: list(range(n))
for name, run in [("linear", linear_run), ("quadratic", quadratic_run),
("cubic", cubic_run)]:
ratios = doubling_ratios(mk, run, start=10, doublings=2)
print(name, ratios, classify(ratios[-1]))
# Hand trace (start=10, sizes 10,20,40):
# linear: ops 10,20,40 -> ratios [2.0, 2.0] -> nearest 2 -> Theta(n)
# quadratic: ops 100,400,1600 -> ratios [4.0, 4.0] -> nearest 4 -> Theta(n^2)
# cubic: ops 1000,8000,64000 -> ratios [8.0, 8.0] -> nearest 8 -> Theta(n^3)
# Expected output:
# linear [2.0, 2.0] Theta(n)
# quadratic [4.0, 4.0] Theta(n^2)
# cubic [8.0, 8.0] Theta(n^3)
Let us hand-verify the cubic line, since it is the most error-prone. cubic_run on size $n$ returns
$n^3$. The sizes $10, 20, 40$ give op counts $1000, 8000, 64000$. The ratios are $8000/1000 = 8.0$ and
$64000/8000 = 8.0$ — pinned at $8 = 2^3$, exactly the §14.5 prediction for $\Theta(n^3)$, and classify
reports it. The tool works: each algorithm's growth class falls out of the doubling numbers, no timing
and no hidden constants required.
🚪 Threshold Concept. You have just built a machine that measures growth class — an abstract, machine-independent property — using nothing but counting and division. This is the §14.3 reframing made concrete: you stopped asking "how many milliseconds?" and built a tool that asks "how does cost scale?", and that question has a clean numerical answer ($1, 2, 4, 8, \dots$) that no hardware change can disturb. Seeing the hierarchy emerge from your own code is the moment asymptotics stops being notation and becomes a lens.
Phase 5: Where the instrument lies — and why you still need proof
A good engineer knows their tool's failure modes. The fingerprinter measures; it does not prove. Three honest limitations, each a chapter lesson:
-
It can be fooled by small inputs. At $n = 10$ a $\Theta(n \log n)$ algorithm and a $\Theta(n)$ algorithm produce nearly identical ratios; the $\log n$ factor only separates at large $n$ (§14.5: the hierarchy is about eventual growth). Run the ladder too low and
classifymislabels. -
It cannot distinguish $\Theta(n)$ from $\Theta(n \log n)$ cleanly at all. Doubling $n$ multiplies $n \log n$ work by $2 \cdot \frac{\log 2n}{\log n} = 2(1 + \frac{1}{\log n})$, which approaches $2$ from above. The ratio for linearithmic drifts toward $2.0$ but never quite the flat $2.0$ of linear — a difference the crude
classifyrounds away. -
A passing fingerprint is evidence, not a guarantee (theme four). The tool sampled a handful of sizes; the algorithm might change behavior at sizes you did not test, or on inputs your
make_inputnever builds. Only a proof from the §14.3 definition settles the growth for all $n$.
def linearithmic_run(data): # Theta(n log n): n passes of a halving inner loop
import math
n = len(data)
if n <= 1:
return n
inner = max(1, int(math.log2(n))) # ~ log2(n) inner steps per element
return n * inner
print(doubling_ratios(lambda n: list(range(n)), linearithmic_run, start=16, doublings=2))
# Hand trace (sizes 16,32,64): inner = log2 -> 4,5,6.
# ops: 16*4=64, 32*5=160, 64*6=384. ratios: 160/64=2.5, 384/160=2.4.
# Expected output:
# [2.5, 2.4]
The ratios $2.5, 2.4$ are between the linear $2.0$ and the quadratic $4.0$, and they are creeping
downward toward $2.0$ as $n$ grows — the signature of an extra $\log n$ factor. A human reading "ratios
a bit above 2, slowly falling" correctly infers $\Theta(n \log n)$; the crude classify would round
$2.4$ to "$\Theta(n)$" and be wrong. The measurement suggests; the §14.3 proof decides.
⚠️ Common Pitfall: Do not trust a single ratio. One ratio of $2.4$ is ambiguous; the trend across several doublings (is it flat? rising toward $4$? falling toward $2$?) is what carries the signal. This is the empirical analogue of the §14.5 rule that growth is about behavior as $n \to \infty$, not at one input.
Discussion Questions
- The fingerprinter cancels the hidden constant $c_2$ by taking a ratio. Show algebraically that for $T(n) = c_2 n^k$, the ratio $T(2n)/T(n)$ is exactly $2^k$ regardless of $c_2$. Why is this the property that makes the tool machine-independent (§14.2)?
classifysnaps to the nearest of $\{1, 2, 4, 8\}$. What target would you add to detect $\Theta(\sqrt{n})$, and what ratio should it look for? (Hint: doubling $n$ multiplies $\sqrt{n}$ by what factor?)- Phase 5 showed the tool cannot cleanly separate $\Theta(n)$ from $\Theta(n \log n)$. Propose a change to the experiment (not the algorithm) that would make the distinction visible. (Hint: §14.5 says the gap only opens up for large $n$.)
- Suppose the fingerprinter reports flat ratios of $4.0$ for a function the author claims is $\Theta(n \log n)$. Which of the two — the measurement or the claim — is more likely wrong, and how would you settle it definitively (theme four)?
- The tool counts operations you instrument by hand. What goes wrong if the algorithm contains a hidden
$\Theta(n)$ operation the author forgot to count — say a
list + [x]inside the loop (§14.2)? How would the fingerprint expose the oversight even though the source "looks" linear?
Your Turn: Extensions
- Option A (add $\Theta(\sqrt{n})$). Extend
classify's table with a $\sqrt{n}$ entry (target ratio $\sqrt{2} \approx 1.41$) and write an instrumentedsqrt_runwhose op count is $\lfloor\sqrt{n}\rfloor$. Hand-derive the ratios on sizes $100, 400, 1600$ (note these quadruple, so the size step is $\times 4$, and $\sqrt{\cdot}$ then doubles) and confirm the new label fires. - Option B (a confidence flag). Make
classifyreturn"ambiguous"when the observed ratio is more than, say, $0.4$ away from every target. Hand-trace it on the linearithmic ratios $2.5, 2.4$ from Phase 5 and show it now honestly refuses to guess. - Option C (fingerprint Case Study 1). Point the finished tool at the
mutuals_runinstrumentation from Case Study 1 (start=10, doublings=2). Predict the output, hand-derive it, and confirm the tool independently re-discovers the $\Theta(n^2)$ bug — then write one sentence explaining why you would still prove the bound from the §14.3 definition before reporting it to the team.
Key Takeaways
- Build the instrument and you internalize the theory. Each layer — op counter, size ladder, ratio reporter, classifier — is a piece of §14.4–14.5 turned executable; assembling them is how the doubling-ratio fingerprint becomes second nature.
- Ratios cancel constants, which is why they are machine-independent. $T(2n)/T(n) = 2^k$ for $\Theta(n^k)$ regardless of the hidden $c_2$ — the tool never needs to know how fast the machine is, only how the work scales (§14.2).
- The hierarchy is visible in the numbers $1, 2, 4, 8$. Logarithmic, linear, quadratic, and cubic algorithms announce themselves through their ratio signatures — the §14.5 hierarchy made tangible.
- Measurement suggests; proof decides. The fingerprinter is fooled by small inputs and cannot cleanly separate $n$ from $n \log n$; only a proof from the §14.3 definition settles the growth class for all $n$. Use both — that is theme four (§14.5, Project Checkpoint).