> — Donald E. Knuth, The Art of Computer Programming, Vol. 1
Prerequisites
- 9
- 11
Learning Objectives
- Define an algorithm precisely and distinguish an algorithm from a program that implements it.
- Count the work an algorithm does using the RAM model, ignoring machine-specific constants.
- State the formal definitions of Big-O, Big-Omega, and Big-Theta and prove membership from the definition.
- Analyze the running time of loops, nested loops, and common code patterns and express it asymptotically.
- Order the standard growth-rate hierarchy and explain why asymptotically larger terms dominate.
- Distinguish best-, worst-, and average-case analysis and prove a simple lower bound.
In This Chapter
- Overview
- Learning Paths
- 14.1 What is an algorithm?
- 14.2 Measuring work: the RAM model
- 14.3 Big-O, Big-Omega, and Big-Theta, formally
- 14.4 Analyzing loops and common patterns
- 14.5 The growth-rate hierarchy and why it dominates
- 14.6 Best, worst, and average case — and a first lower bound
- Project Checkpoint: an empirical complexity estimator
- Summary
- Spaced Review
- What's Next
Chapter 14: Algorithms and Complexity
"An algorithm must be seen to be believed." — Donald E. Knuth, The Art of Computer Programming, Vol. 1
Overview
You have almost certainly had this experience: a function works perfectly on your test data, you ship it, and then a customer feeds it ten million records and it grinds to a halt. The code was correct — it returned the right answer — but it was not fast enough, and on a large enough input the difference between "right" and "right but unusable" is the difference between a working product and an outage.
This chapter is about the second kind of correctness: not "does it return the right answer?" but "how does the cost of getting that answer grow as the input grows?" That question has a precise mathematical answer, and the language for it — Big-O notation — is so central to computer science that it appears in every algorithms course, every system-design interview, and every serious code review. When one engineer tells another "that's $O(n^2)$, can we get it to $O(n \log n)$?", they are speaking the language this chapter teaches.
Here is the key insight that makes the whole subject work. The exact running time of a program depends on a thousand irrelevant details: your CPU's clock speed, the compiler, whether the data was in cache, what else the machine was doing. None of that tells you whether the algorithm will scale. What matters is the shape of the growth — does doubling the input double the work, quadruple it, or square it? Asymptotic analysis throws away the irrelevant constants and keeps exactly the shape. It is abstraction (theme three of this book) applied to performance: working at the right level of detail to see what actually matters.
In this chapter, you will learn to:
- Say precisely what an algorithm is, and separate it from any one implementation.
- Count an algorithm's work in a machine-independent way using the RAM model.
- Read and write the three asymptotic notations — Big-O, Big-Omega, and Big-Theta — and prove a growth claim straight from the definition.
- Analyze loops and nested loops by turning them into sums (the payoff from Chapter 11).
- Rank the functions you will meet for the rest of your career, from constant to factorial, and explain why the ranking is the only thing that matters at scale.
- Tell best-, worst-, and average-case apart, and prove that no comparison-based search can beat a certain bound — your first lower-bound argument.
Learning Paths
🏃 Fast Track: If you already use Big-O fluently, skim 14.1–14.2, then read 14.3 carefully (most people use the notation without ever having proved a single membership — fix that here), and do 14.6 (best/worst/average and the first lower bound). Attempt the ⭐⭐ and ⭐⭐⭐ exercises.
📖 Standard Path: Read in order. Work every
🔄 Check Your Understandingbefore continuing, and in 14.3 try to write each Big-O proof yourself before reading ours — the definition only becomes yours once you have wielded the witnesses $c$ and $n_0$ with your own hands.🔬 Deep Dive: After the chapter, prove the asymptotic relationships in the exercise set (e.g. $\log(n!) = \Theta(n \log n)$, and that $\Theta$ is an equivalence relation on growth rates), and read ahead to how these recurrences get solved in Chapter 19.
14.1 What is an algorithm?
Start with something you do without thinking. Here is a recipe for finding the largest number in a list:
def maximum(values):
"""Return the largest element of a non-empty list."""
biggest = values[0]
for x in values:
if x > biggest:
biggest = x
return biggest
print(maximum([3, 7, 2, 9, 4]))
# Expected output:
# 9
You could describe this procedure in English ("keep a running champion; compare each element to it; update if you find something bigger"), in Python as above, in C, or as a flowchart. All four describe the same algorithm. The Python function is one implementation of it; the algorithm is the underlying idea, independent of language. This distinction matters: when we analyze complexity, we analyze the algorithm, so our conclusions hold no matter what language you write it in.
What separates an algorithm from a vague set of instructions? Four properties, which together make a procedure something a machine can carry out and we can reason about.
Definition (algorithm). An algorithm is a finite sequence of precise, unambiguous instructions for solving a problem or computing a function, such that:
- Definiteness — each step is exactly specified, with no ambiguity about what to do next.
- Finiteness — for every valid input, the procedure halts after finitely many steps.
- Effectiveness — each step is basic enough to be carried out exactly, in finite time.
- Input/output — it takes zero or more inputs from a specified set and produces one or more outputs that stand in a specified relation to the inputs.
💡 Intuition: Think of "definiteness" as no judgment calls — a step like "pick a good pivot" is not algorithmic until "good" is pinned down. "Finiteness" is what separates an algorithm from a server loop that is supposed to run forever: an algorithm must eventually stop and answer. We will see in Chapter 37 that there is a hard limit on what algorithms can do efficiently, and the very definition of "efficiently" is what this chapter builds.
Notice that maximum satisfies all four: each line is definite, the loop runs exactly len(values)
times so it is finite, every operation (comparison, assignment) is effective, and it maps a list to its
largest element. The word "algorithm" honors the 9th-century mathematician al-Khwārizmī, whose name,
Latinized, also gives us "algebra" from the title of his book.
🔗 Connection. An algorithm computes a function in the precise sense of Chapter 9: it maps each input in a domain to a unique output in a codomain. "Sorting" is the function that maps a list to its ordered permutation; merge sort and quicksort are two different algorithms computing that one function. Keeping the function (the what) separate from the algorithm (the how) is the same level-of-abstraction discipline you practiced when you separated a function from its rule.
🔄 Check Your Understanding 1. Is "shuffle the deck until it happens to be sorted" an algorithm for sorting? Which of the four properties is in danger? 2. Are
maximumwritten in Python and the same logic written in C "the same algorithm" or "different algorithms"? Why does the answer matter for complexity analysis?
Answers
- It endangers finiteness: there is no guarantee it ever halts (a sufficiently unlucky run could reshuffle forever). A bounded-attempts version would be finite but might fail to sort, breaking the input/output property instead. 2. They are the same algorithm with two implementations. This matters because complexity analysis is about the algorithm; if it depended on the language, every conclusion would have to be re-derived per language, which would make the whole theory useless.
14.2 Measuring work: the RAM model
We want to measure how much work an algorithm does. The naive idea — measure wall-clock time with a stopwatch — fails immediately, because the same code runs at different speeds on different machines, on different days, with a warm or cold cache. A measurement that changes when you upgrade your laptop tells you nothing about the algorithm.
The fix is to count operations instead of seconds, against an idealized machine simple enough to reason about but realistic enough to predict real behavior. That machine is the RAM model.
Definition (RAM model, informally). In the Random Access Memory model of computation, we assume: memory is an array of cells, each cell holds an integer, and each primitive operation — reading or writing one cell, an arithmetic operation ($+, -, \times, \div$), a comparison, a branch — costs exactly one unit of time, regardless of the values involved. We count the total number of primitive operations as a function of the input size $n$.
Two modeling decisions are baked in, and it is honest to name them. First, "input size" $n$ must be defined per problem: for a list it is the number of elements; for an integer it is usually the number of bits; for a graph it is the vertex and edge counts. Second, the unit-cost assumption is a simplification — multiplying two thousand-digit numbers is not really one step — but for the integer sizes in most programs it predicts real performance remarkably well. (When numbers grow without bound, as in cryptography, we switch to counting bit operations; you will see this in Part IV.)
Let us count operations for maximum on a list of length $n$.
biggest = values[0]— one read, one write: a constant amount of work, call it $c_1$.- The loop runs $n$ times. Each pass does a comparison
x > biggestand possibly an assignment, plus the loop bookkeeping — a constant amount of work per pass, call it $c_2$. return biggest— constant work $c_3$.
So the total is $T(n) = c_2 n + (c_1 + c_3)$. The exact constants $c_1, c_2, c_3$ depend on the machine and we do not know them — but we do not need them, and the next section explains why. What we can already see is the shape: $T(n)$ is a linear function of $n$. Double the list and you roughly double the work. That shape is the durable fact.
💡 Intuition: Constants like $c_2$ are exactly the machine-specific noise we want to discard. Reporting "$T(n) = c_2 n + (c_1 + c_3)$" buries the one fact that survives a hardware upgrade — that the growth is linear — under details that do not. Asymptotic notation, next, is the mathematics of keeping the shape and throwing away the noise.
⚠️ Common Pitfall: Not every "one line" of code is one operation.
new_list = old_list + [x]looks like one step but copies the whole list — that single line costs $O(n)$, not $O(1)$. Likewisex in some_listscans the list ($O(n)$) whilex in some_setis $O(1)$ on average. Counting operations means counting what the machine actually does, not how many lines you typed.🔄 Check Your Understanding 1. Why do we count operations rather than measure seconds? 2. In the RAM model, what is the cost of
total = total + values[i]inside a loop body, per pass? 3. For an algorithm that factors an integer $N$, would you take the input size to be $N$ or $\log_2 N$? Why does it matter?
Answers
- Seconds are machine-, compiler-, and cache-dependent; operation counts are a property of the algorithm and therefore portable. 2. Constant — one array read, one addition, one write: $O(1)$ per pass. 3. The size of an integer input is the number of bits, $\log_2 N$ (the integer is written in that many digits). It matters enormously: an algorithm that does $N$ steps is exponential in the input size $n = \log_2 N$, because $N = 2^n$. This is exactly why factoring is considered hard and why RSA (Chapter 25) is secure.
14.3 Big-O, Big-Omega, and Big-Theta, formally
We have a count like $T(n) = c_2 n + (c_1 + c_3)$ with unknown constants and want to say, precisely, "this grows linearly." Asymptotic notation is the tool. The word asymptotic means we care about behavior as $n \to \infty$ — for large inputs — and we deliberately ignore both constant factors and the behavior on small inputs. Three notations capture three different claims: an upper bound, a lower bound, and a tight bound.
Big-O: an asymptotic upper bound
Definition (Big-O). Let $f, g : \mathbb{N} \to \mathbb{R}_{\ge 0}$. We say $f(n) = O(g(n))$ (read "$f$ is big-O of $g$") if there exist constants $c > 0$ and $n_0 \in \mathbb{N}$ such that $$f(n) \le c \cdot g(n) \quad \text{for all } n \ge n_0.$$
In words: past some starting point $n_0$, the function $f$ is bounded above by a constant multiple of $g$. The pair $(c, n_0)$ are the witnesses — produce them and you have proved the claim. Big-O is a ceiling: it promises $f$ grows no faster than $g$.
💡 Intuition: Read $f(n) = O(g(n))$ as "$f$ grows at most like $g$, ignoring constant factors and small inputs." The "$=$" is a historical abuse of notation — it really means "is a member of the set of functions bounded by $g$." Some authors write $f(n) \in O(g(n))$, which is more honest; the "$=$" form is universal in practice, so we use it, but never read it as symmetric: $O(n) = f(n)$ is meaningless.
Let us prove our first membership from the definition. This is the move most students never make, and it is what turns Big-O from a vague feeling into mathematics.
Claim. $3n + 7 = O(n)$.
The strategy first. To prove a Big-O claim we must exhibit witnesses $c$ and $n_0$ and then verify the inequality for all $n \ge n_0$. The art is choosing $c$ a little larger than the leading coefficient so the lower-order terms get absorbed. Here the leading coefficient is 3; we will pick $c = 4$, leaving a comfortable margin of $1 \cdot n$ to swallow the constant $7$, which it does as soon as $n \ge 7$.
Proof. Choose $c = 4$ and $n_0 = 7$. For every $n \ge 7$ we have $7 \le n$, and therefore $$3n + 7 \le 3n + n = 4n = c \cdot n.$$ Thus $3n + 7 \le c\,n$ for all $n \ge n_0$, which is exactly the definition of $3n + 7 = O(n)$ with witnesses $c = 4$, $n_0 = 7$. $\blacksquare$
The witnesses are not unique. We could equally have taken $c = 10, n_0 = 1$ (since $3n + 7 \le 10n$ for all $n \ge 1$). Any valid pair proves the claim — you only need one.
⚠️ Common Pitfall: $O(g)$ is an upper bound, so technically true upper bounds can be loose: $3n + 7 = O(n^2)$ is a correct statement (linear growth is certainly no faster than quadratic), it is just not the tightest one. When someone says "the running time is $O(n^2)$" they usually mean it is tight, but the notation alone does not promise that — only $\Theta$ (below) does. Saying an $O(n)$ algorithm "is $O(n^2)$" is not wrong; it is uninformative.
Big-Omega: an asymptotic lower bound
The mirror image of Big-O bounds a function from below.
Definition (Big-Omega). $f(n) = \Omega(g(n))$ (read "$f$ is big-omega of $g$") if there exist constants $c > 0$ and $n_0 \in \mathbb{N}$ such that $$f(n) \ge c \cdot g(n) \quad \text{for all } n \ge n_0.$$
Big-Omega is a floor: $f$ grows at least as fast as $g$. Where Big-O is the natural language for "this algorithm is no slower than…", Big-Omega is the language for "this problem cannot be solved faster than…" — which is how we state lower bounds (section 14.6).
Claim. $3n + 7 = \Omega(n)$.
The strategy first. Now we need $f$ to exceed a multiple of $g$. Since $3n + 7$ is at least $3n$ for every $n \ge 0$ (the $+7$ only helps), the coefficient $c = 3$ works immediately.
Proof. Choose $c = 3$ and $n_0 = 0$. For every $n \ge 0$, $$3n + 7 \ge 3n = c \cdot n,$$ which is the definition of $3n + 7 = \Omega(n)$. $\blacksquare$
Big-Theta: a tight bound
When a function is both $O(g)$ and $\Omega(g)$ — squeezed between two constant multiples of $g$ — we have pinned down its growth exactly.
Definition (Big-Theta). $f(n) = \Theta(g(n))$ (read "$f$ is big-theta of $g$") if $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$. Equivalently, there exist constants $c_1, c_2 > 0$ and $n_0$ with $$c_1 \cdot g(n) \le f(n) \le c_2 \cdot g(n) \quad \text{for all } n \ge n_0.$$
Big-Theta is the tight bound, and it is what we usually want: it says $f$ grows exactly like $g$ up to constant factors. Combining the two proofs above:
Claim. $3n + 7 = \Theta(n)$.
Proof. By the first proof, $3n + 7 = O(n)$ (witnesses $c_2 = 4, n_0 = 7$). By the second, $3n + 7 = \Omega(n)$ (witnesses $c_1 = 3, n_0 = 0$). Taking the larger threshold $n_0 = 7$, both inequalities hold simultaneously: $3n \le 3n + 7 \le 4n$ for all $n \ge 7$. Hence $3n + 7 = \Theta(n)$. $\blacksquare$
🚪 Threshold Concept. Once you see growth as a function up to constant factors, you stop asking "how many milliseconds?" and start asking "how does cost scale?" — and that question has a machine-independent, future-proof answer. This reframing is why a single $\Theta$ class (say, $\Theta(n \log n)$ for comparison sorting) can describe an algorithm that will still be the right choice on hardware not yet invented. Asymptotics is the rare part of CS that does not go out of date.
Here is a small but important fact you will prove in the exercises, stated now because we use it constantly: in a polynomial, only the leading term matters.
Theorem (leading term dominates). If $f(n) = a_d n^d + a_{d-1} n^{d-1} + \dots + a_1 n + a_0$ with $a_d > 0$, then $f(n) = \Theta(n^d)$.
The strategy first. For the upper bound, bound every term by the leading power: each $|a_i| n^i \le |a_i| n^d$ once $n \ge 1$, so the whole sum is at most $(\sum_i |a_i|) n^d$ — that sum of absolute values is our constant $c_2$. For the lower bound, the leading term $a_d n^d$ eventually swamps the rest; past some $n_0$, $f(n) \ge \frac{a_d}{2} n^d$, giving $c_1 = a_d/2$.
Proof. Upper bound. For $n \ge 1$ and each $i \le d$, $n^i \le n^d$, so $$f(n) = \sum_{i=0}^{d} a_i n^i \le \sum_{i=0}^{d} |a_i| n^i \le \left(\sum_{i=0}^{d} |a_i|\right) n^d.$$ Set $c_2 = \sum_{i=0}^{d}|a_i|$; then $f(n) \le c_2 n^d$ for $n \ge 1$, so $f(n) = O(n^d)$.
Lower bound. Let $A = \sum_{i=0}^{d-1} |a_i|$ be the total weight of the lower terms. For any $n$, $$f(n) \ge a_d n^d - A n^{d-1} = n^{d-1}\big(a_d n - A\big).$$ Choose $n_0 = \max\{1, \lceil 2A / a_d \rceil\}$. Then for $n \ge n_0$ we have $a_d n \ge 2A$, so $a_d n - A \ge \tfrac{a_d}{2} n$, and therefore $$f(n) \ge n^{d-1} \cdot \tfrac{a_d}{2} n = \tfrac{a_d}{2}\, n^d.$$ Set $c_1 = a_d / 2$; then $f(n) \ge c_1 n^d$ for $n \ge n_0$, so $f(n) = \Omega(n^d)$.
Both bounds hold, so $f(n) = \Theta(n^d)$. $\blacksquare$
This theorem is why we write $\Theta(n^2)$ for $5n^2 + 3n + 100$ without a second thought, and why constants and lower-order terms simply vanish from complexity statements.
🔄 Check Your Understanding 1. Prove $n^2 + 5n = O(n^2)$ by giving explicit witnesses $c$ and $n_0$. 2. True or false: $n = O(n^2)$. And: $n^2 = O(n)$. Justify each. 3. If $f(n) = \Theta(g(n))$, must $g(n) = \Theta(f(n))$? What about with $O$?
Answers
- Take $c = 6, n_0 = 1$: for $n \ge 1$, $5n \le 5n^2$, so $n^2 + 5n \le 6n^2$. (Or $c=2, n_0=5$.)
- $n = O(n^2)$ is true (linear is bounded above by quadratic: $n \le n^2$ for $n \ge 1$). $n^2 = O(n)$ is false: no constant $c$ makes $n^2 \le cn$ for all large $n$, since that would force $n \le c$, which fails once $n > c$. 3. $\Theta$ is symmetric ($f = \Theta(g)$ iff $g = \Theta(f)$ — swap the roles of the two constants), so yes. $O$ is not symmetric: $n = O(n^2)$ but $n^2 \ne O(n)$.
14.4 Analyzing loops and common patterns
You rarely prove Big-O from the definition in daily work. Instead you read it off the code's structure, using a handful of rules that all reduce to the definition. The central technique is the one Chapter 11 set up: a loop's cost is a sum, and you evaluate the sum.
Sequential code: add, then keep the max
If a block runs work $A$ and then work $B$, the total is $A + B$. Asymptotically, the sum of two terms is dominated by the larger:
Sum rule. If $f_1 = O(g_1)$ and $f_2 = O(g_2)$, then $f_1 + f_2 = O(\max(g_1, g_2))$.
So a routine that does an $O(n)$ pass followed by an $O(n^2)$ pass is $O(n^2)$ overall — the cheaper phase disappears. This is why optimizing the linear part of a quadratic algorithm is wasted effort.
A single loop: multiply by the iteration count
A loop that runs $n$ times, doing constant work per pass, costs $\sum_{i=1}^{n} c = cn = \Theta(n)$.
That is maximum from section 14.1: a single pass, $\Theta(n)$.
Nested loops: a double sum
Nested loops multiply — but you must be careful, because the inner bound often depends on the outer index. Consider:
def count_pairs(values):
"""Count index pairs (i, j) with i < j and values[i] + values[j] == 0."""
n = len(values)
count = 0
for i in range(n):
for j in range(i + 1, n): # inner range shrinks as i grows
if values[i] + values[j] == 0:
count += 1
return count
print(count_pairs([1, -1, 2, -2, 0]))
# Expected output:
# 2
How many times does the if execute? For a fixed $i$, the inner loop runs from $i+1$ to $n-1$, which
is $n - 1 - i$ times. Summing over all $i$ turns the analysis into exactly the kind of series you
evaluated in Chapter 11:
$$\sum_{i=0}^{n-1} (n - 1 - i) = \sum_{j=0}^{n-1} j = \frac{(n-1)n}{2} = \frac{n^2 - n}{2}.$$
(We re-indexed with $j = n - 1 - i$, which runs $0, 1, \dots, n-1$.) By the leading-term theorem,
$\frac{n^2 - n}{2} = \Theta(n^2)$. So count_pairs is quadratic — even though the inner loop "only"
runs over the remaining elements, the total is still $\Theta(n^2)$, half of the full $n^2$ but with
the same shape.
⚠️ Common Pitfall: "Two nested loops means $O(n^2)$" is a guess, not a rule. If the inner loop runs a constant number of times, the pair is $O(n)$; if the inner bound depends on the outer index, you must do the sum (as above); and if the inner loop's size depends on a different variable $m$, the answer is $O(nm)$, not $O(n^2)$. Always identify what the inner loop's trip count actually is.
Halving loops: the logarithm appears
What about a loop that does not step by 1, but halves the range each time?
def binary_search(sorted_values, target):
"""Return an index of target in a sorted list, or -1 if absent."""
lo, hi = 0, len(sorted_values) - 1
while lo <= hi:
mid = (lo + hi) // 2
if sorted_values[mid] == target:
return mid
elif sorted_values[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1
print(binary_search([2, 4, 6, 8, 10, 12], 10))
# Expected output:
# 4
Each iteration discards half the remaining range. Starting from $n$ elements, the sizes go $n, n/2, n/4, \dots, 1$, and the loop stops when the range hits size 1 (or empty). How many halvings does that take? We need the smallest $k$ with $n / 2^k \le 1$, i.e. $2^k \ge n$, i.e. $k \ge \log_2 n$. So the loop runs at most $\lceil \log_2 n \rceil + 1$ times, each pass doing constant work. Binary search is $O(\log n)$.
💡 Intuition: Whenever a quantity is repeatedly divided by a constant factor until it reaches a floor, the number of steps is logarithmic. "Repeated halving $\Rightarrow$ $\log$" is one of the most useful pattern-matches in algorithm analysis — it is why balanced trees, binary search, and divide-and-conquer all carry a $\log n$ factor. The base of the log does not matter asymptotically, because $\log_a n$ and $\log_b n$ differ only by the constant factor $\log_a b$ (proved in the exercises) — so we write $O(\log n)$ with no base.
🔗 Connection. The "divide the input in half and recurse" pattern produces a recurrence for the running time — for binary search, $T(n) = T(n/2) + O(1)$. We have analyzed it here by direct counting, but Chapter 19 gives the general machinery (the recursion-tree method and the Master Theorem) that solves $T(n) = aT(n/b) + f(n)$ in one stroke, which is how merge sort's $O(n \log n)$ falls out. This chapter is the prerequisite for that one.
A worked synthesis
Let us combine the rules. What is the running time of this?
def has_duplicate_slow(values): # n = len(values)
n = len(values)
for i in range(n): # outer: n iterations
for j in range(n): # inner: n iterations
if i != j and values[i] == values[j]:
return True
return False
The outer loop runs $n$ times; for each, the inner loop runs $n$ times doing $O(1)$ work, so the body executes up to $n \cdot n = n^2$ times: this is $O(n^2)$. (And it is $\Omega(n^2)$ in the worst case — when there is no duplicate, both loops run fully — so it is $\Theta(n^2)$ in the worst case.) Contrast with the idea of using a set: scan once, remembering what you have seen, and you get $O(n)$ on average. Same function, two algorithms, a quadratic gap in cost. That gap is the entire reason this chapter exists.
🔄 Check Your Understanding 1. A loop runs
for i in range(n)and the body doesprint(i)plus onefor j in range(10)inner loop. What is the asymptotic running time? 2. Give the running time of a loopwhile n > 1: n = n // 3in terms of the starting value $n$. 3. Two routines run one after another: the first is $\Theta(n \log n)$, the second is $\Theta(n^2)$. What is the total, and which sum-rule fact did you use?
Answers
- $\Theta(n)$. The inner loop runs a constant 10 times, so each outer pass is $O(1)$ (i.e., $O(10) = O(1)$), and $n$ passes give $\Theta(n)$ — not $\Theta(n^2)$. 2. $\Theta(\log n)$: $n$ is divided by 3 each pass, so it takes about $\log_3 n$ steps to reach 1, and $\log_3 n = \Theta(\log n)$.
- $\Theta(n^2)$, by the sum rule $f_1 + f_2 = \Theta(\max(g_1, g_2))$ — the quadratic term dominates the $n \log n$ term, which therefore drops out.
14.5 The growth-rate hierarchy and why it dominates
Every running time you will meet for years falls into a short list of growth classes. Knowing their order by heart is one of the highest-leverage facts in computer science, because it lets you compare two algorithms at a glance.
From slowest-growing (best) to fastest-growing (worst), the standard hierarchy is:
$$\Theta(1) \prec \Theta(\log n) \prec \Theta(\sqrt{n}) \prec \Theta(n) \prec \Theta(n \log n) \prec \Theta(n^2) \prec \Theta(n^3) \prec \Theta(2^n) \prec \Theta(n!)$$
where $f \prec g$ means $f(n) = O(g(n))$ but $g(n) \ne O(f(n))$ — $g$ genuinely outgrows $f$. Each class has a name and a characteristic feel:
| Class | Name | Feels like | Example algorithm |
|---|---|---|---|
| $\Theta(1)$ | constant | instant, input-independent | array index, hash lookup (avg) |
| $\Theta(\log n)$ | logarithmic | scales to billions effortlessly | binary search |
| $\Theta(n)$ | linear | proportional; the gold standard for "must read all input" | finding a max, summing a list |
| $\Theta(n \log n)$ | linearithmic | the best general comparison sort | merge sort, heapsort |
| $\Theta(n^2)$ | quadratic | painful past ~$10^4$ | bubble sort, all-pairs of a list |
| $\Theta(n^3)$ | cubic | painful past ~$10^3$ | naive matrix multiply |
| $\Theta(2^n)$ | exponential | hopeless past ~40 | all subsets, naive Fibonacci |
| $\Theta(n!)$ | factorial | hopeless past ~12 | all permutations, brute-force TSP |
The right column splits into two camps with very different fates as $n$ grows, and the distinction is so important it has its own vocabulary.
Definition (polynomial vs. exponential time). An algorithm runs in polynomial time if its running time is $O(n^k)$ for some constant $k$ (so $\Theta(1)$ through $\Theta(n^3)$ and beyond, as long as the exponent is fixed). It runs in exponential time if its running time is $\Omega(c^n)$ for some constant $c > 1$ (so $\Theta(2^n)$, $\Theta(n!)$ — note $n! \ge 2^{n-1}$, which you will prove — grow exponentially or faster).
This is the dividing line in the theory of computation. Polynomial-time algorithms are considered tractable ("feasible"); exponential-time algorithms are intractable for all but tiny inputs. The reason is brutal arithmetic. Suppose each operation takes one nanosecond:
| $n$ | $n^2$ | $n^3$ | $2^n$ | $n!$ |
|---|---|---|---|---|
| 10 | 100 ns | 1 µs | ~1 µs | ~3.6 ms |
| 20 | 400 ns | 8 µs | ~1 ms | ~77 years |
| 50 | 2.5 µs | 125 µs | ~13 days | beyond cosmic scales |
| 100 | 10 µs | 1 ms | ~$4 \times 10^{13}$ years | — |
Look at the $2^n$ column: at $n = 50$ an exponential algorithm already needs days, and at $n = 100$ it needs vastly longer than the age of the universe. No hardware improvement rescues this. If your computer gets a thousand times faster, an $O(2^n)$ algorithm can handle an input about ten larger (since $2^{10} \approx 1000$) — essentially no help. A polynomial algorithm, by contrast, gains a constant factor on the input size. This is why "is there a polynomial-time algorithm?" is the question that organizes the entire field, and the question Chapter 37 is built around.
💡 Intuition: The hierarchy is about growth, not value at small $n$. For tiny inputs an $O(2^n)$ algorithm can beat an $O(n)$ one (a hidden constant might make the linear one slower at $n = 3$). Asymptotics ignores that on purpose: it answers "who wins eventually", because "eventually" is where scaling problems live. An algorithm that is faster only on inputs you will never see is not faster in any way that matters.
Two facts make the hierarchy easy to use without re-deriving it:
- Polynomials beat logs; exponentials beat polynomials. For any $k$ and any $\varepsilon > 0$: $\log n = O(n^\varepsilon)$ and $n^k = O(2^n)$. (Logs grow slower than any positive power of $n$; every polynomial is eventually crushed by any exponential.)
- Within polynomials, the exponent decides; within exponentials, the base decides. $n^2 \prec n^3$, and $2^n \prec 3^n$.
⚠️ Common Pitfall: $2^n$ and $n^2$ look similar and are routinely confused — they could not be more different. $n^2$ is polynomial (tractable); $2^n$ is exponential (intractable). At $n = 30$, $n^2 = 900$ but $2^n \approx 10^9$ — a factor of a million. Read the variable's position: in the exponent means exponential.
🔄 Check Your Understanding 1. Order these from slowest- to fastest-growing: $n^2$, $\log n$, $2^n$, $n \log n$, $n$, $n!$. 2. Your boss says "the new server is 100× faster, so our $O(2^n)$ planner can handle much bigger inputs now." Gently correct this. 3. Is an $O(n^{100})$ algorithm polynomial or exponential? Is it "fast"? Are those the same question?
Answers
- $\log n \prec n \prec n \log n \prec n^2 \prec 2^n \prec n!$. 2. A 100× speedup lets an exponential algorithm handle an input only about $\log_2 100 \approx 6.6$ larger — i.e. roughly 6 more elements, essentially nothing. Exponential growth defeats constant-factor hardware gains; you need a better algorithm, not a faster machine. 3. It is polynomial (the exponent 100 is a constant), but it is certainly not "fast" in practice. "Polynomial" and "fast" are not the same question — polynomial is the theoretical tractability line; real-world speed also depends on the exponent and the constants. In practice, almost all naturally occurring polynomial algorithms have small exponents, which is what makes the line useful.
14.6 Best, worst, and average case — and a first lower bound
For some algorithms the cost depends not just on the input size but on the input's content. Linear search is the classic example: looking for a value in an unsorted list, you might get lucky and find it first, or unlucky and scan the whole list.
def linear_search(values, target):
"""Return the index of target, or -1 if absent."""
for i in range(len(values)):
if values[i] == target:
return i
return -1
print(linear_search([5, 3, 8, 1], 8))
# Expected output:
# 2
On a list of length $n$:
- Best case: the target is at index 0 — one comparison. $\Theta(1)$.
- Worst case: the target is last, or absent — all $n$ comparisons. $\Theta(n)$.
- Average case: assuming the target is present and equally likely to be at any position, the expected number of comparisons is $\frac{1}{n}\sum_{i=1}^{n} i = \frac{n+1}{2}$, which is $\Theta(n)$.
These are three different functions of $n$, and they answer three different questions, so naming which one you mean is not optional.
Definition (best, worst, average case). For inputs of size $n$, let $T(x)$ be the cost on a specific input $x$.
- The worst-case running time is $\max_{|x| = n} T(x)$ — the most work over all inputs of size $n$. It is a guarantee: the algorithm never does worse.
- The best-case running time is $\min_{|x| = n} T(x)$ — the least work over inputs of size $n$. (Usually the least useful measure.)
- The average-case running time is the expected value of $T(x)$ over a stated probability distribution on size-$n$ inputs. It depends entirely on which distribution you assume.
💡 Intuition: Worst case is the default in computer science because it is a promise you can keep. "Average case is fine" is cold comfort to the one user who hits the pathological input — and attackers deliberately construct worst-case inputs (this is exactly how hash-collision denial-of-service attacks work, forcing an average-$O(1)$ hash table into its $O(n)$ worst case). When unqualified, "the complexity of this algorithm" means worst case.
⚠️ Common Pitfall: Best/worst/average case and Big-O/Ω/Θ are independent axes, and conflating them is one of the most common errors. "Worst case" is a question (which input?); "$O$" is an answer shape (upper bound). You can sensibly say "the worst-case time is $O(n)$" or "the best-case time is $\Omega(1)$." Saying "$O$ means worst case" is a category error — you can put an upper bound, a lower bound, or a tight bound on any of the three cases.
A first lower bound
Everything so far has bounded specific algorithms. The deeper question is whether a better algorithm could exist — a bound on the problem itself. This is where Big-Omega earns its keep, and where the reasoning flips: instead of analyzing one algorithm, we argue about all possible algorithms at once.
Here is the cleanest example.
Claim. Any algorithm that finds the maximum of an unsorted list of $n$ distinct numbers, using only comparisons between elements, must make at least $n - 1$ comparisons in the worst case. That is, the problem is $\Omega(n)$.
The strategy first. We use an adversary-style counting argument. Think of each element as a contestant; the answer (the maximum) is the unique contestant who never loses. For an element to be ruled out as the maximum, it must lose at least one comparison — and a single comparison produces exactly one loser. So we count losers: we need everyone except the champion to have lost at least once, and each comparison creates at most one new loser.
Proof. Consider any comparison-based algorithm that correctly identifies the maximum, and run it on an input of $n$ distinct numbers. Call an element a loser once it has been on the smaller side of some comparison the algorithm performed. The algorithm can only declare an element the maximum if every other element has been shown to be smaller than something — for if some element $y \ne \text{answer}$ had never lost a comparison, the algorithm could not distinguish the actual input from one in which $y$ is the true maximum, and would be wrong on one of them. Hence at the end, all $n - 1$ non-maximum elements must be losers.
Now count. Each comparison compares two elements and (since the numbers are distinct) produces exactly one loser. Therefore each comparison can turn at most one element into a loser for the first time. To produce $n - 1$ distinct first-time losers requires at least $n - 1$ comparisons. Thus the algorithm makes at least $n - 1$ comparisons in the worst case, so finding the maximum is $\Omega(n)$. $\blacksquare$
Since maximum from section 14.1 uses exactly $n - 1$ comparisons, it is $O(n)$, and the lower bound
is $\Omega(n)$; the two meet, so the problem of finding a maximum is $\Theta(n)$ — and maximum is
optimal. You cannot do asymptotically better. That is a far stronger statement than "my algorithm
is fast": it is "no algorithm is faster."
🔗 Connection. This style of argument — proving a task requires a certain amount of work, no matter how clever you are — is the seed of complexity theory. In Chapter 37 the same instinct, scaled up, becomes the theory of P, NP, and NP-completeness, where the deepest open question in computer science (does P = NP?) is precisely a question about whether certain problems have lower bounds that rule out fast algorithms. Lower bounds are how we prove that a problem is genuinely hard, not just that we have not been clever enough yet.
🔄 Check Your Understanding 1. For binary search on a sorted list of $n$ elements, what are the best- and worst-case running times? 2. Why is worst-case analysis usually preferred to average-case in systems that must be reliable? 3. The max-finding lower bound counts "losers." Why is it essential that the numbers are distinct?
Answers
- Best case $\Theta(1)$ (the target is the first midpoint checked); worst case $\Theta(\log n)$ (the target is absent or found only after fully halving down to one element). 2. Worst case is a guarantee the system can never violate; average case can be defeated by an unlucky or adversarial input, which is unacceptable when reliability or security is at stake. 3. With ties, a single comparison
a < bcould be false because $a = b$, which does not cleanly produce one loser; distinctness guarantees every comparison has a definite loser, so "$\ge n-1$ comparisons" follows from "$\ge n-1$ losers." The bound still holds with ties, but the clean counting argument needs distinctness.
Project Checkpoint: an empirical complexity estimator
The recurring fourth theme of this book is that computation and proof are complementary: a proof settles the asymptotic growth for all $n$, while running the code on a few sizes gives you empirical evidence you can sanity-check the proof against. This chapter is the natural place to add a tool that does the empirical half — it counts operations as the input grows and reports the ratio of successive counts, which reveals the growth class without any timing at all.
The trick: if $T(n) = \Theta(n^k)$, then doubling $n$ multiplies the work by about $2^k$. So a routine that is linear shows a ratio near 2 when you double the input; quadratic shows a ratio near 4; cubic near 8. Watching the ratio settle is a cheap, machine-independent fingerprint of the growth class.
We will add this as a small standalone helper. Create dmtoolkit/complexity.py:
def count_ops(make_input, run, sizes):
"""For each n in sizes, build an input and count operations 'run' reports.
'run(data)' must return an integer operation count (instrument it yourself).
Returns a list of (n, ops, ratio_to_previous)."""
results = []
prev = None
for n in sizes:
ops = run(make_input(n))
ratio = None if prev is None else ops / prev
results.append((n, ops, ratio))
prev = ops
return results
def doubling_ratios(make_input, run, start=100, doublings=4):
"""Convenience: run count_ops on start, 2*start, 4*start, ... and
return just the ratios. A ratio near 2^k suggests Theta(n^k)."""
sizes = [start * (2 ** i) for i in range(doublings + 1)]
return [r for (_, _, r) in count_ops(make_input, run, sizes) if r is not None]
# A run() that counts the comparisons made by a quadratic pairwise scan:
def quadratic_run(data):
n = len(data); ops = 0
for i in range(n):
for j in range(n):
ops += 1 # count each inner step
return ops
print(doubling_ratios(lambda n: list(range(n)), quadratic_run, start=10, doublings=3))
# Expected output:
# [4.0, 4.0, 4.0]
Trace it by hand: quadratic_run on size $n$ returns exactly $n^2$. The sizes are
$10, 20, 40, 80$, giving op counts $100, 400, 1600, 6400$. The successive ratios are
$400/100 = 4.0$, $1600/400 = 4.0$, $6400/1600 = 4.0$. A ratio pinned at $4.0$ when you double the input
is the empirical signature of $\Theta(n^2)$ — exactly the growth class we proved for count_pairs in
section 14.4. Run it against a linear routine and the ratios approach $2.0$; against binary search they
approach $1$ (the work grows by only a constant each doubling, since $\log(2n) = \log n + 1$).
Toolkit so far:
logic.py(Chapters 1–3),recurrences.py(Chapter 6, growing in 18–19), and now a smallcomplexity.py. This estimator is your bridge between theory and measurement: when you prove a bound later in the book — the $O(n \log n)$ of merge sort (Chapter 19), the $O(V + E)$ of graph traversal (Chapter 28) — you can point this tool at your implementation and watch the ratios confirm it. When the measured shape and the proved shape disagree, you have found a bug in one of them, which is precisely the cross-check theme four promises.
Summary
Asymptotic analysis measures how an algorithm's cost grows with input size, discarding machine-specific constants and small-input behavior to keep only the durable shape of the growth.
The three notations (let $f, g : \mathbb{N} \to \mathbb{R}_{\ge 0}$):
| Notation | Meaning | Definition | Role |
|---|---|---|---|
| $f = O(g)$ | upper bound ("no faster than") | $\exists c, n_0:\ f(n) \le c\,g(n)$ for $n \ge n_0$ | bounding an algorithm above |
| $f = \Omega(g)$ | lower bound ("no slower than") | $\exists c, n_0:\ f(n) \ge c\,g(n)$ for $n \ge n_0$ | bounding a problem below |
| $f = \Theta(g)$ | tight bound ("exactly like") | $f = O(g)$ and $f = \Omega(g)$ | the precise growth class |
Key results and rules:
- To prove a Big-O/Ω claim, exhibit witnesses $c$ and $n_0$ and verify the inequality for all $n \ge n_0$. Witnesses are never unique; one valid pair suffices.
- Leading term dominates: a degree-$d$ polynomial is $\Theta(n^d)$; drop constants and lower-order terms.
- Sum rule: $f_1 + f_2 = \Theta(\max(g_1, g_2))$ — sequential phases are dominated by the costliest.
- Loop analysis = summation: a single $O(1)$-body loop of $n$ passes is $\Theta(n)$; nested loops give a (possibly index-dependent) double sum; repeated halving gives $\Theta(\log n)$ (base irrelevant).
The growth hierarchy (slowest- to fastest-growing): $$\Theta(1) \prec \Theta(\log n) \prec \Theta(n) \prec \Theta(n \log n) \prec \Theta(n^2) \prec \Theta(n^3) \prec \Theta(2^n) \prec \Theta(n!)$$
- Polynomial time ($O(n^k)$, fixed $k$) = tractable; exponential time ($\Omega(c^n)$, $c>1$) = intractable. The gap is not closed by faster hardware — only by a better algorithm.
- Logs grow slower than any positive power of $n$; every polynomial is eventually beaten by any exponential.
Cases vs. bounds (independent axes):
- Worst case = max over inputs of size $n$ (a guarantee — the default); best case = min; average case = expected value over a stated distribution.
- Any case can carry any bound: "worst-case time is $O(n)$" is well-formed; "$O$ = worst case" is a category error.
- A lower bound on a problem (e.g. finding a max needs $\ge n - 1$ comparisons, so it is $\Omega(n)$) proves no algorithm can do better; matching it with an algorithm proves the algorithm optimal.
Spaced Review
Retrieval keeps knowledge available. Answer from memory before checking.
- (Ch. 12) A relation $R$ on a set is reflexive, symmetric, and transitive when it is an equivalence relation. Argue that "$f$ and $g$ have the same growth rate," defined as $f = \Theta(g)$, is an equivalence relation on functions $\mathbb{N} \to \mathbb{R}_{>0}$. Which of the three properties corresponds to the symmetry of $\Theta$ we noted in 14.3?
- (Ch. 12) The relation $f \preceq g$ defined as "$f = O(g)$" is reflexive and transitive. Which property of an equivalence relation does it fail, and what kind of order-like structure does that make it closer to?
- (Ch. 9) An algorithm computes a function. Explain, using injective/surjective language, why two different algorithms (merge sort and quicksort) can compute the same function while having different running times.
- (Ch. 9) Binary search's step count is governed by $\lceil \log_2 n \rceil$. Which function from Chapter 9 is $\lceil \cdot \rceil$, and why must the count of iterations be an integer even though $\log_2 n$ generally is not?
Answers
- Reflexive: $f = \Theta(f)$ (take $c_1 = c_2 = 1$). Symmetric: if $f = \Theta(g)$ then $g = \Theta(f)$ (swap the constants — this is the symmetry noted in 14.3). Transitive: if $f = \Theta(g)$ and $g = \Theta(h)$, composing the constant bounds gives $f = \Theta(h)$. All three hold, so it is an equivalence relation; symmetry is the property highlighted in 14.3. 2. "$O$" fails symmetry ($n = O(n^2)$ but $n^2 \ne O(n)$); being reflexive and transitive (but not symmetric) makes it a preorder — an order-like structure (it behaves like $\le$ on growth rates, except ties are whole $\Theta$-classes rather than single functions). 3. Sorting is one function — it maps each input list to its unique sorted permutation (the same output for the same input). Merge sort and quicksort are two algorithms (two "rules") computing that single function; the function is fixed, the procedures differ, so they may have different costs. 4. $\lceil \cdot \rceil$ is the ceiling function (Chapter 9), mapping a real to the least integer $\ge$ it. The iteration count must be an integer because you cannot perform a fractional iteration; $\log_2 n$ tells you the ideal number of halvings, and ceiling rounds up to the actual whole-number count.
What's Next
You now have the language to describe how fast an algorithm is and to analyze loops by turning them into sums. But the most powerful algorithms are recursive — they call themselves on smaller inputs — and their running times obey recurrences like $T(n) = 2\,T(n/2) + n$ rather than simple sums. Solving such recurrences in closed form is what lets us prove that merge sort runs in $\Theta(n \log n)$ and compute Fibonacci in $O(\log n)$ time. Chapter 19 develops exactly this: the recursion-tree method and the Master Theorem, the general machinery for solving the divide-and-conquer recurrences that this chapter's binary-search analysis only hinted at. The asymptotic notation you mastered here is the language those solutions are written in.