Case Study: Auditing the Cost of a Nested-Loop Routine

DataField.Dev

Case Study: Auditing the Cost of a Nested-Loop Routine

"Premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." — Donald E. Knuth

Executive Summary

A teammate's batch job has started timing out, and the suspect is a single function full of nested loops. In this analysis-focused case study you will not rewrite the function — you will derive its exact step count as a formula in the input size $n$, by turning each loop into a summation and collapsing those sums in closed form. The payoff is the difference between "it feels slow" and "it does exactly $\frac{n^3 - n}{6}$ comparisons; here is the arithmetic, and here is why doubling $n$ octuples the time." This is the core move of §11.6, applied to code you will actually be asked to reason about in a performance review. Along the way you will see how a count (Chapter 16's territory) and a summation arrive at the same closed form, giving you two independent checks on your answer.

Skills applied

Translating a nested loop into a (possibly nested) summation (§11.2, §11.6).
Shifting the index of a sum to turn an unfamiliar sum into a known one (§11.2).
Applying the closed forms for $\sum i$ and $\sum i^2$ (§11.3, §11.4).
Reading off the growth rate from the leading term, previewing Big-$\Theta$ (Chapter 14).
Cross-checking a summation against a combinatorial count.

Background

The scenario

The function below is part of a data-cleaning pipeline. It scans a list of $n$ readings and, for every ordered triple of positions $(i, j, k)$ with $i < j < k$, runs a cheap constant-time check (here, just counting). The team uses it to flag suspicious patterns among triples of readings.

def count_triples(readings):
    n = len(readings)
    checks = 0
    for i in range(n):
        for j in range(i + 1, n):
            for k in range(j + 1, n):
                checks += 1          # one constant-time check per triple
    return checks

print(count_triples([10, 20, 30, 40]))   # n = 4
# Expected output:
# 4

For $n = 4$ readings the function reports $4$ checks. That seems small — but the job runs on lists of tens of thousands of readings, and the wall-clock time has been climbing alarmingly as the data grows. Before anyone proposes a fix, the right first question is: exactly how many times does checks += 1 run, as a function of $n$? Only with that formula in hand can the team predict the cost at scale and decide whether a rewrite is even warranted.

💡 Intuition: Each execution of checks += 1 corresponds to one choice of three distinct positions $i < j < k$ out of $n$. So the answer "ought to be" the number of ways to choose $3$ things from $n$ — the combination $\binom{n}{3}$ we will study formally in Chapter 16. Hold that thought: we will derive the same number by summation, and the agreement of the two routes is our correctness check.

Why this matters

"It's slow" is a feeling; "$T(n) = \frac{n(n-1)(n-2)}{6}$, which is $\Theta(n^3)$" is an engineering fact you can act on. With the closed form you can answer the questions a performance review actually asks: How long at $n = 10{,}000$? What happens when the data doubles? Is the bottleneck this function or something else? A cubic running time means doubling $n$ multiplies the work by roughly $2^3 = 8$ — which is almost certainly the source of the climbing wall-clock time. Deriving that takes ten minutes of summation algebra and saves a week of guessing.

Phase 1: One loop at a time — the innermost sum

Work from the inside out, exactly as the loops nest. Fix $i$ and $j$ for the moment. The innermost loop is

for k in range(j + 1, n):
    checks += 1

It runs for $k = j+1, j+2, \dots, n-1$, which is $n - 1 - j$ iterations (count them: from $j+1$ up to $n-1$ inclusive is $(n-1) - (j+1) + 1 = n - 1 - j$ values). So the innermost work, as a function of $j$, is

$$\text{inner}(j) = \sum_{k=j+1}^{n-1} 1 = n - 1 - j.$$

🔄 Check Your Understanding For $n = 4$, how many times does the innermost loop run when $j = 1$? When $j = 2$? When $j = 3$?

Answer

$\text{inner}(1) = 4 - 1 - 1 = 2$; $\text{inner}(2) = 4 - 1 - 2 = 1$; $\text{inner}(3) = 4 - 1 - 3 = 0$. (When $j = 3$ the range `range(4, 4)` is empty — an empty sum, value $0$, exactly the convention from §11.2.)

Phase 2: The middle sum

Now fix only $i$ and sum the innermost work over the middle loop, which runs for $j = i+1, \dots, n-1$:

$$\text{middle}(i) = \sum_{j=i+1}^{n-1} \text{inner}(j) = \sum_{j=i+1}^{n-1} (n - 1 - j).$$

This is not a sum we recognize on sight, so we use the standard move: shift the index to turn it into a plain sum of consecutive integers (§11.2).

The strategy first. Substitute $m = n - 1 - j$. As $j$ runs from $i+1$ up to $n-1$, the new variable $m$ runs from $n - 1 - (i+1) = n - i - 2$ down to $n - 1 - (n-1) = 0$, hitting every integer in between. The summand $n - 1 - j$ becomes simply $m$. So the sum is $\sum_{m=0}^{n-i-2} m$ — a triangular sum in disguise.

Carrying it out with the arithmetic-series closed form $\sum_{m=0}^{N} m = \frac{N(N+1)}{2}$, here with $N = n - i - 2$:

$$\text{middle}(i) = \sum_{j=i+1}^{n-1} (n-1-j) = \sum_{m=0}^{\,n-i-2} m = \frac{(n-i-2)(n-i-1)}{2}.$$

🐛 Find the Error. A teammate skips the index shift and writes $\text{middle}(i) = \sum_{j=i+1}^{n-1}(n-1-j) = (n-1)\cdot(\text{number of terms}) - \sum_{j=i+1}^{n-1} j$, then claims the result is $(n-1)(n-1-i) - \frac{(n-1)n}{2}$. Is the method sound, and is the answer right?

Answer

The method is sound — splitting $n-1-j$ into a constant part and a $-j$ part by linearity is fine. The answer is wrong because the second sum was taken over the wrong range: $\sum_{j=i+1}^{n-1} j$ runs from $i+1$, not from $1$, so it is not $\frac{(n-1)n}{2}$. The correct value is $\sum_{j=i+1}^{n-1} j = \frac{(n-1)n}{2} - \frac{i(i+1)}{2}$ (subtract the missing head $1 + \cdots + i$). Substituting that gives $(n-1)(n-1-i) - \left[\frac{(n-1)n}{2} - \frac{i(i+1)}{2}\right]$, which — after algebra — equals $\frac{(n-i-2)(n-i-1)}{2}$, matching the index-shift answer. Both routes work; the index shift just has fewer places to slip.

Phase 3: The outer sum and the closed form

Finally, sum the middle work over the outer loop, $i = 0, 1, \dots, n-1$:

$$T(n) = \sum_{i=0}^{n-1} \text{middle}(i) = \sum_{i=0}^{n-1} \frac{(n-i-2)(n-i-1)}{2}.$$

Once more the summand is awkward in $i$, and once more we shift to clean it up. Let $p = n - 1 - i$; as $i$ runs $0 \to n-1$, $p$ runs $n-1 \to 0$. Then $n - i - 1 = p$ and $n - i - 2 = p - 1$, so the summand $\frac{(n-i-2)(n-i-1)}{2}$ becomes $\frac{(p-1)p}{2}$:

$$T(n) = \sum_{p=0}^{n-1} \frac{(p-1)p}{2} = \frac{1}{2}\sum_{p=0}^{n-1} (p^2 - p) = \frac{1}{2}\left(\sum_{p=0}^{n-1} p^2 - \sum_{p=0}^{n-1} p\right).$$

Now apply the two closed forms from the chapter's table, both summed to $n-1$:

$$\sum_{p=0}^{n-1} p = \frac{(n-1)n}{2}, \qquad \sum_{p=0}^{n-1} p^2 = \frac{(n-1)n(2n-1)}{6}.$$

Substitute. A tempting next move is to "pull out the common factor $\frac{(n-1)n}{2}$" — let us try it, and watch carefully (this step has a trap):

$$ T(n) = \frac{1}{2}\left(\frac{(n-1)n(2n-1)}{6} - \frac{(n-1)n}{2}\right) \overset{?}{=} \frac{(n-1)n}{2}\cdot\frac{1}{2}\left(\frac{2n-1}{6} - \frac{1}{2}\right). $$

Combine the inner fractions over denominator $6$: $\frac{2n-1}{6} - \frac{3}{6} = \frac{2n-4}{6} = \frac{n-2}{3}$. Therefore

$$T(n) \overset{?}{=} \frac{(n-1)n}{2}\cdot\frac{1}{2}\cdot\frac{n-2}{3} = \frac{n(n-1)(n-2)}{2 \cdot 2 \cdot 3} = \frac{n(n-1)(n-2)}{12}.$$

Stop and check this against the combinatorial prediction $\binom{n}{3} = \frac{n(n-1)(n-2)}{6}$ — and against the code, which printed $4$ for $n = 4$, whereas $\frac{4 \cdot 3 \cdot 2}{12} = 2$. The two disagree, so the line above is wrong: there is one too many factors of $2$ in the denominator. The culprit is the factoring step that produced line $T(n) = \frac{(n-1)n}{2}\cdot\frac{1}{2}(\dots)$: we pulled the leading $\frac{1}{2}$ out front and also left a $\frac{1}{2}$ inside the parentheses, counting it twice. Redo the simplification cleanly, without that risky factoring, straight from the substituted form:

$$ T(n) = \frac{1}{2}\left(\frac{(n-1)n(2n-1)}{6} - \frac{(n-1)n}{2}\right). $$

Put both terms over $6$: $\frac{(n-1)n}{2} = \frac{3(n-1)n}{6}$, so the bracket is $\frac{(n-1)n(2n-1) - 3(n-1)n}{6} = \frac{(n-1)n\big[(2n-1) - 3\big]}{6} = \frac{(n-1)n(2n-4)}{6} = \frac{2(n-1)n(n-2)}{6} = \frac{(n-1)n(n-2)}{3}.$ Then

$$T(n) = \frac{1}{2}\cdot\frac{(n-1)n(n-2)}{3} = \frac{n(n-1)(n-2)}{6}.$$

That matches the combinatorial prediction exactly. The single factor of $2$ from $2n - 4 = 2(n-2)$ cancels the leading $\frac{1}{2}$, and there is no stray doubling.

⚠️ Common Pitfall: double-counting a constant during factoring. When you "factor out" a constant from a difference, make sure you remove it from every term and do not also leave a copy behind. The aborted computation above pulled $\frac12$ out front and kept a $\frac12$ inside — a doubled constant that an order-of-growth check would hide (both $\frac{n^3}{12}$ and $\frac{n^3}{6}$ are $\Theta(n^3)$) but an exact-count check catches instantly. This is exactly why §11.6 insists on deriving the count exactly, then reading off the growth rate — not the other way around.

Phase 4: Verify, then read off the growth rate

We have two independent derivations of the same closed form, which is the strongest kind of confidence:

By summation (Phases 1–3): $T(n) = \dfrac{n(n-1)(n-2)}{6}$.
By counting (the intuition callout): the loop runs once per $3$-element subset $\{i,j,k\}$, and the number of such subsets is $\binom{n}{3} = \dfrac{n(n-1)(n-2)}{6}$ (Chapter 16).

Now spot-check against the code's own output. For $n = 4$: $\frac{4 \cdot 3 \cdot 2}{6} = \frac{24}{6} = 4$, and `count_triples` printed $4$. ✓ For $n = 5$: $\frac{5 \cdot 4 \cdot 3}{6} = 10$.

# Cross-check the closed form against the loop for several n (hand-traced):
# n=3: 3*2*1/6 = 1   (the single triple (0,1,2))
# n=4: 4*3*2/6 = 4
# n=5: 5*4*3/6 = 10
# n=6: 6*5*4/6 = 20
for n in [3, 4, 5, 6]:
    print(n, count_triples(list(range(n))))
# Expected output:
# 3 1
# 4 4
# 5 10
# 6 20

Finally, the engineering question. Expand the leading behavior: $T(n) = \frac{n^3 - 3n^2 + 2n}{6}$. The dominant term is $\frac{n^3}{6}$; everything else is lower order. So $T(n) = \Theta(n^3)$ — cubic. That single fact explains the team's symptom: when the input doubled from $n$ to $2n$, the work grew by a factor of $\frac{(2n)^3}{n^3} = 8$. A job that took a minute on one batch will take eight minutes on a batch twice the size, and over an hour on a batch four times the size. The closed form did not just describe the cost — it predicted the trend the team was seeing on the clock.

🔗 Connection. The step "discard lower-order terms and constants, keep $\frac{n^3}{6} \to \Theta(n^3)$" is exactly the abstraction **[Chapter 14](../chapter-14-algorithms-complexity/index.md)** formalizes with Big-$\Theta$. And the fact that a triple-nested "for each later element" loop is $\Theta(n^3)$ generalizes the §11.6 result that a double such loop is $\Theta(n^2)$: each extra nested level over "later elements" raises the exponent by one, because each adds one more summation over a triangular range. You now have the tool to predict the exponent of any such nest without running it.

Discussion Questions

The derivation worked from the innermost loop outward, shifting the index at each level to reach a known sum. Why is "innermost first" the natural order? What goes wrong if you try to write the triple sum and attack the outer index first?
We checked the summation against a combinatorial count ($\binom{n}{3}$). For the double-loop version (for i: for j in range(i+1, n)), what combination is the analogous check, and what is its closed form? Confirm it matches the $\frac{n(n-1)}{2}$ from §11.6.
Suppose the innermost check were not constant time but cost $\log_2 n$ each. How would $T(n)$ change, and what would the new growth rate be? (You may pull $\log_2 n$ out of the sums by linearity, since it does not depend on $i, j, k$.)
The aborted algebra in Phase 3 produced $\frac{n^3}{12}$ instead of $\frac{n^3}{6}$ — a factor-of-two error invisible to a Big-$\Theta$ check. Give one situation where the exact constant (not just the growth rate) actually matters to a decision, and one where it genuinely does not.

Your Turn: Extensions

Option A (one level deeper). Add a fourth nested loop for l in range(k+1, n). Predict the closed form before deriving it (use the combinatorial shortcut), then derive it by summation and confirm. What is the growth rate, and by what factor does doubling $n$ multiply the work?
Option B (a different shape). Analyze for i in range(n): for j in range(n): for k in range(n) (all three loops full, no $i < j < k$ restriction). Write it as a triple sum, evaluate it, and compare the exact count and the growth rate to the triangular version. Why is the constant so different even though both are $\Theta(n^3)$?
Option C (instrument and confirm). Add a counter to count_triples and hand-trace it for $n = 3, 4, 5$, tabulating the count beside your closed form. Then use the Toolkit's summation to express the innermost and middle sums symbolically and confirm the per-level formulas of Phases 1–2.

Key Takeaways

A nested loop is a nested sum; evaluate it from the inside out. Each loop becomes one summation, and you collapse them one level at a time.
Shift the index to reach a known sum. Awkward bounds like $\sum_{j=i+1}^{n-1}(n-1-j)$ become plain triangular or square sums under a substitution — the same renaming you do to a loop variable.
Derive the exact count, then read off the growth rate. The leading term gives the $\Theta(\cdot)$; the lower-order terms and constants are what Chapter 14 will license you to drop.
Cross-check a summation against a count. When a summation and a combinatorial argument ($\binom{n}{k}$) land on the same closed form, a stray factor of two has nowhere to hide.