27 min read

> "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Learning Objectives

  • Explain Big-O notation formally and informally
  • Identify the common complexity classes and recognize them in code
  • Analyze the time complexity of loops, nested loops, and recursive functions
  • Distinguish between time complexity and space complexity
  • Explain best-case, worst-case, and average-case analysis
  • Describe amortized analysis at an introductory level
  • Profile Pascal programs to measure actual execution time
  • Apply algorithmic optimization to improve program performance

Chapter 26: Complexity Analysis: How Fast Is Your Program and Can You Make It Faster?

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." — Donald Knuth


Throughout Part IV, we have made claims like "binary search is O(log n)" and "merge sort is O(n log n)" and "naive Fibonacci is O(2^n)." We have compared algorithms by their complexity classes, argued that O(n log n) is better than O(n^2), and chosen data structures based on the time complexity of their operations. But what does all this actually mean? How do you determine an algorithm's complexity from its code? And when theory meets practice, how do you measure and improve your program's actual performance?

This chapter provides rigorous answers. We define Big-O notation properly, catalog the common complexity classes, and develop systematic techniques for analyzing loops and recursion. We distinguish between time and space complexity, explore best/worst/average cases, and introduce amortized analysis. Then we move from theory to practice: measuring execution time in Pascal, counting operations, and applying the most important optimization principle in all of computer science — improve the algorithm before you optimize the code.

By the end of this chapter, you will be able to look at any piece of code and determine its complexity class, predict how it will scale, and make informed decisions about when optimization matters and when it does not.

Prerequisites: Recursion (Chapter 22) and sorting/searching (Chapter 23) provide the algorithms we will analyze. Chapters 24 and 25 provide additional examples but are not strictly required.


26.1 Why Analyze Performance?

Consider two programs that solve the same problem — finding whether a value exists in a sorted array of a million elements.

  • Program A uses linear search: it checks elements one by one. On average, it examines 500,000 elements.
  • Program B uses binary search: it halves the search space at each step. It examines at most 20 elements.

Both programs give the correct answer. Both are written in clean, readable Pascal. But Program B is 25,000 times faster. For a single query, this might not matter. For a program that performs a million queries (a database, a search engine, a financial system), it is the difference between a response time of seconds and a response time of hours.

Performance analysis lets you:

  1. Predict how your program scales as input grows.
  2. Choose the right algorithm for your problem size.
  3. Identify bottlenecks before writing optimized code.
  4. Communicate about algorithms using a shared vocabulary (Big-O notation).

💡 Intuition: Complexity Is About Growth, Not Speed A program that takes 0.001 seconds for n = 100 might seem fast. But if it is O(n^2), it will take 10 seconds for n = 100,000 and nearly 3 hours for n = 10,000,000. An O(n log n) algorithm that takes 0.01 seconds for n = 100 will take about 2 seconds for n = 10,000,000. Complexity analysis is not about measuring speed — it is about predicting how speed changes as problems grow.


26.2 Big-O Notation (Formal and Informal)

Informal Definition

Big-O notation describes the upper bound on the growth rate of a function. When we say an algorithm is O(n^2), we mean that as the input size n grows, the running time grows at most proportionally to n squared. Constants and lower-order terms are ignored:

  • 3n^2 + 7n + 15 is O(n^2)
  • 100n is O(n)
  • 5 is O(1)

Formal Definition

A function f(n) is O(g(n)) if there exist positive constants c and n0 such that:

f(n) <= c * g(n) for all n >= n0

In words: beyond some threshold n0, f(n) is always at most a constant multiple of g(n).

Worked Example: Proving 3n^2 + 7n + 15 Is O(n^2)

We need to find c and n0 such that 3n^2 + 7n + 15 <= c * n^2 for all n >= n0.

For n >= 1: 7n <= 7n^2 and 15 <= 15n^2. So: 3n^2 + 7n + 15 <= 3n^2 + 7n^2 + 15n^2 = 25n^2.

Choose c = 25 and n0 = 1. Then 3n^2 + 7n + 15 <= 25n^2 for all n >= 1. Therefore, 3n^2 + 7n + 15 is O(n^2).

Worked Example: Proving 100n Is O(n)

We need: 100n <= c * n. Choose c = 100 and n0 = 1. Then 100n <= 100n for all n >= 1. Done.

Worked Example: Proving n Is Not O(1)

Suppose n <= c for all n >= n0. This is false: for n = c + 1, we have c + 1 > c. No matter what constant c we choose, a sufficiently large n will exceed it. Therefore, n is not O(1).

Why We Drop Constants

Consider two algorithms: - Algorithm A: exactly 2n comparisons. - Algorithm B: exactly 100n comparisons.

Both are O(n). Algorithm A is 50 times faster, but as n grows, both remain linear. The constant factor (2 vs. 100) matters for practical performance, but it does not change the fundamental scalability. An O(n) algorithm with a large constant will still eventually beat an O(n^2) algorithm with a tiny constant, because linear growth always eventually outpaces quadratic growth.

Specifically, 100n < n^2 when n > 100. So for inputs larger than 100 elements, the "slow" linear algorithm beats the "fast" quadratic one.

  • O (Big-O): Upper bound. f(n) grows at most this fast.
  • Omega (Big-Omega): Lower bound. f(n) grows at least this fast.
  • Theta (Big-Theta): Tight bound. f(n) grows exactly this fast (up to constants).

In practice, we almost always use Big-O because we care most about worst-case upper bounds. When someone says "this algorithm is O(n log n)," they usually mean it is also Theta(n log n), but Big-O is the conventional shorthand.


26.3 Common Complexity Classes

Here are the complexity classes you will encounter most often, ordered from fastest to slowest:

O(1) — Constant Time

The running time does not depend on the input size. Examples: - Array access by index: Arr[I] - Stack push/pop (on an array-based stack) - Hash table lookup (average case)

function GetFirst(const Arr: array of Integer): Integer;
begin
  GetFirst := Arr[0];  { Always one operation, regardless of array size }
end;

O(log n) — Logarithmic Time

The running time grows logarithmically. Typically arises when the problem is halved at each step. Examples: - Binary search - BST search (balanced tree) - Fast exponentiation

function BinarySearch(const Arr: array of Integer;
                      N, Target: Integer): Integer;
var
  Lo, Hi, Mid: Integer;
begin
  Lo := 0; Hi := N - 1;
  while Lo <= Hi do begin       { Loop runs log2(n) times }
    Mid := (Lo + Hi) div 2;
    if Arr[Mid] = Target then begin BinarySearch := Mid; Exit; end
    else if Target < Arr[Mid] then Hi := Mid - 1
    else Lo := Mid + 1;
  end;
  BinarySearch := -1;
end;

O(n) — Linear Time

The running time is proportional to the input size. Examples: - Linear search - Array traversal - Linked list traversal

function Sum(const Arr: array of Integer; N: Integer): Integer;
var
  Total, I: Integer;
begin
  Total := 0;
  for I := 0 to N - 1 do       { Loop runs n times }
    Total := Total + Arr[I];     { O(1) per iteration }
  Sum := Total;
end;

O(n log n) — Linearithmic Time

Typical of efficient sorting algorithms. Examples: - Merge sort - Quicksort (average case) - Heapsort

O(n^2) — Quadratic Time

Running time grows with the square of the input. Typically arises from nested loops. Examples: - Selection sort - Insertion sort (worst case) - Checking all pairs

procedure AllPairs(const Arr: array of Integer; N: Integer);
var
  I, J: Integer;
begin
  for I := 0 to N - 1 do           { Outer: n iterations }
    for J := I + 1 to N - 1 do     { Inner: ~n/2 iterations on average }
      { Process pair (I, J) };       { Total: ~n^2/2 operations = O(n^2) }
end;

O(n^3) — Cubic Time

Arises from triply nested loops. Examples: - Naive matrix multiplication - Floyd-Warshall all-pairs shortest paths

{ Naive matrix multiplication: C = A * B }
for I := 0 to N - 1 do
  for J := 0 to N - 1 do begin
    C[I][J] := 0;
    for K := 0 to N - 1 do
      C[I][J] := C[I][J] + A[I][K] * B[K][J];
  end;
{ Total: n * n * n = O(n^3) }

O(2^n) — Exponential Time

Running time doubles with each additional input element. Examples: - Naive recursive Fibonacci - Brute-force subset enumeration - Traveling salesman (brute force)

function Fib(N: Integer): Int64;
begin
  if N <= 1 then Fib := N
  else Fib := Fib(N-1) + Fib(N-2);  { Two recursive calls per level }
end;

O(n!) — Factorial Time

Running time grows as the factorial of n. Examples: - Generating all permutations - Brute-force traveling salesman

📊 Growth Rate Comparison

n O(1) O(log n) O(n) O(n log n) O(n^2) O(n^3) O(2^n)
10 1 3 10 33 100 1,000 1,024
100 1 7 100 664 10,000 1,000,000 10^30
1,000 1 10 1,000 9,966 1,000,000 10^9 10^301
10,000 1 13 10,000 132,877 10^8 10^12 ...
1,000,000 1 20 1,000,000 19,931,569 10^12 10^18 ...

The jump from O(n log n) to O(n^2) is dramatic. The jump from O(n^2) to O(2^n) is catastrophic. At n = 100, an O(n^2) algorithm performs 10,000 operations (fast), while an O(2^n) algorithm performs 10^30 operations (impossible — more than the number of atoms in the universe).


26.4 Analyzing Loops and Recursion

Single Loops

A loop that iterates n times, doing O(1) work per iteration, is O(n):

Sum := 0;
for I := 1 to N do     { n iterations }
  Sum := Sum + I;       { O(1) per iteration }
{ Total: O(n) }

Nested Loops

Nested loops multiply their iteration counts:

for I := 1 to N do          { n iterations }
  for J := 1 to N do        { n iterations each }
    Process(I, J);           { O(1) }
{ Total: O(n^2) }

But not all nested loops are O(n^2). If the inner loop's bound depends on the outer:

for I := 1 to N do          { n iterations }
  for J := 1 to I do        { I iterations (1, 2, ..., n) }
    Process(I, J);
{ Total: 1 + 2 + ... + n = n(n+1)/2 = O(n^2) }

Still O(n^2), but exactly half as many operations.

Worked Analysis: 10+ Code Snippets

Let us analyze a series of increasingly complex code patterns.

Snippet 1: Sequential loops

for I := 1 to N do A(I);      { O(n) }
for J := 1 to N do B(J);      { O(n) }
{ Total: O(n) + O(n) = O(n) — sequential loops add, not multiply }

Snippet 2: Nested loop with constant inner bound

for I := 1 to N do
  for J := 1 to 100 do
    Process(I, J);
{ Total: n * 100 = O(n) — the inner loop is constant }

Snippet 3: Loop with diminishing inner iterations

for I := 1 to N do
  for J := I to N do
    Process(I, J);
{ Iterations: (n) + (n-1) + (n-2) + ... + 1 = n(n+1)/2 = O(n^2) }

Snippet 4: Halving loop

I := N;
while I > 0 do begin
  Process(I);
  I := I div 2;            { Halving: log2(n) iterations }
end;
{ Total: O(log n) }

Snippet 5: Doubling loop

I := 1;
while I < N do begin
  Process(I);
  I := I * 2;              { Doubling: log2(n) iterations }
end;
{ Total: O(log n) }

Snippet 6: Nested loop with halving

for I := 1 to N do begin         { n iterations }
  J := N;
  while J > 0 do begin           { log2(n) iterations }
    Process(I, J);
    J := J div 2;
  end;
end;
{ Total: n * log(n) = O(n log n) }

Snippet 7: Two loops, one nested

for I := 1 to N do               { O(n) }
  for J := 1 to N do             { O(n^2) total for this block }
    A(I, J);

for K := 1 to N do               { O(n) }
  B(K);
{ Total: O(n^2) + O(n) = O(n^2) — dominated by the larger term }

Snippet 8: Independent halving in nested position

I := N;
while I > 0 do begin             { log(n) iterations }
  J := N;
  while J > 0 do begin           { log(n) iterations }
    Process(I, J);
    J := J div 2;
  end;
  I := I div 2;
end;
{ Total: log(n) * log(n) = O(log^2 n) }

Snippet 9: Loop where work grows linearly

for I := 1 to N do
  for J := 1 to I * I do
    Process(I, J);
{ Iterations: 1 + 4 + 9 + 16 + ... + n^2 = sum of squares ≈ n^3/3 = O(n^3) }

Snippet 10: Loop calling a function

for I := 1 to N do
  BinarySearch(Arr, N, I);      { O(log n) per call }
{ Total: n * O(log n) = O(n log n) }

Snippet 11: Nested loop with early exit

for I := 0 to N - 1 do begin
  for J := 0 to N - 1 do begin
    if Arr[I] = Arr[J] then begin
      WriteLn('Duplicate found');
      Exit;
    end;
  end;
end;
{ Worst case (no duplicates): O(n^2). Best case (first two equal): O(1). }

Analyzing Recursion

For recursive algorithms, we use recurrence relations. The time complexity T(n) is expressed in terms of T on smaller inputs:

Merge sort: T(n) = 2T(n/2) + O(n) - Two recursive calls on half-sized inputs, plus O(n) merge work. - Solution: T(n) = O(n log n).

Binary search: T(n) = T(n/2) + O(1) - One recursive call on half-sized input, plus O(1) comparison. - Solution: T(n) = O(log n).

Naive Fibonacci: T(n) = T(n-1) + T(n-2) + O(1) - Two recursive calls on slightly smaller inputs. - Solution: T(n) = O(2^n) (exponential).

Factorial: T(n) = T(n-1) + O(1) - One recursive call, reducing n by 1 each time. - Solution: T(n) = O(n).

The Recursion Tree Method

For merge sort's T(n) = 2T(n/2) + n, we can draw a recursion tree:

Level 0:                    n                    (1 node, total work: n)
Level 1:              n/2       n/2              (2 nodes, total work: n)
Level 2:          n/4   n/4   n/4   n/4          (4 nodes, total work: n)
...
Level k:    n/2^k repeated 2^k times              (2^k nodes, total: n)
...
Level log(n):  1 1 1 1 ... 1 (n nodes)           (n nodes, total work: n)

Each level does O(n) work. There are log(n) + 1 levels. Total: O(n log n).

The Master Theorem (Simplified)

For recurrences of the form T(n) = aT(n/b) + O(n^c): - If c < log_b(a): T(n) = O(n^(log_b(a))) - If c = log_b(a): T(n) = O(n^c * log n) - If c > log_b(a): T(n) = O(n^c)

Merge sort: a=2, b=2, c=1. log_2(2) = 1 = c, so T(n) = O(n^1 * log n) = O(n log n). Confirmed.

Binary search: a=1, b=2, c=0. log_2(1) = 0 = c, so T(n) = O(n^0 * log n) = O(log n). Confirmed.

Strassen's matrix multiply: a=7, b=2, c=2. log_2(7) ≈ 2.81 > 2 = c, so T(n) = O(n^2.81). This is faster than naive O(n^3) matrix multiplication.

Example that does not fit the Master Theorem: T(n) = T(n-1) + O(n). This is not of the form aT(n/b). We solve it by expansion: T(n) = n + (n-1) + (n-2) + ... + 1 = n(n+1)/2 = O(n^2). This is the complexity of selection sort.

Practice: Analyze These Functions

Try to determine the complexity of each function before reading the answers.

Function 1:

function Mystery1(N: Integer): Integer;
var
  I, J, Count: Integer;
begin
  Count := 0;
  I := 1;
  while I < N do begin
    for J := 1 to N do
      Inc(Count);
    I := I * 2;
  end;
  Mystery1 := Count;
end;

Analysis: The outer while loop runs log(n) times (I doubles). The inner for loop runs n times. Total: O(n log n).

Function 2:

function Mystery2(N: Integer): Integer;
var
  I, J, Count: Integer;
begin
  Count := 0;
  for I := 1 to N do
    for J := 1 to N do
      for K := 1 to N do
        if (I + J + K) mod 2 = 0 then
          Inc(Count);
  Mystery2 := Count;
end;

Analysis: Three nested loops, each running n times. The if-condition does O(1) work per iteration. Total: O(n^3). The condition does not change the complexity because it executes every iteration — it just sometimes skips the increment.

Function 3:

function Mystery3(N: Integer): Integer;
begin
  if N <= 1 then
    Mystery3 := 1
  else
    Mystery3 := Mystery3(N div 3) + Mystery3(N div 3) + Mystery3(N div 3);
end;

Analysis: Recurrence: T(n) = 3T(n/3) + O(1). Master Theorem: a=3, b=3, c=0. log_3(3) = 1 > 0 = c. So T(n) = O(n^1) = O(n). Three recursive calls, each on n/3 elements, with constant combination work, gives linear time.

Function 4:

procedure Mystery4(var Arr: array of Integer; N: Integer);
var
  I, J, Temp: Integer;
  Swapped: Boolean;
begin
  repeat
    Swapped := False;
    for J := 0 to N - 2 do
      if Arr[J] > Arr[J + 1] then begin
        Temp := Arr[J]; Arr[J] := Arr[J+1]; Arr[J+1] := Temp;
        Swapped := True;
      end;
  until not Swapped;
end;

Analysis: This is bubble sort. Best case: O(n) (already sorted, one pass with no swaps). Worst case: O(n^2) (reverse sorted, n passes of n comparisons). Average case: O(n^2).


26.5 Space Complexity

Space complexity measures how much memory an algorithm uses as a function of input size, beyond the input itself.

O(1) space: The algorithm uses a fixed number of variables:

function Sum(const Arr: array of Integer; N: Integer): Integer;
var
  Total, I: Integer;
begin
  Total := 0;
  for I := 0 to N - 1 do Total := Total + Arr[I];
  Sum := Total;  { Only 2 extra variables: Total, I. O(1) space. }
end;

O(n) space: The algorithm creates a data structure proportional to the input:

procedure MergeSort(...);  { Uses O(n) temporary array for merging }

O(log n) space: Recursive algorithms use stack space proportional to recursion depth:

procedure QuickSort(Arr, Lo, Hi);  { Recursion depth ~log n on average }

O(n) stack space: Deep recursion on linear structures:

procedure PrintList(Node);  { Recursion depth = list length n }

Space Complexity of Data Structures

Data Structure Space Notes
Array of n elements O(n) Contiguous memory
Linked list of n nodes O(n) Each node has data + pointer
BST of n nodes O(n) Each node has data + 2 pointers
Adjacency matrix (V vertices) O(V^2) Even for sparse graphs
Adjacency list (V vertices, E edges) O(V + E) Efficient for sparse graphs
DP table (n * W) O(n * W) Knapsack table
Hash table (n entries) O(n) Plus some overhead for buckets

Time-Space Tradeoffs

A common theme in algorithm design is trading space for time: - Memoization: Uses O(n) extra space to reduce Fibonacci from O(2^n) to O(n) time. - Hash tables: Use O(n) space to achieve O(1) average-case lookups. - DP tables: Use O(n * W) space to solve knapsack in O(n * W) time. - Precomputed lookup tables: Use O(n) space to answer queries in O(1). - Sorting + binary search: Use O(n log n) preprocessing time to answer multiple queries in O(log n) each.

The art is choosing the right tradeoff for your constraints. On a system with ample memory, a hash table is ideal. On an embedded system with 64 KB of RAM, you might prefer a binary search on a sorted array.


26.6 Best/Worst/Average Case

A single algorithm can have different performance on different inputs.

Insertion Sort Example

  • Best case: O(n) — input is already sorted. Inner loop never shifts elements.
  • Worst case: O(n^2) — input is reverse sorted. Every element shifts maximally.
  • Average case: O(n^2) — random permutations require about n^2/4 shifts on average.

Binary Search Example

  • Best case: O(1) — target is the middle element.
  • Worst case: O(log n) — target is not present or at an extreme.
  • Average case: O(log n) — about the same as worst case.

Quicksort Example

  • Best case: O(n log n) — pivot always splits evenly.
  • Worst case: O(n^2) — pivot is always min or max (sorted input with bad pivot).
  • Average case: O(n log n) — random pivots give roughly balanced splits.

Hash Table Example

  • Best case: O(1) — no collisions.
  • Worst case: O(n) — all keys hash to the same bucket.
  • Average case: O(1) — with a good hash function and reasonable load factor.

Which Case Matters?

  • Worst case is safest for guarantees: "this will never take longer than..."
  • Average case is most practical: "this will typically take about..."
  • Best case is rarely useful: it only tells you how fast the algorithm can be, not how fast it will be.

In this book, unless stated otherwise, complexity means worst case.

Choosing Your Analysis Level

Different situations call for different analysis levels:

  • Designing an algorithm? Analyze the worst case. You need to guarantee that your algorithm will not be unacceptably slow on any input.
  • Comparing two algorithms for practical use? Consider the average case. If one algorithm has a better average case but the same worst case, it will typically perform better in production.
  • Explaining why an algorithm seems faster than its worst case suggests? Look at amortized analysis (next section) or consider the distribution of real-world inputs.
  • Presenting performance to stakeholders? Report the worst case with a note about typical (average case) performance. "This will never take more than 2 seconds, and typically completes in 200 milliseconds."

26.7 Amortized Analysis

Some operations are expensive occasionally but cheap most of the time. Amortized analysis computes the average cost per operation over a sequence of operations.

Dynamic Array Example

A dynamic array (like Pascal's SetLength-based arrays or TList) starts with a fixed capacity. When full, it doubles its capacity — an O(n) operation (copying all elements to new memory). But this doubling happens only after n insertions. The amortized cost per insertion is:

Let us trace a dynamic array growing from capacity 1:

Insert 1: Capacity 1 → 1 element.  No resize. Cost: 1.
Insert 2: Capacity 1 → FULL. Double to 2. Copy 1 element. Cost: 1 + 1 = 2.
Insert 3: Capacity 2 → FULL. Double to 4. Copy 2 elements. Cost: 1 + 2 = 3.
Insert 4: Capacity 4 → 3 elements. No resize. Cost: 1.
Insert 5: Capacity 4 → FULL. Double to 8. Copy 4 elements. Cost: 1 + 4 = 5.
Insert 6: Cost: 1.
Insert 7: Cost: 1.
Insert 8: Cost: 1.
Insert 9: Capacity 8 → FULL. Double to 16. Copy 8 elements. Cost: 1 + 8 = 9.

Total cost for 9 insertions: 1 + 2 + 3 + 1 + 5 + 1 + 1 + 1 + 9 = 24. Amortized cost per insertion: 24/9 ≈ 2.67.

In general, the total cost of n insertions is at most 3n (each element is copied at most twice across all doublings, plus 1 for the insertion itself). Amortized per insertion: 3n/n = O(1).

So even though individual insertions can be O(n), the amortized cost is O(1) per insertion. This is why dynamic arrays are efficient in practice — the expensive doublings are rare enough that they average out to constant cost.

Another Amortized Example: Incrementing a Binary Counter

Consider a binary counter represented as an array of bits. Incrementing it flips the rightmost 0 to 1 and all trailing 1s to 0. For example: 0111 → 1000 (flips 4 bits), 1000 → 1001 (flips 1 bit).

Some increments are expensive (flipping many bits), but most are cheap. Over n increments starting from 0: - Bit 0 flips every increment: n times. - Bit 1 flips every 2 increments: n/2 times. - Bit 2 flips every 4 increments: n/4 times. - Total flips: n + n/2 + n/4 + ... = 2n.

Amortized cost per increment: 2n/n = O(1). Even though a single increment can flip O(log n) bits, the average over n increments is constant.

The "Banker's Method" Intuition

Think of each O(1) insertion as depositing $3 into a bank account: $1 for the insertion itself, and $2 saved for future copying. When a doubling occurs, the saved money pays for copying all elements to the new array. Since each element saved $2 and will be copied at most once before the next doubling, the bank account never goes negative. This is a popular informal proof of amortized O(1) cost.


26.8 Profiling Pascal Programs (Timing, Counting Operations)

Theory is essential, but practice demands measurement. Here is how to profile Pascal programs.

Measuring Execution Time

uses
  SysUtils, DateUtils;

var
  StartTime: TDateTime;
  ElapsedMS: Int64;

begin
  StartTime := Now;

  { ... code to measure ... }

  ElapsedMS := MilliSecondsBetween(Now, StartTime);
  WriteLn('Elapsed: ', ElapsedMS, ' ms');
end;

For higher precision, use GetTickCount64:

uses
  SysUtils;

var
  Start, Finish: QWord;

begin
  Start := GetTickCount64;

  { ... code to measure ... }

  Finish := GetTickCount64;
  WriteLn('Elapsed: ', Finish - Start, ' ms');
end;

For sub-millisecond timing on Windows, use QueryPerformanceCounter:

{$IFDEF WINDOWS}
uses Windows;

var
  Freq, Start, Finish: Int64;
  ElapsedMicrosec: Double;
begin
  QueryPerformanceFrequency(Freq);
  QueryPerformanceCounter(Start);

  { ... code to measure ... }

  QueryPerformanceCounter(Finish);
  ElapsedMicrosec := (Finish - Start) / Freq * 1000000;
  WriteLn('Elapsed: ', ElapsedMicrosec:0:1, ' microseconds');
end;
{$ENDIF}

Counting Operations

For algorithmic analysis (rather than wall-clock timing), count the fundamental operations:

var
  Comparisons: Int64;

function SearchCounted(const Arr: array of Integer;
                       N, Target: Integer): Integer;
var
  I: Integer;
begin
  Comparisons := 0;
  for I := 0 to N - 1 do begin
    Inc(Comparisons);
    if Arr[I] = Target then begin
      SearchCounted := I;
      Exit;
    end;
  end;
  SearchCounted := -1;
end;

A Complete Counting Example: Comparing Sorts

var
  CompareCount, SwapCount: Int64;

procedure SelectionSortCounted(var Arr: array of Integer; N: Integer);
var
  I, J, MinIdx, Temp: Integer;
begin
  CompareCount := 0;
  SwapCount := 0;
  for I := 0 to N - 2 do begin
    MinIdx := I;
    for J := I + 1 to N - 1 do begin
      Inc(CompareCount);
      if Arr[J] < Arr[MinIdx] then
        MinIdx := J;
    end;
    if MinIdx <> I then begin
      Inc(SwapCount);
      Temp := Arr[I]; Arr[I] := Arr[MinIdx]; Arr[MinIdx] := Temp;
    end;
  end;
  WriteLn('Selection sort: ', CompareCount, ' comparisons, ',
          SwapCount, ' swaps');
end;

For an array of size 100: Selection sort always makes 100*99/2 = 4,950 comparisons, regardless of input order. Insertion sort makes 4,950 comparisons in the worst case but as few as 99 in the best case (sorted input).

Empirical Growth Analysis

To verify that an algorithm is O(n^2), measure it at several input sizes and check that doubling n quadruples the time:

procedure EmpiricalAnalysis;
var
  Sizes: array[1..5] of Integer = (1000, 2000, 4000, 8000, 16000);
  I, S: Integer;
  Arr: array of Integer;
  Start, Elapsed, PrevElapsed: QWord;
begin
  WriteLn('Size':8, 'Time(ms)':12, 'Ratio':8);
  PrevElapsed := 0;
  for S := 1 to 5 do begin
    SetLength(Arr, Sizes[S]);
    for I := 0 to Sizes[S] - 1 do Arr[I] := Random(Sizes[S]);

    Start := GetTickCount64;
    SelectionSort(Arr, Sizes[S]);
    Elapsed := GetTickCount64 - Start;

    if (S > 1) and (PrevElapsed > 0) then
      WriteLn(Sizes[S]:8, Elapsed:12, (Elapsed / PrevElapsed):8:2)
    else
      WriteLn(Sizes[S]:8, Elapsed:12, '    -':8);
    PrevElapsed := Elapsed;
  end;
end;

Expected output for O(n^2):

    Size    Time(ms)   Ratio
    1000           5       -
    2000          20    4.00
    4000          78    3.90
    8000         312    4.00
   16000        1250    4.01

If the ratio of times for consecutive sizes is approximately 4 (when input doubles), the algorithm is O(n^2). If approximately 2, it is O(n). If approximately 2.3 (which is 2 * 1.15), it is O(n log n).

📊 Expected Doubling Ratios | Complexity | Expected Ratio When n Doubles | |-----------|------------------------------| | O(n) | 2.0 | | O(n log n) | ~2.1-2.3 (slightly more than 2) | | O(n^2) | 4.0 | | O(n^3) | 8.0 | | O(2^n) | Squares previous time |

Best Practice: Profile Before Optimizing Never optimize without measuring first. The bottleneck is almost never where you think it is. Profile your program, identify the hotspot (the function or loop consuming the most time), and optimize that specific piece. Optimizing non-bottleneck code wastes your time and makes the code harder to read.


26.9 Common Complexity Mistakes

Before moving to optimization, let us address common errors students make in complexity analysis.

Mistake 1: Treating All Nested Loops as O(n^2)

Not all nested loops are quadratic. The key is what the inner loop's bounds depend on.

for I := 1 to N do
  for J := 1 to 10 do    { Constant inner loop }
    Process(I, J);
{ This is O(10n) = O(n), not O(n^2) }
for I := 1 to N do begin
  J := N;
  while J > 0 do begin
    J := J div 2;        { Logarithmic inner loop }
  end;
end;
{ This is O(n log n), not O(n^2) }

Mistake 2: Confusing O(1) with "Fast"

O(1) means the time does not grow with input size. It does not mean the operation is fast in absolute terms. A hash table lookup is O(1) but might take 50 nanoseconds (computing the hash, resolving collisions). An array access is also O(1) but takes 1 nanosecond. Both are constant time, but one is 50 times slower.

Mistake 3: Ignoring the Input Size That Matters

For the 0/1 knapsack DP solution with complexity O(n * W), is this "polynomial"? It depends. If W is bounded by a constant, yes. But if W can be exponentially large (e.g., W = 2^n), then O(n * W) = O(n * 2^n), which is exponential. This subtlety — that the "size" of W as an input is log(W) bits, not W itself — is what makes knapsack NP-hard despite having a DP solution. This is called a pseudo-polynomial algorithm.

Mistake 4: Forgetting Hidden Costs

String concatenation in a loop seems O(1) per iteration, but in Pascal (and most languages), each concatenation creates a new string. Concatenating n strings of length 1 produces strings of length 1, 2, 3, ..., n, and copying each costs proportional to its length. Total: 1 + 2 + ... + n = O(n^2).

{ This is O(n^2), not O(n)! }
S := '';
for I := 1 to N do
  S := S + Chr(I mod 26 + Ord('A'));  { Creates a new string each iteration }

Mistake 5: Comparing Complexity Classes Incorrectly

O(n^2) is not "twice as slow" as O(n). It is quadratically slower. Doubling n makes an O(n) algorithm take 2x as long, but makes an O(n^2) algorithm take 4x as long. For n = 1,000,000: - O(n) = 1,000,000 operations - O(n^2) = 1,000,000,000,000 operations (a million times more)


26.10 Optimization Techniques (Algorithmic Improvement > Micro-Optimization)

The single most important lesson in optimization:

Improving the algorithm is always more effective than optimizing the code.

Switching from selection sort (O(n^2)) to merge sort (O(n log n)) on 100,000 elements gives a speedup of roughly 5,000x. No amount of micro-optimization — loop unrolling, register allocation, branch prediction hints — can come close to a 5,000x improvement on the same algorithm.

Optimization Hierarchy

  1. Choose a better algorithm. O(n log n) instead of O(n^2). O(log n) instead of O(n). Dynamic programming instead of brute force.

  2. Choose a better data structure. Hash table for O(1) lookup instead of array for O(n) lookup. Balanced BST for O(log n) dynamic operations instead of sorted array.

  3. Reduce unnecessary work. Avoid recomputation (memoization). Exit early when the answer is found. Skip elements that cannot contribute (pruning).

  4. Optimize data access patterns. Process data sequentially (cache-friendly). Avoid pointer chasing (arrays over linked lists when possible).

  5. Micro-optimize critical loops (only as a last resort). Replace multiplication with shifts. Inline small functions. Use compiler optimization flags (-O2, -O3 in Free Pascal).

Optimization Case Study: Duplicate Detection

Version 1: Brute force (O(n^2))

function HasDuplicateV1(const Arr: array of Integer; N: Integer): Boolean;
var
  I, J: Integer;
begin
  HasDuplicateV1 := False;
  for I := 0 to N - 2 do
    for J := I + 1 to N - 1 do
      if Arr[I] = Arr[J] then begin
        HasDuplicateV1 := True;
        Exit;
      end;
end;

Version 2: Sort first, then check adjacent elements (O(n log n))

function HasDuplicateV2(var Arr: array of Integer; N: Integer): Boolean;
var
  I: Integer;
begin
  QuickSort(Arr, 0, N - 1);    { O(n log n) }
  HasDuplicateV2 := False;
  for I := 0 to N - 2 do       { O(n) }
    if Arr[I] = Arr[I + 1] then begin
      HasDuplicateV2 := True;
      Exit;
    end;
end;

Version 3: Use a hash set (O(n) average, if we had one)

{ Conceptual — Pascal does not have a built-in hash set,
  but Free Pascal provides TFPHashList or you could implement one }
{ For each element, check if it is already in the set.
  If yes, duplicate found. If no, add it to the set.
  Each check and insert is O(1) average. Total: O(n). }

The progression from O(n^2) to O(n log n) to O(n) shows the optimization hierarchy at work. Version 2 is already thousands of times faster than Version 1 for large arrays. Version 3 provides another factor of improvement.

Pascal-Specific Optimization Tips

  • Use const parameters for large records and strings to avoid copying.
  • Use Integer instead of Int64 when 32-bit range suffices (faster on 32-bit systems).
  • Compile with optimization flags: fpc -O3 -CX -XX myprogram.pas
  • Use typed constants instead of computed values when possible.
  • Avoid string operations in tight loops — string concatenation creates new strings.
  • Prefer arrays over linked lists when random access is needed — arrays have better cache locality.
  • Use FillChar to initialize large arrays instead of looping.
  • Avoid repeated string comparisons — if you compare the same string many times, consider converting it to an integer index or hash value first.
  • Use Move to copy blocks of memory instead of element-by-element loops.

Why Cache Locality Matters

Modern CPUs access memory through a hierarchy of caches (L1, L2, L3). Accessing data that is already in cache is 10-100x faster than fetching from main memory. Arrays store elements contiguously in memory, so accessing sequential elements is cache-friendly — the first access loads a cache line (typically 64 bytes), and subsequent accesses hit the cache.

Linked lists scatter nodes across the heap. Each Node^.Next dereference may trigger a cache miss, making linked list traversal much slower than array traversal in practice, even though both are O(n) in theory.

This is why quicksort (which operates on contiguous array segments) is typically faster than merge sort (which copies to temporary arrays) in practice, despite both being O(n log n). It is also why hash tables with open addressing (which store entries in arrays) often outperform hash tables with separate chaining (which use linked lists) on modern hardware.

The lesson: complexity analysis tells you how an algorithm scales. But constant factors — especially cache behavior — determine how fast it actually runs on real hardware. Both perspectives matter.

Complete Optimization Case Study: PennyWise Monthly Report

Rosa wants to generate a monthly report that shows total spending by category. Here is her initial implementation:

Version 1: Naive approach (O(n * k) where k is number of categories)

procedure MonthlyReportV1(const Expenses: TExpenseArray; Count: Integer;
                          const Categories: array of string; NumCats: Integer);
var
  I, J: Integer;
  Total: Currency;
begin
  for I := 0 to NumCats - 1 do begin
    Total := 0;
    for J := 0 to Count - 1 do
      if Expenses[J].Category = Categories[I] then
        Total := Total + Expenses[J].Amount;
    WriteLn(Categories[I]:20, ': $', Total:0:2);
  end;
end;

For 10,000 expenses and 50 categories, this makes 500,000 string comparisons.

Version 2: Single pass with category lookup (O(n * k) but better constants)

procedure MonthlyReportV2(const Expenses: TExpenseArray; Count: Integer;
                          const Categories: array of string; NumCats: Integer);
var
  Totals: array of Currency;
  I, J: Integer;
begin
  SetLength(Totals, NumCats);
  for I := 0 to NumCats - 1 do Totals[I] := 0;

  for I := 0 to Count - 1 do
    for J := 0 to NumCats - 1 do
      if Expenses[I].Category = Categories[J] then begin
        Totals[J] := Totals[J] + Expenses[I].Amount;
        Break;  { Found the category — stop inner loop }
      end;

  for I := 0 to NumCats - 1 do
    WriteLn(Categories[I]:20, ': $', Totals[I]:0:2);
end;

The Break after finding the matching category gives a 2x speedup on average (since on average we scan half the categories before finding a match).

Version 3: Sort by category, then single pass (O(n log n + n))

procedure MonthlyReportV3(var Expenses: TExpenseArray; Count: Integer);
var
  I: Integer;
  CurrentCat: string[30];
  Total: Currency;
begin
  QuickSortExpenses(Expenses, 0, Count - 1, @CompareByCategory);

  CurrentCat := Expenses[0].Category;
  Total := 0;
  for I := 0 to Count - 1 do begin
    if Expenses[I].Category <> CurrentCat then begin
      WriteLn(CurrentCat:20, ': $', Total:0:2);
      CurrentCat := Expenses[I].Category;
      Total := 0;
    end;
    Total := Total + Expenses[I].Amount;
  end;
  WriteLn(CurrentCat:20, ': $', Total:0:2);
end;

After sorting by category (O(n log n)), we make a single pass through the data (O(n)). This is O(n log n) total, which is better than O(n * k) when k is large. More importantly, it scales: adding more categories does not slow down the report.

Version 4: Use a BST category tree (O(n log k))

Using the BST from Chapter 24, each expense lookup is O(log k), and we make n lookups. Total: O(n log k). For 50 categories, log(50) is about 6, so each expense requires only 6 comparisons instead of 25. This is the approach PennyWise actually uses.

The progression V1 → V2 → V3 → V4 illustrates the optimization hierarchy perfectly: each step improves the algorithm or data structure, not the low-level code.

⚠️ Warning: Premature Optimization Knuth's famous warning bears repeating: premature optimization is the root of all evil. Write clear, correct code first. Measure its performance. If it is too slow, profile to find the bottleneck. Only then optimize — and start by improving the algorithm, not the code. A well-chosen algorithm in clean Pascal will outperform a micro-optimized version of a poor algorithm every time.


26.11 Project Checkpoint: PennyWise Performance Audit

Rosa's PennyWise now has thousands of expenses. Let us audit its performance and identify optimization opportunities.

Current Performance Profile

Operation Current Approach Complexity 1,000 expenses 10,000 expenses
Find expense by amount Linear search O(n) <1 ms ~5 ms
Find expense by date Linear search O(n) <1 ms ~5 ms
Sort by amount Quicksort O(n log n) <1 ms ~10 ms
Category subtotal Tree traversal O(k) per category <1 ms <1 ms
Add expense Array append O(1) amortized <1 ms <1 ms
Display all Array traversal O(n) ~2 ms ~20 ms

Optimization Opportunities

  1. Frequent searches by amount: If Rosa searches by amount often, pre-sort by amount and use binary search. Reduces from O(n) to O(log n) per search, at the cost of O(n log n) sort time.

  2. Category lookup: The BST category tree (Chapter 24) provides O(log n) lookup. If categories are added alphabetically, the tree degenerates — consider balancing.

  3. Date range queries: For "show all expenses this week," binary search for the start and end dates in a date-sorted array gives O(log n + k) where k is the number of results, vs. O(n) with linear scan.

Profiling Code

procedure ProfileExpenseOperations(const Expenses: TExpenseArray;
                                   Count: Integer);
var
  Start: QWord;
  I, Idx: Integer;
  SortedExpenses: TExpenseArray;
begin
  WriteLn('--- PennyWise Performance Audit ---');
  WriteLn('  Expense count: ', Count);
  WriteLn;

  { Profile linear search }
  Start := GetTickCount64;
  for I := 1 to 1000 do
    Idx := LinearSearchByAmount(Expenses, Count, 42.50);
  WriteLn('  1000 linear searches: ',
          GetTickCount64 - Start, ' ms');

  { Profile sort + binary search }
  SetLength(SortedExpenses, Count);
  Move(Expenses[0], SortedExpenses[0], Count * SizeOf(TExpense));
  QuickSortExpenses(SortedExpenses, 0, Count - 1, @CompareByAmount);

  Start := GetTickCount64;
  for I := 1 to 1000 do
    Idx := BinarySearchByAmount(SortedExpenses, Count, 42.50);
  WriteLn('  1000 binary searches: ',
          GetTickCount64 - Start, ' ms');

  { Profile sort }
  Start := GetTickCount64;
  QuickSortExpenses(Expenses, 0, Count - 1, @CompareByAmount);
  WriteLn('  Sort by amount: ',
          GetTickCount64 - Start, ' ms');
end;

Rosa's Decision Framework

Rosa (and you) should ask these questions before optimizing PennyWise:

  1. Is the current performance acceptable? If operations complete in under 100 ms, the user will not notice. Do not optimize what is already fast enough.

  2. What is the bottleneck? Profile to find the single slowest operation. Focus all optimization effort there.

  3. Can I improve the algorithm? Almost always yes. Switching from linear search to binary search, or from O(n^2) sort to O(n log n) sort, gives the biggest wins.

  4. Is the data structure right? If you search more than you insert, a sorted array with binary search may be better than an unsorted list. If you insert and delete frequently, a balanced BST may be better than an array.

  5. What is the expected data size? For 100 expenses, any algorithm is fast. For 100,000 expenses, algorithm choice is critical. Design for the expected scale.


26.12 Retrieval Practice: Complexity Quick Quiz

Test your understanding by determining the complexity of each code fragment. Try to answer before reading the solutions.

Q1: What is the time complexity?

for I := 1 to N do
  for J := 1 to N * N do
    Count := Count + 1;

A1: O(n * n^2) = O(n^3). The inner loop runs n^2 times for each of the n iterations of the outer loop.

Q2: What is the complexity of searching a balanced BST with n nodes? A2: O(log n). The height of a balanced BST is O(log n), and search visits at most one node per level.

Q3: An algorithm does n/2 work in the first iteration, n/4 in the second, n/8 in the third, and so on. What is the total work? A3: n/2 + n/4 + n/8 + ... = n(1/2 + 1/4 + 1/8 + ...) = n * 1 = O(n). The geometric series sums to less than n.

Q4: What is the space complexity of recursive merge sort on an array of n elements? A4: O(n) for the temporary merge array, plus O(log n) for the recursion stack. Total: O(n).

Q5: An algorithm makes two recursive calls, each on input of size n-1, with O(1) work per call. What is its time complexity? A5: T(n) = 2T(n-1) + O(1). This is O(2^n) — exponential! Each call spawns two more, creating a binary tree of calls with n levels. This is the same pattern as naive Fibonacci.


26.13 Summary

Complexity analysis is the lens through which we evaluate algorithms. It tells us not how fast a program runs today, but how it will behave tomorrow when the data is ten times larger.

Big-O notation describes the upper bound on growth rate, ignoring constants and lower-order terms. We defined it formally (f(n) <= c * g(n) for n >= n0) and informally (how fast does the running time grow?). We proved Big-O relationships for specific functions.

The common complexity classes — O(1), O(log n), O(n), O(n log n), O(n^2), O(n^3), O(2^n) — form a hierarchy of growth rates. The difference between adjacent classes is enormous for large inputs. Moving from O(n^2) to O(n log n) is often the difference between a program that is usable and one that is not.

Analyzing loops is systematic: count the iterations and multiply. Sequential loops add; nested loops multiply. Halving loops give logarithmic complexity. We analyzed 11 code snippets covering every common loop pattern. Analyzing recursion uses recurrence relations solved by the recursion tree method or the Master Theorem (T(n) = aT(n/b) + O(n^c)).

Space complexity measures memory usage, and there is often a time-space tradeoff: you can run faster by using more memory (memoization, hash tables) or use less memory by running slower.

Best/worst/average cases distinguish between different input scenarios. Worst case provides guarantees; average case reflects typical behavior. We examined these cases for insertion sort, binary search, quicksort, and hash tables.

Amortized analysis computes the average cost of operations over time, capturing the insight that expensive operations (like array doubling) can be infrequent enough to have low average cost. We traced a dynamic array's growth step by step and proved amortized O(1) insertion cost.

In practice, profiling is essential. We showed three timing methods in Pascal (Now/MilliSecondsBetween, GetTickCount64, and QueryPerformanceCounter) and demonstrated empirical growth analysis with doubling ratios. The optimization case study on duplicate detection showed the progression from O(n^2) to O(n log n) to O(n).

And always remember the optimization hierarchy: improve the algorithm first, optimize the code last.

This chapter completes Part IV. You now have the theoretical and practical toolkit to analyze any algorithm, predict its scalability, and make informed choices about data structures and algorithms. Wirth's vision is complete: Algorithms + Data Structures = Programs — and complexity analysis tells you whether those programs are fast enough.

📊 Key Concepts at a Glance | Concept | Definition | |---------|-----------| | Big-O notation | Upper bound on growth rate; f(n) is O(g(n)) if f(n) <= c*g(n) for large n | | O(1) | Constant time — independent of input size | | O(log n) | Logarithmic — halving at each step | | O(n) | Linear — proportional to input size | | O(n log n) | Linearithmic — efficient sorting | | O(n^2) | Quadratic — nested loops over input | | O(n^3) | Cubic — triply nested loops | | O(2^n) | Exponential — combinatorial explosion | | Master Theorem | Formula for solving T(n) = aT(n/b) + O(n^c) | | Space complexity | Memory usage as a function of input size | | Best/worst/average case | Performance variation across different inputs | | Amortized analysis | Average cost per operation over a sequence | | Profiling | Measuring actual execution time to find bottlenecks |


This concludes Part IV: Algorithms and Problem Solving. In Part V, we turn to a new challenge — building graphical user interfaces with Lazarus.