Case Study 1: From O(n²) to O(n log n) — Optimizing a Real Program
The Problem
A student named Tomás writes a program to find all duplicate expenses in PennyWise. His first implementation uses a brute-force approach:
procedure FindDuplicatesBrute(const Expenses: TExpenseArray;
Count: Integer);
var
I, J: Integer;
Found: Integer;
begin
Found := 0;
for I := 0 to Count - 2 do
for J := I + 1 to Count - 1 do
if (Expenses[I].Amount = Expenses[J].Amount) and
(Expenses[I].Date = Expenses[J].Date) then begin
Inc(Found);
WriteLn(' Duplicate: $', Expenses[I].Amount:0:2,
' on ', FormatDateTime('yyyy-mm-dd', Expenses[I].Date));
end;
WriteLn(' Total duplicates: ', Found);
end;
Performance on Tomás's Data
| Expenses | Time (brute force) |
|---|---|
| 100 | 1 ms |
| 1,000 | 85 ms |
| 5,000 | 2,100 ms |
| 10,000 | 8,400 ms |
| 50,000 | 210,000 ms (3.5 minutes!) |
The doubling ratio is approximately 4x when the input doubles — confirming O(n²).
The Optimization
Tomás realizes he can sort the expenses first, then scan for adjacent duplicates:
procedure FindDuplicatesSorted(var Expenses: TExpenseArray;
Count: Integer);
var
I: Integer;
Found: Integer;
begin
{ Sort by date, then amount — O(n log n) }
QuickSortExpenses(Expenses, 0, Count - 1, @CompareByDateThenAmount);
{ Scan for adjacent duplicates — O(n) }
Found := 0;
for I := 0 to Count - 2 do
if (Expenses[I].Amount = Expenses[I+1].Amount) and
(Expenses[I].Date = Expenses[I+1].Date) then begin
Inc(Found);
WriteLn(' Duplicate: $', Expenses[I].Amount:0:2,
' on ', FormatDateTime('yyyy-mm-dd', Expenses[I].Date));
end;
WriteLn(' Total duplicates: ', Found);
end;
Performance After Optimization
| Expenses | Brute Force | Sort + Scan | Speedup |
|---|---|---|---|
| 100 | 1 ms | 1 ms | 1x |
| 1,000 | 85 ms | 2 ms | 42x |
| 5,000 | 2,100 ms | 8 ms | 262x |
| 10,000 | 8,400 ms | 18 ms | 466x |
| 50,000 | 210,000 ms | 95 ms | 2,210x |
The optimized version is over 2,000 times faster for 50,000 expenses. What took 3.5 minutes now takes under a tenth of a second.
Analysis
Brute Force: O(n²) — checking all pairs. Sort + Scan: O(n log n) for the sort + O(n) for the scan = O(n log n) total.
The doubling ratio for the optimized version is approximately 2.2x — consistent with O(n log n).
Verification
To verify the optimization produces the same results, Tomás runs both versions on the same data and compares outputs:
{ Assert: both approaches find the same number of duplicates }
BruteCount := FindDuplicatesBrute(Expenses, Count);
SortedCount := FindDuplicatesSorted(Expenses, Count);
Assert(BruteCount = SortedCount,
'Optimization error: different duplicate counts!');
The Lesson
Tomás did not change his hardware. He did not rewrite his program in assembly. He did not use compiler optimization flags. He changed the algorithm — from O(n²) brute-force comparison to O(n log n) sort-and-scan — and gained a 2,000x speedup.
This is the most important lesson in all of computer science optimization: improve the algorithm first.
Discussion Questions
-
For n = 100, the brute force and optimized versions both take about 1 ms. Why would you still prefer the optimized version?
-
Could Tomás achieve even better performance using a hash table? What would the expected complexity be?
-
The sort changes the order of Tomás's expense array. What if he needs to preserve the original order? How would he modify the approach?
-
Tomás's optimization assumes that duplicates have exactly matching amounts and dates. What if "duplicate" means amounts within $0.01 and dates within 1 day? How does this affect the algorithm choice?