Chapter 26 Exercises: Complexity Analysis

DataField.Dev

Chapter 26 Exercises: Complexity Analysis

Part A: Conceptual Understanding

A.1. Explain in your own words what Big-O notation means. Why do we ignore constants and lower-order terms?

Guidance

Big-O describes how an algorithm's running time grows as input size increases. We ignore constants because they depend on hardware/implementation, not the algorithm. We ignore lower-order terms because for large n, the highest-order term dominates. 3n² + 100n + 5000 behaves like n² when n is large enough — the n² term dwarfs everything else.

A.2. Rank the following functions from slowest growth to fastest growth: n², 2ⁿ, n log n, 1, log n, n, n³, n!

Guidance

Slowest to fastest: 1, log n, n, n log n, n², n³, 2ⁿ, n!. The key insight is that polynomial growth (n^k) is always eventually slower than exponential growth (2^n), which is always eventually slower than factorial growth (n!).

A.3. What is the Big-O complexity of each expression? - (a) 5n + 3 - (b) 2n² + 100n + 50 - (c) n³ + n² + n + 1 - (d) 100 - (e) n * log₂(n) + 3n - (f) 2ⁿ + n³

Guidance

(a) O(n). (b) O(n²). (c) O(n³). (d) O(1). (e) O(n log n). (f) O(2ⁿ). In each case, the highest-order term dominates.

A.4. Explain the difference between time complexity and space complexity. Give an example of an algorithm with O(n) time and O(1) space, and another with O(n) time and O(n) space.

Guidance

Time complexity: how many operations as a function of n. Space complexity: how much extra memory. O(n) time / O(1) space: linear search (no extra storage). O(n) time / O(n) space: copying an array (allocates a new array of size n).

A.5. Explain best-case, worst-case, and average-case analysis using insertion sort as an example. Which case is most important for practical decisions?

Guidance

Best: O(n) on sorted input. Worst: O(n²) on reverse-sorted input. Average: O(n²) on random input. Worst case is most important for guarantees ("this will never be slower than X"). Average case is most important for predicting typical behavior. Best case is least useful — it only shows the optimistic extreme.

A.6. What is amortized analysis? Explain using the dynamic array doubling example.

Guidance

Amortized analysis computes the average cost per operation over a sequence. A dynamic array doubles capacity when full: this costs O(n) (copying all elements), but it happens only after n insertions. The total cost of n insertions is about 3n operations, so the amortized cost per insertion is O(1), even though individual insertions can be O(n).

A.7. A student claims: "My algorithm is O(n²) in the worst case and O(n) in the best case, so on average it is O(n * n/2) = O(n^1.5)." Is this correct? Why or why not?

Guidance

Incorrect. The average case is NOT the arithmetic mean of best and worst cases. Average-case analysis requires computing the expected running time over all possible inputs weighted by their probability. For insertion sort, the average case is O(n²) (with a smaller constant than worst case), not O(n^1.5).

Part B: Applied Analysis

B.1. Determine the Big-O complexity of each code fragment:

(a)

for I := 1 to N do
  for J := 1 to N do
    for K := 1 to N do
      Sum := Sum + 1;

(b)

I := N;
while I > 1 do begin
  Process(I);
  I := I div 2;
end;

(c)

for I := 1 to N do
  for J := 1 to 100 do
    Process(I, J);

(d)

for I := 1 to N do begin
  J := 1;
  while J < N do begin
    Process(I, J);
    J := J * 2;
  end;
end;

Guidance

(a) O(n³) — three nested loops each running n times. (b) O(log n) — I is halved each iteration. (c) O(n) — outer loop is n, inner loop is constant (100). (d) O(n log n) — outer loop is n, inner loop is log n (J doubles each time).

B.2. An algorithm takes 1 second for n = 1000. Estimate its running time for n = 10,000 if the algorithm is: (a) O(n), (b) O(n log n), (c) O(n²), (d) O(2ⁿ).

Guidance

Factor of increase: n goes from 1000 to 10,000 (10x). (a) O(n): 10 seconds. (b) O(n log n): approximately 13 seconds (10 × 13.3/10 ≈ 13.3). (c) O(n²): 100 seconds. (d) O(2ⁿ): approximately 2^9000 seconds — effectively infinite. The exponential case is why O(2ⁿ) algorithms are impractical.

B.3. Rosa's PennyWise has 5,000 expenses. She performs 100 searches per session. Compare the total time for: (a) Linear search each time, (b) Sort once with quicksort then binary search each time.

Guidance

(a) 100 × O(5000) = O(500,000) operations. (b) O(5000 × 12) for sort + 100 × O(12) for searches = O(61,200) operations. Sorting first is about 8x faster for 100 searches. The breakeven point is approximately 12 searches (one sort costs n log n ≈ 60,000 operations; each binary search saves about n - log n ≈ 4988 operations per search).

B.4. Give the time complexity of each operation on these data structures:

Operation	Unsorted Array	Sorted Array	Linked List	BST (balanced)	Hash Table
Search	?	?	?	?	?
Insert	?	?	?	?	?
Delete	?	?	?	?	?

Guidance

Search: O(n), O(log n), O(n), O(log n), O(1) avg. Insert: O(1), O(n), O(1), O(log n), O(1) avg. Delete: O(n), O(n), O(1) if at pointer, O(log n), O(1) avg. Note: sorted array insert/delete is O(n) because elements must be shifted.

Part C: Code Exercises

C.1. Write a program that empirically verifies that selection sort is O(n²) by timing it on arrays of size 1000, 2000, 4000, 8000, and 16000. Show that doubling n approximately quadruples the time.

Hint

Use GetTickCount64 for timing. Run the sort on random arrays. Compute the ratio of times for consecutive sizes. Expected ratio: approximately 4.0 for O(n²).

C.2. Add comparison counters to binary search and linear search. For a sorted array of 1,000,000 elements, how many comparisons does each make for a search that fails (element not present)?

Hint

Linear: 1,000,000 comparisons (checks every element). Binary: ceil(log₂(1,000,000)) = 20 comparisons. This is a 50,000x difference.

C.3. Write a function that determines whether a given array contains any duplicate values. Implement three versions: (a) O(n²) using nested loops, (b) O(n log n) by sorting first, (c) O(n) using a hash set (use a Boolean array if values are bounded).

Hint

(a) Compare every pair: for I, for J>I, if Arr[I]=Arr[J]. (b) Sort, then scan for adjacent duplicates. (c) Use a Boolean array Seen[0..MaxVal]; mark seen values and check.

C.4. Profile the PennyWise expense operations from the project checkpoint. Measure linear search, binary search, quicksort, and insertion sort on arrays of 100, 1000, 5000, and 10000 expenses. Present results in a table.

Hint

Generate synthetic expense data. Run each operation in a loop (e.g., 1000 iterations) for more accurate timing. Compute the ratio between consecutive sizes to verify theoretical complexity.

Part D: Challenge Exercises

D.1. Two-Sum Problem. Given an array and a target value, find two elements that sum to the target. Implement three solutions: O(n²) brute force, O(n log n) with sorting + two pointers, and O(n) with a hash set. Time all three.

Hint

Brute force: check all pairs. Sort + two pointers: sort, then use one pointer at start and one at end, moving them inward. Hash set: for each element, check if (target - element) is in the set.

D.2. Complexity of Tree Operations. Empirically measure BST operations (insert, search) for balanced and degenerate trees. Insert 10,000 values in random order (balanced) vs. sorted order (degenerate). Plot the results.

Hint

For random insertion, the tree is approximately balanced (height ≈ 14). For sorted insertion, the tree degenerates (height = 9999). Search on the balanced tree should take ~14 comparisons; on the degenerate tree, ~5000. The difference is dramatic.

D.3. Space vs. Time. Implement memoized Fibonacci with different cache sizes: O(n), O(sqrt(n)), and O(1). How does reducing cache size affect time complexity?

Hint

O(n) cache: standard memoization, O(n) time. O(sqrt(n)) cache: checkpoint every sqrt(n) values, recompute between checkpoints, O(n * sqrt(n)) time. O(1) "cache" (two variables): O(n) time — the iterative solution. The O(1) space solution is actually the fastest because it avoids function call and cache lookup overhead.

D.4. Empirical Growth Analysis Tool. Write a general-purpose timing framework that takes a procedure and an array of input sizes, runs the procedure on each size, and automatically determines whether the growth is O(n), O(n log n), O(n²), etc., by computing growth ratios.

Hint

Growth ratio = time(2n) / time(n). For O(n): ratio ≈ 2. For O(n log n): ratio ≈ 2.2. For O(n²): ratio ≈ 4. For O(n³): ratio ≈ 8. Classify based on the average ratio across size doublings.

Part M: Interleaving Review (Spaced Practice)

M.1. (From Ch 22) What is the space complexity of recursive factorial? How does it compare to iterative factorial?

Guidance

Recursive factorial: O(n) space (n stack frames). Iterative factorial: O(1) space (one loop variable and one accumulator). Both have O(n) time. This is a case where iteration is strictly better — same time, less space.

M.2. (From Ch 23) Merge sort is O(n log n) time and O(n) space. Quicksort is O(n log n) average time and O(log n) space. When would you prefer merge sort despite its higher space usage?

Guidance

When you need guaranteed O(n log n) worst case (quicksort can degrade to O(n²)). When you need a stable sort. When sorting linked lists (merge on linked lists is O(1) extra space). When predictability is more important than raw speed.

M.3. (From Ch 24) Dijkstra's algorithm is O(V²) with a simple implementation. How does using a priority queue (min-heap) improve this to O((V+E) log V)?

Guidance

The simple implementation scans all V vertices to find the minimum distance: O(V) per extraction, V extractions = O(V²). A min-heap extracts the minimum in O(log V) and updates distances in O(log V). With V extractions and E relaxations, total is O((V+E) log V). For sparse graphs (E much less than V²), this is a significant improvement.

M.4. (From Ch 25) The 0/1 knapsack DP is O(n × W). Is this polynomial time? What is "pseudo-polynomial"?

Guidance

O(n × W) looks polynomial, but W is a value, not a size. The input representing W requires only log₂(W) bits. So the complexity is O(n × 2^(bits of W)) — exponential in the input size. This is "pseudo-polynomial": polynomial in the input values but exponential in the input size. In practice, for reasonable W values, the algorithm is fast enough.