Chapter 28 Quiz: Performance Optimization

Test your understanding of performance optimization concepts, tools, and strategies. Each question has one best answer unless otherwise indicated.

Question 1

What is the first step you should take when you suspect an application has a performance problem?

A) Rewrite the slowest-looking function in a faster language
B) Add caching to all database queries
C) Measure and profile to identify the actual bottleneck
D) Increase server resources (more CPU, more RAM)

Answer

**C) Measure and profile to identify the actual bottleneck.** The measure-first principle is the foundation of performance optimization. Intuition about where bottlenecks lie is unreliable. Without profiling, you risk spending effort optimizing code that is not actually the bottleneck, while the real problem remains unaddressed. Tools like cProfile, line_profiler, and py-spy reveal where time is actually spent.

Question 2

According to Amdahl's Law, if database queries consume 70% of your application's total execution time and you optimize those queries to run twice as fast, what is the overall speedup?

A) 2x
B) 1.7x
C) 1.54x
D) 1.43x

Answer

**C) 1.54x.** Amdahl's Law: Speedup = 1 / ((1 - p) + p/s) where p = 0.7 (fraction that can be improved) and s = 2 (speedup factor). Speedup = 1 / (0.3 + 0.35) = 1 / 0.65 ≈ 1.54. Even though 70% of the time was optimized, the 30% that was not optimized limits the overall improvement.

Question 3

Which Python profiling tool can attach to a running production process without modifying or restarting it?

A) cProfile
B) line_profiler
C) memory_profiler
D) py-spy

Answer

**D) py-spy.** py-spy is a sampling profiler that can attach to a running Python process using its process ID (`py-spy top --pid 12345`). It requires no code changes, no restarts, and has minimal overhead, making it ideal for production profiling. cProfile, line_profiler, and memory_profiler all require code modifications or must be started with the program.

Question 4

What is the time complexity of checking whether an item exists in a Python set?

A) O(n)
B) O(log n)
C) O(1) average case
D) O(n log n)

Answer

**C) O(1) average case.** Python sets are implemented as hash tables, providing average-case O(1) time for membership testing, insertion, and deletion. This is why converting a list to a set before performing many membership checks is a common and effective optimization. The worst case is O(n) due to hash collisions, but this is rare with a good hash function.

Question 5

What is the N+1 query problem?

A) A database query that returns N+1 rows when you expected N
B) Issuing 1 query to fetch a collection, then N additional queries for each item's related data
C) A query that joins N+1 tables together
D) A bug where a query runs N+1 times due to a retry loop

Answer

**B) Issuing 1 query to fetch a collection, then N additional queries for each item's related data.** The N+1 problem occurs when code fetches a list of N records with one query, then issues a separate query for each record to fetch related data (e.g., the author for each book). This results in N+1 total queries instead of 1-2 queries using JOINs or eager loading. It is the single most common database performance issue in web applications.

Question 6

Which caching approach is most appropriate for data that changes frequently but can tolerate up to 30 seconds of staleness?

A) functools.lru_cache with no expiration
B) Redis cache with a 30-second TTL
C) HTTP caching with Cache-Control: immutable
D) Write-through cache that updates on every database write

Answer

**B) Redis cache with a 30-second TTL.** A TTL of 30 seconds ensures that data is never more than 30 seconds stale, matching the staleness tolerance. `lru_cache` has no TTL mechanism and would serve stale data indefinitely. `Cache-Control: immutable` tells browsers to never revalidate, which is wrong for changing data. A write-through cache is more complex than needed when 30-second staleness is acceptable.

Question 7

Why do Python threads NOT provide speedup for CPU-bound tasks?

A) Python threads are not real OS threads
B) The Global Interpreter Lock (GIL) prevents parallel execution of Python bytecode
C) Python threads have too much memory overhead
D) The threading module is deprecated in favor of asyncio

Answer

**B) The Global Interpreter Lock (GIL) prevents parallel execution of Python bytecode.** The GIL is a mutex that allows only one thread to execute Python bytecode at a time, even on multi-core machines. For CPU-bound work, threads cannot run in parallel and may even be slower than sequential code due to thread-switching overhead. For I/O-bound work, threads provide speedup because the GIL is released while waiting for I/O. For CPU-bound parallelism, use `multiprocessing`.

Question 8

Which concurrency model is most appropriate for making 1,000 HTTP API requests as quickly as possible?

A) multiprocessing.Pool
B) Sequential requests.get() calls
C) asyncio with aiohttp
D) threading with one thread per request

Answer

**C) `asyncio` with `aiohttp`.** HTTP requests are I/O-bound; the program is mostly waiting for network responses. asyncio efficiently handles thousands of concurrent I/O operations with minimal overhead since it uses a single thread with cooperative multitasking. Threading would work but creates 1,000 OS threads with significant memory overhead. Multiprocessing is for CPU-bound work and has high per-process overhead. Sequential execution would be slowest.

Question 9

What does EXPLAIN ANALYZE show you in PostgreSQL?

A) The table schema and column types
B) The query execution plan with actual timing information
C) A list of all indexes on the queried tables
D) The query cache hit rate

Answer

**B) The query execution plan with actual timing information.** `EXPLAIN ANALYZE` actually runs the query and shows the execution plan the database chose, including actual row counts, actual execution times, and which indexes (if any) were used. This is essential for understanding why a query is slow. `EXPLAIN` alone shows the planned execution without running the query; `ANALYZE` adds real measurements.

Question 10

What is the primary benefit of using Python generators for processing large datasets?

A) Generators run faster than list comprehensions
B) Generators use constant memory regardless of data size
C) Generators automatically parallelize computation
D) Generators provide type safety

Answer

**B) Generators use constant memory regardless of data size.** Generators yield one item at a time using lazy evaluation, so they hold only the current item in memory rather than the entire dataset. This means a generator processing 10 billion records uses the same memory as one processing 10 records. Generators are not inherently faster per-item (they may be slightly slower due to the yield overhead), but they prevent out-of-memory errors for large datasets.

Question 11

Which of the following is the best reason to add a database index?

A) The table has more than 100 rows
B) A column is frequently used in WHERE clauses and the query is slow
C) You want to enforce unique values in a column
D) You want to speed up INSERT operations

Answer

**B) A column is frequently used in WHERE clauses and the query is slow.** Indexes speed up reads (SELECT with WHERE, JOIN, ORDER BY) at the cost of slightly slower writes (INSERT, UPDATE, DELETE). The best reason to add an index is when profiling reveals that queries filtering on a column are slow due to sequential scans. While unique indexes (option C) do enforce uniqueness, the *performance* reason is query speed. Indexes actually slow down INSERTs, and a table having more than 100 rows alone does not justify an index.

Question 12

What is a "cache stampede"?

A) When the cache grows too large and uses all available memory
B) When many requests simultaneously try to regenerate the same expired cache entry
C) When cache entries are evicted faster than they are created
D) When stale cached data is served to users after the source data changes

Answer

**B) When many requests simultaneously try to regenerate the same expired cache entry.** A cache stampede (also called "thundering herd") occurs when a popular cache entry expires and many concurrent requests all find the cache empty, all triggering the expensive computation or database query simultaneously. This can overwhelm the database or backend service. Prevention strategies include locking (only one request regenerates), staggered TTLs, and preemptive refresh before expiration.

Question 13

What does __slots__ do in a Python class?

A) Limits which methods can be called on instances
B) Restricts instance attribute storage to a fixed set, eliminating the per-instance __dict__
C) Makes the class thread-safe by adding locks to attribute access
D) Converts the class to use C-level data structures for speed

Answer

**B) Restricts instance attribute storage to a fixed set, eliminating the per-instance `__dict__`.** By default, each Python object has a `__dict__` dictionary to store its attributes, which adds approximately 100+ bytes of overhead per instance. `__slots__` replaces this with a fixed-size structure, significantly reducing memory for classes with many instances. The trade-off is that you cannot dynamically add new attributes to instances. For a million objects, this can save hundreds of megabytes.

Question 14

In the context of load testing, what does "P95 response time" mean?

A) 95% of the server's processing power is utilized
B) 95% of requests complete within this time
C) The response time after 95 seconds of sustained load
D) The average response time with 5% error margin

Answer

**B) 95% of requests complete within this time.** P95 (95th percentile) means that 95% of requests are faster than this value, and 5% are slower. It is more meaningful than the average because averages can hide a long tail of slow requests. For example, a P95 of 500ms means that while most users get fast responses, 1 in 20 users experiences latency of 500ms or more.

Question 15

Which approach correctly solves the N+1 problem in SQLAlchemy?

A) Adding lazy="select" to the relationship definition
B) Using session.query(Order).options(joinedload(Order.customer)).all()
C) Calling session.flush() before accessing related objects
D) Using session.query(Order).limit(100).all() with pagination

Answer

**B) Using `session.query(Order).options(joinedload(Order.customer)).all()`.** `joinedload` tells SQLAlchemy to fetch related objects using a JOIN in the same query, eliminating the need for separate queries per record. `lazy="select"` (option A) is actually the default lazy-loading behavior that causes N+1 problems. Flushing (C) and pagination (D) do not address N+1.

Question 16

When is multiprocessing the correct choice over threading in Python?

A) When you need to share mutable state between concurrent tasks
B) When the work is CPU-bound and you need true parallelism
C) When you need to run more than 100 concurrent tasks
D) When the work is I/O-bound and you want to avoid the GIL

Answer

**B) When the work is CPU-bound and you need true parallelism.** `multiprocessing` creates separate Python processes, each with its own GIL, enabling true parallel execution on multiple CPU cores. This is the correct choice for CPU-bound work like number crunching, image processing, or data transformation. Threading does not provide speedup for CPU-bound work due to the GIL. Multiprocessing has higher overhead than threading, making it less ideal for I/O-bound work.

Question 17

What is the recommended approach when functools.lru_cache is used on a method that takes a list argument?

A) Convert the list to a tuple before passing it to the cached function
B) Increase the maxsize parameter to accommodate lists
C) Use @lru_cache(typed=True) to handle different types
D) Lists work fine with lru_cache; no changes are needed

Answer

**A) Convert the list to a tuple before passing it to the cached function.** `lru_cache` requires all arguments to be hashable so they can be used as dictionary keys. Lists are not hashable (they are mutable), so passing a list directly will raise a `TypeError`. Converting to a tuple (which is immutable and hashable) solves this. Alternatively, you can create a wrapper function that converts list arguments to tuples.

Question 18

You have a function that makes 50 sequential HTTP API calls, each taking about 200ms. The total execution time is approximately 10 seconds. Using asyncio.gather() with an async HTTP client, what total time would you expect?

A) About 200ms (same as one call)
B) About 400-600ms (slightly more than one call due to overhead)
C) About 5 seconds (half the time)
D) About 10 seconds (no improvement)

Answer

**B) About 400-600ms (slightly more than one call due to overhead).** With `asyncio.gather()`, all 50 requests are initiated concurrently. Since the bottleneck is network I/O (waiting for responses), all requests can wait simultaneously. The total time is approximately the time for the slowest single request plus some overhead for DNS resolution, connection establishment, and event loop scheduling. Assuming the server can handle 50 concurrent requests, the total time would be roughly 200-600ms rather than 50 * 200ms = 10 seconds.

Question 19

Which statement about database connection pooling is correct?

A) Connection pooling reduces the number of database queries executed
B) Connection pooling eliminates the overhead of establishing new database connections for each request
C) Connection pooling automatically caches query results
D) Connection pooling distributes queries across multiple database servers

Answer

**B) Connection pooling eliminates the overhead of establishing new database connections for each request.** Establishing a database connection involves TCP handshake, authentication, and session setup, which can take 10-100ms. Connection pooling maintains a set of pre-established connections that are reused across requests. This eliminates per-request connection overhead. Pooling does not reduce the number of queries, cache results, or distribute queries.

Question 20

Given this code, what is the most impactful optimization?

def find_expensive_orders(orders: list[dict], threshold: float) -> list[str]:
    result = ""
    for order in orders:  # 100,000 orders
        if order["total"] > threshold:
            result = result + order["id"] + ","
    return result.split(",")

A) Use a faster comparison operator
B) Use a list and ",".join() instead of string concatenation
C) Use filter() instead of a for loop
D) Pre-sort the orders by total

Answer

**B) Use a list and `",".join()` instead of string concatenation.** String concatenation with `+` in a loop is O(n^2) because strings are immutable; each concatenation creates a new string, copying all previous content. For 100,000 orders with many matches, this becomes extremely slow. Using a list to collect matches and `",".join()` at the end is O(n). This is the classic Python string-building anti-pattern.

Question 21

What does the maxsize parameter in @lru_cache(maxsize=128) control?

A) The maximum number of times the function can be called
B) The maximum number of cached results stored simultaneously
C) The maximum size of each cached result in bytes
D) The maximum time in seconds that a result is cached

Answer

**B) The maximum number of cached results stored simultaneously.** `maxsize` controls how many distinct argument/result pairs are cached. When the cache is full and a new result needs to be stored, the least-recently-used entry is evicted. Setting `maxsize=None` creates an unbounded cache that never evicts (useful when the set of possible arguments is finite and known). There is no time-based expiration in `lru_cache`.

Question 22

A flame graph shows a wide bar for function serialize_response(). What does this indicate?

A) The function is called many times
B) The function (and its children) consumes a large fraction of total execution time
C) The function uses a lot of memory
D) The function has many parameters

Answer

**B) The function (and its children) consumes a large fraction of total execution time.** In a flame graph, the width of a bar is proportional to the time spent in that function and all functions it calls. A wide bar means that call path dominates execution. The vertical axis shows the call hierarchy (callers above, callees below). Flame graphs make it visually intuitive to spot which call paths consume the most time.

Question 23

Which optimization should you try FIRST when a database query is slow?

A) Cache the result in Redis
B) Run EXPLAIN ANALYZE to understand the query plan
C) Rewrite the query using a different SQL dialect
D) Move the query logic into application code

Answer

**B) Run `EXPLAIN ANALYZE` to understand the query plan.** Before applying any optimization, you need to understand *why* the query is slow. `EXPLAIN ANALYZE` reveals whether the database is doing sequential scans (suggesting a missing index), inefficient joins, or unnecessary sorts. This information guides whether you need an index, a query rewrite, or caching. Applying caching without understanding the root cause is often premature.

Question 24

What is the key difference between asyncio.gather() and asyncio.create_task()?

A) gather() runs tasks sequentially; create_task() runs them concurrently
B) gather() waits for all tasks to complete; create_task() schedules a task for execution without waiting
C) gather() works with threads; create_task() works with coroutines
D) There is no difference; they are aliases

Answer

**B) `gather()` waits for all tasks to complete and returns their results; `create_task()` schedules a task for execution without waiting.** `asyncio.create_task(coro)` schedules a coroutine to run on the event loop and returns a `Task` object immediately. You can `await` the task later or let it run in the background. `asyncio.gather(*coros)` is a convenience that schedules multiple coroutines concurrently and awaits all of them, returning results in the same order as the input. Both run tasks concurrently, but `gather()` provides a convenient "wait for all" semantic.

Question 25

A performance budget specifies that your API's P99 response time must be under 2 seconds. After load testing, you find P50 = 120ms, P95 = 450ms, P99 = 1.8s. What should you do?

A) The P99 is under 2 seconds, so no action is needed — continue monitoring
B) Optimize immediately because P99 is close to the budget
C) Focus on reducing the median response time
D) Increase the budget to 3 seconds since 1.8s is close to the limit

Answer

**A) The P99 is under 2 seconds, so no action is needed — continue monitoring.** The P99 of 1.8 seconds is within the 2-second budget. While it is close to the limit, the correct response is to continue monitoring rather than optimizing code that meets its targets. However, the gap between P95 (450ms) and P99 (1.8s) suggests a significant tail latency issue that is worth understanding — investigate what causes those slowest 4% of requests in case it indicates an emerging problem. Proactive monitoring is appropriate; premature optimization is not.

Scoring Guide

23-25 correct: Excellent mastery of performance optimization concepts. You are ready for production-level performance work.
18-22 correct: Strong understanding with some gaps. Review the sections corresponding to missed questions.
13-17 correct: Solid foundation but needs more practice. Work through the Tier 1-2 exercises.
Below 13: Review the chapter thoroughly before attempting the exercises. Focus on the core concepts: measure first, profiling tools, Big-O basics, and caching fundamentals.