Chapter 23 — Exercises
Load the large dataset first:
psql -d mercado -f sql/generate_data.sql(so indexes actually matter). UseEXPLAIN ANALYZEto see scans. (answer in Appendix) = worked solution in Answers. ⭐ = stretch.
Group A — The anchor (do this!)
23.1 On the large dataset, run EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 4267;. Note the scan type and time. (answer in Appendix)
23.2 Confirm there's an index on orders(customer_id) (Mercado pre-builds it). Drop it (DROP INDEX ...), re-run 23.1, and observe the Seq Scan and slower time. Recreate it. Describe the before/after.
23.3 ⭐ Find a column that is not indexed (e.g., orders.status or a join column you can drop), demonstrate a Seq Scan, add an index, and measure the improvement. This is your own 45s→12ms moment.
Group B — What an index is
23.4 In your own words, why is an indexed lookup O(log n) while a full scan is O(n)? Use the B-tree height idea. (answer in Appendix)
23.5 Roughly how many levels deep is a B-tree on a 100-million-row table, and why does that make lookups fast?
23.6 Name four query situations where an index can help (WHERE, JOIN, ORDER BY, ...). (answer in Appendix)
Group C — Index types
23.7 Which index type for: JSONB containment (@>); full-text search; a range-overlap query; equality on an integer? (answer in Appendix)
23.8 Create a GIN index on products.attributes and show (with EXPLAIN) that attributes @> '{"color":"black"}' uses it.
23.9 ⭐ Create a multi-column index on orders(customer_id, order_date). Which of these can use it: filter on customer_id? on customer_id + order_date? on order_date alone? Explain (leftmost prefix).
23.10 ⭐ Create a partial index on orders(customer_id) WHERE status = 'pending' and explain when it helps and why it's smaller.
Group D — Expression & covering
23.11 WHERE lower(email) = 'alice@example.com' doesn't use a plain index on email. Create an expression index that fixes it. (answer in Appendix)
23.12 ⭐ What is an index-only scan, and how does INCLUDE enable it? Give an example query that could be served entirely from an index.
Group E — Costs & when not to
23.13 List three costs of adding an index. (answer in Appendix)
23.14 For each, say whether indexing is likely worthwhile: a boolean is_active column; an email you look up constantly; a notes text column never filtered; a foreign key used in joins.
23.15 ⭐ Why might the optimizer choose a Seq Scan even when an index exists (e.g., the query returns most of the table)? Is that a bug?
Group F — Safety
23.16 Why use CREATE INDEX CONCURRENTLY on a production table? What's the trade-off? (answer in Appendix)
Group G — Progressive project
23.17 Profile your app's top queries with EXPLAIN ANALYZE; find one missing index, add it, and measure the gain.
23.18 Index your join-used foreign keys.
23.19 ⭐ Identify one column you should NOT index and justify it.
Self-check. If you can spot a
Seq Scanthat should be an index scan, add the right index, and measure the improvement — and you know when an index would just be write-tax — you have the core performance skill. Next: reading the whole plan.