Chapter 31 — Exercises
Hands-on with
COPY/\copyagainstmercado. (answer in Appendix) = worked solution in Answers. ⭐ = stretch.
Group A — COPY basics
31.1 Export all products to a CSV with COPY ... TO (or \copy). Open the file and check the header row. (answer in Appendix)
31.2 Bulk-load that CSV into a fresh products_copy table with \copy.
31.3 Explain the difference between COPY ... FROM '/path' and \copy ... FROM 'path'. When does each apply? (answer in Appendix)
Group B — Speed
31.4 Load ~10,000 rows two ways — an INSERT loop and a single COPY — and compare the time. (answer in Appendix)
31.5 Rank these slowest→fastest and explain: single-row inserts, multi-row inserts, batched transactions, COPY.
31.6 ⭐ From Python (psycopg), load rows via the driver's copy API instead of an execute() loop. Why is it faster?
Group C — Staging & transform
31.7 Create a loose staging_products table (text columns), COPY raw CSV into it, then INSERT ... SELECT into the real products (cast types, resolve category name→id, skip bad rows, upsert). (answer in Appendix)
31.8 Why load into a loose staging table first instead of straight into the constrained table?
31.9 ⭐ Why might you make the staging table UNLOGGED, and what's the trade-off?
Group D — Pipelines
31.10 Explain ETL vs ELT and when you'd choose each. (answer in Appendix)
31.11 What makes a load idempotent, and why does that matter for retries? (Tie to ON CONFLICT, Ch. 13.)
31.12 ⭐ Sketch an incremental load that only processes new rows since the last run (high-water mark).
Group E — Big-load performance
31.13 Why might you drop indexes before a huge initial load and rebuild after? When is this NOT safe? (answer in Appendix)
31.14 Why run ANALYZE after a big load? (Tie to Chapter 24.)
Group F — Progressive project
31.15 Bulk-load a dataset into your project with COPY/\copy; compare to an INSERT loop.
31.16 Use a staging table to load + transform + upsert; make it idempotent and re-run it.
31.17 ⭐ Describe how you'd turn this into an incremental, scheduled pipeline.
Self-check. If you reach for
COPY(not loops) for bulk, stage-then-transform raw data, and build idempotent loads, you can move data at scale. Next: securing it.