Chapter 31 — Quiz
12 questions. Answers at the bottom.
Multiple choice
Q1. The fast way to load millions of rows is:
- A) INSERT one row at a time
- B) COPY
- C) UPDATE
- D) SELECT
Q2. COPY ... FROM '/data/file.csv' reads the file:
- A) On the client machine
- B) On the database server
- C) From the internet
- D) From memory
Q3. To load a file on your (client) machine via psql, use:
- A) COPY
- B) \copy
- C) IMPORT
- D) LOAD
Q4. Slowest to fastest for loading bulk data: - A) COPY → batched → multi-row → single-row - B) single-row inserts → multi-row inserts → batched transactions → COPY - C) They're all equal - D) COPY → single-row
Q5. A staging table is used to: - A) Permanently store production data - B) Load raw data fast, then transform/validate it with SQL into real tables - C) Back up the database - D) Index data
Q6. ETL means: - A) Extract, Transform, Load (transform before loading) - B) Extract, Load, Transform - C) Export, Trim, Load - D) Evaluate, Test, Launch
Q7. ELT (load raw, then transform in SQL) is increasingly favored with: - A) Tiny databases - B) Powerful warehouses (transform in the database) - C) Spreadsheets - D) No databases
Q8. An idempotent load: - A) Can only run once - B) Produces the same result if re-run (safe to retry) - C) Is always slow - D) Skips constraints
Q9. UNLOGGED staging tables load faster because they:
- A) Have no columns
- B) Skip the WAL (not crash-safe, fine for re-loadable data)
- C) Are in memory only
- D) Have no indexes ever
Q10. After a big bulk load, you should run:
- A) VACUUM FULL always
- B) ANALYZE (refresh statistics for the planner)
- C) DROP TABLE
- D) Nothing
True/False
Q11. A multi-row INSERT is faster than many single-row INSERTs, but COPY is faster still for bulk. (True / False)
Q12. Loading raw, unvalidated data directly into constrained production tables is the recommended approach. (True / False)
Short answer
Q13. You must import a 5-million-row CSV from your laptop into a constrained products table that needs type conversion and category-name→id lookup. Outline the fast, safe approach.
---
Answer key
Q1 — B. COPY — orders of magnitude faster than insert loops.
Q2 — B. Server-side file (needs file privileges).
Q3 — B. \copy streams a client-side file via psql.
Q4 — B. single → multi-row → batched → COPY.
Q5 — B. Load raw fast, then transform with SQL into real tables.
Q6 — A. Extract, Transform, Load (transform first).
Q7 — B. Powerful warehouses make in-database (ELT) transformation attractive.
Q8 — B. Re-running yields the same result — safe for retries.
Q9 — B. Skip the WAL (faster, not crash-safe) — fine for re-loadable staging data.
Q10 — B. ANALYZE so the optimizer plans well on the new data.
Q11 — True. Multi-row beats single-row; COPY beats both for bulk.
Q12 — False. Stage raw data, validate/transform, then load into constrained tables.
Q13. Use \copy (client file) to bulk-load the 5M rows into a loose staging table (text columns, optionally UNLOGGED) — fast, no constraint checks. Then transform with SQL: INSERT INTO products (...) SELECT ...::numeric, ... JOIN categories ON name → id, WHERE <valid rows> ON CONFLICT (sku) DO UPDATE ... to cast types, resolve the category id, skip/quarantine bad rows, and upsert (idempotent). Finally ANALYZE (and drop the staging table). This separates fast loading from correct transformation and is safe to re-run.
Scoring: 10–12 you can move data at scale; 7–9 review COPY vs \copy and staging; below 7, do Exercises A–C.