Chapter 31 — Quiz

DataField.Dev

Chapter 31 — Quiz

12 questions. Answers at the bottom.

Multiple choice

Q1. The fast way to load millions of rows is: - A) INSERT one row at a time - B) COPY - C) UPDATE - D) SELECT

Q2. COPY ... FROM '/data/file.csv' reads the file: - A) On the client machine - B) On the database server - C) From the internet - D) From memory

Q3. To load a file on your (client) machine via psql, use: - A) COPY - B) \copy - C) IMPORT - D) LOAD

Q4. Slowest to fastest for loading bulk data: - A) COPY → batched → multi-row → single-row - B) single-row inserts → multi-row inserts → batched transactions → COPY - C) They're all equal - D) COPY → single-row

Q5. A staging table is used to: - A) Permanently store production data - B) Load raw data fast, then transform/validate it with SQL into real tables - C) Back up the database - D) Index data

Q6. ETL means: - A) Extract, Transform, Load (transform before loading) - B) Extract, Load, Transform - C) Export, Trim, Load - D) Evaluate, Test, Launch

Q7. ELT (load raw, then transform in SQL) is increasingly favored with: - A) Tiny databases - B) Powerful warehouses (transform in the database) - C) Spreadsheets - D) No databases

Q8. An idempotent load: - A) Can only run once - B) Produces the same result if re-run (safe to retry) - C) Is always slow - D) Skips constraints

Q9. UNLOGGED staging tables load faster because they: - A) Have no columns - B) Skip the WAL (not crash-safe, fine for re-loadable data) - C) Are in memory only - D) Have no indexes ever

Q10. After a big bulk load, you should run: - A) VACUUM FULL always - B) ANALYZE (refresh statistics for the planner) - C) DROP TABLE - D) Nothing

True/False

Q11. A multi-row INSERT is faster than many single-row INSERTs, but COPY is faster still for bulk. (True / False)

Q12. Loading raw, unvalidated data directly into constrained production tables is the recommended approach. (True / False)

Short answer

Q13. You must import a 5-million-row CSV from your laptop into a constrained products table that needs type conversion and category-name→id lookup. Outline the fast, safe approach.

---

Answer key

Q1 — B. COPY — orders of magnitude faster than insert loops.

Q2 — B. Server-side file (needs file privileges).

Q3 — B. \copy streams a client-side file via psql.

Q4 — B. single → multi-row → batched → COPY.

Q5 — B. Load raw fast, then transform with SQL into real tables.

Q6 — A. Extract, Transform, Load (transform first).

Q7 — B. Powerful warehouses make in-database (ELT) transformation attractive.

Q8 — B. Re-running yields the same result — safe for retries.

Q9 — B. Skip the WAL (faster, not crash-safe) — fine for re-loadable staging data.

Q10 — B. ANALYZE so the optimizer plans well on the new data.

Q11 — True. Multi-row beats single-row; COPY beats both for bulk.

Q12 — False. Stage raw data, validate/transform, then load into constrained tables.

Q13. Use \copy (client file) to bulk-load the 5M rows into a loose staging table (text columns, optionally UNLOGGED) — fast, no constraint checks. Then transform with SQL: INSERT INTO products (...) SELECT ...::numeric, ... JOIN categories ON name → id, WHERE <valid rows> ON CONFLICT (sku) DO UPDATE ... to cast types, resolve the category id, skip/quarantine bad rows, and upsert (idempotent). Finally ANALYZE (and drop the staging table). This separates fast loading from correct transformation and is safe to re-run.

Scoring: 10–12 you can move data at scale; 7–9 review COPY vs \copy and staging; below 7, do Exercises A–C.