Chapter 31 — Key Takeaways

The big idea

Row-by-row INSERT is the slow way to move bulk data; COPY is the fast path — orders of magnitude faster. And raw data should be staged and validated before it reaches production tables. Together: load fast, transform safely.

COPY

  • COPY ... FROM file (load) / COPY ... TO file (unload); formats: csv, text, binary.
  • COPY = server-side file (+ privileges); \copy = client-side file via psql. Same speed/options.
  • From code, use the driver's copy support (psycopg3 cursor.copy()), not an execute() loop.
  • Loading hierarchy (slow→fast): single-row inserts → multi-row inserts → batched transactions → COPY. (Case Study 1: 6 hours → 4 minutes.)
  • Per-row commit() is a disaster (a WAL flush per row) — use one/few transactions.

Staging-table pattern (load → transform)

  1. Bulk-COPY raw data into a loose (text, no constraints, optionally UNLOGGED) staging table.
  2. Validate there (report/quarantine bad rows) — before touching production.
  3. Transform + upsert good rows into real tables with SQL (cast types, resolve keys, ON CONFLICT), in one transaction.
  4. Drop/truncate staging.

This isolates raw data from production and separates fast loading from correct transformation. (Case Study 2: it would have stopped a corrupt import.)

ETL vs ELT, and robust pipelines

  • ETL = transform before load; ELT = load raw, transform in SQL (favored with powerful warehouses, Ch. 34). The staging pattern is ELT-in-miniature.
  • Robust pipelines are incremental (load only new/changed data) and idempotent (re-run = same result, via upserts) so retries are safe.

Big-load performance

COPY not loops · drop/rebuild indexes for big initial loads · batch / few transactions · UNLOGGED staging · ANALYZE after (fresh stats, Ch. 24).

Common mistakes

INSERT loops for millions of rows; confusing COPY (server) vs \copy (client); loading raw data into constrained production; non-idempotent pipelines (duplicates on retry); forgetting ANALYZE.

You can now…

  • ☐ Bulk-load/unload with COPY/\copy and from code.
  • ☐ Explain why COPYINSERT loops.
  • ☐ Apply the staging-table pattern (load → validate → transform → upsert).
  • ☐ Distinguish ETL/ELT and build incremental, idempotent loads.

Looking ahead

Chapter 32 — Database Security (closes Part V). SQL injection (revisited), roles & privileges, row-level security, encryption, backups, and data privacy — protecting the data you can now move.

One sentence to carry forward: Use COPY (never insert loops) for bulk, and always stage-validate-transform raw data into production with idempotent upserts — fast loading and safe transformation are two different jobs.