Chapter 31 — Key Takeaways
The big idea
Row-by-row INSERT is the slow way to move bulk data; COPY is the fast path — orders of magnitude faster. And raw data should be staged and validated before it reaches production tables. Together: load fast, transform safely.
COPY
COPY ... FROM file(load) /COPY ... TO file(unload); formats:csv,text,binary.COPY= server-side file (+ privileges);\copy= client-side file via psql. Same speed/options.- From code, use the driver's copy support (psycopg3
cursor.copy()), not anexecute()loop. - Loading hierarchy (slow→fast): single-row inserts → multi-row inserts → batched transactions →
COPY. (Case Study 1: 6 hours → 4 minutes.) - Per-row
commit()is a disaster (a WAL flush per row) — use one/few transactions.
Staging-table pattern (load → transform)
- Bulk-
COPYraw data into a loose (text, no constraints, optionallyUNLOGGED) staging table. - Validate there (report/quarantine bad rows) — before touching production.
- Transform + upsert good rows into real tables with SQL (cast types, resolve keys,
ON CONFLICT), in one transaction. - Drop/truncate staging.
This isolates raw data from production and separates fast loading from correct transformation. (Case Study 2: it would have stopped a corrupt import.)
ETL vs ELT, and robust pipelines
- ETL = transform before load; ELT = load raw, transform in SQL (favored with powerful warehouses, Ch. 34). The staging pattern is ELT-in-miniature.
- Robust pipelines are incremental (load only new/changed data) and idempotent (re-run = same result, via upserts) so retries are safe.
Big-load performance
COPY not loops · drop/rebuild indexes for big initial loads · batch / few transactions · UNLOGGED staging · ANALYZE after (fresh stats, Ch. 24).
Common mistakes
INSERT loops for millions of rows; confusing COPY (server) vs \copy (client); loading raw data into constrained production; non-idempotent pipelines (duplicates on retry); forgetting ANALYZE.
You can now…
- ☐ Bulk-load/unload with
COPY/\copyand from code. - ☐ Explain why
COPY≫INSERTloops. - ☐ Apply the staging-table pattern (load → validate → transform → upsert).
- ☐ Distinguish ETL/ELT and build incremental, idempotent loads.
Looking ahead
Chapter 32 — Database Security (closes Part V). SQL injection (revisited), roles & privileges, row-level security, encryption, backups, and data privacy — protecting the data you can now move.
One sentence to carry forward: Use
COPY(never insert loops) for bulk, and always stage-validate-transform raw data into production with idempotent upserts — fast loading and safe transformation are two different jobs.