Chapter 31 — Further Reading
Official reference (everyone)
- PostgreSQL Docs: "COPY." The full command — formats, options,
FROM/TO,STDIN/STDOUT. https://www.postgresql.org/docs/current/sql-copy.html - psql
\copyreference — client-side copy, when the file is on your machine. - PostgreSQL Docs: "Populating a Database" — the official guide to fast bulk loads (disable indexes/constraints,
COPY,maxwal,ANALYZE). Read this before any large load.
Loading from code (💻 Developer · 📊 Data Engineer)
- psycopg 3:
cursor.copy()and psycopg2:copy_expert/copy_from— fast bulk loads from Python. - "Fastest way to load data into PostgreSQL" comparisons — single vs. multi-row vs.
COPY, with numbers (Case Study 1).
Pipelines & ETL/ELT (📊 Data Engineer · 🏗️ DBA)
- dbt (data build tool) — the standard for ELT transformations in SQL inside the warehouse; pairs with Chapter 34.
- Airflow / Dagster / Prefect — pipeline orchestration (scheduling, dependencies, retries) for incremental, idempotent loads.
- "ETL vs ELT" explainers — when to transform before vs. after loading.
- "Idempotent data pipelines" / "incremental loads with high-water marks" — robustness patterns (Case Study 2).
Staging & data quality (🏗️ DBA · 📊 Analyst)
- Staging-table / "load then transform" patterns — articles on isolating raw data and validating before production.
UNLOGGEDtables docs — faster, non-crash-safe tables for staging/throwaway data.
Reference (this book)
- Chapter 13 — Data Modification:
INSERT/ON CONFLICT(upsert) for idempotent loads. - Chapter 23 — Indexing: why dropping/rebuilding indexes speeds bulk loads.
- Chapter 24 — Optimization: why
ANALYZEafter a load matters. - Chapter 34 — Data Warehousing: the destination of many ELT pipelines.
Do, don't just read
- Race
COPYvs. anINSERTloop on ~10K rows and time both (Case Study 1). - Build the staging pipeline (Case Study 2): load raw CSV into a loose table, validate, upsert good rows into production, re-run to confirm idempotency.
- Load
generate_data.sqland note it uses bulk techniques + ends withANALYZE.
Next: Chapter 32 — Database Security: protecting your data (closes Part V).