Chapter 31 — Further Reading

Official reference (everyone)

  • PostgreSQL Docs: "COPY." The full command — formats, options, FROM/TO, STDIN/STDOUT. https://www.postgresql.org/docs/current/sql-copy.html
  • psql \copy reference — client-side copy, when the file is on your machine.
  • PostgreSQL Docs: "Populating a Database" — the official guide to fast bulk loads (disable indexes/constraints, COPY, maxwal, ANALYZE). Read this before any large load.

Loading from code (💻 Developer · 📊 Data Engineer)

  • psycopg 3: cursor.copy() and psycopg2: copy_expert/copy_from — fast bulk loads from Python.
  • "Fastest way to load data into PostgreSQL" comparisons — single vs. multi-row vs. COPY, with numbers (Case Study 1).

Pipelines & ETL/ELT (📊 Data Engineer · 🏗️ DBA)

  • dbt (data build tool) — the standard for ELT transformations in SQL inside the warehouse; pairs with Chapter 34.
  • Airflow / Dagster / Prefect — pipeline orchestration (scheduling, dependencies, retries) for incremental, idempotent loads.
  • "ETL vs ELT" explainers — when to transform before vs. after loading.
  • "Idempotent data pipelines" / "incremental loads with high-water marks" — robustness patterns (Case Study 2).

Staging & data quality (🏗️ DBA · 📊 Analyst)

  • Staging-table / "load then transform" patterns — articles on isolating raw data and validating before production.
  • UNLOGGED tables docs — faster, non-crash-safe tables for staging/throwaway data.

Reference (this book)

  • Chapter 13 — Data Modification: INSERT/ON CONFLICT (upsert) for idempotent loads.
  • Chapter 23 — Indexing: why dropping/rebuilding indexes speeds bulk loads.
  • Chapter 24 — Optimization: why ANALYZE after a load matters.
  • Chapter 34 — Data Warehousing: the destination of many ELT pipelines.

Do, don't just read

  • Race COPY vs. an INSERT loop on ~10K rows and time both (Case Study 1).
  • Build the staging pipeline (Case Study 2): load raw CSV into a loose table, validate, upsert good rows into production, re-run to confirm idempotency.
  • Load generate_data.sql and note it uses bulk techniques + ends with ANALYZE.

Next: Chapter 32 — Database Security: protecting your data (closes Part V).