Chapter 34 — Further Reading
Foundations (📊 Analyst · 🏗️ Data Engineer)
- Ralph Kimball & Margy Ross, The Data Warehouse Toolkit. The canonical book on dimensional modeling — star schemas, fact grain, slowly changing dimensions. The standard reference; if you do warehousing, read it.
- Bill Inmon's writings — the "warehouse as integrated subject-oriented store" perspective (the Inmon vs. Kimball contrast is worth knowing).
- Kimball Group "Dimensional Modeling Techniques" (free online) — concise summaries of facts, dimensions, SCD types.
Modern data stack (📊 Data Engineer)
- dbt (data build tool) — the standard for ELT transformation in SQL inside the warehouse; how modern teams build star schemas (ties to Chapter 31).
- Snowflake / BigQuery / Redshift docs — cloud columnar warehouses; architecture (separation of storage and compute), loading, and SQL dialects.
- DuckDB (https://duckdb.org/) — "SQLite for analytics"; superb for local columnar analytics on files. Great for learning OLAP hands-on.
- ClickHouse — real-time columnar analytics.
Concepts (🔬 CS Student · 🏗️ DBA)
- "OLTP vs OLAP" and "row vs column storage" explainers — why the storage model follows the workload (Chapter 28).
- "Slowly changing dimensions explained" — Type 1/2/3 with examples (Case Study 1's stretch).
- Star vs snowflake schema comparisons.
PostgreSQL for analytics (💻 Developer · 🏗️ DBA)
- Citus (distributed PostgreSQL) and columnar storage extensions — pushing PostgreSQL toward warehouse workloads.
- Read replicas for reporting — the quick-win isolation from Case Study 2 (also Chapter 35).
Reference (this book)
- Chapter 19–20 — Normalization/Denormalization: the design trade-off, here resolved toward denormalization.
- Chapter 28 — Internals: row vs column storage.
- Chapter 31 — ETL/ELT: how data gets into the warehouse.
- Chapter 35 — Distributed Databases: read replicas and scale-out.
Do, don't just read
- Design a star schema for a domain you know (Case Study 1 / Exercise 34.4): pick the fact grain, list dimensions, include a date dimension.
- Try DuckDB on a large CSV: load it and run aggregations; feel how fast columnar analytics is.
- Contrast your warehouse star with your OLTP schema and articulate why they differ.
Next: Chapter 35 — Distributed Databases: replication, sharding, and the CAP theorem.