Chapter 34 — Key Takeaways

The big idea

Analytics (OLAP) and running an app (OLTP) are different jobs with opposite needs. A data warehouse is a separate, analytics-optimized database — denormalized, read/aggregation-optimized, often columnar — fed from OLTP via ETL/ELT.

OLTP vs OLAP

OLTP OLAP (warehouse)
Workload many small reads/writes few huge aggregating reads
Schema normalized (Part III) denormalized (star)
Storage row-oriented column-oriented
Optimized for write integrity, low latency read/aggregation over history

Don't mix them on one database under load — analytics starves transactions. (Case Study 2: a dashboard query took down checkout; fixed with a read replica → warehouse.)

The star schema

  • Fact table — the measurements/events at a defined grain (one row = ?); numeric measures + foreign keys to dimensions. Huge but narrow.
  • Dimension tables — descriptive context to slice by (date, product, customer…); small, denormalized (flatten product→category→department into one table).
  • Date dimension — almost always present; pre-computed year/quarter/month/etc. for clean time slicing.
  • Analytical queries are all "measures from the fact, sliced by dimension attributes, grouped, aggregated."
  • Fact grain is the foundational decision. (Case Study 1.)

Snowflake & SCD

  • Snowflake = normalized dimensions (more joins; star usually preferred).
  • Slowly Changing Dimensions: Type 1 (overwrite), Type 2 (versioned history with validity dates — most common), Type 3 (previous-value). Type 2 preserves "as it was."

Storage & tools

  • Columnar storage → aggregating a few columns over billions of rows reads far less I/O and compresses well (why analytical DBs are columnar — Chapter 28).
  • Modern warehouses: Snowflake, BigQuery, Redshift, DuckDB, ClickHouse — columnar, SQL-queried, ELT-fed. For small/medium analytics, PostgreSQL/DuckDB may suffice (theme #4).

Denormalization is the norm here

Chapter 20 vindicated: warehouses denormalize by default because the workload (batch-loaded, read/aggregation-heavy) makes it correct. Design follows workload.

You can now…

  • ☐ Distinguish OLTP/OLAP and why analytics needs a separate database.
  • ☐ Design a star schema and choose the fact grain.
  • ☐ Explain snowflake schemas and SCD strategies.
  • ☐ Explain columnar storage's advantage; name modern warehouses.
  • ☐ Avoid running heavy analytics on the live OLTP database.

Looking ahead

Chapter 35 — Distributed Databases. Scaling across machines: replication, sharding, the CAP theorem, eventual consistency, distributed transactions, and NewSQL.

One sentence to carry forward: Analytics belongs in a separate, denormalized, columnar warehouse modeled as a star schema — design follows workload, so normalize for transactions and denormalize for analysis.