Chapter 34 — Key Takeaways

DataField.Dev

Chapter 34 — Key Takeaways

The big idea

Analytics (OLAP) and running an app (OLTP) are different jobs with opposite needs. A data warehouse is a separate, analytics-optimized database — denormalized, read/aggregation-optimized, often columnar — fed from OLTP via ETL/ELT.

OLTP vs OLAP

	OLTP	OLAP (warehouse)
Workload	many small reads/writes	few huge aggregating reads
Schema	normalized (Part III)	denormalized (star)
Storage	row-oriented	column-oriented
Optimized for	write integrity, low latency	read/aggregation over history

Don't mix them on one database under load — analytics starves transactions. (Case Study 2: a dashboard query took down checkout; fixed with a read replica → warehouse.)

The star schema

Fact table — the measurements/events at a defined grain (one row = ?); numeric measures + foreign keys to dimensions. Huge but narrow.
Dimension tables — descriptive context to slice by (date, product, customer…); small, denormalized (flatten product→category→department into one table).
Date dimension — almost always present; pre-computed year/quarter/month/etc. for clean time slicing.
Analytical queries are all "measures from the fact, sliced by dimension attributes, grouped, aggregated."
Fact grain is the foundational decision. (Case Study 1.)

Snowflake & SCD

Snowflake = normalized dimensions (more joins; star usually preferred).
Slowly Changing Dimensions: Type 1 (overwrite), Type 2 (versioned history with validity dates — most common), Type 3 (previous-value). Type 2 preserves "as it was."

Storage & tools

Columnar storage → aggregating a few columns over billions of rows reads far less I/O and compresses well (why analytical DBs are columnar — Chapter 28).
Modern warehouses: Snowflake, BigQuery, Redshift, DuckDB, ClickHouse — columnar, SQL-queried, ELT-fed. For small/medium analytics, PostgreSQL/DuckDB may suffice (theme #4).

Denormalization is the norm here

Chapter 20 vindicated: warehouses denormalize by default because the workload (batch-loaded, read/aggregation-heavy) makes it correct. Design follows workload.

You can now…

☐ Distinguish OLTP/OLAP and why analytics needs a separate database.
☐ Design a star schema and choose the fact grain.
☐ Explain snowflake schemas and SCD strategies.
☐ Explain columnar storage's advantage; name modern warehouses.
☐ Avoid running heavy analytics on the live OLTP database.

Looking ahead

Chapter 35 — Distributed Databases. Scaling across machines: replication, sharding, the CAP theorem, eventual consistency, distributed transactions, and NewSQL.

One sentence to carry forward: Analytics belongs in a separate, denormalized, columnar warehouse modeled as a star schema — design follows workload, so normalize for transactions and denormalize for analysis.