Chapter 34 — Key Takeaways
The big idea
Analytics (OLAP) and running an app (OLTP) are different jobs with opposite needs. A data warehouse is a separate, analytics-optimized database — denormalized, read/aggregation-optimized, often columnar — fed from OLTP via ETL/ELT.
OLTP vs OLAP
| OLTP | OLAP (warehouse) | |
|---|---|---|
| Workload | many small reads/writes | few huge aggregating reads |
| Schema | normalized (Part III) | denormalized (star) |
| Storage | row-oriented | column-oriented |
| Optimized for | write integrity, low latency | read/aggregation over history |
Don't mix them on one database under load — analytics starves transactions. (Case Study 2: a dashboard query took down checkout; fixed with a read replica → warehouse.)
The star schema
- Fact table — the measurements/events at a defined grain (one row = ?); numeric measures + foreign keys to dimensions. Huge but narrow.
- Dimension tables — descriptive context to slice by (date, product, customer…); small, denormalized (flatten product→category→department into one table).
- Date dimension — almost always present; pre-computed year/quarter/month/etc. for clean time slicing.
- Analytical queries are all "measures from the fact, sliced by dimension attributes, grouped, aggregated."
- Fact grain is the foundational decision. (Case Study 1.)
Snowflake & SCD
- Snowflake = normalized dimensions (more joins; star usually preferred).
- Slowly Changing Dimensions: Type 1 (overwrite), Type 2 (versioned history with validity dates — most common), Type 3 (previous-value). Type 2 preserves "as it was."
Storage & tools
- Columnar storage → aggregating a few columns over billions of rows reads far less I/O and compresses well (why analytical DBs are columnar — Chapter 28).
- Modern warehouses: Snowflake, BigQuery, Redshift, DuckDB, ClickHouse — columnar, SQL-queried, ELT-fed. For small/medium analytics, PostgreSQL/DuckDB may suffice (theme #4).
Denormalization is the norm here
Chapter 20 vindicated: warehouses denormalize by default because the workload (batch-loaded, read/aggregation-heavy) makes it correct. Design follows workload.
You can now…
- ☐ Distinguish OLTP/OLAP and why analytics needs a separate database.
- ☐ Design a star schema and choose the fact grain.
- ☐ Explain snowflake schemas and SCD strategies.
- ☐ Explain columnar storage's advantage; name modern warehouses.
- ☐ Avoid running heavy analytics on the live OLTP database.
Looking ahead
Chapter 35 — Distributed Databases. Scaling across machines: replication, sharding, the CAP theorem, eventual consistency, distributed transactions, and NewSQL.
One sentence to carry forward: Analytics belongs in a separate, denormalized, columnar warehouse modeled as a star schema — design follows workload, so normalize for transactions and denormalize for analysis.