Case Study 1 — Denormalizing the Product Page, Correctly

Denormalization done right: a read-heavy product page that paid an expensive aggregation on every view was made fast — with a deliberate technique choice and an explicit consistency plan. The contrast with Case Study 2 (where the same idea drifted) is the whole lesson.

Background

An e-commerce product listing showed, for each product, its name, price, review count, and average rating. The reviews lived (correctly, per Chapter 19) in a separate reviews table. So every page render aggregated:

SELECT p.product_id, p.name, p.price,
       COUNT(r.review_id)  AS review_count,
       AVG(r.rating)       AS avg_rating
FROM products p
LEFT JOIN reviews r ON r.product_id = p.product_id
GROUP BY p.product_id, p.name, p.price;

With a large catalog and millions of reviews, this aggregation ran on every listing view — and listings were the most-visited pages on the site. It was slow and it dominated database load. The data (reviews) changed far less often than the page was read: a textbook denormalization candidate (read-heavy, expensive aggregation, slowly-changing data).

The decision process (the right order)

The team didn't denormalize first. They followed the professional sequence:

  1. Could an index fix it? An index on reviews(product_id) helped the join, but the aggregation over millions of rows on every view was still the cost. Indexing alone wasn't enough here. (For many problems it would be — always check first.)
  2. Could a materialized view fix it? Yes — a matview of per-product review stats, refreshed periodically, would make reads instant. This was viable and low-risk.
  3. Did they need real-time freshness? Review counts/ratings updating within a few minutes was perfectly acceptable for a listing page. Slight staleness was fine.

Given "slight staleness OK," they chose the lowest-risk denormalization: a materialized view (the database manages the redundant copy), with a scheduled refresh.

The implementation

CREATE MATERIALIZED VIEW product_review_stats AS
SELECT product_id,
       COUNT(*)            AS review_count,
       ROUND(AVG(rating),2) AS avg_rating
FROM reviews
GROUP BY product_id;

CREATE UNIQUE INDEX ON product_review_stats (product_id);   -- enables CONCURRENTLY refresh

The listing query now joins the small, pre-aggregated matview (one row per product) instead of aggregating millions of reviews:

SELECT p.product_id, p.name, p.price,
       COALESCE(s.review_count, 0) AS review_count,
       s.avg_rating
FROM products p
LEFT JOIN product_review_stats s ON s.product_id = p.product_id;

And — the part that distinguishes this from a disaster — they scheduled the refresh and documented the staleness:

-- Refresh every 10 minutes, without blocking reads (pg_cron)
SELECT cron.schedule('refresh_review_stats', '*/10 * * * *',
    $$REFRESH MATERIALIZED VIEW CONCURRENTLY product_review_stats$$);

They also surfaced the freshness contract to the product team ("ratings update within ~10 minutes") so no one expected real-time counts.

The outcome

Listing pages dropped from slow-and-load-heavy to instant. Crucially, because the redundant data was a managed materialized view with a scheduled refresh, it never silently drifted — the database recomputes it from the source of truth on every refresh, so it's always at most ~10 minutes stale, never wrong. Contrast this with a hand-maintained products.review_count column, which (Case Study 2) can diverge permanently if any write path forgets to update it.

The analysis

  1. Denormalize in the right order. Index → materialized view → hand-maintained redundancy. They stopped at the materialized view — the lowest-risk option that solved the problem. Don't jump to manual denormalization when a matview suffices.

  2. A materialized view is denormalization the database manages for you. It stores the redundant aggregate, but recomputes it from the truth on refresh — so it can be stale but not silently wrong. That property (vs. a hand-maintained column) is why it's often the preferred technique.

  3. Match the consistency strategy to the freshness need. Listing ratings tolerate ~10-minute staleness, so a scheduled refresh is ideal — no per-write trigger overhead, no fragile app code. A different metric (live cart total) would demand a different strategy.

  4. Document the freshness contract. Telling the product team "ratings update within ~10 minutes" turns staleness from a hidden surprise into an agreed, understood trade-off (Chapter 15's lesson).

  5. This is denormalization as a feature. Deliberate, measured, lowest-risk technique, with a consistency plan and documentation. Everything the accidental redundancy of Chapter 19 wasn't.

Discussion questions

  1. Why didn't an index alone solve the problem here, when it often does?
  2. Why is a materialized view "denormalization the database manages," and how does that make it safer than a hand-maintained column?
  3. Why was a scheduled refresh the right consistency strategy for this page?
  4. What role does documenting the freshness contract play?
  5. ⭐ For a page that did need real-time review counts, which technique would you choose instead, and what would it cost?