Case Study 1 — Denormalizing the Product Page, Correctly
Denormalization done right: a read-heavy product page that paid an expensive aggregation on every view was made fast — with a deliberate technique choice and an explicit consistency plan. The contrast with Case Study 2 (where the same idea drifted) is the whole lesson.
Background
An e-commerce product listing showed, for each product, its name, price, review count, and average rating. The reviews lived (correctly, per Chapter 19) in a separate reviews table. So every page render aggregated:
SELECT p.product_id, p.name, p.price,
COUNT(r.review_id) AS review_count,
AVG(r.rating) AS avg_rating
FROM products p
LEFT JOIN reviews r ON r.product_id = p.product_id
GROUP BY p.product_id, p.name, p.price;
With a large catalog and millions of reviews, this aggregation ran on every listing view — and listings were the most-visited pages on the site. It was slow and it dominated database load. The data (reviews) changed far less often than the page was read: a textbook denormalization candidate (read-heavy, expensive aggregation, slowly-changing data).
The decision process (the right order)
The team didn't denormalize first. They followed the professional sequence:
- Could an index fix it? An index on
reviews(product_id)helped the join, but the aggregation over millions of rows on every view was still the cost. Indexing alone wasn't enough here. (For many problems it would be — always check first.) - Could a materialized view fix it? Yes — a matview of per-product review stats, refreshed periodically, would make reads instant. This was viable and low-risk.
- Did they need real-time freshness? Review counts/ratings updating within a few minutes was perfectly acceptable for a listing page. Slight staleness was fine.
Given "slight staleness OK," they chose the lowest-risk denormalization: a materialized view (the database manages the redundant copy), with a scheduled refresh.
The implementation
CREATE MATERIALIZED VIEW product_review_stats AS
SELECT product_id,
COUNT(*) AS review_count,
ROUND(AVG(rating),2) AS avg_rating
FROM reviews
GROUP BY product_id;
CREATE UNIQUE INDEX ON product_review_stats (product_id); -- enables CONCURRENTLY refresh
The listing query now joins the small, pre-aggregated matview (one row per product) instead of aggregating millions of reviews:
SELECT p.product_id, p.name, p.price,
COALESCE(s.review_count, 0) AS review_count,
s.avg_rating
FROM products p
LEFT JOIN product_review_stats s ON s.product_id = p.product_id;
And — the part that distinguishes this from a disaster — they scheduled the refresh and documented the staleness:
-- Refresh every 10 minutes, without blocking reads (pg_cron)
SELECT cron.schedule('refresh_review_stats', '*/10 * * * *',
$$REFRESH MATERIALIZED VIEW CONCURRENTLY product_review_stats$$);
They also surfaced the freshness contract to the product team ("ratings update within ~10 minutes") so no one expected real-time counts.
The outcome
Listing pages dropped from slow-and-load-heavy to instant. Crucially, because the redundant data was a managed materialized view with a scheduled refresh, it never silently drifted — the database recomputes it from the source of truth on every refresh, so it's always at most ~10 minutes stale, never wrong. Contrast this with a hand-maintained products.review_count column, which (Case Study 2) can diverge permanently if any write path forgets to update it.
The analysis
-
Denormalize in the right order. Index → materialized view → hand-maintained redundancy. They stopped at the materialized view — the lowest-risk option that solved the problem. Don't jump to manual denormalization when a matview suffices.
-
A materialized view is denormalization the database manages for you. It stores the redundant aggregate, but recomputes it from the truth on refresh — so it can be stale but not silently wrong. That property (vs. a hand-maintained column) is why it's often the preferred technique.
-
Match the consistency strategy to the freshness need. Listing ratings tolerate ~10-minute staleness, so a scheduled refresh is ideal — no per-write trigger overhead, no fragile app code. A different metric (live cart total) would demand a different strategy.
-
Document the freshness contract. Telling the product team "ratings update within ~10 minutes" turns staleness from a hidden surprise into an agreed, understood trade-off (Chapter 15's lesson).
-
This is denormalization as a feature. Deliberate, measured, lowest-risk technique, with a consistency plan and documentation. Everything the accidental redundancy of Chapter 19 wasn't.
Discussion questions
- Why didn't an index alone solve the problem here, when it often does?
- Why is a materialized view "denormalization the database manages," and how does that make it safer than a hand-maintained column?
- Why was a scheduled refresh the right consistency strategy for this page?
- What role does documenting the freshness contract play?
- ⭐ For a page that did need real-time review counts, which technique would you choose instead, and what would it cost?