Case Study 2 — The Pagination That Skipped and Repeated Rows

DataField.Dev

Case Study 2 — The Pagination That Skipped and Repeated Rows

Two everyday habits — LIMIT without a deterministic ORDER BY, and SELECT * in application code — combined to produce a baffling user-facing bug. The root cause is the lesson from Chapter 3: rows have no inherent order.

Background

A product-listing page showed 20 products per page, with "Next" and "Previous" buttons. The backend fetched each page like this:

-- Page N (0-based): skip N*20 rows, take 20
SELECT * FROM products
ORDER BY price
LIMIT 20 OFFSET 40;     -- e.g., page 3

Users started reporting two strange things: some products appeared on two consecutive pages, and others seemed to vanish — present on neither page 2 nor page 3. The data was fine; nothing was being deleted. Yet the catalog felt unreliable, and a few customers complained they couldn't find a product they'd seen moments earlier.

What went wrong

The ORDER BY price was not deterministic. Many products share the same price (several books at $45, several accessories at $129). When the sort key has ties, the relative order of the tied rows is undefined — the database may return them in any order, and that order can differ from one query execution to the next (depending on the plan, caching, or concurrent activity).

Pagination with LIMIT/OFFSET assumes a stable, total ordering: page 2 is "rows 21–40 of one fixed sequence," and page 3 is "rows 41–60 of the same sequence." But if rows tied on price reshuffle between the page-2 query and the page-3 query, a product that was row 40 in one run can become row 41 in the next — so it appears on both pages, while its neighbor falls into the gap and appears on neither. The pages were slices of different orderings that only looked like one.

The fix is to make the sort order total by adding a unique tiebreaker — the primary key:

-- Deterministic: ties in price are broken by product_id, which is unique
SELECT product_id, name, price
FROM products
ORDER BY price, product_id
LIMIT 20 OFFSET 40;

Now every product has exactly one position in one well-defined sequence, identical across queries, so pages don't overlap or drop rows.

Rule of thumb: any ORDER BY used for pagination must end in a column (or set of columns) that is unique, so the ordering is total. Usually that's the primary key.

The second problem: SELECT * in the API

While fixing the pagination, the team hit a second issue rooted in SELECT *. The product API serialized whatever columns came back. When a developer later added an internal cost column to products (the wholesale cost — not for customers' eyes), SELECT * happily included it, and the new field leaked into the public JSON response. No code "decided" to expose it; * simply meant "every column, including ones that don't exist yet." A customer poking at the API noticed the margins.

The fix was to name columns explicitly everywhere the data crossed a trust boundary:

-- Name exactly the public fields; new internal columns can't leak in.
SELECT product_id, name, price, attributes
FROM products
ORDER BY price, product_id
LIMIT 20 OFFSET 40;

The analysis

LIMIT is only meaningful with a deterministic ORDER BY. "The next 20" presupposes a fixed sequence. If your sort key has ties, append a unique column (the primary key) to make the order total. This is the single most common pagination bug in the wild.
Rows have no inherent order — including across two runs of the same query. Chapter 3's lesson has teeth here: not only is there no default order, there's no stable order for tied rows unless you specify one. Never assume two executions return ties in the same sequence.
SELECT * is a liability across boundaries. In code — especially where results are serialized to users — name your columns. * couples your output to the table's current shape, so a future schema change silently changes (or leaks) your output. Interactive exploration is the only place * truly belongs.
(Foreshadowing) OFFSET pagination has performance limits too. Even with a deterministic order, large OFFSETs are slow because the database must scan and discard all the skipped rows. For deep pagination, "keyset pagination" (WHERE (price, product_id) > (:lastPrice, :lastId)) is far faster. You'll have the tools to appreciate this after Chapters 23–24; for now, just know OFFSET 100000 is not free.

Discussion questions

Explain precisely why tied prices caused rows to repeat on one page and vanish from another.
What property must a pagination ORDER BY have, and how do you guarantee it?
Why did adding a cost column cause a data leak, and how does naming columns prevent it?
Give one scenario where SELECT * is perfectly fine and one where it's dangerous.
⭐ The team later replaced OFFSET with keyset pagination for performance. Sketch the WHERE clause for "the 20 products after the last one shown (price P, id I)" using the (price, product_id) ordering. Why is it faster than OFFSET?