Case Study 2 — The Month-End Orders That Vanished
A subtle date bug that recurs in countless systems: using
BETWEENwith a date literal against a timestamp column, so the last day's transactions silently disappear. Combined with a time-zone misunderstanding, it under-reported revenue on exactly the days that mattered most.
Background
A retailer ran monthly revenue reports. The orders.order_date column was a timestamptz (date and time, time-zone aware). The finance query selected a month's orders like this:
-- "All of March 2024" ❌
SELECT SUM(amount) AS march_revenue
FROM orders
WHERE order_date BETWEEN '2024-03-01' AND '2024-03-31';
Every month, the reported revenue was slightly low — and the discrepancy was worst at month-end pushes and holiday sales. Over a year, the cumulative under-report was material enough that finance and the data team had a tense reconciliation meeting. The orders weren't missing from the database. They were missing from the query.
What went wrong: two compounding bugs
Bug 1 — BETWEEN with a bare date excludes most of the last day. When you compare a timestamptz column to the date literal '2024-03-31', SQL interprets that literal as the instant 2024-03-31 00:00:00 (midnight at the start of the day). BETWEEN ... AND '2024-03-31' therefore includes only orders timestamped at exactly midnight on the 31st — and excludes every order placed during March 31 (i.e., almost all of them):
order at 2024-03-31 14:22 → 14:22 > 00:00 → EXCLUDED by BETWEEN ... '2024-03-31'
So the entire last day of every month vanished from the report. Month-end and holiday surges — the highest-volume days — were exactly the ones being dropped, which is why the error was worst when it hurt most.
Bug 2 — time-zone drift at the boundaries. The literal '2024-03-01' is interpreted in the session's time zone, while business reporting expected a specific zone (say, US/Eastern). Orders placed late on the last day of February in Eastern time could land on March 1 in UTC (or vice versa), shifting a handful of orders across the month boundary. Small, but enough to make reconciliation fail to the penny and erode trust in every number.
The fix
Use a half-open range — >= start AND < next-month-start — which includes all instants in the month regardless of time, and be explicit about the time zone:
-- Correct: half-open range covers every instant in March
SELECT SUM(amount) AS march_revenue
FROM orders
WHERE order_date >= '2024-03-01'
AND order_date < '2024-04-01';
The half-open pattern (>= start, < next start) is the gold standard for any period query — days, weeks, months, years — because it has no gaps and no overlaps when you tile consecutive periods, and it never depends on knowing how many days a month has or how precise the timestamps are.
To remove the time-zone ambiguity, convert to the business zone explicitly before bucketing:
-- Bucket by calendar month in a specific business time zone:
SELECT date_trunc('month', order_date AT TIME ZONE 'America/New_York') AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY 1
ORDER BY 1;
AT TIME ZONE makes the intended zone explicit, so an order's calendar month is defined by business time, not by whatever zone the session happens to use.
The analysis
-
BETWEENand timestamps don't mix for "a whole day/month." A date literal is midnight;BETWEEN ... datestops at that midnight and drops the rest of the day. Reach for half-open ranges (>= start AND < end) for any period over a timestamp column. -
Half-open ranges tile perfectly. Consecutive half-open periods (
[Mar1, Apr1),[Apr1, May1)) cover every instant exactly once — no gaps, no double-counting at boundaries.BETWEEN(inclusive on both ends) double-counts the shared endpoint when periods are chained. -
Be explicit about time zones at boundaries.
timestamptzstores an absolute instant; which calendar day/month it falls in depends on the zone. For business reporting, convert withAT TIME ZONEto the intended zone rather than relying on the session default. -
The error hid in the highest-value data. Dropping each month's last day disproportionately removed month-end and holiday spikes — the bug was biased toward the most important days. Systematic, biased errors are more dangerous than random noise because they consistently mislead.
-
Reconcile against a no-filter total.
SELECT SUM(amount) FROM orders;for the full year, compared to the sum of twelve monthly reports, would have revealed the leak (the months wouldn't add up to the year). Tiling checks — "do the parts sum to the whole?" — catch boundary bugs.
Discussion questions
- Explain why
BETWEEN '2024-03-01' AND '2024-03-31'drops almost all of March 31 for atimestamptzcolumn. - Write the half-open version, and explain why it's correct regardless of timestamp precision.
- Why do half-open ranges tile consecutive months without gaps or overlaps, while
BETWEENdoes not? - How can time zone affect which month an order belongs to? How does
AT TIME ZONEresolve it? - ⭐ Propose a reconciliation check that verifies the twelve monthly totals sum to the annual total, and explain what kind of bug it would catch.