Case Study 2 — The Month-End Orders That Vanished

A subtle date bug that recurs in countless systems: using BETWEEN with a date literal against a timestamp column, so the last day's transactions silently disappear. Combined with a time-zone misunderstanding, it under-reported revenue on exactly the days that mattered most.

Background

A retailer ran monthly revenue reports. The orders.order_date column was a timestamptz (date and time, time-zone aware). The finance query selected a month's orders like this:

-- "All of March 2024"   ❌
SELECT SUM(amount) AS march_revenue
FROM orders
WHERE order_date BETWEEN '2024-03-01' AND '2024-03-31';

Every month, the reported revenue was slightly low — and the discrepancy was worst at month-end pushes and holiday sales. Over a year, the cumulative under-report was material enough that finance and the data team had a tense reconciliation meeting. The orders weren't missing from the database. They were missing from the query.

What went wrong: two compounding bugs

Bug 1 — BETWEEN with a bare date excludes most of the last day. When you compare a timestamptz column to the date literal '2024-03-31', SQL interprets that literal as the instant 2024-03-31 00:00:00 (midnight at the start of the day). BETWEEN ... AND '2024-03-31' therefore includes only orders timestamped at exactly midnight on the 31st — and excludes every order placed during March 31 (i.e., almost all of them):

   order at 2024-03-31 14:22  →  14:22 > 00:00  →  EXCLUDED by BETWEEN ... '2024-03-31'

So the entire last day of every month vanished from the report. Month-end and holiday surges — the highest-volume days — were exactly the ones being dropped, which is why the error was worst when it hurt most.

Bug 2 — time-zone drift at the boundaries. The literal '2024-03-01' is interpreted in the session's time zone, while business reporting expected a specific zone (say, US/Eastern). Orders placed late on the last day of February in Eastern time could land on March 1 in UTC (or vice versa), shifting a handful of orders across the month boundary. Small, but enough to make reconciliation fail to the penny and erode trust in every number.

The fix

Use a half-open range>= start AND < next-month-start — which includes all instants in the month regardless of time, and be explicit about the time zone:

-- Correct: half-open range covers every instant in March
SELECT SUM(amount) AS march_revenue
FROM orders
WHERE order_date >= '2024-03-01'
  AND order_date <  '2024-04-01';

The half-open pattern (>= start, < next start) is the gold standard for any period query — days, weeks, months, years — because it has no gaps and no overlaps when you tile consecutive periods, and it never depends on knowing how many days a month has or how precise the timestamps are.

To remove the time-zone ambiguity, convert to the business zone explicitly before bucketing:

-- Bucket by calendar month in a specific business time zone:
SELECT date_trunc('month', order_date AT TIME ZONE 'America/New_York') AS month,
       SUM(amount) AS revenue
FROM orders
GROUP BY 1
ORDER BY 1;

AT TIME ZONE makes the intended zone explicit, so an order's calendar month is defined by business time, not by whatever zone the session happens to use.

The analysis

  1. BETWEEN and timestamps don't mix for "a whole day/month." A date literal is midnight; BETWEEN ... date stops at that midnight and drops the rest of the day. Reach for half-open ranges (>= start AND < end) for any period over a timestamp column.

  2. Half-open ranges tile perfectly. Consecutive half-open periods ([Mar1, Apr1), [Apr1, May1)) cover every instant exactly once — no gaps, no double-counting at boundaries. BETWEEN (inclusive on both ends) double-counts the shared endpoint when periods are chained.

  3. Be explicit about time zones at boundaries. timestamptz stores an absolute instant; which calendar day/month it falls in depends on the zone. For business reporting, convert with AT TIME ZONE to the intended zone rather than relying on the session default.

  4. The error hid in the highest-value data. Dropping each month's last day disproportionately removed month-end and holiday spikes — the bug was biased toward the most important days. Systematic, biased errors are more dangerous than random noise because they consistently mislead.

  5. Reconcile against a no-filter total. SELECT SUM(amount) FROM orders; for the full year, compared to the sum of twelve monthly reports, would have revealed the leak (the months wouldn't add up to the year). Tiling checks — "do the parts sum to the whole?" — catch boundary bugs.

Discussion questions

  1. Explain why BETWEEN '2024-03-01' AND '2024-03-31' drops almost all of March 31 for a timestamptz column.
  2. Write the half-open version, and explain why it's correct regardless of timestamp precision.
  3. Why do half-open ranges tile consecutive months without gaps or overlaps, while BETWEEN does not?
  4. How can time zone affect which month an order belongs to? How does AT TIME ZONE resolve it?
  5. ⭐ Propose a reconciliation check that verifies the twelve monthly totals sum to the annual total, and explain what kind of bug it would catch.