> Where you are: Part II, Chapter 5 of 40 — the first chapter of SQL Mastery, the heart of the book. You know the model (Ch. 3) and the algebra (Ch. 4). Now you write SQL in earnest. By the end of this chapter, basic queries should start to feel...
In This Chapter
- The query you'll write ten thousand times
- The simplest query
- The logical order of evaluation
- WHERE: filtering rows
- DISTINCT — removing duplicate rows
- ORDER BY — sorting the result
- LIMIT and OFFSET — just some of the rows
- Comments and formatting
- Putting it together
- Literals, quoting, and type casts
- Filtering on dates and times
- CASE: choosing a value per row
- Three-valued logic in WHERE, made concrete
- A wider gallery of queries
- A performance note (a preview of Chapter 23)
- SELECT without a table, and other small surprises
- Equivalent spellings: when several queries say the same thing
- A reference: the operators you'll use in WHERE
- A method for fixing a query that's wrong
- Progressive project: your first real queries
- Summary
Chapter 5: SELECT, FROM, WHERE — Your First Queries
Where you are: Part II, Chapter 5 of 40 — the first chapter of SQL Mastery, the heart of the book. You know the model (Ch. 3) and the algebra (Ch. 4). Now you write SQL in earnest. By the end of this chapter, basic queries should start to feel automatic.
Learning paths: 💻 📊 🔬 🏗️ — everyone, no exceptions. Every later chapter assumes total fluency with what's here. Type every example.
The query you'll write ten thousand times
The SELECT statement is the workhorse of SQL. You will write it more than any other single thing in your database career — to answer questions, to check your work, to debug, to build reports, to feed applications. Master its three core clauses — SELECT, FROM, WHERE — and you can already answer a huge fraction of the questions anyone will ever ask of a database.
This is theme #2 in full force: SQL is a language, learned by writing it. Reading this chapter will teach you about SELECT. Typing every query, breaking it, and fixing it will teach you SELECT. Keep a psql window open and follow along against the mercado database (loaded in Chapter 2 with seed-sample.sql, so your output matches the book's).
A complete basic query has this shape:
SELECT <columns> -- WHAT to show (projection, π)
FROM <table> -- WHERE FROM (the relation)
WHERE <condition> -- WHICH rows (selection, σ)
ORDER BY <columns> -- in what order
LIMIT <n>; -- how many
You met SELECT, FROM, and WHERE as relational-algebra operations in Chapter 4 — projection (π), the source relation, and selection (σ). Now you'll use them for real.
The simplest query
FROM names the table; the SELECT list names the columns. The asterisk * means "all columns":
SELECT * FROM categories;
category_id | name | slug | parent_category_id | description
-------------+-------------+-------------+--------------------+----------------------------
1 | Electronics | electronics | | Devices and gadgets
2 | Computers | computers | 1 | Laptops, desktops, and parts
3 | Laptops | laptops | 2 | Portable computers
...
(8 rows)
Usually you want only some columns. List them, comma-separated, in the order you want them displayed:
SELECT name, price FROM products;
This is projection (π): you're choosing columns. Selecting only the columns you need is also a good habit for performance and clarity — more on that at the end.
Common mistake.
SELECT *in application code and saved reports. It's fine for interactive exploration, but in code it's fragile: if someone adds a column, your result shape changes unexpectedly, and you pull more data than you need over the network. Name the columns you actually want. (Interactively,*is your friend — use it to explore.)
Expressions and aliases in the SELECT list
The SELECT list isn't limited to bare column names. You can compute values, and you can rename outputs with AS (an alias):
SELECT
name,
price,
price * 0.9 AS sale_price, -- a computed column
ROUND(price * 0.9, 2) AS sale_rounded
FROM products;
name | price | sale_price | sale_rounded
-------------------+---------+------------+--------------
UltraBook 14 | 1299.00 | 1169.100 | 1169.10
UltraBook 16 Pro | 1899.00 | 1709.100 | 1709.10
...
The alias is the column's name in the result. Aliases make output readable and let you refer to computed columns in ORDER BY. You can also concatenate text with ||:
SELECT first_name || ' ' || last_name AS full_name, loyalty_tier
FROM customers;
Dialect Difference. String concatenation with
||is standard SQL and works in PostgreSQL, Oracle, and SQLite. MySQL usesCONCAT(a, b)by default (its||means OR unless configured otherwise), and SQL Server uses+. See Appendix J. TheASkeyword is itself optional —price * 0.9 sale_priceworks too — but writingASis clearer.
The logical order of evaluation
Here's something that confuses nearly everyone at first, and clears up a lot once understood. You write the clauses in this order:
SELECT … FROM … WHERE … ORDER BY … LIMIT …
But the database logically evaluates them in a different order:
1. FROM pick the source relation (and joins, Ch. 6)
2. WHERE keep rows matching the condition (σ)
3. SELECT compute and choose the output columns (π)
4. ORDER BY sort the result
5. LIMIT keep the first n rows
This is exactly the order of the algebra expression from Chapter 4: source → selection → projection → sort → limit. It explains several things that otherwise seem arbitrary:
- Why
WHEREcan't use aSELECT-list alias. BecauseWHEREruns before theSELECTlist is computed, an alias defined there doesn't exist yet:sql -- ERROR: column "sale_price" does not exist SELECT price * 0.9 AS sale_price FROM products WHERE sale_price < 100; -- Correct: repeat the expression, since WHERE runs first SELECT price * 0.9 AS sale_price FROM products WHERE price * 0.9 < 100; - Why
ORDER BYcan use an alias. BecauseORDER BYruns after theSELECTlist, the alias already exists:sql SELECT price * 0.9 AS sale_price FROM products ORDER BY sale_price; -- fine
Keep this order in your head; it resolves a surprising number of "why doesn't this work?" moments throughout Part II.
WHERE: filtering rows
The WHERE clause is selection (σ) — it keeps only the rows for which a condition is true. (Recall from Chapter 3: rows where the condition is NULL/unknown are excluded.)
Comparison operators
SELECT name, price FROM products WHERE price > 200;
SELECT * FROM orders WHERE status = 'delivered';
SELECT * FROM products WHERE price <> 1299.00; -- <> means "not equal"; != also works
The operators: = (equal), <> or != (not equal), <, >, <=, >=. They work on numbers, text (alphabetical/collation order), dates, and more.
Combining conditions: AND, OR, NOT
-- Active products under $100
SELECT name, price FROM products
WHERE price < 100 AND is_active = true;
-- Laptops OR desktops (by category_id)
SELECT name FROM products
WHERE category_id = 3 OR category_id = 4;
-- Everything except cancelled orders
SELECT order_id, status FROM orders
WHERE NOT status = 'cancelled';
Precedence matters, so use parentheses. AND binds tighter than OR, which causes a classic bug:
-- BUG: reads as category_id = 3 OR (category_id = 4 AND price < 600)
SELECT name, price, category_id FROM products
WHERE category_id = 3 OR category_id = 4 AND price < 600;
-- FIXED: parentheses make the intent explicit
SELECT name, price, category_id FROM products
WHERE (category_id = 3 OR category_id = 4) AND price < 600;
Common mistake. Relying on
AND/ORprecedence instead of parentheses. When a query mixesANDandOR, always parenthesize. It costs nothing and prevents a whole class of silent logic errors.
BETWEEN — ranges
BETWEEN is inclusive shorthand for two comparisons:
-- price >= 100 AND price <= 500 (both endpoints included)
SELECT name, price FROM products WHERE price BETWEEN 100 AND 500;
IN — membership in a list
IN tests whether a value is in a set — far cleaner than chaining ORs:
-- The verbose way:
SELECT name FROM products WHERE category_id = 3 OR category_id = 5 OR category_id = 8;
-- The clean way:
SELECT name FROM products WHERE category_id IN (3, 5, 8);
-- Works for text too:
SELECT order_id, status FROM orders WHERE status IN ('paid','shipped');
The list inside IN can also be a subquery — a powerful idea you'll meet in Chapter 9.
Common mistake.
NOT INwith a list that containsNULL. Because of three-valued logic (Chapter 3),x NOT IN (1, 2, NULL)is nevertrue— it returns no rows even when you expect matches. PreferNOT EXISTSfor "not in" against query results that might containNULL(Chapter 9).
LIKE — pattern matching
LIKE matches text against a pattern, with two wildcards: % (any sequence of characters, including none) and _ (exactly one character).
SELECT name FROM products WHERE name LIKE 'Ultra%'; -- starts with "Ultra"
SELECT email FROM customers WHERE email LIKE '%@example.com'; -- ends with that domain
SELECT name FROM products WHERE name LIKE '%Book%'; -- contains "Book"
SELECT sku FROM products WHERE sku LIKE 'LAP-00_'; -- LAP-00 then one char
LIKE is case-sensitive. PostgreSQL adds ILIKE for case-insensitive matching:
SELECT name FROM products WHERE name ILIKE 'ultrabook%'; -- matches "UltraBook…"
Dialect Difference.
ILIKEis PostgreSQL-specific. Standard SQL has onlyLIKE; for case-insensitive matching elsewhere you typically useLOWER(col) LIKE LOWER(pattern). For serious text search (ranking, stemming, large volumes),LIKE '%term%'is the wrong tool — you'll learn full-text search in Chapter 16.
IS NULL — testing for unknown
As you learned in Chapter 3, you cannot use = with NULL:
SELECT first_name, last_name FROM customers WHERE phone IS NULL; -- no phone
SELECT first_name, last_name FROM customers WHERE phone IS NOT NULL; -- has a phone
DISTINCT — removing duplicate rows
Because SQL uses bag semantics (Chapter 4), a projection can produce duplicate values. DISTINCT removes them:
SELECT category_id FROM products; -- 15 rows, category values repeat
SELECT DISTINCT category_id FROM products; -- one row per distinct category_id
DISTINCT applies to the whole row of selected columns, not just the first column:
-- distinct COMBINATIONS of (loyalty_tier, the existence of a phone)
SELECT DISTINCT loyalty_tier, (phone IS NOT NULL) AS has_phone FROM customers;
Performance note.
DISTINCTisn't free — the database must sort or hash the rows to find duplicates. Don't sprinkle it in "just in case." If you're getting unexpected duplicates, the usual cause is a join producing more rows than you expected (Chapter 6), and the right fix is usually to fix the join, not to paper over it withDISTINCT.
ORDER BY — sorting the result
Recall the lesson from Chapter 3: rows have no inherent order. If you want a specific order, you must ask for it. ORDER BY sorts the result:
SELECT name, price FROM products ORDER BY price; -- ascending (default)
SELECT name, price FROM products ORDER BY price DESC; -- descending
Sort by multiple columns — the second breaks ties in the first:
-- by tier, and within each tier alphabetically by last name
SELECT last_name, first_name, loyalty_tier
FROM customers
ORDER BY loyalty_tier, last_name;
You can sort by an alias, by an expression, or by SELECT-list position (though naming is clearer than position):
SELECT name, price * 0.9 AS sale_price FROM products ORDER BY sale_price DESC;
Where do NULLs sort? In PostgreSQL, NULLs sort last in ascending order and first in descending — you can control this explicitly:
SELECT first_name, phone FROM customers ORDER BY phone NULLS FIRST;
Dialect Difference.
NULLS FIRST/NULLS LASTis standard but databases differ in their defaults (PostgreSQL and Oracle putNULLs last on ascending; MySQL and SQL Server put them first). If null ordering matters, state it explicitly.
LIMIT and OFFSET — just some of the rows
LIMIT caps how many rows come back; OFFSET skips rows first. Together they're the basis of "top N" and pagination:
SELECT name, price FROM products ORDER BY price DESC LIMIT 3; -- 3 most expensive
SELECT name, price FROM products ORDER BY price DESC LIMIT 3 OFFSET 3; -- the next 3
name | price
-------------------+---------
UltraBook 16 Pro | 1899.00
TowerPro Desktop | 1499.00
UltraBook 14 | 1299.00
(3 rows)
Common mistake.
LIMITwithoutORDER BY. "Give me 10 rows" with no order returns some 10 rows — and which 10 can change between runs, because rows have no inherent order. A "top 10" query is only meaningful with anORDER BYthat defines what "top" means. Always pairLIMITwithORDER BY.Dialect Difference.
LIMIT/OFFSETis used by PostgreSQL, MySQL, and SQLite. The SQL standard spells itFETCH FIRST n ROWS ONLY(also supported by PostgreSQL); SQL Server usesTOPorOFFSET … FETCH; older Oracle usedROWNUM. See Appendix J.
Comments and formatting
Document and format your SQL — future-you will thank you:
-- This is a single-line comment.
/* This is a
multi-line comment. */
SELECT
name, -- the product's display name
price -- in dollars
FROM products
WHERE is_active -- boolean column; "= true" is optional
ORDER BY price DESC;
Note WHERE is_active — for a boolean column, the = true is redundant. Readable formatting (one clause per line, aligned) isn't cosmetic; it's how you catch logic errors in complex queries later.
Putting it together
A few realistic queries that combine everything:
-- The 5 cheapest active products, name and price:
SELECT name, price
FROM products
WHERE is_active = true
ORDER BY price
LIMIT 5;
-- Gold and platinum customers whose last name starts with N–Z, alphabetized:
SELECT first_name, last_name, loyalty_tier
FROM customers
WHERE loyalty_tier IN ('gold','platinum')
AND last_name >= 'N'
ORDER BY last_name, first_name;
-- Orders placed in 2024 that aren't cancelled, newest first, top 10:
SELECT order_id, customer_id, order_date, status
FROM orders
WHERE order_date >= DATE '2024-01-01'
AND status <> 'cancelled'
ORDER BY order_date DESC
LIMIT 10;
Notice how each reads almost like its English description. That's the goal: a query you can read aloud and understand. When you can do that reliably, you're fluent in basic SQL.
Literals, quoting, and type casts
Every value you type directly into a query — a number, a piece of text, a date — is a literal, and PostgreSQL is particular about how you write each kind. Getting this right eliminates a surprising share of beginner errors.
Text literals use single quotes. Always.
SELECT * FROM customers WHERE loyalty_tier = 'gold'; -- correct: single quotes
Double quotes mean something entirely different in SQL: they quote identifiers (table and column names), not text values. WHERE loyalty_tier = "gold" doesn't compare to the string "gold" — it tries to compare the column to a column named gold, and you get ERROR: column "gold" does not exist. This trips up everyone arriving from other programming languages, where the two quote styles are interchangeable. In SQL they are not: single quotes for data, double quotes for names (and you rarely need double quotes at all, since lowercase unquoted names work fine).
Apostrophes inside text are escaped by doubling them:
-- A product literally named O'Brien's Special → double the inner apostrophe:
SELECT * FROM products WHERE name = 'O''Brien''s Special';
(This doubling is also the seed of a security lesson: building SQL by pasting user text between quotes is how injection happens. We hold that thought for Chapter 32 — the fix is parameterized queries, not careful doubling.)
Number literals are written bare — 200, 19.99, -5. No quotes. Quoting a number ('200') makes it text, which PostgreSQL will often coerce back to a number, but relying on that is a habit worth avoiding.
Casts convert a value from one type to another, with the ::type shorthand (PostgreSQL-specific) or the standard CAST(value AS type):
SELECT '2024-03-15'::date; -- text → date (PostgreSQL shorthand)
SELECT CAST('42' AS integer); -- the standard, portable spelling
SELECT price::text || ' USD' AS label FROM products; -- number → text to concatenate
You'll cast constantly — to compare across types, to format output, to force exact arithmetic. When a query complains that an operator does not exist for two types, a cast is usually the fix.
Dialect Difference. The
::cast is PostgreSQL's;CAST(... AS ...)is the SQL standard and works everywhere. Identifier quoting with double quotes is standard, but several databases also accept backticks (MySQL) or square brackets (SQL Server). Stick to standard double quotes — or better, lowercase unquoted names — for portability.
Filtering on dates and times
A large fraction of real queries filter by when something happened — orders this month, customers who joined last year, shipments overdue today. PostgreSQL's date/time handling is excellent, and a few patterns cover most needs.
Write a date literal with the DATE keyword (or cast a string), and compare with the ordinary operators:
-- Orders placed on or after a specific date:
SELECT order_id, order_date FROM orders
WHERE order_date >= DATE '2024-01-01';
-- Orders in a specific month — note the half-open range, which is the
-- correct and bug-free way to express "all of January":
SELECT order_id, order_date FROM orders
WHERE order_date >= DATE '2024-01-01'
AND order_date < DATE '2024-02-01';
That half-open pattern (>= start AND < next-start) deserves emphasis. It's tempting to write BETWEEN '2024-01-01' AND '2024-01-31', but orders.order_date is a timestamptz — it carries a time, not just a day — so an order placed at 2024-01-31 14:30 is after midnight on the 31st and BETWEEN would still catch it, while an order at 2024-01-31 23:59 would too, yet you might accidentally exclude the boundary depending on times. The half-open >= Jan 1 AND < Feb 1 is unambiguous: it captures every instant in January and nothing else, regardless of time-of-day. Burn this pattern into muscle memory; it prevents a classic off-by-one-day reporting bug.
PostgreSQL also gives you the current moment and interval arithmetic:
SELECT now(); -- current date and time (timestamptz)
SELECT CURRENT_DATE; -- just today's date
-- Customers who joined in the last 90 days:
SELECT first_name, last_name, created_at FROM customers
WHERE created_at >= now() - INTERVAL '90 days';
-- Orders older than one year:
SELECT order_id FROM orders
WHERE order_date < now() - INTERVAL '1 year';
And you can extract parts of a date for filtering or grouping (you'll group by these in Chapter 7):
-- All orders placed in the year 2024, however you stored the time:
SELECT order_id FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024;
Common mistake. Using
EXTRACT(YEAR FROM order_date) = 2024when you have an index onorder_date. Wrapping the column in a function can prevent the database from using that index (Chapter 23), forcing a full scan. The half-open range (>= '2024-01-01' AND < '2025-01-01') returns the same rows and stays index-friendly. Prefer ranges over function-wrapping when filtering large tables — a perfect preview of theme #5.
CASE: choosing a value per row
Sometimes you want the SELECT list to compute different values depending on a condition — the SQL equivalent of an if/else. That's the CASE expression, and it's one of the most useful tools in the language:
SELECT
name,
price,
CASE
WHEN price >= 1000 THEN 'premium'
WHEN price >= 200 THEN 'mid-range'
ELSE 'budget'
END AS price_band
FROM products
ORDER BY price DESC;
name | price | price_band
-------------------+---------+------------
UltraBook 16 Pro | 1899.00 | premium
TowerPro Desktop | 1499.00 | premium
NordicSound Headp | 249.00 | mid-range
USB-C Cable | 12.00 | budget
CASE walks its WHEN clauses top to bottom and returns the first match's value (or the ELSE, or NULL if there's no ELSE). It works anywhere an expression is allowed — the SELECT list, ORDER BY, even WHERE — and it's how you bucket, label, and conditionally transform data without leaving SQL. There's also a shorter form for equality checks:
SELECT order_id,
CASE status
WHEN 'delivered' THEN 'done'
WHEN 'cancelled' THEN 'done'
ELSE 'in progress'
END AS phase
FROM orders;
You'll reach for CASE constantly — to create report categories, to pivot data (Chapter 7), and to handle special values inline. It pairs beautifully with the aggregation you'll learn next, where CASE inside SUM becomes a powerful counting trick.
Three-valued logic in WHERE, made concrete
Chapter 3 introduced three-valued logic; now that you're filtering for real, let's make its effect on WHERE precise, because it causes bugs that are maddening until you understand them. Recall there are three truth values — true, false, and unknown (the result of any comparison involving NULL) — and WHERE returns a row only when its condition evaluates to exactly true. unknown is treated like false: the row is dropped.
Here's how the connectives behave when unknown (call it U) is involved:
AND | T F U OR | T F U NOT
------+--------- -----+--------- -----
T | T F U T | T T T T → F
F | F F F F | T F U F → T
U | U F U U | T U U U → U
The practical consequences bite in real queries:
-- Intends "every order not cancelled". If status could be NULL, those
-- rows yield NULL <> 'cancelled' = U, and are DROPPED — silently missing!
SELECT order_id FROM orders WHERE status <> 'cancelled';
-- Safe version when NULLs are possible — handle them explicitly:
SELECT order_id FROM orders
WHERE status <> 'cancelled' OR status IS NULL;
This is why Chapter 3 urged you to declare columns NOT NULL whenever "missing" is meaningless: it removes unknown from the picture entirely, so your filters behave the way intuition expects. (Mercado's status is NOT NULL, so the first query is safe there — but in the wild, always ask "can this column be NULL?" before writing <> or NOT IN.)
Why this matters. Almost every "my
WHEREclause is mysteriously dropping rows" bug is three-valued logic meeting an unexpectedNULL. Internalize the truth tables and the habit of asking about nullability, and you'll debug in seconds what otherwise costs hours.
A wider gallery of queries
Fluency is pattern recognition, so here's a broader set of realistic single-table queries. Read each English description, predict the SQL, then check yourself:
-- "The 3 most expensive active products in category 3."
SELECT name, price FROM products
WHERE is_active = true AND category_id = 3
ORDER BY price DESC
LIMIT 3;
-- "Customers whose email is at example.com, alphabetized by last name."
SELECT last_name, first_name, email FROM customers
WHERE email ILIKE '%@example.com'
ORDER BY last_name;
-- "All products priced between $50 and $150, labeled by price band."
SELECT name, price,
CASE WHEN price < 100 THEN 'under 100' ELSE '100 and up' END AS band
FROM products
WHERE price BETWEEN 50 AND 150
ORDER BY price;
-- "Orders from 2024 that are paid or shipped, newest first, second page of 10."
SELECT order_id, order_date, status FROM orders
WHERE order_date >= DATE '2024-01-01' AND order_date < DATE '2025-01-01'
AND status IN ('paid','shipped')
ORDER BY order_date DESC
LIMIT 10 OFFSET 10;
-- "Distinct loyalty tiers actually present among customers."
SELECT DISTINCT loyalty_tier FROM customers ORDER BY loyalty_tier;
-- "Products whose SKU follows the laptop pattern LAP-0NN."
SELECT sku, name FROM products WHERE sku LIKE 'LAP-0__' ORDER BY sku;
Each of these is just the same skeleton — SELECT (π) the columns, FROM the table, WHERE (σ) the rows, then order and limit — varied by which operators fill the WHERE. There is no eighth secret clause; mastery of basic queries really is mastery of these pieces, recombined. The reason to type all six is theme #2 once more: the patterns move from your eyes to your fingers only by writing them.
Try this. Take the last query and break it on purpose: change
LIKE 'LAP-0__'toLIKE 'LAP-0%'and explain the difference in results (_is exactly one character;%is any number). Then add aCASEcolumn that flags products over $1000 as'flagship'. Deliberately modifying a working query — and predicting the change before you run it — is one of the fastest ways to build real intuition.
A performance note (a preview of Chapter 23)
The WHERE clause is where performance is won or lost. When you filter WHERE customer_id = 5, the database has two options: scan every row checking the condition (a sequential scan), or jump straight to the matching rows using an index (an index scan). On a small table it doesn't matter; on millions of rows it's the difference between 12 milliseconds and 45 seconds — the book's recurring anchor.
You don't need to act on this yet. Just register the idea, theme #5: the conditions you put in WHERE interact with indexes to determine speed. When we reach Chapters 23–24, you'll learn exactly which WHERE patterns can use an index (e.g., price > 200 can; WHERE LOWER(name) = '…' might not without help) and how to read the plan. For now: write correct queries, and know that how you filter will later matter for how fast they run.
SELECT without a table, and other small surprises
A SELECT doesn't strictly need a FROM at all. PostgreSQL is happy to evaluate a bare expression, which makes the prompt a handy calculator and a quick way to test how a function or an operator behaves before you trust it in a real query:
SELECT 2 + 2; -- 4
SELECT 'Hello, ' || 'world'; -- Hello, world
SELECT now(), CURRENT_DATE; -- the current timestamp and date
SELECT ROUND(19.99 * 0.9, 2); -- 17.99 — test a calculation in isolation
This habit — pulling an expression out of a complex query and testing it on its own with a no-FROM SELECT — is one of the most practical debugging moves you'll learn. When a computed column looks wrong, isolate just that expression, feed it known inputs, and confirm it does what you think before plugging it back in. Small surprises lurk in expressions, and isolating them is how you find them fast.
Speaking of small surprises: a few behaviors catch nearly everyone at least once. Knowing them now saves a confusing afternoon later.
- Integer division truncates.
SELECT 7 / 2;returns3, not3.5, because both operands are integers and integer division discards the remainder. To get a decimal, make at least one side non-integer:7 / 2.0, or cast:7::numeric / 2. This single gotcha is behind a remarkable number of "my percentages are all zero" bugs —count_a / count_bis0whenevercount_a < count_bbecause both are integers. NULLis contagious in arithmetic. Any arithmetic withNULLyieldsNULL:price * NULL,5 + NULL, evenNULL || 'text'all produceNULL(the absence of a value poisons the whole expression). If a computed column is mysteriously empty for some rows, aNULLinput is the usual culprit — wrap the nullable input inCOALESCEto supply a fallback.- Division by zero is an error, not
NULL.SELECT 1 / 0;raisesERROR: division by zeroand aborts the statement. When a denominator might be zero, guard it — aCASEorNULLIF(denominator, 0)(which turns a zero intoNULL, making the whole divisionNULLrather than an error) is the standard trick.
-- Safe average price per item, guarding against a zero quantity:
SELECT order_id,
SUM(unit_price * quantity) / NULLIF(SUM(quantity), 0) AS avg_unit_price
FROM order_items
GROUP BY order_id; -- (grouping is Chapter 7; note the NULLIF guard)
Common mistake. Computing a ratio of two counts and getting
0everywhere. It's integer division:successes / attemptswith both integers truncates to zero whenever successes < attempts. Cast one side —successes::numeric / attempts— and the percentages appear. This is so common it's worth checking first whenever a rate or percentage looks suspiciously like all zeros.
Equivalent spellings: when several queries say the same thing
SQL often gives you more than one way to express the same filter, and part of fluency is choosing the spelling that reads most clearly. Consider "products in category 3, 5, or 8." Three equivalent forms:
WHERE category_id = 3 OR category_id = 5 OR category_id = 8 -- explicit, verbose
WHERE category_id IN (3, 5, 8) -- the clean idiom
WHERE category_id = ANY (ARRAY[3, 5, 8]) -- the array form
All three return identical results. The IN (...) form is almost always the most readable for a literal list, so prefer it. The = ANY (array) form looks odd here, but it becomes invaluable later, when the set of values lives in an array variable passed from application code (Chapter 29) — you can't drop an array into an IN list, but = ANY takes one directly. Knowing they're equivalent means you can pick whichever fits the situation without changing the meaning.
Likewise, a range has two spellings:
WHERE price >= 100 AND price <= 500 -- two comparisons
WHERE price BETWEEN 100 AND 500 -- inclusive shorthand, identical result
BETWEEN is tidy and reads well — just remember it's inclusive on both ends, which is occasionally not what you want (recall the date-range discussion, where a half-open >= ... AND < ... is safer for timestamps). The skill isn't memorizing which form is "correct" — they're equal — but developing taste for which one makes a given query easiest for the next human to read. Readable SQL is correct SQL that stays correct, because people can see what it means.
A reference: the operators you'll use in WHERE
For quick recall, here is the working set of filtering tools from this chapter in one place. Keep it handy until they're reflexes:
| Operator | Meaning | Example |
|---|---|---|
= <> != |
equal / not equal | status = 'paid' |
< > <= >= |
ordered comparison | price >= 200 |
AND OR NOT |
combine conditions (parenthesize when mixing) | a AND (b OR c) |
BETWEEN x AND y |
inclusive range | price BETWEEN 100 AND 500 |
IN (…) |
membership in a list | category_id IN (3,5,8) |
NOT IN (…) |
absence — beware NULL in the list |
id NOT IN (1,2,3) |
LIKE / ILIKE |
pattern match (%, _); ILIKE is case-insensitive |
name LIKE 'Ultra%' |
IS NULL / IS NOT NULL |
test for unknown/absent | phone IS NULL |
IS DISTINCT FROM |
NULL-safe "not equal" |
phone IS DISTINCT FROM '555' |
A few notes that elevate this from a list to working knowledge:
- Comparisons respect type. On text,
<and>follow collation (roughly alphabetical); on dates, they follow chronology; on numbers, magnitude.WHERE last_name >= 'N'works because text is ordered. NOTcan wrap any condition, but it's often clearer to negate the operator instead:NOT price > 200versusprice <= 200. Prefer the form a human reads fastest.- The dangerous trio with
NULL—<>,NOT IN, andNOT BETWEEN— can all silently drop rows where the column isNULL, thanks to three-valued logic. When a column is nullable, pair these with an explicitOR col IS NULL, or useIS DISTINCT FROM. LIKEanchoring matters for performance.LIKE 'Ultra%'(anchored at the start) can use an index;LIKE '%book'(leading wildcard) generally cannot, because the database can't know where to start looking. Hold that thought for Chapter 23.
This compact toolkit, combined with AND/OR and the logical-evaluation order, is genuinely most of what WHERE ever needs. Everything fancier — subquery filters (Chapter 9), full-text search (Chapter 16) — builds on exactly these foundations. Master the small set here, and the large set later will feel like natural extensions rather than new material.
A method for fixing a query that's wrong
When a query returns the wrong rows — too many, too few, or strange values — flailing at it rarely helps. A reliable method, which leans on everything in this chapter:
- Check the row count first. Far more rows than expected often means an accidental cross join (Chapter 6) or a missing
WHEREcondition. Far fewer often means aNULLinteracting with<>orNOT IN(three-valued logic), or an over-tightAND. - Strip it down, then build it back up. Remove clauses until it works, then add them back one at a time. The clause that breaks it is the culprit. Start with just
SELECT * FROM table LIMIT 5to confirm the data looks how you expect, then layer on theWHERE. - Isolate suspicious expressions with a no-
FROMSELECT, as above. If aCASEor a calculation is misbehaving, test it on known inputs alone. - Make
NULLs visible (\pset null '∅'from Chapter 2) so you can see whether a blank is an empty string or aNULL— they behave completely differently in filters. - Read the clauses in logical order, not written order:
FROM, thenWHERE, thenSELECT. Many "why doesn't my alias work inWHERE?" puzzles dissolve the moment you rememberWHEREruns before theSELECTlist exists.
This is a debugging discipline, and it pays off for the rest of the book — the queries only get more complex, but the method scales. Most query bugs are not exotic; they're a NULL you forgot, a precedence you assumed, a join condition you dropped, or an integer division you didn't expect. The method above catches all four.
Why this matters. Theme #3 — understand the WHY — is what turns debugging from guesswork into diagnosis. When you know why
WHEREcan't see an alias, why<>dropsNULLrows, and why7/2is3, a wrong result stops being a mystery and becomes a short checklist. That shift, from hoping to knowing, is the whole point of learning the fundamentals rather than just the syntax.
Progressive project: your first real queries
Against your project database (or, if it's still empty, against Mercado as a stand-in), answer five of the questions you listed back in Chapter 1 using only SELECT/FROM/WHERE/ORDER BY/LIMIT. For example, for a library: "the 10 most recently added books," "all members in the 'student' category," "books priced (or fined) above a threshold." For each:
- Write the query.
- Identify the projection (π — your
SELECTlist) and the selection (σ — yourWHERE). - Add an
ORDER BYthat makes the result genuinely useful, and aLIMITif it's a "top N."
Save them in project-notes.md. These are the first queries of the application you're building — and you'll extend them with joins and aggregation in the chapters ahead.
Summary
The SELECT statement is SQL's workhorse. FROM chooses the source relation; the SELECT list is projection (π — which columns, plus computed expressions and AS aliases); WHERE is selection (σ — which rows). The clauses are evaluated FROM → WHERE → SELECT → ORDER BY → LIMIT, which is why WHERE can't see SELECT-list aliases but ORDER BY can. WHERE conditions combine comparison operators with AND/OR/NOT (parenthesize when mixing!), plus BETWEEN, IN, LIKE/ILIKE, and IS NULL. DISTINCT removes duplicate result rows; ORDER BY imposes order (rows have none by default); LIMIT/OFFSET take a slice (always with ORDER BY). Write these constantly until they're automatic — everything in Part II builds on them.
You can now: - Write
SELECT … FROM … WHERE … ORDER BY … LIMITqueries fluently. - Project columns, compute expressions, and alias withAS. - Filter with comparisons,AND/OR/NOT(with correct parenthesization),BETWEEN,IN,LIKE/ILIKE, andIS NULL. - Explain the logical order of evaluation and why it governs alias visibility. - UseDISTINCT,ORDER BY(includingNULLS FIRST/LAST), andLIMIT/OFFSETcorrectly. - Recognize thatWHEREconditions will later interact with indexes for performance.
What's next. Chapter 6 — JOINs — the most important SQL skill, where you connect Mercado's tables and answer questions that span customers, orders, and products. You saw a join as "product + selection" in Chapter 4; now you'll wield every kind, building from a two-table join to a six-table business query.
Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.