22 min read

> Where you are: Part II, Chapter 5 of 40 — the first chapter of SQL Mastery, the heart of the book. You know the model (Ch. 3) and the algebra (Ch. 4). Now you write SQL in earnest. By the end of this chapter, basic queries should start to feel...

Chapter 5: SELECT, FROM, WHERE — Your First Queries

Where you are: Part II, Chapter 5 of 40 — the first chapter of SQL Mastery, the heart of the book. You know the model (Ch. 3) and the algebra (Ch. 4). Now you write SQL in earnest. By the end of this chapter, basic queries should start to feel automatic.

Learning paths: 💻 📊 🔬 🏗️ — everyone, no exceptions. Every later chapter assumes total fluency with what's here. Type every example.


The query you'll write ten thousand times

The SELECT statement is the workhorse of SQL. You will write it more than any other single thing in your database career — to answer questions, to check your work, to debug, to build reports, to feed applications. Master its three core clauses — SELECT, FROM, WHERE — and you can already answer a huge fraction of the questions anyone will ever ask of a database.

This is theme #2 in full force: SQL is a language, learned by writing it. Reading this chapter will teach you about SELECT. Typing every query, breaking it, and fixing it will teach you SELECT. Keep a psql window open and follow along against the mercado database (loaded in Chapter 2 with seed-sample.sql, so your output matches the book's).

A complete basic query has this shape:

SELECT   <columns>      -- WHAT to show     (projection, π)
FROM     <table>        -- WHERE FROM       (the relation)
WHERE    <condition>    -- WHICH rows       (selection, σ)
ORDER BY <columns>      -- in what order
LIMIT    <n>;           -- how many

You met SELECT, FROM, and WHERE as relational-algebra operations in Chapter 4 — projection (π), the source relation, and selection (σ). Now you'll use them for real.


The simplest query

FROM names the table; the SELECT list names the columns. The asterisk * means "all columns":

SELECT * FROM categories;
 category_id |    name     |    slug     | parent_category_id |        description
-------------+-------------+-------------+--------------------+----------------------------
           1 | Electronics | electronics |                    | Devices and gadgets
           2 | Computers   | computers   |                  1 | Laptops, desktops, and parts
           3 | Laptops     | laptops     |                  2 | Portable computers
 ...
(8 rows)

Usually you want only some columns. List them, comma-separated, in the order you want them displayed:

SELECT name, price FROM products;

This is projection (π): you're choosing columns. Selecting only the columns you need is also a good habit for performance and clarity — more on that at the end.

Common mistake. SELECT * in application code and saved reports. It's fine for interactive exploration, but in code it's fragile: if someone adds a column, your result shape changes unexpectedly, and you pull more data than you need over the network. Name the columns you actually want. (Interactively, * is your friend — use it to explore.)

Expressions and aliases in the SELECT list

The SELECT list isn't limited to bare column names. You can compute values, and you can rename outputs with AS (an alias):

SELECT
    name,
    price,
    price * 0.9 AS sale_price,           -- a computed column
    ROUND(price * 0.9, 2) AS sale_rounded
FROM products;
       name        |  price  | sale_price | sale_rounded
-------------------+---------+------------+--------------
 UltraBook 14      | 1299.00 | 1169.100   |      1169.10
 UltraBook 16 Pro  | 1899.00 | 1709.100   |      1709.10
 ...

The alias is the column's name in the result. Aliases make output readable and let you refer to computed columns in ORDER BY. You can also concatenate text with ||:

SELECT first_name || ' ' || last_name AS full_name, loyalty_tier
FROM customers;

Dialect Difference. String concatenation with || is standard SQL and works in PostgreSQL, Oracle, and SQLite. MySQL uses CONCAT(a, b) by default (its || means OR unless configured otherwise), and SQL Server uses +. See Appendix J. The AS keyword is itself optional — price * 0.9 sale_price works too — but writing AS is clearer.


The logical order of evaluation

Here's something that confuses nearly everyone at first, and clears up a lot once understood. You write the clauses in this order:

SELECT … FROM … WHERE … ORDER BY … LIMIT …

But the database logically evaluates them in a different order:

   1. FROM      pick the source relation (and joins, Ch. 6)
   2. WHERE     keep rows matching the condition          (σ)
   3. SELECT    compute and choose the output columns     (π)
   4. ORDER BY  sort the result
   5. LIMIT     keep the first n rows

This is exactly the order of the algebra expression from Chapter 4: source → selection → projection → sort → limit. It explains several things that otherwise seem arbitrary:

  • Why WHERE can't use a SELECT-list alias. Because WHERE runs before the SELECT list is computed, an alias defined there doesn't exist yet: sql -- ERROR: column "sale_price" does not exist SELECT price * 0.9 AS sale_price FROM products WHERE sale_price < 100; -- Correct: repeat the expression, since WHERE runs first SELECT price * 0.9 AS sale_price FROM products WHERE price * 0.9 < 100;
  • Why ORDER BY can use an alias. Because ORDER BY runs after the SELECT list, the alias already exists: sql SELECT price * 0.9 AS sale_price FROM products ORDER BY sale_price; -- fine

Keep this order in your head; it resolves a surprising number of "why doesn't this work?" moments throughout Part II.


WHERE: filtering rows

The WHERE clause is selection (σ) — it keeps only the rows for which a condition is true. (Recall from Chapter 3: rows where the condition is NULL/unknown are excluded.)

Comparison operators

SELECT name, price FROM products WHERE price > 200;
SELECT * FROM orders WHERE status = 'delivered';
SELECT * FROM products WHERE price <> 1299.00;   -- <> means "not equal"; != also works

The operators: = (equal), <> or != (not equal), <, >, <=, >=. They work on numbers, text (alphabetical/collation order), dates, and more.

Combining conditions: AND, OR, NOT

-- Active products under $100
SELECT name, price FROM products
WHERE price < 100 AND is_active = true;

-- Laptops OR desktops (by category_id)
SELECT name FROM products
WHERE category_id = 3 OR category_id = 4;

-- Everything except cancelled orders
SELECT order_id, status FROM orders
WHERE NOT status = 'cancelled';

Precedence matters, so use parentheses. AND binds tighter than OR, which causes a classic bug:

-- BUG: reads as  category_id = 3  OR  (category_id = 4 AND price < 600)
SELECT name, price, category_id FROM products
WHERE category_id = 3 OR category_id = 4 AND price < 600;

-- FIXED: parentheses make the intent explicit
SELECT name, price, category_id FROM products
WHERE (category_id = 3 OR category_id = 4) AND price < 600;

Common mistake. Relying on AND/OR precedence instead of parentheses. When a query mixes AND and OR, always parenthesize. It costs nothing and prevents a whole class of silent logic errors.

BETWEEN — ranges

BETWEEN is inclusive shorthand for two comparisons:

-- price >= 100 AND price <= 500  (both endpoints included)
SELECT name, price FROM products WHERE price BETWEEN 100 AND 500;

IN — membership in a list

IN tests whether a value is in a set — far cleaner than chaining ORs:

-- The verbose way:
SELECT name FROM products WHERE category_id = 3 OR category_id = 5 OR category_id = 8;
-- The clean way:
SELECT name FROM products WHERE category_id IN (3, 5, 8);

-- Works for text too:
SELECT order_id, status FROM orders WHERE status IN ('paid','shipped');

The list inside IN can also be a subquery — a powerful idea you'll meet in Chapter 9.

Common mistake. NOT IN with a list that contains NULL. Because of three-valued logic (Chapter 3), x NOT IN (1, 2, NULL) is never true — it returns no rows even when you expect matches. Prefer NOT EXISTS for "not in" against query results that might contain NULL (Chapter 9).

LIKE — pattern matching

LIKE matches text against a pattern, with two wildcards: % (any sequence of characters, including none) and _ (exactly one character).

SELECT name FROM products WHERE name LIKE 'Ultra%';   -- starts with "Ultra"
SELECT email FROM customers WHERE email LIKE '%@example.com';  -- ends with that domain
SELECT name FROM products WHERE name LIKE '%Book%';    -- contains "Book"
SELECT sku FROM products WHERE sku LIKE 'LAP-00_';     -- LAP-00 then one char

LIKE is case-sensitive. PostgreSQL adds ILIKE for case-insensitive matching:

SELECT name FROM products WHERE name ILIKE 'ultrabook%';   -- matches "UltraBook…"

Dialect Difference. ILIKE is PostgreSQL-specific. Standard SQL has only LIKE; for case-insensitive matching elsewhere you typically use LOWER(col) LIKE LOWER(pattern). For serious text search (ranking, stemming, large volumes), LIKE '%term%' is the wrong tool — you'll learn full-text search in Chapter 16.

IS NULL — testing for unknown

As you learned in Chapter 3, you cannot use = with NULL:

SELECT first_name, last_name FROM customers WHERE phone IS NULL;       -- no phone
SELECT first_name, last_name FROM customers WHERE phone IS NOT NULL;   -- has a phone

DISTINCT — removing duplicate rows

Because SQL uses bag semantics (Chapter 4), a projection can produce duplicate values. DISTINCT removes them:

SELECT category_id FROM products;            -- 15 rows, category values repeat
SELECT DISTINCT category_id FROM products;   -- one row per distinct category_id

DISTINCT applies to the whole row of selected columns, not just the first column:

-- distinct COMBINATIONS of (loyalty_tier, the existence of a phone)
SELECT DISTINCT loyalty_tier, (phone IS NOT NULL) AS has_phone FROM customers;

Performance note. DISTINCT isn't free — the database must sort or hash the rows to find duplicates. Don't sprinkle it in "just in case." If you're getting unexpected duplicates, the usual cause is a join producing more rows than you expected (Chapter 6), and the right fix is usually to fix the join, not to paper over it with DISTINCT.


ORDER BY — sorting the result

Recall the lesson from Chapter 3: rows have no inherent order. If you want a specific order, you must ask for it. ORDER BY sorts the result:

SELECT name, price FROM products ORDER BY price;          -- ascending (default)
SELECT name, price FROM products ORDER BY price DESC;     -- descending

Sort by multiple columns — the second breaks ties in the first:

-- by tier, and within each tier alphabetically by last name
SELECT last_name, first_name, loyalty_tier
FROM customers
ORDER BY loyalty_tier, last_name;

You can sort by an alias, by an expression, or by SELECT-list position (though naming is clearer than position):

SELECT name, price * 0.9 AS sale_price FROM products ORDER BY sale_price DESC;

Where do NULLs sort? In PostgreSQL, NULLs sort last in ascending order and first in descending — you can control this explicitly:

SELECT first_name, phone FROM customers ORDER BY phone NULLS FIRST;

Dialect Difference. NULLS FIRST/NULLS LAST is standard but databases differ in their defaults (PostgreSQL and Oracle put NULLs last on ascending; MySQL and SQL Server put them first). If null ordering matters, state it explicitly.


LIMIT and OFFSET — just some of the rows

LIMIT caps how many rows come back; OFFSET skips rows first. Together they're the basis of "top N" and pagination:

SELECT name, price FROM products ORDER BY price DESC LIMIT 3;        -- 3 most expensive
SELECT name, price FROM products ORDER BY price DESC LIMIT 3 OFFSET 3;  -- the next 3
       name        |  price
-------------------+---------
 UltraBook 16 Pro  | 1899.00
 TowerPro Desktop  | 1499.00
 UltraBook 14      | 1299.00
(3 rows)

Common mistake. LIMIT without ORDER BY. "Give me 10 rows" with no order returns some 10 rows — and which 10 can change between runs, because rows have no inherent order. A "top 10" query is only meaningful with an ORDER BY that defines what "top" means. Always pair LIMIT with ORDER BY.

Dialect Difference. LIMIT/OFFSET is used by PostgreSQL, MySQL, and SQLite. The SQL standard spells it FETCH FIRST n ROWS ONLY (also supported by PostgreSQL); SQL Server uses TOP or OFFSET … FETCH; older Oracle used ROWNUM. See Appendix J.


Comments and formatting

Document and format your SQL — future-you will thank you:

-- This is a single-line comment.
/* This is a
   multi-line comment. */
SELECT
    name,           -- the product's display name
    price           -- in dollars
FROM products
WHERE is_active     -- boolean column; "= true" is optional
ORDER BY price DESC;

Note WHERE is_active — for a boolean column, the = true is redundant. Readable formatting (one clause per line, aligned) isn't cosmetic; it's how you catch logic errors in complex queries later.


Putting it together

A few realistic queries that combine everything:

-- The 5 cheapest active products, name and price:
SELECT name, price
FROM products
WHERE is_active = true
ORDER BY price
LIMIT 5;

-- Gold and platinum customers whose last name starts with N–Z, alphabetized:
SELECT first_name, last_name, loyalty_tier
FROM customers
WHERE loyalty_tier IN ('gold','platinum')
  AND last_name >= 'N'
ORDER BY last_name, first_name;

-- Orders placed in 2024 that aren't cancelled, newest first, top 10:
SELECT order_id, customer_id, order_date, status
FROM orders
WHERE order_date >= DATE '2024-01-01'
  AND status <> 'cancelled'
ORDER BY order_date DESC
LIMIT 10;

Notice how each reads almost like its English description. That's the goal: a query you can read aloud and understand. When you can do that reliably, you're fluent in basic SQL.


Literals, quoting, and type casts

Every value you type directly into a query — a number, a piece of text, a date — is a literal, and PostgreSQL is particular about how you write each kind. Getting this right eliminates a surprising share of beginner errors.

Text literals use single quotes. Always.

SELECT * FROM customers WHERE loyalty_tier = 'gold';   -- correct: single quotes

Double quotes mean something entirely different in SQL: they quote identifiers (table and column names), not text values. WHERE loyalty_tier = "gold" doesn't compare to the string "gold" — it tries to compare the column to a column named gold, and you get ERROR: column "gold" does not exist. This trips up everyone arriving from other programming languages, where the two quote styles are interchangeable. In SQL they are not: single quotes for data, double quotes for names (and you rarely need double quotes at all, since lowercase unquoted names work fine).

Apostrophes inside text are escaped by doubling them:

-- A product literally named  O'Brien's Special  →  double the inner apostrophe:
SELECT * FROM products WHERE name = 'O''Brien''s Special';

(This doubling is also the seed of a security lesson: building SQL by pasting user text between quotes is how injection happens. We hold that thought for Chapter 32 — the fix is parameterized queries, not careful doubling.)

Number literals are written bare — 200, 19.99, -5. No quotes. Quoting a number ('200') makes it text, which PostgreSQL will often coerce back to a number, but relying on that is a habit worth avoiding.

Casts convert a value from one type to another, with the ::type shorthand (PostgreSQL-specific) or the standard CAST(value AS type):

SELECT '2024-03-15'::date;                 -- text → date (PostgreSQL shorthand)
SELECT CAST('42' AS integer);              -- the standard, portable spelling
SELECT price::text || ' USD' AS label FROM products;   -- number → text to concatenate

You'll cast constantly — to compare across types, to format output, to force exact arithmetic. When a query complains that an operator does not exist for two types, a cast is usually the fix.

Dialect Difference. The :: cast is PostgreSQL's; CAST(... AS ...) is the SQL standard and works everywhere. Identifier quoting with double quotes is standard, but several databases also accept backticks (MySQL) or square brackets (SQL Server). Stick to standard double quotes — or better, lowercase unquoted names — for portability.


Filtering on dates and times

A large fraction of real queries filter by when something happened — orders this month, customers who joined last year, shipments overdue today. PostgreSQL's date/time handling is excellent, and a few patterns cover most needs.

Write a date literal with the DATE keyword (or cast a string), and compare with the ordinary operators:

-- Orders placed on or after a specific date:
SELECT order_id, order_date FROM orders
WHERE order_date >= DATE '2024-01-01';

-- Orders in a specific month — note the half-open range, which is the
-- correct and bug-free way to express "all of January":
SELECT order_id, order_date FROM orders
WHERE order_date >= DATE '2024-01-01'
  AND order_date <  DATE '2024-02-01';

That half-open pattern (>= start AND < next-start) deserves emphasis. It's tempting to write BETWEEN '2024-01-01' AND '2024-01-31', but orders.order_date is a timestamptz — it carries a time, not just a day — so an order placed at 2024-01-31 14:30 is after midnight on the 31st and BETWEEN would still catch it, while an order at 2024-01-31 23:59 would too, yet you might accidentally exclude the boundary depending on times. The half-open >= Jan 1 AND < Feb 1 is unambiguous: it captures every instant in January and nothing else, regardless of time-of-day. Burn this pattern into muscle memory; it prevents a classic off-by-one-day reporting bug.

PostgreSQL also gives you the current moment and interval arithmetic:

SELECT now();                              -- current date and time (timestamptz)
SELECT CURRENT_DATE;                       -- just today's date

-- Customers who joined in the last 90 days:
SELECT first_name, last_name, created_at FROM customers
WHERE created_at >= now() - INTERVAL '90 days';

-- Orders older than one year:
SELECT order_id FROM orders
WHERE order_date < now() - INTERVAL '1 year';

And you can extract parts of a date for filtering or grouping (you'll group by these in Chapter 7):

-- All orders placed in the year 2024, however you stored the time:
SELECT order_id FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024;

Common mistake. Using EXTRACT(YEAR FROM order_date) = 2024 when you have an index on order_date. Wrapping the column in a function can prevent the database from using that index (Chapter 23), forcing a full scan. The half-open range (>= '2024-01-01' AND < '2025-01-01') returns the same rows and stays index-friendly. Prefer ranges over function-wrapping when filtering large tables — a perfect preview of theme #5.


CASE: choosing a value per row

Sometimes you want the SELECT list to compute different values depending on a condition — the SQL equivalent of an if/else. That's the CASE expression, and it's one of the most useful tools in the language:

SELECT
    name,
    price,
    CASE
        WHEN price >= 1000 THEN 'premium'
        WHEN price >= 200  THEN 'mid-range'
        ELSE 'budget'
    END AS price_band
FROM products
ORDER BY price DESC;
       name        |  price  | price_band
-------------------+---------+------------
 UltraBook 16 Pro  | 1899.00 | premium
 TowerPro Desktop  | 1499.00 | premium
 NordicSound Headp |  249.00 | mid-range
 USB-C Cable       |   12.00 | budget

CASE walks its WHEN clauses top to bottom and returns the first match's value (or the ELSE, or NULL if there's no ELSE). It works anywhere an expression is allowed — the SELECT list, ORDER BY, even WHERE — and it's how you bucket, label, and conditionally transform data without leaving SQL. There's also a shorter form for equality checks:

SELECT order_id,
       CASE status
           WHEN 'delivered' THEN 'done'
           WHEN 'cancelled' THEN 'done'
           ELSE 'in progress'
       END AS phase
FROM orders;

You'll reach for CASE constantly — to create report categories, to pivot data (Chapter 7), and to handle special values inline. It pairs beautifully with the aggregation you'll learn next, where CASE inside SUM becomes a powerful counting trick.


Three-valued logic in WHERE, made concrete

Chapter 3 introduced three-valued logic; now that you're filtering for real, let's make its effect on WHERE precise, because it causes bugs that are maddening until you understand them. Recall there are three truth values — true, false, and unknown (the result of any comparison involving NULL) — and WHERE returns a row only when its condition evaluates to exactly true. unknown is treated like false: the row is dropped.

Here's how the connectives behave when unknown (call it U) is involved:

   AND  | T  F  U          OR   | T  F  U          NOT
   ------+---------         -----+---------         -----
    T    | T  F  U           T   | T  T  T           T → F
    F    | F  F  F           F   | T  F  U           F → T
    U    | U  F  U           U   | T  U  U           U → U

The practical consequences bite in real queries:

-- Intends "every order not cancelled". If status could be NULL, those
-- rows yield NULL <> 'cancelled' = U, and are DROPPED — silently missing!
SELECT order_id FROM orders WHERE status <> 'cancelled';

-- Safe version when NULLs are possible — handle them explicitly:
SELECT order_id FROM orders
WHERE status <> 'cancelled' OR status IS NULL;

This is why Chapter 3 urged you to declare columns NOT NULL whenever "missing" is meaningless: it removes unknown from the picture entirely, so your filters behave the way intuition expects. (Mercado's status is NOT NULL, so the first query is safe there — but in the wild, always ask "can this column be NULL?" before writing <> or NOT IN.)

Why this matters. Almost every "my WHERE clause is mysteriously dropping rows" bug is three-valued logic meeting an unexpected NULL. Internalize the truth tables and the habit of asking about nullability, and you'll debug in seconds what otherwise costs hours.


Fluency is pattern recognition, so here's a broader set of realistic single-table queries. Read each English description, predict the SQL, then check yourself:

-- "The 3 most expensive active products in category 3."
SELECT name, price FROM products
WHERE is_active = true AND category_id = 3
ORDER BY price DESC
LIMIT 3;

-- "Customers whose email is at example.com, alphabetized by last name."
SELECT last_name, first_name, email FROM customers
WHERE email ILIKE '%@example.com'
ORDER BY last_name;

-- "All products priced between $50 and $150, labeled by price band."
SELECT name, price,
       CASE WHEN price < 100 THEN 'under 100' ELSE '100 and up' END AS band
FROM products
WHERE price BETWEEN 50 AND 150
ORDER BY price;

-- "Orders from 2024 that are paid or shipped, newest first, second page of 10."
SELECT order_id, order_date, status FROM orders
WHERE order_date >= DATE '2024-01-01' AND order_date < DATE '2025-01-01'
  AND status IN ('paid','shipped')
ORDER BY order_date DESC
LIMIT 10 OFFSET 10;

-- "Distinct loyalty tiers actually present among customers."
SELECT DISTINCT loyalty_tier FROM customers ORDER BY loyalty_tier;

-- "Products whose SKU follows the laptop pattern LAP-0NN."
SELECT sku, name FROM products WHERE sku LIKE 'LAP-0__' ORDER BY sku;

Each of these is just the same skeleton — SELECT (π) the columns, FROM the table, WHERE (σ) the rows, then order and limit — varied by which operators fill the WHERE. There is no eighth secret clause; mastery of basic queries really is mastery of these pieces, recombined. The reason to type all six is theme #2 once more: the patterns move from your eyes to your fingers only by writing them.

Try this. Take the last query and break it on purpose: change LIKE 'LAP-0__' to LIKE 'LAP-0%' and explain the difference in results (_ is exactly one character; % is any number). Then add a CASE column that flags products over $1000 as 'flagship'. Deliberately modifying a working query — and predicting the change before you run it — is one of the fastest ways to build real intuition.


A performance note (a preview of Chapter 23)

The WHERE clause is where performance is won or lost. When you filter WHERE customer_id = 5, the database has two options: scan every row checking the condition (a sequential scan), or jump straight to the matching rows using an index (an index scan). On a small table it doesn't matter; on millions of rows it's the difference between 12 milliseconds and 45 seconds — the book's recurring anchor.

You don't need to act on this yet. Just register the idea, theme #5: the conditions you put in WHERE interact with indexes to determine speed. When we reach Chapters 23–24, you'll learn exactly which WHERE patterns can use an index (e.g., price > 200 can; WHERE LOWER(name) = '…' might not without help) and how to read the plan. For now: write correct queries, and know that how you filter will later matter for how fast they run.


SELECT without a table, and other small surprises

A SELECT doesn't strictly need a FROM at all. PostgreSQL is happy to evaluate a bare expression, which makes the prompt a handy calculator and a quick way to test how a function or an operator behaves before you trust it in a real query:

SELECT 2 + 2;                       -- 4
SELECT 'Hello, ' || 'world';        -- Hello, world
SELECT now(), CURRENT_DATE;         -- the current timestamp and date
SELECT ROUND(19.99 * 0.9, 2);       -- 17.99 — test a calculation in isolation

This habit — pulling an expression out of a complex query and testing it on its own with a no-FROM SELECT — is one of the most practical debugging moves you'll learn. When a computed column looks wrong, isolate just that expression, feed it known inputs, and confirm it does what you think before plugging it back in. Small surprises lurk in expressions, and isolating them is how you find them fast.

Speaking of small surprises: a few behaviors catch nearly everyone at least once. Knowing them now saves a confusing afternoon later.

  • Integer division truncates. SELECT 7 / 2; returns 3, not 3.5, because both operands are integers and integer division discards the remainder. To get a decimal, make at least one side non-integer: 7 / 2.0, or cast: 7::numeric / 2. This single gotcha is behind a remarkable number of "my percentages are all zero" bugs — count_a / count_b is 0 whenever count_a < count_b because both are integers.
  • NULL is contagious in arithmetic. Any arithmetic with NULL yields NULL: price * NULL, 5 + NULL, even NULL || 'text' all produce NULL (the absence of a value poisons the whole expression). If a computed column is mysteriously empty for some rows, a NULL input is the usual culprit — wrap the nullable input in COALESCE to supply a fallback.
  • Division by zero is an error, not NULL. SELECT 1 / 0; raises ERROR: division by zero and aborts the statement. When a denominator might be zero, guard it — a CASE or NULLIF(denominator, 0) (which turns a zero into NULL, making the whole division NULL rather than an error) is the standard trick.
-- Safe average price per item, guarding against a zero quantity:
SELECT order_id,
       SUM(unit_price * quantity) / NULLIF(SUM(quantity), 0) AS avg_unit_price
FROM order_items
GROUP BY order_id;   -- (grouping is Chapter 7; note the NULLIF guard)

Common mistake. Computing a ratio of two counts and getting 0 everywhere. It's integer division: successes / attempts with both integers truncates to zero whenever successes < attempts. Cast one side — successes::numeric / attempts — and the percentages appear. This is so common it's worth checking first whenever a rate or percentage looks suspiciously like all zeros.


Equivalent spellings: when several queries say the same thing

SQL often gives you more than one way to express the same filter, and part of fluency is choosing the spelling that reads most clearly. Consider "products in category 3, 5, or 8." Three equivalent forms:

WHERE category_id = 3 OR category_id = 5 OR category_id = 8   -- explicit, verbose
WHERE category_id IN (3, 5, 8)                                -- the clean idiom
WHERE category_id = ANY (ARRAY[3, 5, 8])                      -- the array form

All three return identical results. The IN (...) form is almost always the most readable for a literal list, so prefer it. The = ANY (array) form looks odd here, but it becomes invaluable later, when the set of values lives in an array variable passed from application code (Chapter 29) — you can't drop an array into an IN list, but = ANY takes one directly. Knowing they're equivalent means you can pick whichever fits the situation without changing the meaning.

Likewise, a range has two spellings:

WHERE price >= 100 AND price <= 500     -- two comparisons
WHERE price BETWEEN 100 AND 500         -- inclusive shorthand, identical result

BETWEEN is tidy and reads well — just remember it's inclusive on both ends, which is occasionally not what you want (recall the date-range discussion, where a half-open >= ... AND < ... is safer for timestamps). The skill isn't memorizing which form is "correct" — they're equal — but developing taste for which one makes a given query easiest for the next human to read. Readable SQL is correct SQL that stays correct, because people can see what it means.


A reference: the operators you'll use in WHERE

For quick recall, here is the working set of filtering tools from this chapter in one place. Keep it handy until they're reflexes:

Operator Meaning Example
= <> != equal / not equal status = 'paid'
< > <= >= ordered comparison price >= 200
AND OR NOT combine conditions (parenthesize when mixing) a AND (b OR c)
BETWEEN x AND y inclusive range price BETWEEN 100 AND 500
IN (…) membership in a list category_id IN (3,5,8)
NOT IN (…) absence — beware NULL in the list id NOT IN (1,2,3)
LIKE / ILIKE pattern match (%, _); ILIKE is case-insensitive name LIKE 'Ultra%'
IS NULL / IS NOT NULL test for unknown/absent phone IS NULL
IS DISTINCT FROM NULL-safe "not equal" phone IS DISTINCT FROM '555'

A few notes that elevate this from a list to working knowledge:

  • Comparisons respect type. On text, < and > follow collation (roughly alphabetical); on dates, they follow chronology; on numbers, magnitude. WHERE last_name >= 'N' works because text is ordered.
  • NOT can wrap any condition, but it's often clearer to negate the operator instead: NOT price > 200 versus price <= 200. Prefer the form a human reads fastest.
  • The dangerous trio with NULL<>, NOT IN, and NOT BETWEEN — can all silently drop rows where the column is NULL, thanks to three-valued logic. When a column is nullable, pair these with an explicit OR col IS NULL, or use IS DISTINCT FROM.
  • LIKE anchoring matters for performance. LIKE 'Ultra%' (anchored at the start) can use an index; LIKE '%book' (leading wildcard) generally cannot, because the database can't know where to start looking. Hold that thought for Chapter 23.

This compact toolkit, combined with AND/OR and the logical-evaluation order, is genuinely most of what WHERE ever needs. Everything fancier — subquery filters (Chapter 9), full-text search (Chapter 16) — builds on exactly these foundations. Master the small set here, and the large set later will feel like natural extensions rather than new material.


A method for fixing a query that's wrong

When a query returns the wrong rows — too many, too few, or strange values — flailing at it rarely helps. A reliable method, which leans on everything in this chapter:

  1. Check the row count first. Far more rows than expected often means an accidental cross join (Chapter 6) or a missing WHERE condition. Far fewer often means a NULL interacting with <> or NOT IN (three-valued logic), or an over-tight AND.
  2. Strip it down, then build it back up. Remove clauses until it works, then add them back one at a time. The clause that breaks it is the culprit. Start with just SELECT * FROM table LIMIT 5 to confirm the data looks how you expect, then layer on the WHERE.
  3. Isolate suspicious expressions with a no-FROM SELECT, as above. If a CASE or a calculation is misbehaving, test it on known inputs alone.
  4. Make NULLs visible (\pset null '∅' from Chapter 2) so you can see whether a blank is an empty string or a NULL — they behave completely differently in filters.
  5. Read the clauses in logical order, not written order: FROM, then WHERE, then SELECT. Many "why doesn't my alias work in WHERE?" puzzles dissolve the moment you remember WHERE runs before the SELECT list exists.

This is a debugging discipline, and it pays off for the rest of the book — the queries only get more complex, but the method scales. Most query bugs are not exotic; they're a NULL you forgot, a precedence you assumed, a join condition you dropped, or an integer division you didn't expect. The method above catches all four.

Why this matters. Theme #3 — understand the WHY — is what turns debugging from guesswork into diagnosis. When you know why WHERE can't see an alias, why <> drops NULL rows, and why 7/2 is 3, a wrong result stops being a mystery and becomes a short checklist. That shift, from hoping to knowing, is the whole point of learning the fundamentals rather than just the syntax.


Progressive project: your first real queries

Against your project database (or, if it's still empty, against Mercado as a stand-in), answer five of the questions you listed back in Chapter 1 using only SELECT/FROM/WHERE/ORDER BY/LIMIT. For example, for a library: "the 10 most recently added books," "all members in the 'student' category," "books priced (or fined) above a threshold." For each:

  1. Write the query.
  2. Identify the projection (π — your SELECT list) and the selection (σ — your WHERE).
  3. Add an ORDER BY that makes the result genuinely useful, and a LIMIT if it's a "top N."

Save them in project-notes.md. These are the first queries of the application you're building — and you'll extend them with joins and aggregation in the chapters ahead.


Summary

The SELECT statement is SQL's workhorse. FROM chooses the source relation; the SELECT list is projection (π — which columns, plus computed expressions and AS aliases); WHERE is selection (σ — which rows). The clauses are evaluated FROM → WHERE → SELECT → ORDER BY → LIMIT, which is why WHERE can't see SELECT-list aliases but ORDER BY can. WHERE conditions combine comparison operators with AND/OR/NOT (parenthesize when mixing!), plus BETWEEN, IN, LIKE/ILIKE, and IS NULL. DISTINCT removes duplicate result rows; ORDER BY imposes order (rows have none by default); LIMIT/OFFSET take a slice (always with ORDER BY). Write these constantly until they're automatic — everything in Part II builds on them.

You can now: - Write SELECT … FROM … WHERE … ORDER BY … LIMIT queries fluently. - Project columns, compute expressions, and alias with AS. - Filter with comparisons, AND/OR/NOT (with correct parenthesization), BETWEEN, IN, LIKE/ILIKE, and IS NULL. - Explain the logical order of evaluation and why it governs alias visibility. - Use DISTINCT, ORDER BY (including NULLS FIRST/LAST), and LIMIT/OFFSET correctly. - Recognize that WHERE conditions will later interact with indexes for performance.

What's next. Chapter 6 — JOINs — the most important SQL skill, where you connect Mercado's tables and answer questions that span customers, orders, and products. You saw a join as "product + selection" in Chapter 4; now you'll wield every kind, building from a two-table join to a six-table business query.


Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.