27 min read

> Where you are: Part II, Chapter 11 of 40. You've been nesting subqueries in FROM (Chapter 9). CTEs give those nested queries names, turning tangled queries into readable, step-by-step pipelines — and unlock something subqueries can't do at all...

Chapter 11: Common Table Expressions and Recursive Queries

Where you are: Part II, Chapter 11 of 40. You've been nesting subqueries in FROM (Chapter 9). CTEs give those nested queries names, turning tangled queries into readable, step-by-step pipelines — and unlock something subqueries can't do at all: recursion.

Learning paths: 💻 📊 🔬 🏗️ — everyone. Recursive CTEs (the second half) are among SQL's most powerful and least-known features; 🔬 CS students will recognize the base-case/recursive-case structure immediately.


WITH: giving your subqueries names

A complex query built from nested subqueries quickly becomes unreadable — parentheses inside parentheses, the logic buried. A Common Table Expression (CTE) lets you define a named, temporary result up front with the WITH clause, then use it like a table in the main query:

WITH per_order AS (
    SELECT o.order_id, SUM(oi.quantity * oi.unit_price) AS order_total
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    GROUP BY o.order_id
)
SELECT AVG(order_total) AS avg_order_value,
       MAX(order_total) AS biggest_order
FROM per_order;

Compare this to the equivalent derived-table version from Chapter 9 — the CTE reads top to bottom like a recipe: "first compute the per-order totals, then average them." The name per_order documents intent, and the main query is clean.

Why this matters. CTEs are about readability, and readable SQL is correct SQL more often. A query you can read aloud as a sequence of named steps is one you (and your reviewers) can reason about. This is theme #2 — SQL is a language — applied to style: the same result, expressed so a human can follow it.


Multiple CTEs: a pipeline of steps

You can define several CTEs in one WITH, separated by commas, and each can reference the ones before it. This turns a complex question into a readable pipeline:

WITH per_customer AS (                       -- step 1: revenue per customer
    SELECT o.customer_id, SUM(oi.quantity * oi.unit_price) AS revenue
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    GROUP BY o.customer_id
),
ranked AS (                                  -- step 2: who's above average?
    SELECT customer_id, revenue,
           (SELECT AVG(revenue) FROM per_customer) AS avg_revenue
    FROM per_customer
)
SELECT c.first_name, c.last_name, r.revenue   -- step 3: label and filter
FROM ranked r
JOIN customers c ON c.customer_id = r.customer_id
WHERE r.revenue > r.avg_revenue
ORDER BY r.revenue DESC;

Each CTE is a named stage; the final SELECT assembles the answer. This is far easier to write, read, and debug than the same logic crammed into nested subqueries — you can even run each CTE's SELECT on its own to check it.


CTEs vs. subqueries

A CTE and a derived table (subquery in FROM) are often interchangeable. Choose a CTE when:

  • The same intermediate result is used more than once (define it once, reference it by name repeatedly).
  • The query is complex enough that names aid understanding (most non-trivial queries).
  • You're writing a recursive query (only CTEs can recurse — see below).

Dialect Difference / performance note. Historically, PostgreSQL materialized every CTE — it computed the CTE fully and stored the result, acting as an "optimization fence" that sometimes hurt performance. Since PostgreSQL 12, simple CTEs that are referenced once are inlined (optimized together with the outer query) by default, like subqueries. You can force the old behavior with WITH x AS MATERIALIZED (...) or prevent it with NOT MATERIALIZED. For most queries you don't need to think about this; just know the knob exists for when a CTE is unexpectedly slow (Chapter 24).


Recursive CTEs: querying hierarchies

Here is the feature that subqueries and joins simply cannot replicate. A recursive CTE repeatedly applies a query to its own output until no new rows appear — perfect for hierarchies of unknown depth: organizational charts, category trees, threaded comments, bill-of-materials, file systems, graph traversal.

Mercado has two natural hierarchies (by design — Chapter 3): categories.parent_category_id (a category tree) and employees.manager_id (an org chart). A self-join (Chapter 6) can reach one level up or down; a recursive CTE reaches all levels.

The structure is always the same — WITH RECURSIVE, an anchor (base case), UNION ALL, and a recursive term that references the CTE itself:

WITH RECURSIVE category_tree AS (
    -- ANCHOR: the starting point(s) — top-level categories (no parent)
    SELECT category_id, name, parent_category_id, 1 AS depth
    FROM categories
    WHERE parent_category_id IS NULL

    UNION ALL

    -- RECURSIVE TERM: children of rows already in category_tree
    SELECT c.category_id, c.name, c.parent_category_id, ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON c.parent_category_id = ct.category_id
)
SELECT repeat('  ', depth - 1) || name AS category, depth
FROM category_tree
ORDER BY depth, category;
       category      | depth
---------------------+-------
 Books               |     1
 Electronics         |     1
 Home                |     1
   Audio             |     2
   Computers         |     2
   Kitchen           |     2
     Desktops        |     3
     Laptops         |     3

How it runs:

  1. Anchor runs once, producing the top-level categories (depth 1).
  2. Recursive term runs repeatedly: each pass finds the children of the rows produced last pass, adding depth. Pass 1 finds depth-2 categories, pass 2 finds depth-3, and so on.
  3. It stops when a pass produces no new rows (no more children).

The depth column (incremented each level) lets us indent the output into a visual tree — and is itself a useful pattern (computing the level of each node).

Walking up: from a node to its ancestors

Recurse the other direction to find a node's chain of ancestors — e.g., the full category path of "Laptops":

WITH RECURSIVE ancestry AS (
    SELECT category_id, name, parent_category_id
    FROM categories WHERE name = 'Laptops'        -- anchor: the starting node
    UNION ALL
    SELECT c.category_id, c.name, c.parent_category_id
    FROM categories c
    JOIN ancestry a ON c.category_id = a.parent_category_id   -- step to the parent
)
SELECT name FROM ancestry;     -- Laptops, Computers, Electronics

The org chart

The same shape walks Mercado's employees.manager_id. "Everyone under the CEO, with their level":

WITH RECURSIVE org AS (
    SELECT employee_id, first_name, last_name, manager_id, 1 AS level
    FROM employees WHERE manager_id IS NULL                  -- the CEO
    UNION ALL
    SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, org.level + 1
    FROM employees e
    JOIN org ON e.manager_id = org.employee_id
)
SELECT repeat('  ', level - 1) || first_name || ' ' || last_name AS who, level
FROM org ORDER BY level, who;

Common mistakes with recursion

  • Infinite recursion from cycles. If the data has a cycle (A's parent is B, B's parent is A — which a missing constraint could allow), the recursion never terminates. Guard against it by tracking the path and stopping if a node repeats, or use PostgreSQL's CYCLE clause: sql WITH RECURSIVE t AS ( ... ) -- ... add: CYCLE category_id SET is_cycle USING path Also useful: a depth limit (WHERE depth < 100) as a safety net.
  • Using UNION instead of UNION ALL. UNION deduplicates on every pass (slower, and can interact oddly with the recursion). UNION ALL is standard for recursive CTEs unless you specifically need dedup.
  • Recursive term doesn't reference the CTE. The recursive term must join to the CTE's own name — that self-reference is what makes it recurse.
  • Anchor selects the wrong starting set. The anchor defines where traversal begins (roots for top-down, the target node for bottom-up). Get it wrong and you traverse from the wrong place.

Dialect Difference. WITH RECURSIVE is standard SQL and supported by PostgreSQL, SQL Server (where RECURSIVE is implied, not written), Oracle, and modern MySQL/SQLite. The CYCLE clause is more PostgreSQL/standard-specific. See Appendix J.


CTEs as a refactoring tool

Before the recursion, it's worth dwelling on the everyday superpower of ordinary CTEs, because you'll use it far more often than recursion: CTEs are how you tame a query that has grown into an unreadable knot. Real analytical queries accrete complexity — a subquery here to compute a benchmark, a derived table there to pre-aggregate, a correlated subquery to compare against a group — until they become a wall of nested parentheses that no one, including their author a month later, can confidently modify. The CTE is the refactoring that rescues them.

The move is mechanical and reliable. Find the innermost subquery, lift it out, and give it a name in a WITH clause; repeat outward. Each nested computation becomes a named stage, and the final query reads as a sequence of clearly-labeled steps instead of a Russian doll of parentheses. The logic is unchanged — the optimizer, since PostgreSQL 12, generally treats the inlined CTE identically to the subquery — but the legibility is transformed. A reviewer can read per_customer, then ranked, then the final select, understanding each in isolation, rather than trying to hold five levels of nesting in their head at once. And crucially, you can debug each stage independently: run just the per_customer CTE's body to confirm it produces what you expect before trusting the stages built on top of it.

This is the same principle that makes well-factored code readable in any language: name your intermediate results, and let the names tell the story. A CTE pipeline is essentially a paragraph of named steps — "first compute revenue per customer; then find the average; then keep the above-average ones; then attach their names" — and that narrative quality is exactly what makes complex SQL maintainable. When you inherit a horrifying nested query, your first instinct should be to decompose it into CTEs; you'll understand it in the process of refactoring it, and leave it better for the next person. Theme #2 — SQL is a language — extends to prose style, and CTEs are the paragraph breaks.

Why this matters. The difference between a query that lives for years and one that gets rewritten from scratch every time it needs a change is usually readability. A CTE pipeline that a new team member can follow is an asset; a clever five-level nested subquery that only its author understands is a liability, however correct. Choosing readability is choosing maintainability, and over a system's life that choice compounds enormously.


Data-modifying CTEs: WITH meets INSERT, UPDATE, DELETE

PostgreSQL extends CTEs in a powerful, somewhat unusual direction: a CTE's body can be an INSERT, UPDATE, or DELETE — not just a SELECT — and combined with RETURNING, this lets you chain modifications and queries into a single atomic statement. These are sometimes called writable CTEs (wCTEs), and they solve real problems elegantly.

The classic use is "move rows from one table to another in one statement" — delete from the source, and insert exactly what was deleted into the destination, with no risk of the two operations disagreeing:

-- Archive old orders: delete them from orders, insert them into orders_archive,
-- all in one atomic statement.
WITH moved AS (
    DELETE FROM orders
    WHERE order_date < DATE '2020-01-01'
    RETURNING *
)
INSERT INTO orders_archive
SELECT * FROM moved;

The DELETE ... RETURNING * removes the old orders and hands the deleted rows to the moved CTE; the outer INSERT then stores exactly those rows in the archive. Because it's one statement, it's atomic — either both happen or neither does — so you can never end up having deleted orders you failed to archive. Writing this as two separate statements would open a window where a failure between them loses data; the wCTE closes that window entirely.

wCTEs also let you capture what a modification did and use it immediately. "Insert a new order and immediately insert its line items, referencing the new order's generated id" is a wCTE that inserts the order with RETURNING order_id, then inserts the items selecting from that result. This is a cleaner alternative to round-tripping to the application to fetch the new id between two statements (a pattern you'll revisit in Chapter 13's data modification and Chapter 29's application code).

Dialect Difference. Data-modifying CTEs are a PostgreSQL strength; not all databases support INSERT/UPDATE/DELETE inside WITH. There's also a subtlety worth knowing: all the sub-statements in a wCTE see the same snapshot of the data and execute in an unspecified order relative to each other, so you can't, for example, have one CTE update rows and another CTE in the same statement see those updates. They're powerful but have precise semantics — consult the docs when chaining several modifications.


How recursion actually executes

To use recursive CTEs confidently — and to debug them when they misbehave — it helps to know the actual machinery, which is simpler than the magic it appears to be. PostgreSQL maintains two internal tables during a recursive CTE: a working table (the rows produced by the most recent step) and an accumulating result.

The execution is a precise loop. First, the anchor query runs once; its rows go into both the result and the working table. Then the engine repeats: run the recursive term, but with the CTE's self-reference reading only the current working table (not the whole accumulated result); whatever rows come back are added to the result and become the new working table, replacing the old one. The loop continues until a step produces zero rows — an empty working table means there's nothing left to expand, and recursion halts. That's the entire algorithm: anchor once, then "expand the frontier" repeatedly until the frontier is empty.

This model demystifies the behavior you observe. The reason a top-down category traversal produces depth-1 rows, then depth-2, then depth-3 is that each step expands exactly the rows found in the previous step — the working table is the current frontier of the tree. The reason cycles cause infinite loops is that a cyclic reference means the frontier never empties: A produces B, B produces A, forever, and the working table never goes to zero. And the reason UNION ALL is standard (not UNION) is partly performance and partly that the frontier logic is cleanest without per-step deduplication. Once you picture the working table advancing through the hierarchy level by level, recursive CTEs stop being mysterious and become a tool you can reason about precisely — including predicting roughly how many steps a given traversal will take (as many as the hierarchy is deep).

Common mistake. Expecting the recursive term to see the entire accumulated result, not just the latest frontier. It sees only the working table — the most recent step's output. If your recursive logic needs to check against everything found so far (for cycle detection, say), you must carry that information forward explicitly in a column (an array of visited nodes, which the CYCLE clause automates). Misunderstanding what the self-reference "sees" is the source of most confusing recursive-CTE bugs.


A worked recursive scenario: the full category path

Let's build something genuinely useful with recursion: for every product, the full category path from the root down to its immediate category — "Electronics › Computers › Laptops" — the kind of breadcrumb a shopping site shows. This combines a recursive walk with string-building and demonstrates carrying data down through the levels.

The approach: recurse down the category tree (as before), but at each step accumulate the path as a string and track the root. Then join products to this enriched category data:

WITH RECURSIVE category_paths AS (
    -- Anchor: roots start a path with just their own name.
    SELECT category_id, name,
           name AS path,
           1 AS depth
    FROM categories
    WHERE parent_category_id IS NULL

    UNION ALL

    -- Recursive term: each child extends its parent's path.
    SELECT c.category_id, c.name,
           cp.path || ' › ' || c.name AS path,   -- carry the path down, appending
           cp.depth + 1
    FROM categories c
    JOIN category_paths cp ON c.parent_category_id = cp.category_id
)
SELECT p.name AS product, cp.path AS category_path
FROM products p
JOIN category_paths cp ON cp.category_id = p.category_id
ORDER BY cp.path, p.name;
       product        |        category_path
----------------------+------------------------------------
 UltraBook 14         | Electronics › Computers › Laptops
 TowerPro Desktop     | Electronics › Computers › Desktops
 NordicSound Headph…  | Electronics › Audio

The key technique is the cp.path || ' › ' || c.name in the recursive term: each level takes its parent's accumulated path and appends its own name, so the string grows as the recursion descends. This pattern — carrying and extending an accumulator through the levels — is the recursive-CTE equivalent of building up a result in a loop, and it generalizes far beyond breadcrumbs. You can accumulate a running total down a chain, collect an array of every ancestor's id, or compute a cumulative quantity in a bill-of-materials explosion (how many of each raw component a finished product requires, summed across all sub-assemblies). The recursion walks the structure; the accumulator column carries forward whatever you need to compute along the way.


Recursion beyond hierarchies; and when not to use it

Recursive CTEs aren't only for trees and org charts — they handle any transitive relationship and even pure generation. A few horizons worth knowing:

Graph traversal. "All people reachable from a given user through the follows-relationship," "all parts that depend on this part, directly or transitively" — these are graph reachability questions, and a recursive CTE walks them exactly as it walks a tree, except cycles are likely, so the CYCLE clause (or a visited-set column) becomes essential rather than optional. The structure is identical; only the cycle-guarding changes. Sequence generation. A recursive CTE can manufacture rows that don't exist in any table — WITH RECURSIVE n AS (SELECT 1 UNION ALL SELECT n+1 FROM n WHERE n < 100) generates the numbers 1 to 100 — though for the common case generate_series (Chapter 8) is simpler. Iterative computation. Anything naturally expressed as "repeat until stable" — running balances, certain simulations — can sometimes be a recursive CTE, blurring the line between query and algorithm.

But recursion is not always the right tool, and maturity is knowing when to reach past it. When a hierarchy is queried constantly and changes rarely — a category tree read on every page load — re-traversing it with a recursive CTE every time is wasteful. The alternatives, covered in Chapter 21, precompute the structure: a closure table stores every ancestor-descendant pair explicitly, turning "all descendants" into a simple indexed lookup with no recursion at all; a materialized path stores each node's full path as a string column, making subtree queries a LIKE 'Electronics/%'. These trade write-time and storage cost for dramatically faster reads. The rule of thumb: recursive CTEs are perfect for occasional traversal of modest hierarchies, but for hot, read-heavy hierarchy access, a precomputed structure usually wins. Knowing both, and choosing by the read/write balance, is the design judgment Part III will sharpen.

Why this matters (theme #5). "It works" and "it works at scale, queried thousands of times a second" are different bars. A recursive CTE that's elegant for an admin report run twice a day may be the wrong choice for a hierarchy read on every customer page view. The query is correct either way; the engineering is in matching the technique to the access pattern. Recognizing when elegance must yield to a precomputed structure is exactly the kind of judgment that separates a working query from a production-ready one.


A performance note (preview of Chapters 23–24)

Recursive CTEs traverse row by row, level by level; their cost scales with the size of the hierarchy reached. An index on the recursion column (categories.parent_category_id, employees.manager_id) makes each step's join fast. For very deep or very wide hierarchies queried constantly, alternatives like the closure table or materialized path patterns (Chapter 21) can be faster to read. EXPLAIN (Chapter 24) shows the recursive plan. For the common case — modest trees and org charts — recursive CTEs are both elegant and efficient.


Termination and safety: making recursion stop

The single thing that distinguishes a recursive query from any other is that it loops, and anything that loops can loop forever. Treating termination as a first-class concern, rather than an afterthought, is what separates a recursive CTE you can trust in production from one that takes down a database. There are three lines of defense, and serious code uses more than one.

The first and most fundamental is correct data plus correct logic: a recursion over a genuine tree (where every node has at most one parent and there are no cycles) terminates naturally, because each step moves strictly toward the leaves and the frontier eventually empties. If your hierarchy is guaranteed acyclic — enforced, ideally, by the schema design and constraints of Part III — the recursion simply stops on its own. The trouble begins when the data can contain a cycle (A's parent is B, B's parent is A), which an under-constrained schema permits; then the frontier never empties and the query runs until it exhausts memory or you kill it.

The second defense is the CYCLE clause, PostgreSQL's built-in cycle detector. It tracks the path of nodes visited and stops expanding a branch the moment it revisits a node, setting a flag column you can inspect. This is the correct, declarative way to handle data that might contain cycles — graph traversal especially, where cycles are normal rather than errors. The third defense is a plain depth limit: carry a depth column (as our examples do) and add WHERE depth < 100 (or whatever exceeds your real maximum) to the recursive term, so even if cycle detection is somehow defeated, the recursion can't run away. This belt-and-suspenders limit costs nothing and has saved many a production incident.

The professional habit is to ask, before writing any recursive CTE, "can this structure contain a cycle?" If it's a strict tree by design, the natural termination suffices, though a depth limit is cheap insurance. If cycles are possible — anytime you're traversing a graph rather than a tree, or a hierarchy whose acyclicity isn't enforced — use the CYCLE clause and a depth limit. Recursion is powerful precisely because it loops, and respecting that power by guaranteeing termination is simply part of using it responsibly. A recursive CTE without a termination story is a latent outage; one with a clear story is a reliable tool.


A second recursive scenario: subtree aggregation

The category-path example carried data down the tree. An equally common need runs the other way: aggregating up a hierarchy — "how many products are in each category, counting everything in its subcategories too?" A product in "Laptops" should count toward Laptops, toward its parent "Computers," and toward the grandparent "Electronics." This is subtree aggregation, and it's a beautiful demonstration of combining recursion with the aggregation of Chapter 7.

The strategy has two stages, naturally expressed as CTEs. First, a recursive CTE maps every category to all of its descendants (including itself). Then a join-and-group counts the products falling under each:

WITH RECURSIVE subtree AS (
    -- Anchor: every category is its own descendant (depth 0).
    SELECT category_id AS root_id, category_id AS descendant_id
    FROM categories
    UNION ALL
    -- Recursive term: extend each root to its children's children, etc.
    SELECT s.root_id, c.category_id
    FROM categories c
    JOIN subtree s ON c.parent_category_id = s.descendant_id
)
SELECT cat.name,
       COUNT(p.product_id) AS products_in_subtree
FROM subtree s
JOIN categories cat ON cat.category_id = s.root_id
LEFT JOIN products  p ON p.category_id  = s.descendant_id
GROUP BY cat.category_id, cat.name
ORDER BY products_in_subtree DESC;

The subtree CTE produces, for every category (root_id), the full set of categories beneath it (descendant_id), including itself. Joining products onto the descendants and grouping by the root then counts every product anywhere in each category's subtree. "Electronics" will show the sum of products across Computers, Laptops, Desktops, Audio, and itself — exactly the rolled-up number a category landing page wants to display. This pattern — recurse to enumerate a subtree, then aggregate over it — answers a huge family of hierarchical questions: total budget under each manager (salaries summed down the org chart), total size of each folder (file sizes summed down the directory tree), total cost of each assembly (component costs summed down the bill of materials).

Notice how the two halves play to their strengths: recursion handles the structure (who is under whom, to any depth), and ordinary GROUP BY aggregation handles the measurement (counting, summing). Most sophisticated hierarchical reporting is exactly this division of labor. Once you can write the "enumerate the subtree" recursive CTE, attaching any aggregate to it is just Chapter 7 applied to the result. The hard part was never the counting; it was reaching every descendant, and recursion does that.


Recursion's search order and the SEARCH clause

When a recursive CTE produces rows, in what order do they come out? By default the order is an implementation detail you shouldn't rely on, but the SQL standard (and PostgreSQL) lets you control it explicitly with the SEARCH clause, which matters when you want output that reads as a properly ordered tree. There are two natural traversal orders, and they're the same two from any computer-science treatment of trees.

Breadth-first visits all of level 1, then all of level 2, then level 3 — which is what our earlier ORDER BY depth approximated. Depth-first follows each branch all the way down before backtracking — so you'd see Electronics, then Computers, then Laptops, then Desktops, then back up to Audio, mirroring how a file explorer expands one folder fully before the next. Depth-first is usually what produces a visually correct indented tree, because children appear immediately under their parent rather than being scattered across a level. PostgreSQL's SEARCH DEPTH FIRST BY ... SET ordercol clause generates an ordering column you can then ORDER BY:

WITH RECURSIVE category_tree AS (
    SELECT category_id, name, parent_category_id FROM categories WHERE parent_category_id IS NULL
    UNION ALL
    SELECT c.category_id, c.name, c.parent_category_id
    FROM categories c JOIN category_tree ct ON c.parent_category_id = ct.category_id
) SEARCH DEPTH FIRST BY name SET ord
SELECT name FROM category_tree ORDER BY ord;

The SEARCH DEPTH FIRST BY name SET ord tells PostgreSQL to compute an ordering column ord that, when sorted, yields a depth-first walk ordered by name within each level. This is the clean, standard way to get a correctly-ordered tree without hand-rolling a sort key. (Before this clause existed, people built the ordering manually by accumulating a path string and sorting on it — the same path accumulator from the breadcrumb example, repurposed as a sort key.) For many reports the default order plus ORDER BY depth is good enough, but when you need a true, properly-nested tree display, SEARCH DEPTH FIRST is the tool. The companion CYCLE clause (mentioned earlier for cycle detection) uses the same machinery to track visited nodes.


CTEs versus temporary tables versus views

CTEs are one of several ways to name and reuse an intermediate result, and choosing among them is a recurring decision worth settling now. The three main options — CTEs, temporary tables, and views — overlap but serve different needs.

A CTE exists only for the duration of the single statement that defines it. It's perfect for structuring one complex query into readable stages, and (since PostgreSQL 12) it's typically optimized together with the surrounding query. But its scope is exactly one statement: you can't define a CTE and then reference it in a separate later query. A temporary table (CREATE TEMP TABLE ...), by contrast, persists for the whole session and can be referenced by many subsequent statements, indexed, and analyzed — making it the right choice when you need to compute an expensive intermediate result once and reuse it across several queries, or when you want to add an index to a derived dataset before querying it further. A view (Chapter 15) is a named query stored permanently in the database, reusable by everyone, across sessions and forever — the choice when an intermediate result is a reusable abstraction others should build on, not a one-off.

The decision tree is straightforward. If you need readable structure within a single query, use a CTE — it's the lightest-weight option and requires no cleanup. If you need to reuse an expensive computation across several statements in one session, or to index a derived result, use a temporary table. If you need a permanent, shared abstraction over a query, use a view. Reaching for a heavier tool than the situation needs (a temp table where a CTE would do) adds ceremony; reaching for a lighter one than needed (a CTE re-evaluated five times where a temp table computed once would serve) wastes work. As always, match the tool to the lifetime and scope of the thing you're naming — a theme that recurs throughout database work.

Why this matters. These three tools look similar — all "name a query result" — but their lifetimes differ by orders of magnitude: one statement, one session, forever. Choosing wrong rarely breaks correctness, but it costs either performance (re-computing what should have been stored) or cleanliness (a permanent view for a one-off, or a temp table littering a session). Knowing the lifetimes is knowing which to reach for.


Recursion in the wild: where these patterns appear

It's worth cataloguing where recursive CTEs earn their keep in real systems, because once you have the tool, you start seeing the structures it fits everywhere — and recognizing them is what turns a textbook technique into a daily instrument. The unifying trait is self-reference: any time a thing relates to other things of the same kind, you have a hierarchy or graph, and recursion can traverse it.

The most common are the org chart (employees.manager_id) and the category tree (categories.parent_category_id) you've already met — but the same shape recurs constantly. Threaded comments and replies, where a comment's parent is another comment, form a tree you recurse to render a discussion. Bills of materials, where a product is assembled from sub-assemblies that are themselves assembled from parts, recurse to compute total component requirements or rolled-up cost. File systems and folder structures, where a folder contains folders, recurse to compute total sizes or full paths. Geographic hierarchies (continent › country › region › city) recurse to roll up populations or sales. Course prerequisites and dependency graphs, where completing one thing unlocks others, recurse to find everything reachable. Social and follow graphs recurse to find connections within N degrees. Each is the same conceptual structure — nodes pointing at other nodes — and each yields to the anchor-plus-recursive-term pattern you now know.

This breadth is why the chapter insisted recursive CTEs are among SQL's most powerful and most underused features. Many developers, not knowing recursion is available in SQL, drag hierarchical data into application code and write loops there — fetching one level, then querying again for the next, in a chatty back-and-forth that's slow and verbose. The recursive CTE does the whole traversal in a single query, in the database, where the data lives. Recognizing "this is a hierarchy / graph, so it's a recursive CTE" is a genuine professional edge, because so many people reach for clumsier tools simply from unfamiliarity. The structures are everywhere; the technique is one pattern; and now it's yours.


Progressive project: name your steps, walk your hierarchy

In your domain:

  1. Refactor a complex query into a multi-CTE pipeline — break it into named steps and confirm it's more readable than the nested version.
  2. If your domain has a hierarchy (categories, course prerequisites, an org chart, threaded comments, sub-tasks), write a recursive CTE to traverse it — list all descendants of a node, or the ancestor chain of a node, with a depth/level column.
  3. If you have no natural hierarchy, add one (e.g., a self-referencing parent_id) and traverse it — recursion is worth practicing.

Add these to project-notes.md.


The bigger picture: readability and the shape of hard problems

This chapter quietly taught two lessons that reach far beyond the WITH keyword, and they're worth naming explicitly. The first is that readability is a feature, not a luxury. Ordinary CTEs add no new computational power — every CTE pipeline could, in principle, be written as nested subqueries — yet they are among the most valuable tools in SQL precisely because they let you express a complex query as a sequence of named, comprehensible steps. The query that a teammate can read, reason about, and safely modify six months later is worth more than a cleverer one that only its author understands. As databases become the long-lived heart of systems that outlast any individual developer, the ability to write SQL that communicates — that reads like a well-structured argument rather than a tangle — becomes a genuinely senior skill. CTEs are the primary instrument for it, and reaching for them to decompose a gnarly query should become reflexive.

The second lesson is deeper: recursion lets SQL describe problems whose size you don't know in advance. Every other tool in Part II operates on a fixed structure — this many tables, this many conditions. Recursion breaks that ceiling. A recursive CTE can follow a chain of unknown length, walk a tree of unknown depth, traverse a graph of unknown breadth, because it's defined not by a fixed number of steps but by a rule — a base case and a way to extend — that repeats until it's exhausted. This is the same conceptual leap that recursion represents in any programming language: from "do this a fixed number of times" to "do this until a condition is met, however many times that takes." That a declarative query language can express it at all is remarkable, and it means the relational model can handle hierarchical and graph-shaped data — org charts, category trees, dependency graphs, social networks, bills of materials — that people often wrongly assume require a specialized graph database. (Chapter 33 returns to when a true graph database earns its place; the answer is "less often than you'd think," precisely because of recursive CTEs.)

Together these two ideas — name your steps for clarity, and use recursion to conquer unknown-size structure — represent a maturing of how you think about queries. Early in Part II, a query was a single flat request: filter, join, project. By now a query can be a program: a pipeline of named transformations, possibly with a recursive loop at its heart, expressing logic of real sophistication while remaining, if you've written it well, readable top to bottom. That progression mirrors the whole arc of the book's second part. You began by asking the database simple questions; you can now hand it intricate, multi-stage, even self-referential problems and trust it to work them out. The remaining chapters of Part II — window functions, then the data-modification and definition statements — round out the toolkit, but the conceptual summit of querying is right here: composition and recursion, expressed readably. Master these, and there are very few questions about your data you cannot ask.


Summary

A CTE (WITH name AS (...)) names a temporary result, turning nested subqueries into a readable, top-to-bottom pipeline; multiple CTEs chain into named steps, each able to reference the previous. CTEs and derived tables are often interchangeable (prefer a CTE for clarity, reuse, or recursion); since PostgreSQL 12 simple CTEs are inlined, with MATERIALIZED/NOT MATERIALIZED to control it. Recursive CTEs (WITH RECURSIVE, anchor UNION ALL recursive-term) traverse hierarchies of unknown depth — Mercado's category tree and org chart — running the anchor once, then the recursive term until no new rows appear. Guard against cycles (the CYCLE clause or a depth limit), use UNION ALL, and ensure the recursive term references the CTE. Index the recursion column for speed.

You can now: - Write single and multiple CTEs to structure complex queries readably. - Explain when to use a CTE vs. a subquery (clarity, reuse, recursion). - Write recursive CTEs to walk hierarchies both down (descendants) and up (ancestors), with a depth/level column. - Guard recursion against cycles and runaway depth. - Reason about CTE materialization and recursion performance.

What's next. Chapter 12 — Window Functions — the most underused feature in SQL: rankings, running totals, moving averages, and "compare each row to its group" — analytics without collapsing rows (the thing GROUP BY can't do).


Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.