Case Study 2 — From `LIKE '%term%'` to Real Search

DataField.Dev

Case Study 2 — From `LIKE '%term%'` to Real Search

A product-search box built on LIKE was slow and gave bad results. Switching to PostgreSQL full-text search made it fast (with a GIN index) and genuinely useful (stemming, ranking) — and saved the team from standing up Elasticsearch.

Background

A storefront's search box ran this against a growing catalog:

SELECT name FROM products
WHERE name ILIKE '%' || :term || '%'
   OR description ILIKE '%' || :term || '%';

Two complaints piled up. Speed: as the catalog grew past a few hundred thousand products, every search did a full table scan (a leading-% LIKE pattern can't use a normal index), and search latency climbed into seconds. Quality: searching running shoes didn't match a product described as "shoe for runners" (no exact substring); wireless didn't match "wireless," "Wireless," and "WIRELESS" consistently without ILIKE; results came back in arbitrary order with no notion of relevance; and common words like "the" matched everything.

Someone proposed adding Elasticsearch. Before taking on a whole new system, the team tried PostgreSQL's built-in full-text search.

The fix: full-text search + GIN index

They added a stored tsvector (the normalized search document) and a GIN index:

ALTER TABLE products ADD COLUMN search_doc tsvector
    GENERATED ALWAYS AS (
        to_tsvector('english', name || ' ' || coalesce(description, ''))
    ) STORED;

CREATE INDEX idx_products_search ON products USING GIN (search_doc);

And rewrote the query to match and rank:

SELECT name,
       ts_rank(search_doc, websearch_to_tsquery('english', :term)) AS rank
FROM products
WHERE search_doc @@ websearch_to_tsquery('english', :term)
ORDER BY rank DESC
LIMIT 20;

The results were transformed:

Fast: the GIN index turned the full scan into an index lookup; search dropped from seconds to milliseconds, and stayed fast as the catalog grew.
Stemming: to_tsvector reduces running/runs/run to a common stem, so running shoes matches "shoe for runners."
Relevance ranking: ts_rank scores matches so the best results come first (a product with the term in its name outranks one that merely mentions it in a long description).
Stop words: common words ("the", "a", "for") are dropped, so they don't pollute matches.
Friendly query syntax: websearch_to_tsquery accepts Google-style input (wireless "noise cancelling" -refurbished), so the search box "just works" for users.

Why not Elasticsearch (here)

Elasticsearch is excellent — but adopting it would have meant: running and securing a second datastore, building and maintaining a pipeline to sync product data from PostgreSQL into it (and handling the inevitable drift and lag), and operating two systems instead of one. For a catalog search of this size and sophistication, PostgreSQL full-text search delivered fast, relevant results with the data staying in one place, always consistent. The team kept Elasticsearch on the table for future needs (Chapter 36 discusses where it genuinely wins — very large scale, advanced relevance tuning, faceting, fuzzy/typo tolerance), but didn't pay its cost prematurely.

The analysis

LIKE '%term%' is substring matching, not search. It can't use a normal index (leading wildcard), can't stem, can't rank, and can't drop stop words. It's fine for tiny tables or exact-prefix matches (LIKE 'term%' can use an index); it's the wrong tool for a search box on a real catalog.
PostgreSQL full-text search is real search. tsvector/tsquery/@@ plus ts_rank give stemming, boolean queries, relevance ranking, and stop-word handling — the things users expect from "search."
Index it (GIN) or it's not fast. The whole speed win comes from the GIN index on the tsvector. A stored, generated tsvector column keeps it maintained automatically.
It kept search consistent with the data. No separate system, no sync pipeline, no drift — the search index is part of the same transactional database, so it's never out of date with the products it searches.
Theme #4, with honesty about the limits. PostgreSQL FTS handles the vast majority of application search needs and avoids a second system. When you do outgrow it (massive scale, advanced relevance, faceting, typo tolerance), Chapter 36 covers the move to a dedicated engine — a deliberate decision, not a reflex.

Discussion questions

Give three reasons LIKE '%term%' is the wrong tool for catalog search.
What specifically does to_tsvector do to the text, and why does each transformation improve search?
Why is the GIN index essential to the performance win?
What costs would adopting Elasticsearch have added, and why were they unnecessary here?
⭐ Describe a search requirement that would justify Elasticsearch over PostgreSQL FTS, and why.

Case Study 2 — From LIKE '%term%' to Real Search

Background

The fix: full-text search + GIN index

Why not Elasticsearch (here)

The analysis

Discussion questions

Case Study 2 — From `LIKE '%term%'` to Real Search