Case Study 1 — One JSONB Column Instead of Forty Nullable Ones

The constructive use of JSONB: a product catalog whose attributes differed wildly by category was drowning in sparse, mostly-null columns. A single jsonb column — properly indexed, and alongside the real relational columns — fixed it without adding a second database.

Background

An electronics retailer's products table had grown a column for every attribute any product might have: ram_gb, screen_inches, cpu_cores, battery_mah, page_count, isbn, wattage, material, shoe_size, voltage, and on and on — about 40 columns. The problem: a book has none of the laptop columns, a blender has none of the book columns. Every row was mostly NULL. Adding a new product category meant an ALTER TABLE to add more columns (a risky, locking operation on a big table — Chapter 14's Case Study 2). The table was a sparse, ever-growing mess, and merchandising waited on engineering every time they added a category.

One faction wanted to "just use MongoDB for products." Another proposed a notorious anti-pattern (an "EAV" key-value table). The team chose a third path: keep the relational columns relational, and put the variable attributes in a single jsonb column.

The redesign

The columns that every product has, and that the business queries, joins, and constrains on, stayed as real typed columns: product_id, sku, name, price, category_id, supplier_id, is_active. The dozens of category-specific, sparse attributes moved into one jsonb column:

ALTER TABLE products ADD COLUMN attributes jsonb NOT NULL DEFAULT '{}';
-- a laptop:  {"ram_gb":16,"screen_in":14,"color":"silver"}
-- a book:    {"format":"paperback","pages":420,"isbn":"978-..."}
-- a blender: {"watts":1200,"color":"red"}

(This is exactly Mercado's design — products.attributes is that column.) Querying is clean, and crucially, indexable:

CREATE INDEX idx_products_attributes ON products USING GIN (attributes);

-- "16GB laptops" — uses the GIN index
SELECT name FROM products
WHERE category_id = 3 AND attributes @> '{"ram_gb": 16}';

Now adding a new category needs no schema change — merchandising just stores whatever keys that category needs. The 40 sparse columns collapsed to one. And because it's still PostgreSQL, products kept their foreign keys, their CHECK (price >= 0), transactions, and joins — none of which a separate document database would have given them for free.

Why not MongoDB, and why not EAV

  • Not MongoDB: the core of a product (price, category, supplier, active flag) is deeply relational — it joins to orders, categories, suppliers, and needs integrity and transactions. Moving products to a document store would have sacrificed all of that to solve a problem (variable attributes) that jsonb solves inside the relational database. (This is the Lumen lesson from Chapter 1, avoided.)
  • Not EAV (an attributes(product_id, key, value) table): the entity-attribute-value pattern is flexible but makes every "products with 16GB RAM" query a self-joining nightmare and loses type safety. jsonb gives the same flexibility with far better ergonomics and a real index.

The balance that made it work

The design's success came from drawing the line correctly:

  • Relational columns for what's universal, queried, joined, and constrained → typed columns with constraints.
  • JSONB for what's variable, sparse, and category-specific → one indexed jsonb column.

They also resisted the temptation to push everything into JSONB. price stayed a real numeric column (you SUM it, constrain it >= 0, and want it typed), not attributes->>'price'. The rule they adopted: if you filter, join, aggregate, or constrain on it across all products, it's a column; if it's a category-specific spec you mostly read back whole, it's a JSONB key.

The analysis

  1. JSONB is the right tool for genuinely variable/sparse attributes. It collapses dozens of mostly-null columns into one flexible, indexable column and removes the ALTER TABLE-per-category treadmill.

  2. It complements relational design — it doesn't replace it. The universal, queried, related fields stayed as typed columns with constraints and foreign keys. JSONB handled the variable part. Theme #1 (design) and theme #4 (PostgreSQL's power) work together.

  3. Index it or it's slow. The GIN index on attributes is what makes @> containment queries fast; without it, every attribute query is a full scan (Chapter 23).

  4. It avoided a second database. One system — with transactions, joins, integrity, and document flexibility — instead of PostgreSQL + MongoDB to deploy, secure, sync, and reconcile. Theme #4 in action.

  5. Draw the column/JSONB line deliberately. "Filter/join/aggregate/constrain on it → column; category-specific spec read back whole → JSONB key." Getting this line right is the whole skill.

Discussion questions

  1. Why is one jsonb column better than 40 sparse nullable columns and better than an EAV table?
  2. Why did price and category_id stay as real columns instead of moving into attributes?
  3. What does the GIN index enable, and what happens to attribute queries without it?
  4. How does this design avoid the Lumen mistake (Chapter 1) of modeling relational data as documents?
  5. ⭐ State the rule for deciding whether a given attribute should be a column or a JSONB key, and apply it to three example attributes.