Chapter 33: NoSQL — Document, Key-Value, Column-Family, and Graph

DataField.Dev

31 min read

> Where you are: Part VI, Chapter 33 of 40 — the start of Beyond Relational. For 32 chapters we argued the relational model is right for most problems (and PostgreSQL stretches far). Now we test the limits honestly: the non-relational databases...

In This Chapter

Honoring the argument by testing it
Why NoSQL emerged
The four families
Document databases, in depth
Key-value, column-family, and graph, in depth
The trade-offs they make
When PostgreSQL does the job better (theme #4)
JSONB versus a document database: a closer look
When NoSQL genuinely wins
Polyglot persistence and its costs
A worked decision and NoSQL's trajectory
Common mistakes
ACID, BASE, and the consistency spectrum
PostgreSQL covers most of it (theme #4, fully)
NewSQL: having it both ways
The verdict: relational by default, specialist by need
Progressive project: evaluate the alternatives
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 33: NoSQL — Document, Key-Value, Column-Family, and Graph

Where you are: Part VI, Chapter 33 of 40 — the start of Beyond Relational. For 32 chapters we argued the relational model is right for most problems (and PostgreSQL stretches far). Now we test the limits honestly: the non-relational databases, what each is genuinely good at, and — theme #4 — when PostgreSQL still wins anyway.

Learning paths: 💻 🏗️ 📊 🔬 — everyone benefits from the landscape and the judgment. This is a survey chapter; you won't master MongoDB here, but you'll know when (and when not) to reach for it.

Honoring the argument by testing it

This book has made a strong case for relational databases. Part VI honors that case by examining its boundaries: there are problems the relational model serves poorly, and tools built for them. "NoSQL" (originally "non-SQL," later "not only SQL") is the umbrella for databases that drop some relational guarantees — and sometimes SQL itself — for other benefits. Understanding them makes you able to say, with evidence, "we don't need a separate database here" — and to recognize the genuine cases where you do.

Why NoSQL emerged

In the 2000s, web-scale companies (Google, Amazon, Facebook) hit volumes and traffic the relational systems of the day struggled with, and they wanted things relational databases didn't easily offer:

Horizontal scaling — spreading data across many commodity machines, rather than scaling up one big server.
Flexible/schemaless data — storing varied, evolving structures without migrations.
Massive throughput — extreme write or read rates for specific access patterns.

To get these, NoSQL systems often relaxed the relational model's guarantees — especially strict consistency and the rich query model — in favor of availability, partition tolerance, and scale (the CAP trade-offs of Chapter 35). The result is several families, each optimized for a different shape of problem.

The four families

Document databases (MongoDB, Couchbase)

Store documents — JSON-like, self-contained, with flexible structure. A "customer" document can embed its addresses and orders, varying field-by-field. No fixed schema; you can add fields freely.

Good at: genuinely variable/hierarchical data, rapid prototyping, content/catalogs where each item differs, storing whole API payloads.
Trade-offs: joins are weak or absent (you embed or do app-side lookups); cross-document transactions and integrity are limited (improving in newer versions); "schemaless" means the schema lives, unenforced, in your application (Chapter 1's Lumen case).

Key-value stores (Redis, DynamoDB, Memcached)

The simplest model: a giant dictionary — store and fetch a value by its key, extremely fast.

Good at: caching (the canonical use), session storage, rate limiting, queues, leaderboards, anything that's "look up X by its key, fast." Redis keeps data in memory for sub-millisecond access.
Trade-offs: you can only query by key (no rich queries, no joins); it's a different tool, not a general database. Often used alongside a relational database (cache in front of PostgreSQL), not instead of it.

Column-family / wide-column stores (Cassandra, ScyllaDB, HBase)

Store data in rows with flexible columns, distributed across many nodes, optimized for massive write throughput and linear horizontal scaling.

Good at: enormous write volumes, time-series/event data at scale, write-heavy workloads across data centers where you can design around fixed query patterns.
Trade-offs: you must design tables around your queries (limited ad-hoc querying); typically eventual consistency; no joins. Powerful at scale, awkward for general use.

Graph databases (Neo4j, Amazon Neptune)

Model data as nodes and edges — relationships are first-class and traversing them is the core operation.

Good at: highly-connected data where you traverse relationships deeply: social networks ("friends of friends of friends"), recommendation engines, fraud rings, knowledge graphs, network/dependency analysis.
Trade-offs: specialized; overkill for data that isn't deeply relationship-traversal-heavy. (Note: relational joins handle moderate relationship queries well; graph databases shine when you traverse many hops, which joins do awkwardly.)

Document databases, in depth

Document databases are the most prominent NoSQL family and the one most often considered as a relational alternative, so understanding them deeply — their genuine strengths, their real trade-offs, and where they fit — is valuable for making informed decisions. A document database stores documents — self-contained, JSON-like structures that can nest data and vary in structure from document to document — and queries them by their content, without a fixed schema.

The genuine strength of document databases is flexibility for genuinely variable, hierarchical data. When your data items differ from one another (a product catalog where electronics, books, and clothing have wildly different attributes), or when a natural unit of data is hierarchical (a document that embeds its sub-parts), or when the structure evolves rapidly during development, the document model fits naturally — you store each item as it is, without forcing it into a rigid table structure or doing migrations for every change. A document can embed related data (a customer document containing its addresses and recent orders), so retrieving the whole thing is one operation, no joins. For rapid prototyping, for content management, for storing whole API payloads, for genuinely schema-variable data, document databases offer real convenience. This is the same flexibility PostgreSQL's JSONB provides within a relational database — which is exactly the theme-#4 point developed below.

The trade-offs, though, are significant and often underappreciated. Joins are weak or absent — the document model assumes you embed related data or do lookups in application code, so the rich relational querying across entities (the join progression of Chapter 6) isn't available; if your data is actually relational (entities that relate in many ways, queried across those relationships), the document model fights you. Cross-document transactions and integrity are limited — the ACID guarantees and referential integrity of relational databases are weaker or absent (improving in newer document databases, but not equivalent), so maintaining consistency across documents is the application's job. And "schemaless" means the schema is unenforced, not absent — your data still has a structure (the application expects certain fields), but the database doesn't enforce it, so the schema lives implicitly in application code, and bad or inconsistent data accumulates without the database catching it (Chapter 1's Lumen disaster). This is the document model's central trade: flexibility gained, but integrity, joins, and transactional guarantees given up.

The honest assessment is that document databases fit data that is genuinely document-shaped — variable, hierarchical, self-contained, queried as whole units, not richly cross-related — and that a lot of data people put in document databases is actually relational and would be better served by a relational database (often PostgreSQL with JSONB for the variable parts). The mistake the chapter keeps flagging — choosing a document database for fundamentally relational data because it's trendy or seems flexible — sacrifices the integrity, joins, and transactions that relational data needs, for a flexibility it didn't need. Document databases are a real tool for genuinely document-shaped data; they're a poor choice for relational data wearing a flexible disguise. Recognizing which your data is — genuinely document-shaped, or relational — is the key judgment, and it's why understanding the relational model deeply (the whole book) is what lets you evaluate the document alternative honestly.

Key-value, column-family, and graph, in depth

The other three NoSQL families each excel at a specific access pattern that relational databases serve less ideally, and understanding what makes each shine — and its trade-offs — completes the picture of when non-relational tools genuinely fit.

Key-value stores (Redis, DynamoDB, Memcached) are the simplest model: a giant distributed dictionary where you store and fetch a value by its key, extremely fast. Their strength is speed and simplicity for key-based access — Redis keeps data in memory for sub-millisecond lookups, making it ideal for caching (the canonical use — a fast cache in front of a relational database), session storage, rate-limiting counters, queues, and leaderboards. Anything that's "look up this value by its key, very fast" fits. The trade-off is that you can only query by key — there are no rich queries, no joins, no ad-hoc filtering — so a key-value store is a specialized tool, not a general database. Crucially, key-value stores are most often used alongside a relational database (Redis caching PostgreSQL query results), not instead of it: the relational database is the system of record, the key-value store is the fast-access layer. This complementary use — not replacement — is the common and correct pattern.

Column-family / wide-column stores (Cassandra, ScyllaDB, HBase) optimize for massive write throughput and linear horizontal scaling across many nodes. They store data in rows with flexible columns, distributed across a cluster, and excel at enormous write volumes — time-series and event data at web scale, write-heavy workloads spanning data centers. Their strength is scaling writes and storage across many commodity machines while staying available. The trade-offs are substantial: you must design your tables around your specific queries (there's no ad-hoc querying — you model for the exact access patterns you'll use), consistency is typically eventual (Chapter 35's CAP trade-offs), and there are no joins. They're powerful precisely at the scale where a single relational server can't keep up with the write volume, and awkward for general-purpose use. Graph databases (Neo4j, Amazon Neptune) make relationships first-class — data is nodes and edges, and traversing relationships is the core, optimized operation. Their strength is deep, multi-hop relationship traversal: "friends of friends of friends" in a social network, fraud rings (chains of suspicious connections), recommendation engines, knowledge graphs, dependency analysis. These are queries that traverse many relationship hops, which relational joins handle awkwardly (each hop is another join, and many-hop joins get unwieldy). The trade-off is specialization — graph databases are overkill for data that isn't deeply traversal-heavy, and relational joins handle moderate relationship queries (a few hops) perfectly well. The graph database's genuine win is when traversal depth and connectedness are the central characteristic of the data and the queries.

The pattern across all four families is that each makes a specific trade — giving up some relational generality (rich queries, joins, strong consistency, enforced schema) to excel at a specific access pattern (flexible documents, fast key lookups, massive writes, deep traversal). None is a general-purpose replacement for a relational database; each is a specialist. This is why "NoSQL" as a monolithic category is misleading — the families are as different from each other as they are from relational, united only by not being relational. Understanding what each genuinely excels at (and its trade-offs) is what lets you recognize the specific situations where one is the right tool, versus the many situations where a relational database (often PostgreSQL) serves better. The judgment is always: does my data and access pattern match this specialist's strength enough to justify giving up relational generality? For most data, the answer is no; for the specific cases that match a specialist's strength, the answer is yes — and recognizing the difference is the skill.

The trade-offs they make

NoSQL databases generally trade away some of relational's guarantees:

ACID → BASE / eventual consistency. Many NoSQL systems favor availability and partition tolerance over strict consistency (Chapter 35's CAP theorem), so a read might return slightly stale data. Acceptable for a social feed; not for a bank balance.
Rich queries/joins → simple access patterns. You often give up ad-hoc SQL and joins, designing instead around specific known queries.
Enforced schema/integrity → application responsibility. "Schemaless" flexibility means the database won't catch the bad data a relational schema would (Chapter 1, 14).

These aren't flaws — they're trades. The question is always whether the trade fits your problem.

When PostgreSQL does the job better (theme #4)

Here's the crucial counterpoint, the book's recurring theme #4: PostgreSQL often covers NoSQL use cases without a separate system.

Document needs → JSONB (Chapter 16). PostgreSQL stores and indexes JSON documents inside the relational database, with transactions and joins still available. For most "we need flexible documents" cases, JSONB suffices — and you keep integrity for the relational parts (the Lumen lesson, Chapter 1).
Key-value needs → a simple table, or PostgreSQL's UNLOGGED tables / hstore; and for genuine caching, Redis in front of PostgreSQL is a common complement, not a replacement.
Search needs → full-text search (Chapter 16), not Elasticsearch, for most apps.
Moderate graph/relationship needs → joins and recursive CTEs (Chapter 11) handle hierarchies and moderate traversals well.

The pattern: reach for PostgreSQL's features first; you avoid an entire extra system to deploy, secure, back up, and keep consistent (Chapter 16's case studies). One well-understood database beats several drifting ones.

JSONB versus a document database: a closer look

Because the document-database-versus-PostgreSQL-JSONB decision is the single most common NoSQL choice teams face, it deserves a focused comparison — when does JSONB genuinely suffice, and when does a dedicated document database earn its place? This is theme #4 at its most practically consequential, since "we need MongoDB" is the most frequent unnecessary-NoSQL impulse.

JSONB suffices — and is the better choice — when your data is mostly relational with some variable parts. This describes a huge fraction of real applications: you have structured entities with relationships (customers, orders, products) that need joins, transactions, and integrity, plus some genuinely variable data (product attributes that differ by type, flexible metadata, captured payloads). JSONB lets you store the variable parts flexibly within your relational schema, so you get document flexibility where you need it and keep relational power (joins, ACID, constraints, SQL) for the structured majority — in one system. You query JSONB with operators and index it with GIN (Chapter 16), getting indexed document querying without leaving PostgreSQL. For this extremely common shape — relational with variable parts — JSONB is not a compromise but the superior choice, because a document database would force you to give up the relational power your structured data needs, while JSONB gives you both.

A dedicated document database earns its place when your data is genuinely and overwhelmingly document-shaped, and you need things PostgreSQL's JSONB doesn't provide at your scale. The clearest cases: when essentially all your data is self-contained documents with no meaningful cross-entity relationships (so you never need joins or cross-document transactions, and the relational power JSONB preserves would go unused), and when you need the specific operational characteristics of a document database — effortless horizontal sharding across many nodes, or particular document-database features and tooling — at a scale where PostgreSQL's single-node-plus-replication model strains. Even then, the bar is high, because PostgreSQL scales further than people assume (Chapters 25, 35) and JSONB is genuinely capable. The honest summary: most "we need a document database" situations are actually "relational with variable parts," for which JSONB is better (not just adequate) — and the genuine document-database cases are those where data is purely document-shaped and scale or operational needs exceed what PostgreSQL offers.

The practical test, then, when tempted toward a document database: ask "is my data purely documents with no relationships I need to query across, or is it relational with some variable parts?" If the latter (the common case), JSONB in PostgreSQL is the better choice — flexibility where needed, relational power throughout, one system. If genuinely the former and at a scale or with operational needs PostgreSQL can't meet, a document database may be justified. This single judgment — JSONB for relational-with-variable-parts (most cases), document database for purely-document-at-scale (rare cases) — resolves the most common NoSQL decision correctly, and it's a direct application of knowing PostgreSQL's full capabilities (theme #4). The teams that reach for MongoDB by default, without considering JSONB, are usually leaving the better choice on the table — which is exactly the mistake understanding this comparison prevents.

When NoSQL genuinely wins

But sometimes a NoSQL database is genuinely the right tool — and recognizing that is just as important:

Caching / sub-millisecond key lookups at scale → Redis (almost always alongside a relational DB).
Truly massive write throughput across data centers with fixed query patterns → Cassandra.
Deep, multi-hop relationship traversal (social graphs, fraud rings, recommendations) → a graph database.
Document storage where you genuinely never need joins/transactions/integrity and want effortless horizontal scaling → a document store.
Scale beyond a single big server for the right workload → distributed NoSQL (or NewSQL, Chapter 35).

The deciding factors — data model, consistency needs, query patterns, scale — are exactly Chapter 37's database decision framework.

Polyglot persistence and its costs

"Polyglot persistence" — using multiple specialized databases within one system, each for what it's best at — became a fashionable architecture, and understanding both its appeal and its substantial costs is key to making sound decisions about when it's worth it. The appeal is intuitive: use a relational database for transactional data, a document store for flexible content, a search engine for search, a cache for fast lookups, a graph database for relationships — each tool for its ideal job. In principle, each component is optimal for its part.

The costs, however, are severe and often underweighted. Each additional database is a separate system to deploy (infrastructure), secure (its own access control, its own attack surface), back up (its own backup strategy, tested), monitor (its own metrics and alerts), upgrade (its own patch cycle), and staff (someone must know it). That's substantial operational overhead per system, and it grows with each one. But the most insidious cost is consistency across systems: when the same logical data lives in multiple databases (a product in the relational store, in the search index, in the cache), keeping them in agreement becomes a hard, ongoing problem — the relational store and the search index drift apart, the cache goes stale, and reconciling them requires pipelines, careful invalidation, and constant vigilance. The cross-system consistency problem often becomes the dominant complexity of a polyglot architecture, dwarfing the benefit of each component's specialization. What looked like "optimal tool for each job" becomes "a zoo of systems that must be kept in sync," and the synchronization is where the bugs and the effort concentrate.

This is why the mature view is cautious about polyglot persistence — not opposed, but demanding genuine justification for each added system. The question for each potential addition is: does the benefit of this specialist exceed its full cost — deployment, security, backup, monitoring, staffing, and the cross-system consistency work? Often, the honest answer is no, especially when PostgreSQL's capabilities (theme #4) could cover the need within the existing system, avoiding the addition entirely. A single, capable database that does several jobs adequately is frequently better overall than several specialized databases that each do one job optimally but must be operated and reconciled — because the operational and consistency costs of the polyglot architecture outweigh the marginal performance gains of specialization. The teams that thrive often run lean — one database (PostgreSQL) doing most things, with a specialist added only where a measured need clearly justifies its full cost. The teams that struggle often have polyglot sprawl — many systems adopted enthusiastically, now a perpetual operational and consistency burden. Recognizing polyglot persistence's true cost — not just the per-system overhead but the cross-system consistency problem — is what lets you resist unnecessary sprawl and add specialists only when truly warranted. "Fewer systems, each well-understood" beats "many systems, optimally specialized but operationally and consistency-burdened" more often than the polyglot enthusiasm admits.

A worked decision and NoSQL's trajectory

Let's apply the chapter's judgment to a concrete decision, then situate NoSQL historically, because both the decision process and the trajectory inform how you should think about these tools. Suppose you're building a system with: transactional order data (must be correct), a product catalog (variable attributes by category), full-text product search, a need for fast session lookups, and a "customers who bought this also bought" recommendation feature. A naive polyglot design might use five databases: relational for orders, document for the catalog, Elasticsearch for search, Redis for sessions, and a graph database for recommendations. Apply the chapter's judgment instead.

Orders need strong consistency and rich relational querying → relational (PostgreSQL), clearly. The variable catalog → PostgreSQL JSONB covers it (theme #4), no separate document store needed. Search → PostgreSQL full-text search is adequate for most catalogs, no Elasticsearch needed unless at large scale with sophisticated relevance needs. Sessions → Redis genuinely fits (fast key-value, ephemeral) and is the one place a specialist is clearly justified — added alongside PostgreSQL. Recommendations → "also bought" is a moderate relationship query that PostgreSQL joins and aggregation handle (it's not deep multi-hop traversal), so no graph database needed unless the recommendation logic becomes genuinely graph-heavy. The judged design: PostgreSQL for almost everything (orders, catalog via JSONB, search via full-text, recommendations via joins), plus Redis for sessions — two systems, not five. This is dramatically simpler to operate, with far less cross-system consistency burden, and it serves the actual needs. That's theme #4 and the polyglot-cost reasoning, applied: reach for PostgreSQL's full power first, add a specialist (Redis) only where genuinely justified.

Historically, NoSQL's trajectory reinforces this judgment. NoSQL emerged in the late 2000s amid web-scale pressures and considerable hype — some declared relational databases obsolete, and teams adopted NoSQL broadly, often for data that was fundamentally relational. The subsequent years brought a correction: many teams that had enthusiastically gone NoSQL discovered they missed transactions, joins, and schema discipline, and either returned to relational databases or struggled with the consequences. Meanwhile, relational databases absorbed NoSQL's good ideas — PostgreSQL added JSONB (document flexibility), full-text search, and rich extension capabilities — narrowing the cases where NoSQL was genuinely necessary. And NoSQL systems themselves added back stronger consistency options, acknowledging that the "scale at any cost, consistency be damned" stance went too far. The result, today, is a mature synthesis: NoSQL databases are recognized as specialists with genuine strengths for specific problems (not relational replacements), relational databases are recognized as the right default for most data (enhanced with NoSQL-inspired features), and the decision is made by matching tool to problem rather than by fashion. This trajectory — hype, correction, synthesis — is why the chapter's tone is balanced: NoSQL is neither the future-that-replaces-relational (the early hype) nor a mistake (the backlash), but a set of specialist tools to be chosen deliberately for the specific problems they fit, with relational as the well-justified default. Understanding this history helps you avoid both repeating the over-adoption mistake and dismissing genuinely useful tools.

Common mistakes

Choosing NoSQL for relational data because it's trendy — the Lumen disaster (Chapter 1): dental scheduling (deeply relational) in a document store sacrificed integrity, joins, and transactions for a flexibility it didn't need.
"Schemaless" = no schema — the schema just moves to your app, unenforced; bad data accumulates (Chapters 1, 14).
Replacing PostgreSQL when JSONB/full-text/CTEs would do — adding a whole system for a need PostgreSQL covers (theme #4).
Ignoring eventual consistency — using an eventually-consistent store for data that needs strong consistency (balances, inventory).
Polyglot sprawl — adding so many specialized databases that operating and keeping them consistent becomes the dominant cost.

ACID, BASE, and the consistency spectrum

The deepest trade-off NoSQL databases make is around consistency, and understanding the spectrum from strict ACID to relaxed BASE is essential to evaluating any non-relational option — because consistency is often the guarantee you're implicitly giving up, sometimes without realizing it. The contrast was previewed in Chapter 26; here it's central to the NoSQL decision.

Relational databases offer ACID (Chapter 26): strong consistency, where every read sees a consistent state and transactions are atomic, isolated, and durable. Many distributed NoSQL systems instead offer BASE — Basically Available, Soft state, Eventually consistent — trading strong consistency for availability and scale. "Eventually consistent" means that after a write, different parts of the system may temporarily disagree, converging to a consistent state over time — so a read shortly after a write might return stale data. This isn't a bug; it's a deliberate trade that enables the system to stay available and scale horizontally even when parts of it can't immediately coordinate (the CAP theorem of Chapter 35 formalizes why distributed systems face this trade). For some data, eventual consistency is perfectly acceptable: a social media like-count that's briefly off, a product-view counter, a recommendation that's slightly stale — these tolerate temporary inconsistency, and the availability and scale are worth it.

But for much data, strong consistency is essential, and eventual consistency is dangerous. An account balance that's "eventually consistent" could allow double-spending; an inventory count that's stale could oversell; a booking system that's eventually consistent could double-book. For these, the ACID guarantees are not optional — being briefly wrong is unacceptable. So the consistency requirement of your data is a primary factor in the relational-vs-NoSQL decision: data that must be correct at all times needs strong consistency (relational ACID, or a NoSQL system that offers it); data that tolerates brief staleness can accept eventual consistency in exchange for the availability and scale it buys. The mistake — using an eventually-consistent store for data that needs strong consistency — causes subtle, serious bugs (the double-spend, the oversold item) that pass testing and corrupt data in production under the concurrency and partition conditions where eventual consistency shows.

The mature view, which Chapter 37's decision framework develops, is that consistency is a spectrum and the right point depends on the data: know which of your data needs strong consistency (and use a system that provides it) and which tolerates eventual consistency (and can use a system that trades it for scale). The early NoSQL movement sometimes oversold "scale at any cost," leading teams to put consistency-critical data in eventually-consistent stores and suffer for it; the mature reaction recognizes that most data benefits from strong consistency (which is why relational ACID databases remain the default), while specific data and scale requirements justify the eventual-consistency trade. Understanding the spectrum — what ACID and BASE each guarantee, what each costs, and which your data needs — is what lets you make this trade deliberately rather than stumble into a consistency mismatch. It's perhaps the single most important consideration in choosing a database for data that matters, and it's why "what consistency does this data need?" should be among the first questions in any database decision.

PostgreSQL covers most of it (theme #4, fully)

The book's recurring theme #4 — PostgreSQL's full power often eliminates the need for another database — reaches its fullest statement here, in the chapter surveying the alternatives. Having seen what each NoSQL family offers, the striking realization is how much of it PostgreSQL already provides, so that a great many "we need NoSQL" situations are actually "we haven't used PostgreSQL's full capabilities" situations.

The mapping is direct. Document needs → PostgreSQL's JSONB (Chapter 16) stores and indexes JSON documents inside the relational database, providing document flexibility for the genuinely variable parts of your data while keeping relational integrity, joins, and transactions for the structured parts — the best of both, in one system. For most "we need flexible documents" cases, JSONB suffices, and you avoid adding a whole document database. Search needs → PostgreSQL's full-text search (Chapter 16) provides ranked, stemmed search adequate for most applications, without adding Elasticsearch. Key-value needs → a simple PostgreSQL table, or hstore, handles key-value storage (though for genuine high-throughput caching, Redis alongside PostgreSQL remains the pattern). Moderate graph/relationship needs → PostgreSQL's joins and recursive CTEs (Chapter 11) handle hierarchies and moderate-depth traversals well (a graph database only wins for deep, many-hop traversal). Even specialized needs → PostgreSQL extensions (Chapter 36's pgvector, PostGIS, TimescaleDB) extend it further into vector, geospatial, and time-series territory.

The practical force of this is that the default should be to reach for PostgreSQL's capabilities first, and add a separate system only when PostgreSQL genuinely can't meet a measured need. The benefit is enormous operational simplicity: one well-understood database to deploy, secure, back up, monitor, and keep consistent, versus a polyglot collection of specialized systems each demanding their own operations and expertise and the cross-system consistency work of keeping them in agreement. A team running a single PostgreSQL instance with JSONB, full-text search, and the right extensions can build remarkably capable applications without the sprawl — and they move faster and run more reliably for it. This isn't dogma ("never use another database") but informed default ("use PostgreSQL's full power first, add specialists only for measured needs"). The reason the industry so often over-reaches for NoSQL is underestimating PostgreSQL — not knowing it has JSONB, full-text, CTEs, extensions — which this book's deep treatment of those features (Part II especially Chapter 16) is meant to correct. Knowing PostgreSQL's full capabilities is precisely what lets you recognize when a "NoSQL need" is actually already covered, which is most of the time. That recognition — theme #4 made actionable — is one of the most valuable judgments a database practitioner can have, because it saves the substantial cost of unnecessary systems.

NewSQL: having it both ways

A development worth knowing, because it reflects the field's maturation, is NewSQL — a category of databases that aim to provide the horizontal scalability of NoSQL with the strong consistency and SQL interface of relational databases. NewSQL represents the recognition that the NoSQL trade — giving up consistency and SQL for scale — wasn't always necessary, and that you might have both.

The original NoSQL premise was that scaling across many machines required giving up strong consistency and the relational model (the early CAP-theorem interpretation, Chapter 35). NewSQL systems — Google Spanner, CockroachDB, YugabyteDB, and others — challenge that premise, using sophisticated distributed-systems techniques to provide distributed strong consistency: they scale horizontally across many machines (and even data centers) like NoSQL, but offer ACID transactions and a SQL interface like relational databases. They're harder to build (distributed strong consistency is a genuinely difficult problem, requiring clever techniques like synchronized clocks or consensus protocols), but they deliver something valuable: the scale you might have gone to NoSQL for, without giving up the consistency and SQL you'd have lost. For applications that genuinely need to scale beyond a single server and need strong consistency and relational querying, NewSQL is increasingly the answer — you no longer have to choose between scale and ACID.

NewSQL matters for the chapter's overall message because it further narrows the cases where you must give up relational guarantees. The decision used to be framed as "relational (consistent, but scales only vertically) versus NoSQL (scales horizontally, but eventually consistent)." NewSQL adds a third option: "distributed and consistent and SQL," removing the forced trade-off for many scale scenarios. So the modern landscape isn't relational-versus-NoSQL but a richer spectrum: single-node relational (PostgreSQL — the default, handles vast scale with partitioning and replication, Chapters 25/35), NewSQL (distributed relational with strong consistency, for genuine multi-node scale that needs consistency), and the NoSQL specialists (for the specific access patterns and the cases where eventual consistency is acceptable and beneficial). The existence of NewSQL means that "we need to scale beyond one server" no longer automatically means "give up consistency and SQL" — you can often have all three. This is part of why the relational model and SQL remain central even at scale: the techniques to scale them have advanced (replication, partitioning, NewSQL), so the scenarios forcing a move to eventually-consistent NoSQL have shrunk. The field has matured toward "you can usually have consistency and SQL, even at scale," which vindicates the book's relational focus while honestly acknowledging the genuine specialist and extreme-scale cases that remain.

The verdict: relational by default, specialist by need

Pulling the chapter together into a clear verdict, since "when should I use NoSQL?" deserves a direct answer grounded in everything covered. The verdict is: use a relational database (PostgreSQL) by default, and reach for a specialist only when a specific, measured need genuinely justifies it. This isn't relational dogma — it's the conclusion that follows from honestly weighing the trade-offs.

The default is relational for sound reasons. Most data is relational (entities that relate, queried across those relationships), most data benefits from strong consistency (being correct matters), most applications benefit from rich ad-hoc querying (SQL) and enforced integrity (constraints), and a single well-understood database is operationally far simpler than a polyglot collection. PostgreSQL specifically extends this default's reach enormously through JSONB, full-text search, extensions, partitioning, and replication — covering many "specialist" needs within one system (theme #4) and scaling far further than people assume. So the default handles the large majority of cases well, and choosing it avoids the costs (operational overhead, cross-system consistency) of unnecessary specialists. The specialists earn their place in specific situations: genuine sub-millisecond caching at scale (Redis, alongside the relational database), truly massive write throughput with fixed query patterns (Cassandra), deep multi-hop graph traversal as the central workload (a graph database), document data that genuinely never needs joins or transactions with effortless horizontal scaling (a document store), or scale-beyond-one-server that also needs consistency (NewSQL). Each is justified by a concrete characteristic — measured, not anticipated — that the relational default genuinely can't meet.

The decision factors — data model, consistency requirements, query patterns, and scale — are exactly Chapter 37's database-decision framework, which this chapter's survey feeds into. The skill is to characterize your actual need along those dimensions and match it to the right tool: relational for the common case, a specific specialist for the specific case it fits, deliberately and with the trade-offs understood. The two failure modes to avoid are over-reaching for NoSQL (choosing a specialist by fashion for data that's fundamentally relational — the Lumen disaster, the dominant historical mistake) and under-using PostgreSQL (adding a system for a need PostgreSQL already covers — failing theme #4). Between them lies the mature path: relational by default because it fits most needs and PostgreSQL extends that fit remarkably far, specialists by genuine measured need because some problems really do fit them better. That balanced judgment — neither relational dogmatism nor NoSQL fashion, but tool-matched-to-problem with relational as the well-justified default — is the verdict, and it's the judgment that the next four chapters of Part VI (warehousing, distributed systems, specialized databases, and the decision framework) refine into a complete approach to choosing where data should live.

Progressive project: evaluate the alternatives

For your project (don't change it — analyze):

Identify any part that might fit a NoSQL family (a cache? variable documents? a deep relationship traversal? massive write volume?).
For each, ask: does PostgreSQL already cover it? (JSONB for documents, full-text for search, recursive CTEs for hierarchies, a table for key-value.)
Decide honestly whether a separate system is justified — and write down the data-model/consistency/scale/query reason (or why PostgreSQL suffices).
Identify one realistic scenario (perhaps hypothetical for your domain) where a NoSQL database would genuinely be the right choice, and which family.

Summary

NoSQL databases drop some relational guarantees for scale, flexibility, or specialized access. The four families: document (MongoDB — flexible JSON docs, weak joins/integrity), key-value (Redis — fast lookups by key; caching/sessions), column-family (Cassandra — massive write throughput, design-around-queries), and graph (Neo4j — first-class relationships, deep traversal). Each trades ACID/strong consistency → BASE/eventual consistency, rich queries/joins → simple access patterns, and enforced schema → application responsibility. Crucially (theme #4), PostgreSQL often covers these needs without a separate system — JSONB for documents, full-text for search, CTEs/joins for moderate graphs — so reach for it first and avoid polyglot sprawl. But NoSQL genuinely wins for caching, extreme write scale, deep graph traversal, and scale-beyond-one-server — decided by data model, consistency, query patterns, and scale (Chapter 37). The mistake to avoid is choosing NoSQL by fashion for data that's fundamentally relational.

You can now: - Name the four NoSQL families and what each is genuinely good at. - Explain the trade-offs NoSQL makes (consistency, queries, schema). - Judge when PostgreSQL's features cover a "NoSQL" need (theme #4). - Recognize when NoSQL genuinely wins, and which family fits. - Avoid choosing NoSQL for fundamentally relational data.

What's next. Chapter 34 — Data Warehousing — databases built for analytics rather than transactions: OLTP vs. OLAP, the star schema and dimensional modeling, column-oriented storage, and the modern analytical databases (Snowflake, BigQuery, DuckDB).

Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.