Case Study 2 — Choosing Badly: The Microservice-Database Zoo

The cautionary twin of Case Study 1. A team chose databases by hype and "best practice" slogans rather than by fit — ending with a sprawl of systems no one could fully operate, constant sync bugs, and a re-architecture. The framework would have prevented all of it.

Background

A mid-size startup rebuilt its product as microservices, and someone declared a principle: "each microservice owns its own database, and we pick the 'best' database for each service." The result, within a year, was a zoo: the orders service used PostgreSQL, the catalog service used MongoDB ("products are JSON"), the search service used Elasticsearch, the recommendations service used a graph database, the activity feed used Cassandra, plus Redis "for speed" — six different database technologies, chosen service-by-service by whoever built each, often based on a conference talk or a "what's hot" article.

No one had run a decision framework. Each choice was locally plausible ("products are documents → MongoDB") and globally disastrous.

What went wrong

The framework's neglected factors came due:

  • Operational complexity exploded. Six database technologies meant six sets of backups, monitoring, tuning, upgrades, failure modes, and on-call expertise. No single engineer understood all of them. Each was operated poorly because attention was split six ways. (The "operational complexity / team expertise" factor, ignored.)
  • Data was scattered and had to be synced. Information about a product lived in MongoDB (catalog), Elasticsearch (search), the graph DB (recommendations), and was referenced from PostgreSQL (orders). Keeping these consistent required brittle sync pipelines that constantly drifted — a product updated in MongoDB but stale in Elasticsearch and the graph DB. Cross-system "transactions" were impossible; data integrity across services was a perpetual fire.
  • Wrong fits hidden inside "best per service." The catalog's products were actually relational (they reference categories, suppliers, and are referenced by orders) — MongoDB's lack of joins/integrity caused the Lumen problems (Chapter 1). The activity feed didn't have Cassandra-scale volume; it just inherited Cassandra's awkwardness for no benefit. Each "best" choice was made on the data's surface ("looks like JSON," "it's a feed") rather than its essential nature and actual scale.
  • Cost ballooned — licensing/managed-service fees for six systems, plus the enormous people cost of operating them, plus the velocity cost of engineers fighting infrastructure instead of building features.

The breaking point: a data-integrity incident where a product's price was right in one store and wrong in three others, surfacing wrong prices to customers. Reconciling six sources of "truth" took the team days and shook confidence in all their data.

The fix: consolidate, then justify

The team ran the framework retroactively and consolidated:

  • PostgreSQL became the system of record for the relational core (orders, catalog, customers) — products went back into PostgreSQL (with JSONB for the genuinely variable attributes, Chapter 16), regaining joins, integrity, and transactions.
  • Search stayed on Elasticsearch only because it had genuinely outgrown PostgreSQL FTS — a justified addition, fed from PostgreSQL.
  • Recommendations: the graph features that needed deep traversal kept a graph database (justified, Chapter 33); the rest moved back to SQL.
  • Cassandra was dropped — the feed fit PostgreSQL fine at its actual scale.
  • Redis stayed as a cache (justified).

Six technologies became three (PostgreSQL + Elasticsearch + Redis + a small graph store for one feature), each justified by a concrete need, with PostgreSQL as the authoritative source of truth. Operational burden plummeted, sync bugs largely vanished (less to sync), and the team could finally reason about their data.

The analysis

  1. "Best database per service" is a trap without the framework. Choosing locally per service, by surface appearance or hype, ignores the global costs (operational complexity, sync, integrity across systems) — the factors that dominate at the system level. Each choice was plausible; the sum was a disaster.

  2. Operational complexity and team expertise are first-class factors. Six databases no one can fully operate is worse than three understood deeply. Every system added multiplies backups, monitoring, failure modes, and required expertise — a cost that's invisible at choice time and crushing later.

  3. Polyglot sprawl vs. justified polyglot. Case Study 1 added stores deliberately, each justified, with a system of record; this team accumulated them by fashion, with no authoritative source. Same "multiple databases," opposite discipline and outcome (Chapter 33).

  4. Match the data's essential nature, not its surface. "Products are JSON" → MongoDB ignored that products are deeply relational. "It's a feed" → Cassandra ignored the actual (modest) scale. The framework's "data model" and "honest scale" factors would have caught both.

  5. Consolidation is a valid, powerful move. Fewer, well-understood databases — with PostgreSQL as the default system of record and specialized stores added only on demonstrated need — is usually the right architecture. When in doubt, consolidate.

Discussion questions

  1. Which framework factors did "best database per service" ignore, and what was the cost of each?
  2. Why did scattering product data across stores cause the integrity incident?
  3. Contrast this team's polyglot persistence with Case Study 1's. What's the difference in discipline?
  4. Why was moving products back into PostgreSQL (with JSONB) the right call?
  5. ⭐ Write a one-paragraph database-decision policy for a microservices org that avoids both extremes (one giant shared DB vs. a zoo).