Chapter 35 — Further Reading

The essential book (everyone)

  • Martin Kleppmann, Designing Data-Intensive Applications — Parts II–III (replication, partitioning/sharding, consistency, consensus, distributed transactions) are the modern treatment. If one resource, this. It makes CAP, eventual consistency, and consensus genuinely clear.

CAP & consistency (🔬 CS Student)

  • Eric Brewer's CAP theorem (original conjecture) and "CAP Twelve Years Later" — the nuanced, accurate version (the trade-off bites only during partitions).
  • PACELC — the extension covering latency-vs-consistency even without a partition.
  • Consistency models (linearizability, causal, eventual) explainers — the spectrum beyond "strong vs eventual."

Replication & PostgreSQL (🏗️ DBA)

  • PostgreSQL Docs: "High Availability, Load Balancing, and Replication." Streaming replication, synchronous replication, failover, logical replication.
  • Patroni / repmgr — tools for PostgreSQL HA/failover (if self-managing).
  • Managed services: Amazon RDS/Aurora, Google Cloud SQL/AlloyDB, Azure Database for PostgreSQL docs — replicas and failover without the ops.

Sharding & NewSQL (🏗️ DBA · 💻 Developer)

  • Citus (distributed PostgreSQL) docs — sharding PostgreSQL while keeping SQL.
  • CockroachDB / YugabyteDB / Google Spanner docs/papers — distributed SQL with ACID; the Spanner paper and Raft/Paxos consensus are worth reading for the how.
  • "When to shard" essays — the premature-sharding warning (Case Study 2).

Reference (this book)

  • Chapter 25 — Partitioning: sharding is partitioning across machines.
  • Chapter 26 — Transactions: distributed transactions extend these ideas.
  • Chapter 28 — Internals (WAL): the basis of streaming replication.
  • Chapter 37 — The Database Decision: where distribution fits the overall choice.

Do, don't just read

  • Set up a read replica (locally or on a managed service) and route reads to it; observe replication lag.
  • Reason through CAP for three datasets in your domain — which need CP, which can be AP?
  • Resist the urge to shard — estimate how far one server + replicas would carry your project before you'd genuinely need more.

Next: Chapter 36 — Time-Series, Vector, and Specialized Databases: the right tool for unusual data.