Chapter 35 — Further Reading
The essential book (everyone)
- Martin Kleppmann, Designing Data-Intensive Applications — Parts II–III (replication, partitioning/sharding, consistency, consensus, distributed transactions) are the modern treatment. If one resource, this. It makes CAP, eventual consistency, and consensus genuinely clear.
CAP & consistency (🔬 CS Student)
- Eric Brewer's CAP theorem (original conjecture) and "CAP Twelve Years Later" — the nuanced, accurate version (the trade-off bites only during partitions).
- PACELC — the extension covering latency-vs-consistency even without a partition.
- Consistency models (linearizability, causal, eventual) explainers — the spectrum beyond "strong vs eventual."
Replication & PostgreSQL (🏗️ DBA)
- PostgreSQL Docs: "High Availability, Load Balancing, and Replication." Streaming replication, synchronous replication, failover, logical replication.
- Patroni / repmgr — tools for PostgreSQL HA/failover (if self-managing).
- Managed services: Amazon RDS/Aurora, Google Cloud SQL/AlloyDB, Azure Database for PostgreSQL docs — replicas and failover without the ops.
Sharding & NewSQL (🏗️ DBA · 💻 Developer)
- Citus (distributed PostgreSQL) docs — sharding PostgreSQL while keeping SQL.
- CockroachDB / YugabyteDB / Google Spanner docs/papers — distributed SQL with ACID; the Spanner paper and Raft/Paxos consensus are worth reading for the how.
- "When to shard" essays — the premature-sharding warning (Case Study 2).
Reference (this book)
- Chapter 25 — Partitioning: sharding is partitioning across machines.
- Chapter 26 — Transactions: distributed transactions extend these ideas.
- Chapter 28 — Internals (WAL): the basis of streaming replication.
- Chapter 37 — The Database Decision: where distribution fits the overall choice.
Do, don't just read
- Set up a read replica (locally or on a managed service) and route reads to it; observe replication lag.
- Reason through CAP for three datasets in your domain — which need CP, which can be AP?
- Resist the urge to shard — estimate how far one server + replicas would carry your project before you'd genuinely need more.
Next: Chapter 36 — Time-Series, Vector, and Specialized Databases: the right tool for unusual data.