Chapter 33: Key Takeaways
Why Scaling Matters
- Prediction market traffic is extremely bursty: peak-to-normal ratios of 50:1 to 500:1 are common. Election nights, breaking news, and surprise events produce the most valuable information and the most traffic simultaneously.
- A platform that buckles under peak load fails at the exact moment it matters most. Scaling is not a luxury; it is a core requirement.
- Service Level Objectives (SLOs) drive architecture: order latency p50 < 50 ms, p99 < 200 ms, availability 99.95%, and exactly-once trade semantics.
Event Sourcing
- Event sourcing stores all changes as an immutable, append-only sequence of events. Current state is derived by replaying events: $S(t) = \text{fold}(f, S_0, [e_1, \ldots, e_n])$.
- Benefits specific to prediction markets: regulatory audit trails, dispute resolution, temporal queries ("What was the price at 9:47 PM?"), replay-based debugging, and event-driven architecture.
- Key event types:
MarketCreated,OrderPlaced,OrderMatched,OrderCancelled,MarketResolved,FundsDeposited,PayoutIssued. - Snapshots prevent expensive full replays: capture state periodically, then replay only events after the snapshot.
- Optimistic concurrency control prevents conflicting updates: each event carries a version number checked against the aggregate's current version.
CQRS (Command Query Responsibility Segregation)
- Separate the write model (commands: place order, create market) from the read model (queries: display prices, show portfolio).
- The write path optimizes for consistency and correctness. The read path optimizes for throughput and flexible querying.
- Read models are projections of the event stream, denormalized for specific access patterns: order book view, market summary view, portfolio view, leaderboard view.
- Eventual consistency between write and read models is acceptable. Staleness bound: $\delta + \pi < 10\text{ms}$ in practice.
- CQRS enables independent scaling: a single, powerful write server paired with many read replicas.
Database Optimization
- Schema design must reflect access patterns: current prices (very frequent), open orders (matching engine), trader positions (portfolio), recent trades (charts).
- Partial indexes are critical:
CREATE INDEX ... WHERE status = 'open'keeps the matching engine's index small as the orders table grows. - Connection pooling (PgBouncer): pool size = $C \times (Q_{\text{avg}} + L_{\text{avg}}) / 1000$. Transaction pooling mode allows thousands of app connections to share dozens of database connections.
- Read replicas distribute read-heavy workloads. Route read-only queries (market listings, portfolios) to replicas; reserve the primary for writes.
- Table partitioning by time for trades: queries with time range predicates automatically target only relevant partitions.
Caching Strategies
- Cache value formula: $V_{\text{cache}} = f_{\text{access}} \times c_{\text{compute}} / r_{\text{change}}$. High-value targets: market list, leaderboard, user portfolios.
- Multi-level caching: L1 (in-process, nanoseconds), L2 (Redis, microseconds), L3 (database, milliseconds).
- Cache invalidation: hybrid strategy combining TTL-based expiration (safety net), event-driven invalidation (freshness), and write-through for critical data.
- Market prices use short TTLs (2 seconds) with event-driven invalidation. Leaderboards use long TTLs (5 minutes).
- Cache stampede prevention: use locks to ensure only one request recomputes an expired value while others wait.
Message Queues and Async Processing
- Inter-service communication via message queues decouples producers from consumers and enables independent scaling.
- Priority queues: order matching (critical), settlement (high), notifications (normal), analytics (low).
- Dead letter queues: messages that fail after max retries are captured for manual inspection, not silently dropped.
- Retry strategy: exponential backoff with jitter: $t_{\text{retry}}(n) = t_{\text{base}} \cdot 2^{n-1} + \text{jitter}$.
Horizontal Scaling
- Stateless API servers: sessions in Redis, no local state. Scale by adding instances behind a load balancer.
- Matching engine sharding: distribute markets across shards using consistent hashing. Each shard processes its markets independently.
- WebSocket separation: dedicated WebSocket servers subscribe to Redis Pub/Sub for price updates. IP-hash load balancing for session affinity.
- Auto-scaling policies: scale up on leading indicators (CPU > 70%, p99 latency > 150 ms, queue depth > 1000). Scale down conservatively with longer evaluation periods.
Monitoring and Alerting
- The four golden signals: latency, traffic, errors, saturation. Every service must emit all four.
- Prediction-market-specific metrics: matching engine queue depth, bid-ask spread, order-to-trade latency, WebSocket connection count.
- Prometheus + Grafana is the standard stack: structured metrics, flexible dashboards, alerting rules.
- Anomaly detection: rolling z-score on order volume flags unusual activity (potential manipulation or system issues).
- Alert on symptoms (user-facing impact), not causes. Page on error rate, not CPU.
Security
- Rate limiting with token bucket algorithm: different tiers for different endpoints (order submission vs. price reads).
- DDoS mitigation: CDN-level rate limiting, challenge pages for suspicious traffic, circuit breakers.
- Input validation: all prices between 0 and 1, quantities positive, idempotency keys on all mutations.
- TLS everywhere: all inter-service communication encrypted in transit.
Disaster Recovery
- Recovery Time Objective (RTO): maximum acceptable downtime. Target: < 5 minutes.
- Recovery Point Objective (RPO): maximum acceptable data loss. Target: 0 (no data loss via continuous WAL archiving).
- Regular failover drills: test that automated failover works before you need it in production.
- Event sourcing naturally supports disaster recovery: the event log is the recovery mechanism.
Key Formulas
| Formula | Description |
|---|---|
| $R = T_{\text{peak}} / T_{\text{normal}}$ | Peak-to-normal traffic ratio |
| $S(t) = \text{fold}(f, S_0, [e_1, \ldots, e_n])$ | Event sourcing state reconstruction |
| $\text{staleness}_{\max} = \delta + \pi$ | CQRS consistency bound |
| $V_{\text{cache}} = f \times c / r$ | Cache value formula |
| $t_{\text{retry}} = t_{\text{base}} \cdot 2^{n-1}$ | Exponential backoff retry delay |
| $\text{pool} = C \times (Q + L) / 1000$ | Connection pool sizing |