Chapter 42 Exercises: Extending the Capstone Trading System

These exercises guide you through extending, hardening, and improving the trading system built in this chapter. They are organized into five parts of increasing complexity, starting with component-level improvements and ending with full system extensions.

Part A: Data Pipeline Extensions (Exercises 1-6)

Exercise 1: Adding a New Platform Client

Write a complete API client class for Kalshi (or another prediction market platform of your choice). Your client must inherit from BasePlatformClient, implement all three abstract methods (fetch_markets, fetch_market, get_order_book), and normalize data into MarketSnapshot objects. Include proper error handling for rate limiting (HTTP 429 responses), using exponential backoff with jitter.

Exercise 2: WebSocket Streaming

Replace the polling-based data collection with a WebSocket client that receives real-time price updates. Implement the following:

A WebSocketFeed class that connects to a platform's WebSocket endpoint
Automatic reconnection with exponential backoff on disconnect
A callback mechanism so the DataAggregator receives updates as they arrive
A heartbeat checker that detects stale connections

Compare the latency of your WebSocket implementation versus polling every 60 seconds. How much faster do you receive price updates?

Exercise 3: Data Quality Validation

Add a DataValidator class that checks every incoming MarketSnapshot for:

Price sanity: yes_price and no_price are both in [0, 1]
Spread sanity: spread is non-negative and below a configurable threshold (e.g., 0.20)
Staleness: the snapshot timestamp is within the last N minutes
Completeness: no required fields are None or NaN
Consistency: yes_price + no_price is approximately 1.0 (within a tolerance for AMM spreads)

Write at least five unit tests verifying your validator correctly accepts valid data and rejects invalid data.

Exercise 4: Historical Data Backfill

Write a script that backfills historical price data from Polymarket's API into the DataStore. Your script should:

Accept a date range as command-line arguments
Handle API pagination correctly
Implement rate limiting to avoid being blocked
Resume from the last successfully stored timestamp if interrupted
Report progress and estimated time remaining

Exercise 5: Alternative Data Integration

Extend the SupplementaryDataFetcher to include at least two additional data sources:

Polling data: For political markets, fetch aggregated polling data from a public source (e.g., FiveThirtyEight, RealClearPolitics, or a polling API). Parse the data and compute features such as polling average, trend direction, and poll-to-market divergence.
Social media sentiment: Fetch recent posts from a social platform API related to a market's topic. Compute a sentiment score using a simple keyword-based approach or a pre-trained sentiment model.

Write tests to verify that your fetchers handle API errors gracefully and return sensible defaults when data is unavailable.

Exercise 6: Data Pipeline Monitoring

Add instrumentation to the data pipeline that tracks:

Number of markets fetched per platform per cycle
API response latency percentiles (p50, p95, p99)
Error rate by platform
Data freshness (time since last successful fetch per platform)

Create a PipelineHealthCheck class that returns a status of HEALTHY, DEGRADED, or UNHEALTHY based on these metrics. Define reasonable thresholds for each status level.

Part B: Model Improvements (Exercises 7-12)

Exercise 7: Feature Importance Analysis

After training the ensemble model on historical data, perform a feature importance analysis:

Use XGBoost's built-in feature importance (gain, weight, and cover)
Compute permutation importance by shuffling each feature and measuring the increase in Brier score
Calculate SHAP values for a sample of predictions

Create a visualization (using matplotlib) that compares the three methods. Which features are most important? Do the methods agree? Discuss any discrepancies.

Exercise 8: Calibration Improvement

The ensemble model uses isotonic calibration for the logistic regression component. Extend this by:

Applying Platt scaling (sigmoid calibration) as an alternative and comparing the calibration curves
Implementing a post-hoc recalibration step for the full ensemble output using a held-out calibration set
Building a calibration monitoring system that alerts when the model's ECE exceeds a threshold on recent predictions

Plot reliability diagrams before and after each calibration method. Which approach produces the lowest ECE?

Exercise 9: Online Learning

Modify the EnsembleModel to support incremental learning:

Replace the logistic regression with SGDClassifier (with log loss) that can be partially fit on new data
Implement a windowed retraining approach where the model is updated daily using the last 90 days of data
Compare the performance of the incrementally updated model versus the periodically retrained model

Track the Brier score over time for both approaches. Does online learning improve or degrade performance?

Exercise 10: Category-Specific Models

Instead of one model for all markets, train separate models for different market categories (e.g., politics, sports, crypto, science). Implement:

A ModelRegistry class that maps categories to trained models
Automatic model selection based on the market's category
A fallback to the general model when a category-specific model has insufficient training data

Compare the category-specific models' Brier scores against the general model. For which categories does specialization help?

Exercise 11: Ensemble Weight Optimization

The capstone system uses fixed weights of 0.4 (logistic regression) and 0.6 (XGBoost). Implement a weight optimization procedure:

On a validation set, search over weights in [0, 1] with step size 0.05 to find the pair that minimizes the Brier score
Implement a more sophisticated stacking approach where a meta-learner (another logistic regression) learns the optimal combination
Add a third model (e.g., a random forest or neural network) and optimize the three-model ensemble weights

Report the optimal weights and the improvement in Brier score over the fixed-weight baseline.

Exercise 12: Prediction Confidence Intervals

Extend the model to output not just a point probability estimate, but a confidence interval:

Use bootstrap resampling: train the ensemble on 50 bootstrap samples and take the 5th and 95th percentile of predictions as the confidence interval
Implement conformal prediction to produce statistically valid prediction intervals
Use the confidence interval width to modulate position sizing — wider intervals mean less confidence and smaller positions

Test that your confidence intervals have the correct coverage rate (i.e., the true outcome falls within the 90% CI approximately 90% of the time).

Part C: Strategy and Risk Extensions (Exercises 13-18)

Exercise 13: Multi-Outcome Market Support

The capstone system handles binary markets (YES/NO). Extend it to support multi-outcome markets (e.g., "Which candidate will win?"):

Modify MarketSnapshot to support N outcomes with N prices
Adjust the edge calculation to identify the most mispriced outcome
Implement Kelly criterion for multi-outcome markets (the multi-asset Kelly formula from Chapter 15)

Test your implementation on a market with 4 outcomes.

Exercise 14: Dynamic Kelly Adjustment

Instead of using a fixed Kelly fraction (0.25), implement a dynamic fraction that adjusts based on:

Recent model accuracy — if the model has been less accurate recently, reduce the fraction
Market regime — in high-volatility periods, use a smaller fraction
Portfolio concentration — if the portfolio is already concentrated, use a smaller fraction for new positions

Backtest the dynamic Kelly approach against the fixed approach. Does it improve the Sharpe ratio or reduce maximum drawdown?

Exercise 15: Correlation-Based Risk Limits

Implement a correlation-based position sizing system:

Estimate the correlation matrix between all open markets using historical price data
When adding a new position, check if it is highly correlated (above the configured threshold) with any existing position
If correlated, reduce the new position size proportionally to the correlation
Implement a portfolio-level risk measure (e.g., Value-at-Risk) that accounts for correlations

Test with synthetic data where you create deliberately correlated markets.

Exercise 16: Stop-Loss and Take-Profit Orders

Add stop-loss and take-profit functionality to the position manager:

For each position, set a stop-loss at a configurable percentage below entry (e.g., 30% loss)
Set a take-profit at a configurable percentage above entry (e.g., 50% profit)
Implement trailing stops that move up as the position becomes more profitable
When a stop is triggered, automatically generate a sell signal

Backtest the strategy with and without stops. Do stops improve the maximum drawdown? Do they hurt the total return?

Exercise 17: Transaction Cost Optimization

Implement strategies to minimize transaction costs:

Order timing: Analyze historical spread data to find times of day with the lowest spreads
Order splitting: For large orders, split into smaller chunks to reduce market impact
Patience parameter: Allow orders to rest at a better price for a configurable time before falling back to the current market price

Estimate the savings from each optimization technique on a simulated order flow.

Exercise 18: Portfolio Rebalancing Strategy

Implement a rebalancing strategy that decides when to adjust positions:

Threshold-based: Only rebalance when positions deviate from target weights by more than a threshold
Time-based: Rebalance at fixed intervals (hourly, daily)
Event-driven: Rebalance when the model's prediction changes by more than a threshold

Backtest all three approaches. Which minimizes transaction costs while keeping the portfolio close to optimal?

Part D: Infrastructure and Operations (Exercises 19-25)

Exercise 19: Database Migration to PostgreSQL

Migrate the system from SQLite to PostgreSQL:

Create a DatabaseFactory that returns the appropriate connection based on configuration
Write migration scripts that transfer existing data from SQLite to PostgreSQL
Implement connection pooling using asyncpg or psycopg2
Add database health checks to the monitoring dashboard

Verify that all queries produce identical results on both databases.

Exercise 20: Logging and Audit Trail

Implement a comprehensive audit trail system:

Every trading decision must be logged with full context: the model prediction, the market price, the computed edge, the Kelly fraction, and the resulting order
Use structured logging (JSON format) so logs can be parsed by log aggregation tools
Implement log rotation with configurable retention (e.g., keep 30 days)
Create a TradeReconstructor class that can replay the decision process for any historical trade given the audit log

Write a test that submits a trade and then reconstructs the decision from the audit trail.

Exercise 21: Graceful Shutdown

Implement a robust graceful shutdown procedure:

Handle SIGTERM and SIGINT signals
Stop accepting new trading signals
Cancel all open (unfilled) orders
Wait for pending orders to reach a terminal state (filled, cancelled, or expired)
Save the current portfolio state to the database
Flush all metrics and logs
Exit cleanly

Test by sending a SIGTERM during an active trading cycle and verifying no data is lost.

Exercise 22: Health Check Endpoint

Create an HTTP health check endpoint using a lightweight framework (e.g., aiohttp or FastAPI):

GET /health returns 200 if the system is healthy, 503 if degraded
GET /metrics returns the current metrics summary as JSON
GET /portfolio returns the current portfolio state
GET /risk returns the current risk report

Include authentication (API key in header) to prevent unauthorized access.

Exercise 23: Configuration Hot Reload

Implement the ability to reload configuration without restarting the system:

Watch the configuration file for changes (using watchdog or similar)
When a change is detected, validate the new configuration
Apply safe changes immediately (e.g., risk limits, alert thresholds)
Flag dangerous changes (e.g., disabling dry_run) that require manual confirmation
Log all configuration changes with before/after values

Test by modifying max_daily_loss while the system is running and verifying it takes effect.

Exercise 24: Multi-Instance Architecture

Design and implement a multi-instance deployment:

A "coordinator" instance that runs the data pipeline and model inference
Multiple "executor" instances, one per platform, that handle order execution
A shared message queue (e.g., Redis pub/sub) for communication between instances
Distributed locking to prevent duplicate order submission

Draw an architecture diagram and explain why this design improves reliability.

Exercise 25: Disaster Recovery

Implement a disaster recovery plan:

Automated database backups every 6 hours
Model checkpoint backups after every retraining
A SystemRecovery class that can restore the system from backups
Position reconciliation that compares local state with platform API state on startup
A "safe start" mode that loads the last known good state and pauses before trading

Test by simulating a database corruption and recovering from backup.

Part E: Advanced Extensions (Exercises 26-30)

Exercise 26: Cross-Platform Arbitrage Detector

Build an arbitrage detection module:

Identify the same event listed on multiple platforms (using question text similarity)
Compute the arbitrage opportunity: if Platform A prices YES at 0.40 and Platform B prices YES at 0.65, there is a 0.05 arbitrage after accounting for the combined YES+NO cost
Account for transaction costs, settlement times, and counterparty risk
Generate alerts when arbitrage opportunities exceed a configurable threshold

Test with historical data: how often do cross-platform arbitrage opportunities occur? How large are they? How long do they persist?

Exercise 27: Market Making Bot

Extend the system to act as a market maker:

Post both buy and sell orders at prices around the model's estimated probability
Adjust the spread based on inventory risk — widen the spread as position grows
Implement inventory management to prevent accumulating a large directional position
Calculate expected profit from market making (spread earned minus adverse selection cost)

This exercise combines concepts from Chapter 7 (order books), Chapter 15 (Kelly criterion), and Chapter 30 (market maker incentives).

Exercise 28: Natural Language Pipeline

Build an NLP-powered pipeline that:

Reads the question text of each market
Extracts named entities (people, organizations, events) using spaCy or a similar library
Searches for recent news about those entities
Generates a text-based summary of relevant information
Feeds the summary through a sentiment model to produce a feature for the prediction model

Test the impact on model performance: does adding NLP features improve the Brier score?

Exercise 29: Reinforcement Learning Trader

Replace the strategy engine with a reinforcement learning agent:

Define the state space: current prices, model predictions, position sizes, portfolio value
Define the action space: buy, sell, hold (with continuous sizing)
Define the reward function: risk-adjusted returns (Sharpe ratio or similar)
Train the agent using PPO (Proximal Policy Optimization) on historical data
Compare the RL agent's backtest performance against the Kelly-based strategy

This is a challenging exercise that combines ML concepts from Part IV with the trading frameworks from Part III.

Exercise 30: Full System Integration Test

Write a comprehensive integration test that:

Starts the full system with mock API responses (use respx or httpx.MockTransport)
Feeds in a scripted sequence of market data over 100 simulated time steps
Verifies that the system generates the expected trades at each step
Triggers a circuit breaker and verifies trading stops
Simulates a market resolution and verifies P&L is calculated correctly
Checks that all metrics, logs, and database records are consistent

This test should run in under 30 seconds and provide confidence that all components work together correctly.

Submission Guidelines

For each exercise:

Write clean, documented Python code following PEP 8
Include type hints on all function signatures
Write at least two unit tests per exercise
Document your design decisions in comments or docstrings
If the exercise asks you to compare approaches, include a brief writeup (200-500 words) analyzing your results

The exercises are designed to be completed independently. You do not need to finish them in order, though later exercises in each part build on earlier ones.