Chapter 6 Exercises
These exercises are organized into five parts of increasing complexity. Part A focuses on environment setup and verification. Part B covers API interaction patterns. Part C addresses the pmtools module. Part D deals with data management and visualization. Part E presents integration challenges that combine multiple skills.
Part A: Environment Setup and Verification
Exercise 1: Python Version Check Script
Write a Python script called version_check.py that:
- Prints the Python version, including major, minor, and micro components
- Checks whether the version is at least 3.9
- Prints the path to the Python executable
- Prints the path to the current working directory
- Lists all paths in
sys.path
If the version is below 3.9, the script should print a clear error message and exit with a non-zero exit code.
Exercise 2: Virtual Environment Exploration
Perform the following tasks and record your observations:
- Create a new virtual environment called
pm-test-env - Activate it and run
pip list— how many packages are installed in a fresh environment? - Install
numpyand runpip listagain — how many packages now? Why more than one? - Run
python -c "import numpy; print(numpy.__file__)"— where is numpy installed? - Deactivate the environment and run the same command — what happens?
- Delete the virtual environment directory
Write your answers as comments in a Python script.
Exercise 3: Requirements File Parsing
Write a Python function parse_requirements(filepath: str) -> list[dict] that reads a requirements.txt file and returns a list of dictionaries. Each dictionary should contain:
name: the package namemin_version: the minimum version (or None if not specified)max_version: the maximum version (or None if not specified)exact_version: the exact version if pinned with==(or None)
The function should handle comments (lines starting with #) and blank lines. Test it with the requirements.txt from the chapter.
Exercise 4: Dependency Conflict Simulation
Create two separate requirements.txt files:
project_a_requirements.txtwithnumpy==1.24.0andpandas==2.0.0project_b_requirements.txtwithnumpy>=1.26.0andscipy==1.11.0
Demonstrate (in writing or code) how virtual environments solve the conflict of Project A needing numpy 1.24 while Project B needs numpy 1.26+.
Exercise 5: Project Scaffolding Script
Write a Python script called scaffold_project.py that creates the complete project directory structure described in Section 6.1. The script should:
- Accept a project name as a command-line argument
- Create all directories (
pmtools/,notebooks/,scripts/,data/raw/,data/processed/,tests/,configs/,logs/) - Create starter files:
__init__.py,.gitignore,requirements.txt,.env.example - The
.env.examplefile should contain placeholder keys with comments - Print a summary of everything created
python scaffold_project.py my-market-project
Part B: API Interaction Patterns
Exercise 6: Simple HTTP Client
Using the requests library, write a function fetch_json(url: str) -> dict that:
- Makes a GET request to the provided URL
- Sets a timeout of 15 seconds
- Checks that the response status code is 200
- Parses and returns the JSON body
- Handles
ConnectionError,Timeout, andHTTPErrorexceptions with informative messages
Test it with https://httpbin.org/get and https://httpbin.org/status/404.
Exercise 7: Rate Limiter Class
Implement a RateLimiter class that enforces a maximum number of requests per second:
class RateLimiter:
def __init__(self, max_per_second: float):
...
def wait(self):
"""Block until it is safe to make another request."""
...
Test it by making 10 requests in a loop with max_per_second=2 and verifying that the total elapsed time is at least 4.5 seconds.
Exercise 8: Exponential Backoff Implementation
Write a function retry_with_backoff(func, max_retries=5, base_delay=1.0) that:
- Calls
func()and returns the result if it succeeds - If
func()raises an exception, waits and retries - Uses exponential backoff: delay = base_delay * 2^attempt
- Adds random jitter between 0 and 1 second to each delay
- Raises the last exception after all retries are exhausted
- Logs each retry attempt with the delay and exception message
Test it with a function that fails randomly 70% of the time.
Exercise 9: Response Caching
Implement a simple response cache for API calls:
class CachedClient:
def __init__(self, ttl_seconds: int = 300):
"""Cache API responses for ttl_seconds."""
...
def get(self, url: str) -> dict:
"""Return cached response if available and not expired."""
...
Requirements:
- Cache key is the URL
- Cached responses expire after ttl_seconds
- Provide a clear_cache() method
- Provide a cache_stats() method that returns hit/miss counts
Exercise 10: Async API Fetcher
Using httpx and asyncio, write an async function that fetches data from multiple URLs concurrently:
async def fetch_all(urls: list[str], max_concurrent: int = 5) -> list[dict]:
"""
Fetch all URLs concurrently with a concurrency limit.
Args:
urls: List of URLs to fetch
max_concurrent: Maximum number of concurrent requests
Returns:
List of response dictionaries (in same order as urls)
"""
Use asyncio.Semaphore to limit concurrency. Test with 20 URLs to https://httpbin.org/delay/1 and verify the total time is significantly less than 20 seconds.
Part C: The pmtools Module
Exercise 11: Implied Probability with Multiple Outcomes
Extend the implied_probability function to handle markets with more than two outcomes. Write:
def implied_probabilities_multi(prices: list[float]) -> list[float]:
"""
Calculate implied probabilities for a multi-outcome market.
Removes the overround proportionally across all outcomes.
Args:
prices: List of prices for each outcome (should sum to > 1.0)
Returns:
List of probabilities that sum to 1.0
"""
Test with a three-outcome market: prices = [0.40, 0.35, 0.30] (total = 1.05, so 5% overround).
Exercise 12: Kelly Criterion Simulation
Write a simulation that demonstrates the Kelly criterion in action:
- Start with a bankroll of $1,000
- Simulate 500 binary markets where your true probability estimate is 0.60
- The market price is 0.50 (so you have edge)
- In one simulation, bet the full Kelly amount each time
- In another, bet half-Kelly
- In a third, bet a fixed 10% of bankroll each time
- Plot all three bankroll trajectories on the same chart
- Run 100 simulations of each strategy and plot the distribution of final bankrolls
Exercise 13: Data Model Validation
Extend the Market dataclass to include validation:
yes_pricemust be between 0 and 1no_pricemust be between 0 and 1volumemust be non-negativeclose_datemust be aftercreated_at(if both are set)statusmust be a validMarketStatusenum value
Use __post_init__ for validation and raise ValueError with descriptive messages for any violation. Write test cases for each validation rule.
Exercise 14: Market Comparison Function
Write a function that compares two markets and returns a summary:
def compare_markets(market_a: Market, market_b: Market) -> dict:
"""
Compare two markets and return analysis.
Returns dict with:
- price_difference: difference in yes prices
- probability_difference: difference in implied probabilities
- volume_ratio: ratio of volumes
- overround_comparison: which market has more vig
- arbitrage_opportunity: bool, True if prices allow arbitrage
"""
Exercise 15: Brier Score Decomposition
Implement the Murphy decomposition of the Brier score into reliability, resolution, and uncertainty components:
def brier_decomposition(
predicted: list[float],
actual: list[int],
n_bins: int = 10
) -> dict:
"""
Decompose Brier score into components.
Returns:
Dictionary with keys: reliability, resolution, uncertainty, brier_score
where brier_score = reliability - resolution + uncertainty
"""
Test with both well-calibrated and poorly-calibrated prediction sets.
Part D: Data Management and Visualization
Exercise 16: Database Migration Script
Write a script that adds a new column to the price_snapshots table called spread (the difference between yes and no prices). The script should:
- Check if the column already exists (idempotent operation)
- Add the column if it does not exist
- Populate it for all existing rows
- Verify the migration succeeded
This teaches you how to evolve your database schema over time.
Exercise 17: CSV to SQLite Importer
Write a script that imports CSV files of market data into the SQLite database:
def import_csv_to_db(
csv_path: str,
db_path: str,
table_name: str,
column_mapping: dict[str, str] | None = None
):
"""
Import a CSV file into a SQLite database table.
Args:
csv_path: Path to the CSV file
db_path: Path to the SQLite database
table_name: Target table name
column_mapping: Optional dict mapping CSV columns to DB columns
"""
Handle common issues: missing values, date parsing, duplicate detection, and type mismatches.
Exercise 18: Custom Visualization — Market Dashboard
Create a function that generates a four-panel dashboard for a single market:
- Top left: Price history over time (line chart)
- Top right: Volume by day (bar chart)
- Bottom left: Return distribution (histogram)
- Bottom right: Summary statistics table
The function should accept a DataFrame and market title, and optionally save to a file.
Exercise 19: Probability Heatmap
Create a visualization function that displays a heatmap of probabilities across multiple related markets over time. For example, if there are 5 candidates in an election market, the heatmap should show how each candidate's probability evolves.
def plot_probability_heatmap(
data: pd.DataFrame,
title: str = "Market Probability Heatmap",
figsize: tuple = (14, 6)
) -> plt.Figure:
"""
data: DataFrame with DatetimeIndex, columns are market/candidate names,
values are probabilities
"""
Exercise 20: Animated Price Chart
Using matplotlib's FuncAnimation, create a function that generates an animated chart showing market price evolution over time. The animation should:
- Start from the beginning of the price history
- Progressively reveal the price line
- Show the current price and timestamp as text
- Optionally save as a GIF or MP4
Part E: Integration Challenges
Exercise 21: End-to-End Data Pipeline
Build a complete data pipeline script that:
- Reads API configuration from
.env - Connects to a prediction market API (use a mock if no real key)
- Fetches the top 10 active markets
- Stores market metadata in SQLite
- Fetches price history for each market
- Stores price snapshots
- Generates a summary visualization (one chart showing all 10 price histories)
- Logs each step and any errors
- Saves a run summary to a JSON file
Exercise 22: Configuration Validation
Write a function that validates the complete project configuration:
def validate_config() -> list[str]:
"""
Validate the project configuration and return a list of issues.
Checks:
- .env file exists and contains required keys
- configs/settings.yaml is valid YAML with required sections
- Database is accessible and schema is correct
- All required directories exist
- Log directory is writable
- All Python dependencies are installed with correct versions
Returns:
List of issue descriptions (empty list = all good)
"""
Exercise 23: Mock API Server
Using Python's http.server module or Flask, create a simple mock prediction market API server that:
- Returns realistic-looking market data at
/api/markets - Returns price history at
/api/markets/{id}/prices - Simulates rate limiting (429 responses after 10 requests per minute)
- Simulates occasional server errors (500 responses, 5% of the time)
Use this mock server to test your API client without hitting real APIs.
Exercise 24: Notebook to Script Converter
Write a Python script that converts a Jupyter notebook (.ipynb) into a clean Python script:
- Extract all code cells
- Convert Markdown cells to comments
- Remove magic commands (
%matplotlib inline, etc.) - Add proper imports at the top (deduplicate imports scattered across cells)
- Add a
if __name__ == "__main__":block - Save as a
.pyfile
Test it with a sample notebook you create.
Exercise 25: Portfolio Tracker
Build a complete portfolio tracking system using the pmtools module:
- Define a
Portfolioclass that tracks positions across multiple markets - Implement methods:
-
add_trade(market_id, side, quantity, price)— record a trade -current_value(current_prices: dict)— calculate total portfolio value -pnl_report()— generate profit/loss report -risk_summary()— calculate portfolio risk metrics (total exposure, largest position, diversification score) - Store all data in SQLite
- Generate a dashboard visualization showing: - Position sizes across markets - Unrealized P&L by market - Portfolio value over time
Submission Guidelines
For each exercise:
- Create a separate Python file named
exercise_XX.py(e.g.,exercise_01.py) - Include docstrings explaining your approach
- Include test cases that demonstrate your solution works
- Handle edge cases and errors gracefully
- Follow the code style established in this chapter (type hints, logging, docstrings)
For exercises that require running against live APIs, include instructions for using mock data as a fallback.