Chapter 6 Exercises

These exercises are organized into five parts of increasing complexity. Part A focuses on environment setup and verification. Part B covers API interaction patterns. Part C addresses the pmtools module. Part D deals with data management and visualization. Part E presents integration challenges that combine multiple skills.

Part A: Environment Setup and Verification

Exercise 1: Python Version Check Script

Write a Python script called version_check.py that:

Prints the Python version, including major, minor, and micro components
Checks whether the version is at least 3.9
Prints the path to the Python executable
Prints the path to the current working directory
Lists all paths in sys.path

If the version is below 3.9, the script should print a clear error message and exit with a non-zero exit code.

Exercise 2: Virtual Environment Exploration

Perform the following tasks and record your observations:

Create a new virtual environment called pm-test-env
Activate it and run pip list — how many packages are installed in a fresh environment?
Install numpy and run pip list again — how many packages now? Why more than one?
Run python -c "import numpy; print(numpy.__file__)" — where is numpy installed?
Deactivate the environment and run the same command — what happens?
Delete the virtual environment directory

Write your answers as comments in a Python script.

Exercise 3: Requirements File Parsing

Write a Python function parse_requirements(filepath: str) -> list[dict] that reads a requirements.txt file and returns a list of dictionaries. Each dictionary should contain:

name: the package name
min_version: the minimum version (or None if not specified)
max_version: the maximum version (or None if not specified)
exact_version: the exact version if pinned with == (or None)

The function should handle comments (lines starting with #) and blank lines. Test it with the requirements.txt from the chapter.

Exercise 4: Dependency Conflict Simulation

Create two separate requirements.txt files:

project_a_requirements.txt with numpy==1.24.0 and pandas==2.0.0
project_b_requirements.txt with numpy>=1.26.0 and scipy==1.11.0

Demonstrate (in writing or code) how virtual environments solve the conflict of Project A needing numpy 1.24 while Project B needs numpy 1.26+.

Exercise 5: Project Scaffolding Script

Write a Python script called scaffold_project.py that creates the complete project directory structure described in Section 6.1. The script should:

Accept a project name as a command-line argument
Create all directories (pmtools/, notebooks/, scripts/, data/raw/, data/processed/, tests/, configs/, logs/)
Create starter files: __init__.py, .gitignore, requirements.txt, .env.example
The .env.example file should contain placeholder keys with comments
Print a summary of everything created

python scaffold_project.py my-market-project

Part B: API Interaction Patterns

Exercise 6: Simple HTTP Client

Using the requests library, write a function fetch_json(url: str) -> dict that:

Makes a GET request to the provided URL
Sets a timeout of 15 seconds
Checks that the response status code is 200
Parses and returns the JSON body
Handles ConnectionError, Timeout, and HTTPError exceptions with informative messages

Test it with https://httpbin.org/get and https://httpbin.org/status/404.

Exercise 7: Rate Limiter Class

Implement a RateLimiter class that enforces a maximum number of requests per second:

class RateLimiter:
    def __init__(self, max_per_second: float):
        ...

    def wait(self):
        """Block until it is safe to make another request."""
        ...

Test it by making 10 requests in a loop with max_per_second=2 and verifying that the total elapsed time is at least 4.5 seconds.

Exercise 8: Exponential Backoff Implementation

Write a function retry_with_backoff(func, max_retries=5, base_delay=1.0) that:

Calls func() and returns the result if it succeeds
If func() raises an exception, waits and retries
Uses exponential backoff: delay = base_delay * 2^attempt
Adds random jitter between 0 and 1 second to each delay
Raises the last exception after all retries are exhausted
Logs each retry attempt with the delay and exception message

Test it with a function that fails randomly 70% of the time.

Exercise 9: Response Caching

Implement a simple response cache for API calls:

class CachedClient:
    def __init__(self, ttl_seconds: int = 300):
        """Cache API responses for ttl_seconds."""
        ...

    def get(self, url: str) -> dict:
        """Return cached response if available and not expired."""
        ...

Requirements: - Cache key is the URL - Cached responses expire after ttl_seconds - Provide a clear_cache() method - Provide a cache_stats() method that returns hit/miss counts

Exercise 10: Async API Fetcher

Using httpx and asyncio, write an async function that fetches data from multiple URLs concurrently:

async def fetch_all(urls: list[str], max_concurrent: int = 5) -> list[dict]:
    """
    Fetch all URLs concurrently with a concurrency limit.

    Args:
        urls: List of URLs to fetch
        max_concurrent: Maximum number of concurrent requests

    Returns:
        List of response dictionaries (in same order as urls)
    """

Use asyncio.Semaphore to limit concurrency. Test with 20 URLs to https://httpbin.org/delay/1 and verify the total time is significantly less than 20 seconds.

Part C: The pmtools Module

Exercise 11: Implied Probability with Multiple Outcomes

Extend the implied_probability function to handle markets with more than two outcomes. Write:

def implied_probabilities_multi(prices: list[float]) -> list[float]:
    """
    Calculate implied probabilities for a multi-outcome market.

    Removes the overround proportionally across all outcomes.

    Args:
        prices: List of prices for each outcome (should sum to > 1.0)

    Returns:
        List of probabilities that sum to 1.0
    """

Test with a three-outcome market: prices = [0.40, 0.35, 0.30] (total = 1.05, so 5% overround).

Exercise 12: Kelly Criterion Simulation

Write a simulation that demonstrates the Kelly criterion in action:

Start with a bankroll of $1,000
Simulate 500 binary markets where your true probability estimate is 0.60
The market price is 0.50 (so you have edge)
In one simulation, bet the full Kelly amount each time
In another, bet half-Kelly
In a third, bet a fixed 10% of bankroll each time
Plot all three bankroll trajectories on the same chart
Run 100 simulations of each strategy and plot the distribution of final bankrolls

Exercise 13: Data Model Validation

Extend the Market dataclass to include validation:

yes_price must be between 0 and 1
no_price must be between 0 and 1
volume must be non-negative
close_date must be after created_at (if both are set)
status must be a valid MarketStatus enum value

Use __post_init__ for validation and raise ValueError with descriptive messages for any violation. Write test cases for each validation rule.

Exercise 14: Market Comparison Function

Write a function that compares two markets and returns a summary:

def compare_markets(market_a: Market, market_b: Market) -> dict:
    """
    Compare two markets and return analysis.

    Returns dict with:
        - price_difference: difference in yes prices
        - probability_difference: difference in implied probabilities
        - volume_ratio: ratio of volumes
        - overround_comparison: which market has more vig
        - arbitrage_opportunity: bool, True if prices allow arbitrage
    """

Exercise 15: Brier Score Decomposition

Implement the Murphy decomposition of the Brier score into reliability, resolution, and uncertainty components:

def brier_decomposition(
    predicted: list[float],
    actual: list[int],
    n_bins: int = 10
) -> dict:
    """
    Decompose Brier score into components.

    Returns:
        Dictionary with keys: reliability, resolution, uncertainty, brier_score
        where brier_score = reliability - resolution + uncertainty
    """

Test with both well-calibrated and poorly-calibrated prediction sets.

Part D: Data Management and Visualization

Exercise 16: Database Migration Script

Write a script that adds a new column to the price_snapshots table called spread (the difference between yes and no prices). The script should:

Check if the column already exists (idempotent operation)
Add the column if it does not exist
Populate it for all existing rows
Verify the migration succeeded

This teaches you how to evolve your database schema over time.

Exercise 17: CSV to SQLite Importer

Write a script that imports CSV files of market data into the SQLite database:

def import_csv_to_db(
    csv_path: str,
    db_path: str,
    table_name: str,
    column_mapping: dict[str, str] | None = None
):
    """
    Import a CSV file into a SQLite database table.

    Args:
        csv_path: Path to the CSV file
        db_path: Path to the SQLite database
        table_name: Target table name
        column_mapping: Optional dict mapping CSV columns to DB columns
    """

Handle common issues: missing values, date parsing, duplicate detection, and type mismatches.

Exercise 18: Custom Visualization — Market Dashboard

Create a function that generates a four-panel dashboard for a single market:

Top left: Price history over time (line chart)
Top right: Volume by day (bar chart)
Bottom left: Return distribution (histogram)
Bottom right: Summary statistics table

The function should accept a DataFrame and market title, and optionally save to a file.

Exercise 19: Probability Heatmap

Create a visualization function that displays a heatmap of probabilities across multiple related markets over time. For example, if there are 5 candidates in an election market, the heatmap should show how each candidate's probability evolves.

def plot_probability_heatmap(
    data: pd.DataFrame,
    title: str = "Market Probability Heatmap",
    figsize: tuple = (14, 6)
) -> plt.Figure:
    """
    data: DataFrame with DatetimeIndex, columns are market/candidate names,
          values are probabilities
    """

Exercise 20: Animated Price Chart

Using matplotlib's FuncAnimation, create a function that generates an animated chart showing market price evolution over time. The animation should:

Start from the beginning of the price history
Progressively reveal the price line
Show the current price and timestamp as text
Optionally save as a GIF or MP4

Part E: Integration Challenges

Exercise 21: End-to-End Data Pipeline

Build a complete data pipeline script that:

Reads API configuration from .env
Connects to a prediction market API (use a mock if no real key)
Fetches the top 10 active markets
Stores market metadata in SQLite
Fetches price history for each market
Stores price snapshots
Generates a summary visualization (one chart showing all 10 price histories)
Logs each step and any errors
Saves a run summary to a JSON file

Exercise 22: Configuration Validation

Write a function that validates the complete project configuration:

def validate_config() -> list[str]:
    """
    Validate the project configuration and return a list of issues.

    Checks:
    - .env file exists and contains required keys
    - configs/settings.yaml is valid YAML with required sections
    - Database is accessible and schema is correct
    - All required directories exist
    - Log directory is writable
    - All Python dependencies are installed with correct versions

    Returns:
        List of issue descriptions (empty list = all good)
    """

Exercise 23: Mock API Server

Using Python's http.server module or Flask, create a simple mock prediction market API server that:

Returns realistic-looking market data at /api/markets
Returns price history at /api/markets/{id}/prices
Simulates rate limiting (429 responses after 10 requests per minute)
Simulates occasional server errors (500 responses, 5% of the time)

Use this mock server to test your API client without hitting real APIs.

Exercise 24: Notebook to Script Converter

Write a Python script that converts a Jupyter notebook (.ipynb) into a clean Python script:

Extract all code cells
Convert Markdown cells to comments
Remove magic commands (%matplotlib inline, etc.)
Add proper imports at the top (deduplicate imports scattered across cells)
Add a if __name__ == "__main__": block
Save as a .py file

Test it with a sample notebook you create.

Exercise 25: Portfolio Tracker

Build a complete portfolio tracking system using the pmtools module:

Define a Portfolio class that tracks positions across multiple markets
Implement methods: - add_trade(market_id, side, quantity, price) — record a trade - current_value(current_prices: dict) — calculate total portfolio value - pnl_report() — generate profit/loss report - risk_summary() — calculate portfolio risk metrics (total exposure, largest position, diversification score)
Store all data in SQLite
Generate a dashboard visualization showing: - Position sizes across markets - Unrealized P&L by market - Portfolio value over time

Submission Guidelines

For each exercise:

Create a separate Python file named exercise_XX.py (e.g., exercise_01.py)
Include docstrings explaining your approach
Include test cases that demonstrate your solution works
Handle edge cases and errors gracefully
Follow the code style established in this chapter (type hints, logging, docstrings)

For exercises that require running against live APIs, include instructions for using mock data as a fallback.