Chapter 6 Key Takeaways
Environment and Setup
-
Always use virtual environments. Whether
venvor Conda, isolating dependencies per project prevents version conflicts and ensures reproducibility. A project without a virtual environment is a project waiting to break. -
Pin your dependencies. Use version ranges in
requirements.txt(e.g.,numpy>=1.24.0,<2.0.0) for development and exact pins viapip freezefor deployment. Reproducibility is not optional in quantitative work. -
Python 3.9+ is the minimum. We rely on modern type hinting syntax, dictionary union operators, and library features that require 3.9 or later. Prefer 3.10 or 3.11 for the best experience.
API Interaction
-
Build a base client class. The retry logic, rate limiting, error handling, and logging code is the same regardless of which prediction market platform you connect to. Write it once in a base class and subclass for each platform.
-
Exponential backoff with jitter is essential. When rate-limited, double the wait time on each retry and add a small random delay. This prevents the "thundering herd" problem where all clients retry simultaneously.
-
Handle every error type differently. A 429 (rate limited) means wait and retry. A 500 (server error) means the platform has a problem — retry cautiously. A 400 (bad request) means your code has a bug — do not retry.
-
Never trust API responses. Validate the data you receive. Check for missing fields, unexpected types, and unreasonable values before storing or acting on API data.
Code Organization
-
Separate concerns into modules. The
pmtoolspackage divides functionality intoprobability.py,data_models.py,visualization.py,database.py, andapi_client.py. Each module has a single purpose and can be tested independently. -
Use dataclasses for data structures. Python
dataclassesprovide type annotations, auto-generated methods, and IDE support. They make your data structures explicit and self-documenting, unlike plain dictionaries. -
Notebooks are for exploration; scripts are for production. Use Jupyter to investigate ideas interactively, then extract validated logic into tested Python modules. The notebook-to-module refactoring process is a core skill.
Data Management
-
SQLite is sufficient for single-user analysis. No server, no configuration, no installation — just a single file. It handles millions of price snapshots efficiently and supports the full SQL query language.
-
Use indexes on columns you query frequently. An index on
(market_id, timestamp)in the price snapshots table makes price history lookups orders of magnitude faster. -
Prefer Parquet over CSV for internal storage. Parquet preserves types, is compressed, and is dramatically faster to read. Use CSV only when sharing data with non-Python tools.
Security and Configuration
-
Never hardcode secrets. API keys, passwords, and private keys go in
.envfiles loaded at runtime withpython-dotenv. These files must be listed in.gitignore. A single leaked API key can result in financial loss. -
Layer your configuration. Defaults in code, overrides from config files, overrides from environment variables, overrides from command-line arguments. This hierarchy gives you flexibility without complexity.
Monitoring and Quality
-
Use the logging module, not print statements. Logging supports severity levels, file output, rotation, and filtering. Print statements are for interactive exploration only.
-
Test your code with pytest. Unit tests catch bugs early, serve as documentation, and give you confidence to refactor. Test edge cases: empty inputs, single elements, boundary values.
-
Run an environment verification script. Before starting any analysis, verify that all libraries are installed, the database is accessible, and API connections work. Five minutes of checking prevents five hours of debugging.
Version Control
-
Use Git from day one. Initialize a repository before writing your first line of code. Commit frequently with descriptive messages. Branch for experiments.
-
Know what to exclude. Your
.gitignoremust exclude.envfiles, data directories, virtual environments, compiled Python files, and IDE-specific configuration. When in doubt, exclude it.