Chapter 6 Quiz

Test your understanding of Python environment setup, API interaction patterns, and software engineering best practices for prediction market work. Each question has one best answer unless stated otherwise.

Question 1

What is the primary purpose of a Python virtual environment?

A) To make Python run faster by optimizing the interpreter B) To isolate project dependencies from the system Python and other projects C) To encrypt your Python source code for security D) To allow running multiple Python scripts simultaneously

Question 2

You run pip freeze > requirements-lock.txt in your virtual environment. What does this command produce?

A) A file listing only the packages you explicitly installed B) A file listing every installed package with exact versions, including dependencies of dependencies C) A file listing packages that need to be updated D) A file listing packages that are incompatible with each other

Question 3

Which of the following is the correct way to specify a compatible version range in requirements.txt that allows minor updates but prevents major version changes?

A) numpy==1.24.0 B) numpy~=1.24.0 C) numpy>=1.24.0,<2.0.0 D) Both B and C are acceptable approaches

Question 4

In the PredictionMarketClient base class, what is the purpose of the _throttle() method?

A) To cancel requests that are taking too long B) To compress request data for faster transmission C) To enforce a minimum time interval between consecutive API requests D) To prioritize certain requests over others

Question 5

When implementing exponential backoff with jitter, why is the jitter (random delay component) important?

A) It makes the code more unpredictable, which helps with security B) It prevents multiple clients from retrying at exactly the same time after a rate limit C) It introduces randomness that helps the API server load-balance requests D) It is not actually important; it is included only for theoretical completeness

Question 6

What HTTP status code indicates that you have exceeded the API's rate limit?

A) 401 Unauthorized B) 403 Forbidden C) 429 Too Many Requests D) 503 Service Unavailable

Question 7

In our API client design, why do we catch ConnectionError and Timeout exceptions separately rather than using a single broad except Exception clause?

A) Broad exception handlers are syntactically invalid in Python B) Different exception types require different handling strategies, and broad handlers can mask unexpected bugs C) Python does not support catching multiple exception types D) The requests library only raises ConnectionError and Timeout, never other exceptions

Question 8

What is the advantage of using Python dataclasses (as in data_models.py) over plain dictionaries for representing markets and orders?

A) Dataclasses are faster to create and access than dictionaries B) Dataclasses provide type safety, auto-generated methods, IDE support, and serve as documentation of the data structure C) Dictionaries cannot store the same types of data as dataclasses D) Dataclasses automatically validate data types at runtime

Question 9

In the MarketDatabase class, what does PRAGMA journal_mode=WAL do?

A) Enables Write-Ahead Logging for better concurrent read/write access B) Activates a Web Application Layer for remote database access C) Turns on Warning And Logging for debugging database queries D) Enables Write Access Locking to prevent data corruption

Question 10

Why does the chapter recommend Parquet over CSV for internal data storage?

A) Parquet files are human-readable, while CSV files are not B) Parquet is smaller (compressed), faster to read, and preserves data types; CSV loses type information C) CSV is a proprietary format that requires a license, while Parquet is open source D) Parquet supports real-time data streaming, while CSV is only for batch processing

Question 11

Consider this logging configuration:

console_handler.setLevel(logging.INFO)
file_handler.setLevel(logging.DEBUG)

What is the practical effect of these two different levels?

A) The console shows only INFO and above, while the log file captures everything including DEBUG messages B) The console shows only DEBUG messages, while the file shows only INFO messages C) Both handlers show the same messages; the levels are redundant D) The file handler is disabled because DEBUG is lower priority than INFO

Question 12

What is wrong with this code for loading API keys?

# config.py
POLYMARKET_KEY = "pk_live_abc123def456"
KALSHI_KEY = "kalshi_prod_xyz789"

A) The variable names should be lowercase B) The keys should be stored in a separate .env file and loaded at runtime, never hardcoded in source files C) Python strings cannot store API keys D) The keys should be encrypted with base64 encoding

Question 13

In the .gitignore file, what does the pattern *.py[cod] match?

A) Only files ending in .pyc B) Files ending in .pyc, .pyo, or .pyd (Python compiled files) C) All Python files D) Files named pycod with any extension

Question 14

When should you use a Jupyter notebook instead of a Python script, according to the chapter?

A) Always — notebooks are superior to scripts in every way B) For exploratory data analysis, visualization development, and prototyping C) Only when you need to share code with non-programmers D) For production trading logic and scheduled data collection

Question 15

What does the @abstractmethod decorator do in the PredictionMarketClient base class?

A) It marks the method as deprecated and scheduled for removal B) It prevents the method from being called directly C) It forces subclasses to provide their own implementation of the method; the base class cannot be instantiated without it D) It makes the method run in a separate thread for performance

Question 16

The Brier score for a set of predictions is calculated as the mean of (prediction - outcome)^2. A forecaster achieves a Brier score of 0.18. What does this tell you?

A) The forecaster is perfectly calibrated B) The forecaster's predictions have moderate accuracy; lower would be better, and 0 would be perfect C) The forecaster is performing worse than random chance D) The Brier score of 0.18 is meaningless without knowing the base rate of outcomes

Question 17

In the kelly_fraction function, what does the fractional parameter control?

A) The fraction of your total wealth to use as bankroll B) The fraction of the full Kelly bet to actually wager, as a risk management measure C) The fractional probability used in the calculation D) The minimum bet size as a fraction of the market price

Question 18

What is the purpose of creating indexes on the price_snapshots table?

CREATE INDEX idx_snapshots_market_time ON price_snapshots(market_id, timestamp);

A) To prevent duplicate entries B) To encrypt the data for security C) To speed up queries that filter by market_id and/or timestamp D) To reduce the size of the database file

Question 19

You are building a data collection script that runs every 15 minutes via a cron job. Which combination of features from this chapter is most important for this script?

A) Jupyter notebook integration and seaborn visualization B) Error handling with retry logic, logging, database storage, and configuration management C) Type hints and dataclass validation D) Git branching and virtual environment management

Question 20

You have the following market data:

Market A: Yes price = 0.55, No price = 0.50
Market B: Yes price = 0.62, No price = 0.42

Calculate the overround for each market and identify which has higher vig.

A) Market A overround = 5%, Market B overround = 4%. Market A has higher vig. B) Market A overround = 5%, Market B overround = 4%. Market B has higher vig. C) Market A overround = 0.5%, Market B overround = 0.4%. Market A has higher vig. D) Market A overround = 10%, Market B overround = 8%. Market A has higher vig.

Answer Key

1. B — Virtual environments isolate dependencies per project, preventing version conflicts between projects and the system Python.

2. B — pip freeze outputs every installed package with exact version pins, including transitive dependencies (dependencies of your dependencies).

3. D — Both ~=1.24.0 (compatible release, allows >=1.24.0, <1.25.0) and >=1.24.0,<2.0.0 are acceptable, though they define slightly different ranges. The ~= operator is more restrictive (minor version only), while the range specifier allows any version below 2.0.0.

4. C — _throttle() ensures a minimum time gap between requests to avoid exceeding the API's rate limit. It checks the time since the last request and sleeps if necessary.

5. B — Jitter prevents the "thundering herd" problem. Without jitter, clients that all hit the rate limit simultaneously would all retry at the same time, causing another rate limit event.

6. C — HTTP 429 (Too Many Requests) is the standard status code for rate limiting. 401 is authentication failure, 403 is authorization failure, and 503 is server overload (which may also indicate rate limiting in some implementations).

7. B — Different exceptions need different responses: a ConnectionError might mean the server is down (retry later), while a Timeout might mean the request was too complex (retry with different parameters). A broad except Exception would also catch programming errors like TypeError, hiding bugs.

8. B — Dataclasses provide type annotations (documentation and IDE autocomplete), automatically generate __init__, __repr__, and __eq__ methods, and make the data structure explicit. They do not enforce types at runtime without additional validation.

9. A — WAL (Write-Ahead Logging) is a SQLite journaling mode that allows concurrent reads while a write is in progress, improving performance for our use case of writing price snapshots while reading data for analysis.

10. B — Parquet is a columnar binary format that is compressed (smaller files), preserves column types (no need to specify parse_dates when reading), and is significantly faster to read for large datasets. CSV is text-based and loses type information.

11. A — Each handler independently filters by level. The console handler only passes through messages at INFO level and above (INFO, WARNING, ERROR, CRITICAL). The file handler passes through everything at DEBUG and above, providing a complete record for troubleshooting.

12. B — Hardcoding secrets in source files means they will be committed to version control and potentially exposed. The correct approach is to store them in a .env file (excluded by .gitignore) and load them with python-dotenv at runtime.

13. B — The pattern *.py[cod] uses character class syntax: [cod] matches a single character that is either c, o, or d. So it matches .pyc, .pyo, and .pyd files — all forms of compiled Python.

14. B — Notebooks are ideal for exploratory work, visualization development, and prototyping where you need to see results immediately. Production code (automated pipelines, trading logic) should be in scripts with proper error handling and testing.

15. C — @abstractmethod makes the class abstract: you cannot create an instance of PredictionMarketClient directly. You must create a subclass that implements all abstract methods. This enforces the contract that every platform client must define its own _setup_session, _parse_markets, and _markets_endpoint.

16. B — A Brier score of 0.18 indicates moderate accuracy. The scale runs from 0 (perfect) to 2 (worst possible, though practically 1 is the baseline for a naive "always predict 50%" strategy on balanced data). While context (base rate) helps interpretation, 0.18 is generally considered decent performance.

17. B — The fractional parameter implements "fractional Kelly" — a common risk management technique where you bet a fraction (often 0.5 for "half Kelly") of the theoretically optimal Kelly amount. This reduces variance in returns at the cost of slightly lower expected growth rate.

18. C — Indexes create a sorted data structure (typically a B-tree) that allows the database engine to quickly locate rows matching a condition without scanning the entire table. A query like WHERE market_id = 'X' AND timestamp > '2024-01-01' runs orders of magnitude faster with this index.

19. B — A scheduled data collection script needs: error handling with retries (APIs fail), logging (to diagnose issues when you are not watching), database storage (to persist collected data), and configuration management (API keys, schedule parameters). Visualization and notebooks are for interactive analysis.

20. A — Overround = yes_price + no_price - 1.0. Market A: 0.55 + 0.50 - 1.0 = 0.05 (5%). Market B: 0.62 + 0.42 - 1.0 = 0.04 (4%). Market A has higher vig, meaning the market maker takes a larger cut.