Chapter 41: Exercises
Foundations and Concepts
Exercise 1: LLM Calibration Analysis
You have the following LLM probability forecasts and outcomes for 10 questions:
| Question | LLM Forecast | Outcome |
|---|---|---|
| Q1 | 0.85 | 1 |
| Q2 | 0.30 | 0 |
| Q3 | 0.70 | 1 |
| Q4 | 0.55 | 0 |
| Q5 | 0.90 | 1 |
| Q6 | 0.20 | 1 |
| Q7 | 0.60 | 1 |
| Q8 | 0.75 | 0 |
| Q9 | 0.40 | 0 |
| Q10 | 0.95 | 1 |
(a) Compute the Brier score for this set of forecasts. (b) Divide forecasts into 5 bins ([0, 0.2], (0.2, 0.4], (0.4, 0.6], (0.6, 0.8], (0.8, 1.0]) and compute the expected calibration error (ECE). (c) Is this forecaster overconfident or underconfident? Justify your answer.
Exercise 2: Prompt Strategy Comparison
Design three different prompts (base rate, adversarial, decomposition) for the following prediction question: "Will the James Webb Space Telescope discover evidence of biosignatures in an exoplanet atmosphere before 2030?" Write out the full prompts and explain which cognitive biases each strategy is designed to mitigate.
Exercise 3: Human-AI Forecast Aggregation
A human forecaster estimates $P(\text{event}) = 0.60$ and an AI forecaster estimates $P(\text{event}) = 0.80$. The market price is 0.65.
(a) Compute the extremized combined forecast with $\alpha = 0.5$. (b) Compute the Kelly optimal bet size for the combined forecast against the market price, assuming a bankroll of $1000 and the market uses a standard binary contract paying $1. (c) How would you determine the optimal $\alpha$ in practice?
Exercise 4: Reinforcement Learning MDP Formulation
Formalize the market-making problem for a binary prediction market as an MDP. Specify: (a) The state space (what variables matter?) (b) The action space (what can the market maker do?) (c) The transition dynamics (how does the state evolve?) (d) The reward function (what are we optimizing, and what constraints matter?) (e) Why is this problem harder than standard market making in financial markets?
Exercise 5: Zero-Knowledge Proof Concepts
A trader wants to prove they have a sufficient balance to execute a trade without revealing their actual balance.
(a) Formalize this as a zero-knowledge proof: what is the statement, the witness, and the relation? (b) Explain why a simple "show the balance, then forget it" approach does not satisfy zero-knowledge. (c) Describe at a high level how a Pedersen commitment scheme could be used. (d) What is the difference between computational and statistical zero-knowledge? Which is more practical?
Mathematical Analysis
Exercise 6: Differential Privacy for Price Publication
A prediction market with LMSR (liquidity parameter $b = 100$) wants to publish the current price with $\epsilon = 0.1$ differential privacy.
(a) What is the sensitivity of the price to a single trader's action (assuming maximum trade size is 10 shares)? (b) What is the standard deviation of the Laplace noise that must be added? (c) If the true price is 0.65, what is the probability that the published price is between 0.55 and 0.75? (d) How does increasing $b$ affect the privacy-utility tradeoff?
Exercise 7: Shamir Secret Sharing
Using Shamir's (3, 5) secret sharing scheme with prime modulus $p = 97$:
(a) Share the secret $s = 42$ by choosing random coefficients $a_1 = 15, a_2 = 7$. (b) Compute the shares for parties $1, 2, 3, 4, 5$. (c) Using shares from parties 1, 3, and 5, reconstruct the secret using Lagrange interpolation. (d) Show that shares from only parties 1 and 2 do not uniquely determine the secret.
Exercise 8: Bayesian Truth Serum Scoring
In a BTS implementation with 5 agents, each reporting on a binary question:
| Agent | Report | Prediction of "Yes" frequency |
|---|---|---|
| A1 | Yes | 0.60 |
| A2 | Yes | 0.70 |
| A3 | No | 0.40 |
| A4 | Yes | 0.50 |
| A5 | No | 0.30 |
(a) Compute the empirical frequency of "Yes" (excluding each agent's own report for their score). (b) Compute the geometric mean prediction for "Yes" (excluding each agent). (c) Compute the information score for each agent. (d) Which agents received the highest information scores, and why?
Exercise 9: Neural AMM Properties
Consider a neural AMM where the cost function is parameterized as $C_\theta(\mathbf{q}) = \text{NN}_\theta(\mathbf{q})$.
(a) What mathematical properties must $C_\theta$ satisfy for the AMM to produce valid prices? (b) How would you enforce the constraint $\sum_i \partial C_\theta / \partial q_i = 1$ during training? (c) What is the relationship between the convexity of $C_\theta$ and the market maker's worst-case loss? (d) Compare the expressiveness of a neural AMM to the LMSR. Can the neural AMM learn the LMSR as a special case?
Exercise 10: Causal vs. Observational Markets
Consider two markets: - Market A: "What will unemployment be in 12 months?" (observational) - Market B: "If the minimum wage increases by 20%, what will unemployment be in 12 months?" (causal)
(a) Formally express each market's target using the potential outcomes framework. (b) Explain why the prices in these two markets should differ even if the minimum wage increase is expected. (c) How would you design a settlement mechanism for Market B? (d) What manipulation concerns arise in Market B that do not arise in Market A?
Implementation
Exercise 11: LLM Forecasting Pipeline
Implement a Python function that takes a question string and returns structured LLM forecasts using three prompting strategies (base rate, adversarial, decomposition). The function should: - Generate the three prompts - Simulate LLM responses (since we cannot call the API in an exercise) - Extract probability estimates from each response - Return the individual estimates and their geometric mean
Exercise 12: Calibration Visualization
Write a Python function that takes arrays of forecasts and outcomes and produces: (a) A calibration plot (predicted probability vs. actual frequency) (b) A reliability diagram with confidence intervals (using binomial confidence intervals) (c) The Brier score decomposition (reliability, resolution, uncertainty)
Exercise 13: Pedersen Commitment Implementation
Implement a simplified Pedersen commitment scheme in Python (using modular arithmetic with a large prime, not elliptic curves): (a) Key generation: Choose generators $g$ and $h$ for a group $\mathbb{Z}_p^*$. (b) Commit: Given a value $v$ and randomness $r$, compute $C = g^v h^r \mod p$. (c) Open: Given $v, r, C$, verify that $C = g^v h^r \mod p$. (d) Demonstrate the hiding and binding properties.
Exercise 14: Differential Privacy Mechanism
Implement a differentially private mechanism for publishing prediction market volume statistics: (a) Implement the Laplace mechanism with configurable $\epsilon$. (b) Implement a privacy budget tracker using basic and advanced composition. (c) Simulate 1000 trading periods, publishing DP volume after each period. (d) Plot the privacy budget consumption over time under both composition theorems.
Exercise 15: Peer Prediction Simulator
Implement a simulation of the Bayesian Truth Serum mechanism: (a) Generate a population of agents with known types (honest, lazy, strategic). (b) Simulate signal observations from a common prior. (c) Compute BTS scores for each agent. (d) Verify that honest reporting yields higher expected scores than lazy or strategic reporting.
Advanced Problems
Exercise 16: Adversarial Robustness of LLM Forecasts
Design an experiment to test the adversarial robustness of LLM forecasters: (a) Define a set of 10 "adversarial prompts" — questions designed to exploit known LLM biases. (b) For each adversarial prompt, define the "correct" probability and the expected LLM response. (c) Propose a defense mechanism (e.g., adversarial training, prompt filtering). (d) Analyze the tradeoff between robustness and forecasting accuracy.
Exercise 17: Privacy-Utility Frontier
Derive the privacy-utility frontier for a simple prediction market model: - There are $n$ traders, each with a private signal $s_i \in \{0, 1\}$. - The market publishes an aggregate statistic $\hat{p} = \frac{1}{n}\sum_i s_i + \text{noise}$. - Utility is defined as $U = -(\hat{p} - p^*)^2$ where $p^* = \frac{1}{n}\sum_i s_i$ is the true proportion.
(a) Compute the utility as a function of $\epsilon$ for the Laplace mechanism. (b) Plot the privacy-utility frontier for $n \in \{10, 100, 1000\}$. (c) At what value of $n$ is $\epsilon = 1$ differential privacy "practically free" (utility loss < 1%)?
Exercise 18: Automated AMM Design
Design and implement a simple automated mechanism design pipeline: (a) Define a parameterized family of cost functions: $C_\alpha(\mathbf{q}) = \alpha \log(\sum_i e^{q_i/\alpha})$ (this is the LMSR with parameter $\alpha$). (b) Define an objective function that trades off information aggregation accuracy against market maker loss. (c) Implement gradient-based optimization to find the optimal $\alpha$ for a simulated trading environment. (d) Extend the family to include a second parameter and re-optimize. Does the extra parameter help?
Exercise 19: Decision Market Simulation
Simulate a decision market with the following setup: - There are two possible policies, $A$ and $B$. - Markets predict the outcome $Y \in [0, 1]$ under each policy. - 10 traders each observe a signal about the effectiveness of each policy. - The decision-maker commits to choosing the policy with the higher market price.
(a) Implement the simulation with honest traders and verify that the market converges to the correct decision. (b) Introduce one strategic trader who prefers policy $A$ regardless of effectiveness. How does this affect the market? (c) Implement a detection mechanism for this kind of manipulation.
Exercise 20: Cross-Chain Arbitrage
Model a prediction market that exists on two chains with different block times: - Chain A: 12-second block time, 0.1% trading fee - Chain B: 0.4-second block time, 0.3% trading fee
(a) If the prices on the two chains diverge by $\delta$, what is the minimum $\delta$ for cross-chain arbitrage to be profitable (accounting for bridge fees of 0.05%)? (b) Simulate price dynamics on both chains with random information shocks and compute arbitrage opportunities. (c) How does bridge latency (assume 10 minutes) affect the profitability and risk of cross-chain arbitrage?
Research and Design
Exercise 21: AI Safety Market Design
Design a prediction market for the question "Will there be an AI system that can autonomously conduct novel scientific research by 2030?" Address: (a) Resolution criteria: What exactly constitutes "autonomous" scientific research? (b) Oracle design: Who resolves the market, and what evidence do they consider? (c) Liquidity provision: How do you attract informed traders for a long-horizon question? (d) Incentive analysis: What are the manipulation risks, and how do you mitigate them?
Exercise 22: Scientific Replication Market
Design a market for predicting the replication of a specific published psychology experiment. Your design should address: (a) Market structure (binary, continuous, combinatorial?) (b) Participant incentives (who trades and why?) (c) Resolution mechanism (what counts as "replication"?) (d) Timeline and liquidity management (e) Ethical concerns (could the market influence the replication attempt?)
Exercise 23: Privacy-Preserving Market Architecture
Design a complete architecture for a privacy-preserving prediction market. Your design should include: (a) User registration and identity management (anonymous but Sybil-resistant) (b) Trade submission (encrypted, with ZK proofs of validity) (c) Price computation (on encrypted data or via MPC) (d) Settlement and payout (private amounts) (e) Identify the main performance bottlenecks and propose optimizations.
Exercise 24: Composing Multiple Research Directions
Consider a system that combines: - LLM forecasting (Section 41.2) - Privacy preservation (Section 41.4) - Peer prediction (Section 41.6)
Design a system where: (a) LLM-generated forecasts are submitted to a peer prediction mechanism. (b) The mechanism aggregates forecasts from both human and LLM participants. (c) All submissions are privacy-preserving. (d) Analyze the incentive properties of your system. Does privacy help or hurt incentive compatibility?
Exercise 25: Open Problem Analysis
Choose one of the open problems from Section 41.11 and: (a) Write a 500-word problem statement that would be suitable for a research grant proposal. (b) Identify the 3-5 most relevant prior works. (c) Propose a concrete research plan (methodology, timeline, expected results). (d) Identify the main obstacles and how you would address them. (e) Explain the practical impact if the problem were solved.