Chapter 11 Exercises: Information Aggregation Theory


Part A: Conceptual Foundations (Exercises 1-6)

Exercise 1: Hayek's Information Argument

Suppose there are three traders in a prediction market on whether a new drug will receive FDA approval:

  • Trader A is a biochemist who has read the Phase III trial results (publicly available but technically dense). She estimates the approval probability at 0.80.
  • Trader B is a regulatory consultant who knows the FDA panel composition and their historical voting patterns. He estimates 0.60.
  • Trader C is a pharmaceutical industry analyst who knows the company's manufacturing readiness and has heard rumors about a competitor's drug. She estimates 0.70.

(a) Explain how a prediction market can produce a price that is more accurate than any individual trader's estimate, even though no single trader has access to all the relevant information.

(b) What would the simple average of the three estimates be? Under what conditions would the market price differ from this simple average?

(c) Identify which of Surowiecki's four conditions are satisfied in this scenario and which might be violated.


Exercise 2: Efficiency and Martingales

A prediction market for a binary event has prices at the following times:

Time Price
t=0 0.50
t=1 0.52
t=2 0.48
t=3 0.55
t=4 0.53
t=5 0.58

(a) Calculate the price changes (returns) for each period.

(b) Compute the sample autocorrelation at lag 1 for the returns. Does the result suggest weak-form efficiency or inefficiency?

(c) If you observed that prices consistently drifted upward on Mondays, what form of the EMH would this violate? How would you expect the market to correct this anomaly?


Exercise 3: Forms of EMH

For each of the following scenarios, identify which form of the EMH (weak, semi-strong, or strong) is violated:

(a) A trader notices that in a political prediction market, prices always drop on Fridays and recover on Mondays. They profitably exploit this pattern.

(b) A trader reads a publicly available poll showing a candidate leading by 10 points. The market price is 0.50. They buy and profit when the price rises to 0.65.

(c) A campaign insider knows that a major endorsement will be announced tomorrow. They buy contracts today and profit when the announcement moves the market.

(d) A sophisticated trader builds a model that combines publicly available polling data, economic indicators, and historical patterns. Their model consistently outperforms the market price.


Exercise 4: No-Trade Theorem Intuition

(a) Explain the Milgrom-Stokey No-Trade Theorem in your own words. Why is it surprising?

(b) Suppose you hold a prediction market contract priced at $0.60, and another trader offers to buy it from you at $0.65. Under the No-Trade Theorem assumptions, should you sell? Why or why not?

(c) List three real-world reasons why prediction markets have active trading despite the No-Trade Theorem. For each reason, identify which assumption of the theorem it violates.


Exercise 5: Crowd Wisdom Mathematics

A group of 400 analysts each independently estimate the probability that a company will beat its quarterly earnings forecast. The true probability is $\theta = 0.62$. Each analyst's estimate has a bias of 0 and a standard deviation of $\sigma = 0.20$.

(a) What is the standard deviation of the average of all 400 estimates?

(b) If the analysts' errors have a pairwise correlation of $\rho = 0.10$, what is the standard deviation of the average?

(c) How many independent analysts would you need to achieve the same standard deviation as 400 analysts with $\rho = 0.10$?

(d) What does this tell you about the relative importance of adding more analysts versus reducing correlation?


Exercise 6: Information Cascades

Consider the Banerjee cascade model with signal quality $q = 0.7$ (each person's signal is correct with probability 0.7).

(a) Person 1 gets signal "event will occur" and buys. Person 2 gets signal "event will NOT occur." What should Person 2 do? Show the Bayesian calculation.

(b) If Person 2 also buys (against their signal), what should Person 3 do if their signal says "event will NOT occur"? Show the calculation.

(c) At what signal quality $q$ would it take three consecutive agreeing actions (instead of two) to start a cascade? Solve for the threshold.


Part B: Mathematical Analysis (Exercises 7-12)

Exercise 7: Bayesian Information Aggregation

Two traders in a prediction market each have a private signal about whether event $E$ will occur. The prior probability is $P(E) = 0.5$.

  • Trader A's signal is correct with probability 0.8.
  • Trader B's signal is correct with probability 0.6.

Both receive signals indicating $E$ will occur.

(a) If only Trader A's information were in the market, what should the price be?

(b) If both traders' information were fully reflected, what should the price be? (Assume signals are conditionally independent given the true state.)

(c) In a sequential trading model, Trader A trades first and moves the price to the answer from (a). Then Trader B arrives and observes this price. What should Trader B's posterior be after observing the price and their own signal? Does the final price match your answer from (b)?


Exercise 8: Variance Decomposition

The mean squared error of the crowd estimate can be decomposed as:

$$\text{MSE}(\bar{\theta}) = \text{Bias}^2 + \text{Variance}$$

Consider a crowd of $N = 100$ estimators. Each has bias $\beta$ and individual variance $\sigma^2 = 0.04$, with pairwise correlation $\rho$.

(a) Derive the expression for $\text{MSE}(\bar{\theta})$ in terms of $\beta$, $\sigma^2$, $N$, and $\rho$.

(b) Calculate $\text{MSE}(\bar{\theta})$ for $\rho = 0, 0.1, 0.5$ when $\beta = 0$.

(c) Calculate $\text{MSE}(\bar{\theta})$ for $\beta = 0, 0.05, 0.10$ when $\rho = 0$.

(d) Which is more damaging to crowd accuracy: a correlation of 0.3 with zero bias, or zero correlation with a bias of 0.05? Compute both and compare.


Exercise 9: Marginal Trader Effectiveness

A prediction market has $N = 500$ traders. Of these, $n_I = 25$ are informed (they know the true probability is 0.70 with noise $\sigma_I = 0.05$) and $n_N = 475$ are noise traders (their estimates are uniformly distributed on [0, 1]).

(a) If the market price were a simple average of all traders' estimates, what would the expected price be?

(b) Now suppose informed traders trade with intensity proportional to the perceived mispricing ($d_i = \gamma(v_i - p)$ with $\gamma = 5$) while noise traders trade with fixed intensity ($d_j = \epsilon_j$ where $\epsilon_j \sim N(0, 1)$). In the price-impact model $\Delta p = \lambda \sum d_i$, derive the equilibrium price.

(c) Compare the effective weight of informed traders in parts (a) and (b). What accounts for the difference?


Exercise 10: Scoring Rule Connection

Consider an LMSR market maker with cost function $C(\mathbf{q}) = b \ln(\sum_i e^{q_i/b})$ for a binary event with current quantities $q_1$ (yes) and $q_2$ (no).

(a) Show that the current price of the "yes" contract is $p_1 = \frac{e^{q_1/b}}{e^{q_1/b} + e^{q_2/b}}$.

(b) Suppose a trader buys $\Delta$ units of the "yes" contract. Show that the cost is $C(q_1 + \Delta, q_2) - C(q_1, q_2)$.

(c) How does the parameter $b$ affect the ease of manipulation? Specifically, what is the cost to move the price from 0.5 to 0.9 as a function of $b$?

(d) Explain the trade-off the market designer faces when choosing $b$.


Exercise 11: Cascade Breaking

In the cascade model from Section 11.6, suppose after an incorrect cascade has formed (everyone is buying, but the true state is $\theta = 0$), a new participant arrives with signal quality $q' > q$ (a more accurate signal).

(a) Derive the minimum signal quality $q'$ needed for this participant to break the cascade, as a function of $q$ and the number of agents $k$ already in the cascade.

(b) For $q = 0.6$ and $k = 5$, what is the minimum $q'$?

(c) How does this analysis change if the new participant is in a prediction market rather than a sequential decision-making setting? Why?


Exercise 12: Risk-Neutral vs. True Probabilities

In standard financial theory, EMH implies prices equal risk-neutral probabilities, which may differ from true (physical) probabilities.

(a) Explain why the risk premium is typically small in prediction markets compared to stock markets.

(b) A prediction market contract on "Will GDP growth exceed 3%?" is priced at 0.25. If traders are risk-averse and GDP growth above 3% is correlated with portfolio value, would the true probability be higher or lower than 0.25? Explain.

(c) For a prediction market contract on "Will it rain tomorrow?", would you expect the risk-neutral and true probabilities to be approximately equal? Why or why not?


Part C: Python Programming (Exercises 13-18)

Exercise 13: Implement a Calibration Checker

Write a Python function check_calibration(predictions, outcomes, n_bins=10) that:

  1. Takes arrays of predicted probabilities and binary outcomes.
  2. Groups predictions into n_bins equally-spaced bins from 0 to 1.
  3. For each bin, computes the mean predicted probability and the realized frequency.
  4. Computes the Expected Calibration Error (ECE): $\text{ECE} = \sum_{b=1}^{B} \frac{n_b}{N} |p_b - o_b|$
  5. Returns a dictionary with bin-level data and the ECE.

Test your function with: (a) perfectly calibrated synthetic data, (b) overconfident predictions (probabilities pushed toward extremes), and (c) underconfident predictions (probabilities pushed toward 0.5).


Exercise 14: Autocorrelation Analysis

Write a Python function that:

  1. Generates synthetic prediction market price data with a known autocorrelation structure.
  2. Implements the Ljung-Box test for autocorrelation.
  3. Runs the test on: (a) prices generated from a random walk (should pass efficiency test), (b) prices generated from an AR(1) process with $\phi = 0.3$ (should fail), and (c) real-looking prediction market data where prices respond to "news events."

Report the test statistics and p-values. Discuss what the results imply about market efficiency.


Exercise 15: Diversity vs. Size Simulation

Write a simulation that explores the trade-off between crowd size and diversity:

  1. Create a crowd of $N$ estimators with pairwise correlation $\rho$.
  2. Compute the mean absolute error (MAE) of the crowd average.
  3. Plot MAE as a function of $N$ for $\rho \in \{0, 0.05, 0.1, 0.2, 0.5\}$.
  4. For each $\rho$, find the $N$ at which adding more estimators yields less than 1% improvement in MAE.
  5. Plot MAE as a function of $\rho$ for fixed $N = 100$.

What is the practical lesson for prediction market design?


Exercise 16: Marginal Trader Simulation

Extend the marginal trader simulation from Section 11.5 to explore:

  1. How market accuracy varies with the fraction of informed traders (from 1% to 50%).
  2. How market accuracy varies with the noise level of informed traders' signals ($\sigma_I$ from 0.01 to 0.30).
  3. How market accuracy varies with the intensity of noise trading.
  4. Create a 3D surface plot showing accuracy as a function of (informed fraction, signal noise).

Identify the "minimum viable" informed trader population for different accuracy targets.


Exercise 17: Cascade Probability Calculator

Write a program that:

  1. Analytically computes the probability of an incorrect cascade forming as a function of signal quality $q$ and group size $N$.
  2. Simulates the cascade process 10,000 times and compares simulation results to the analytical calculation.
  3. Extends the analysis to a prediction market setting and compares cascade probabilities.
  4. Plots the probability of incorrect outcomes under both mechanisms.

Exercise 18: ABM Extension — Market Maker Comparison

Extend the ABM from Section 11.7 to compare two market-making mechanisms:

  1. Continuous double auction (CDA): Agents submit limit orders; trades execute when bid >= ask.
  2. LMSR market maker: Agents trade against an automated market maker.

For each mechanism, measure: - Speed of price convergence to the true probability - Final price accuracy - Total volume and number of trades - Profit distribution across agent types

Run 100 simulations for each mechanism and produce a statistical comparison.


Part D: Analysis and Critical Thinking (Exercises 19-24)

Exercise 19: The 2016 U.S. Election

In the 2016 U.S. presidential election, most prediction markets gave Hillary Clinton a 70-85% probability of winning on election day. She lost.

(a) Does this outcome prove that prediction markets "failed"? Explain using the concept of calibration.

(b) What information might the markets have failed to aggregate? Consider the four conditions for crowd wisdom.

(c) If you had 100 elections where the prediction market gave the Democrat a 75% chance of winning, how many would you expect the Democrat to lose? Does the 2016 result fall within the expected range?

(d) Compare the prediction market performance in 2016 to the performance of polls, statistical models (like FiveThirtyEight), and expert judgment. Which performed best? Which performed worst?


Exercise 20: Manipulation Scenario Analysis

A wealthy individual wants to manipulate a prediction market to make it appear that a ballot initiative will fail (currently priced at 0.60, they want to push it below 0.40).

(a) In an LMSR market with $b = 100$, how much would they need to spend to move the price from 0.60 to 0.40?

(b) If there are 50 informed traders who believe the true probability is 0.60, and each has $500 to trade, how quickly would the price revert?

(c) Under what conditions could this manipulation succeed for an extended period? Be specific.

(d) Propose a market design modification that would make this manipulation harder.


Exercise 21: Comparing Aggregation Mechanisms

Compare prediction markets to three alternative information aggregation mechanisms:

  1. Delphi method: Iterative expert surveys with feedback.
  2. Prediction tournaments: Individual forecasters compete on a common set of questions.
  3. Simple polling average: The average of multiple opinion polls.

For each comparison, discuss: (a) Information diversity: Which mechanism captures a wider range of information? (b) Independence: Which mechanism better preserves independence of judgment? (c) Incentive alignment: Which mechanism provides stronger incentives for accuracy? (d) Practical limitations: What are the practical challenges of each?


Exercise 22: Thin Market Problem

A company wants to use an internal prediction market to forecast quarterly sales. The company has 200 employees who might participate.

(a) What problems might arise from having too few traders? List at least four.

(b) Of the 200 employees, estimate how many would be "informed" (have private information relevant to the forecast) and how many would be "noise" traders. Justify your estimate.

(c) Propose three specific design choices to maximize information aggregation in this small market.

(d) At what point would you recommend against using a prediction market and suggest an alternative mechanism instead?


Exercise 23: Conditional Markets Design Challenge

A government wants to use prediction markets to evaluate two climate policies: a carbon tax (Policy A) and a cap-and-trade system (Policy B).

(a) Design a set of conditional prediction markets that would help evaluate these policies. Specify the exact contracts you would create.

(b) What is the "conditional on action" interpretation of these markets? How would the government use the market prices to make a decision?

(c) Identify three potential problems with this approach and propose mitigations for each.

(d) What is the minimum liquidity you would need in each market for the prices to be informative? How would you ensure this liquidity?


Exercise 24: Agent-Based Model Critique

(a) List three assumptions of the ABM in Section 11.7 that are unrealistic. For each, explain how a more realistic assumption might change the results.

(b) The ABM assumes agents have fixed types (fundamentalist, noise, chartist). In reality, the same person might behave as a fundamentalist on some questions and a noise trader on others. How would you modify the ABM to capture this?

(c) ABMs can produce a wide variety of outcomes depending on parameter choices. How would you validate an ABM of a prediction market against real-world data? What metrics would you compare?

(d) Design an experiment (using simulation) to test whether your ABM produces realistic-looking price paths. Describe the statistical tests you would use.


Part E: Advanced and Research-Oriented (Exercises 25-30)

Exercise 25: Proving Convergence

Consider a prediction market with $N$ informed traders who each observe a signal $s_i = \theta + \epsilon_i$, where $\epsilon_i \sim N(0, \sigma^2)$ and $\theta \in [0, 1]$ is the true probability.

(a) In a competitive equilibrium where the market price equals the average signal, prove that $p_N \to \theta$ in probability as $N \to \infty$.

(b) Derive the rate of convergence. How does the error scale with $N$?

(c) Now suppose there are also $M$ noise traders whose signals are drawn from $U(0, 1)$. If $M = cN$ for some constant $c > 0$, does the market price still converge to $\theta$? Prove or disprove.

(d) What if $M = cN^2$ (noise traders grow much faster than informed)? Does convergence still hold?


Exercise 26: Endogenous Information Acquisition

In the models we have studied, traders have exogenously given private information. In reality, traders choose how much information to acquire, at a cost.

(a) Model a prediction market where each trader can pay a cost $c$ to acquire a signal of quality $q(c) = \sqrt{c}$. The trader's expected profit from having a signal of quality $q$ when the current market error is $e$ is approximately $q \cdot e$. Derive the optimal information acquisition $c^*$ as a function of $e$.

(b) Show that in equilibrium, the market error $e^*$ is positive (the market is never perfectly efficient when information is costly). This is the Grossman-Stiglitz (1980) paradox.

(c) How does the equilibrium market accuracy depend on the cost of information $c$? Sketch the relationship.

(d) What policy implications does this have for prediction market design? Should information acquisition be subsidized?


Exercise 27: Multi-Dimensional Information Aggregation

Most prediction markets focus on a single binary question. But real-world decisions often involve multiple correlated uncertainties.

(a) Consider three binary events: A (inflation rises), B (interest rates rise), C (stock market falls). Write down the full joint probability distribution (8 probabilities). How many independent contracts would you need to fully specify this distribution?

(b) Design a set of prediction market contracts that would allow traders to express beliefs about any conditional probability $P(X | Y)$ for $X, Y \in \{A, B, C\}$.

(c) Prove that a standard market with separate contracts for A, B, and C cannot capture the full joint distribution unless traders can also trade conditional contracts.

(d) Implement a Python simulation of a combinatorial market with three binary events. Show that it converges to the true joint distribution when informed traders participate.


Exercise 28: Social Learning and Market Dynamics

(a) Implement a model where traders learn from each other as well as from prices. Each trader has a social network, and they update their beliefs based on the beliefs of their neighbors (before trading).

(b) Show that when social learning is strong, market accuracy can decrease because it reduces the effective diversity of opinions (violating Surowiecki's independence condition).

(c) Find the optimal "social learning rate" that balances the benefits of information sharing with the costs of reduced diversity.

(d) Relate your findings to the real-world phenomenon of "echo chambers" and their potential impact on prediction market accuracy.


Exercise 29: Mechanism Comparison Experiment

Design and implement a comprehensive simulation experiment comparing four information aggregation mechanisms:

  1. Simple average of all estimates
  2. Prediction market (continuous double auction)
  3. LMSR market maker
  4. Weighted average (where weights are based on past accuracy)

For each mechanism: (a) Implement the mechanism in Python. (b) Test with 100, 500, and 2000 agents. (c) Vary the fraction of informed agents from 5% to 50%. (d) Measure accuracy (Brier score), convergence speed, and robustness to manipulation attempts. (e) Present results in a clear comparison table and identify which mechanism performs best under which conditions.


Exercise 30: Research Proposal

Write a 500-word research proposal investigating an open question in information aggregation theory for prediction markets. Your proposal should:

(a) Clearly state the research question.

(b) Explain why the question is important, referencing specific concepts from this chapter.

(c) Describe the methodology you would use (theoretical, empirical, or simulation-based).

(d) Identify what data you would need and how you would obtain it.

(e) Describe the expected contribution and how it would advance the field.

Possible topics include (but are not limited to): - How does social media affect information aggregation in prediction markets? - Can we design markets that are robust to coordinated manipulation by AI agents? - What is the optimal market structure for aggregating information about low-probability, high-impact events (tail risks)? - How do position limits affect the marginal trader hypothesis? - Can prediction markets aggregate information about events where no single trader has relevant expertise?