Appendix A: Mathematical Foundations

This appendix provides a self-contained reference for the mathematical tools and concepts used throughout this book. Readers with a strong quantitative background can treat this as a quick-reference guide; those encountering some of these topics for the first time should work through the relevant sections before tackling the chapters that depend on them.


A.1 Notation Guide

The following table catalogs every mathematical symbol that appears in the main text. Symbols are organized by category for easy lookup.

General Conventions

Symbol Meaning
$x, y, z$ Scalar variables (lowercase italic)
$\mathbf{x}, \mathbf{y}$ Vectors (bold lowercase)
$\mathbf{A}, \mathbf{B}$ Matrices (bold uppercase)
$X, Y$ Random variables (uppercase italic)
$\hat{\theta}$ Estimator or predicted value of $\theta$
$\theta^*$ True or optimal value of $\theta$
$n$ Sample size or number of observations
$i, j, k$ Index variables
$t$ Time index or step

Set Notation

Symbol Meaning
$\{a, b, c\}$ Set containing elements $a$, $b$, $c$
$\in$ Element of (membership)
$\notin$ Not an element of
$\subset$ Proper subset
$\subseteq$ Subset (possibly equal)
$\cup$ Union of sets
$\cap$ Intersection of sets
$\emptyset$ Empty set
$\mathbb{R}$ Set of real numbers
$\mathbb{R}^n$ $n$-dimensional real vector space
$[0, 1]$ Closed interval from 0 to 1
$(a, b)$ Open interval from $a$ to $b$
$\|S\|$ Cardinality (number of elements) of set $S$

Summation, Products, and Limits

Symbol Meaning
$\sum_{i=1}^{n} x_i$ Sum of $x_1 + x_2 + \cdots + x_n$
$\prod_{i=1}^{n} x_i$ Product $x_1 \cdot x_2 \cdots x_n$
$\lim_{x \to a} f(x)$ Limit of $f(x)$ as $x$ approaches $a$
$\inf$, $\sup$ Infimum (greatest lower bound), supremum (least upper bound)
$\arg\max_x f(x)$ Value of $x$ that maximizes $f(x)$
$\arg\min_x f(x)$ Value of $x$ that minimizes $f(x)$

Probability and Statistics

Symbol Meaning
$P(A)$ Probability of event $A$
$P(A \mid B)$ Conditional probability of $A$ given $B$
$\mathbb{E}[X]$ Expected value of random variable $X$
$\text{Var}(X)$ Variance of $X$
$\text{Cov}(X, Y)$ Covariance of $X$ and $Y$
$\sigma$ Standard deviation
$\sigma^2$ Variance
$\mu$ Mean
$\rho$ Correlation coefficient
$f(x)$ Probability density function (continuous)
$p(x)$ Probability mass function (discrete)
$F(x)$ Cumulative distribution function
$\mathcal{N}(\mu, \sigma^2)$ Normal distribution with mean $\mu$ and variance $\sigma^2$
$\text{Bernoulli}(p)$ Bernoulli distribution with success probability $p$
$\text{Beta}(\alpha, \beta)$ Beta distribution with shape parameters $\alpha, \beta$
$\sim$ "is distributed as"
$\overset{d}{\to}$ Convergence in distribution

Calculus and Optimization

Symbol Meaning
$\frac{df}{dx}$ or $f'(x)$ Derivative of $f$ with respect to $x$
$\frac{\partial f}{\partial x}$ Partial derivative of $f$ with respect to $x$
$\nabla f$ Gradient vector of $f$
$\nabla^2 f$ or $\mathbf{H}$ Hessian matrix of $f$
$\int_a^b f(x)\,dx$ Definite integral of $f$ from $a$ to $b$
$\eta$ Learning rate (in gradient descent)
$\lambda$ Regularization parameter

Linear Algebra

Symbol Meaning
$\mathbf{x}^T$ Transpose of vector $\mathbf{x}$
$\mathbf{A}^{-1}$ Inverse of matrix $\mathbf{A}$
$\mathbf{I}$ Identity matrix
$\det(\mathbf{A})$ Determinant of $\mathbf{A}$
$\text{tr}(\mathbf{A})$ Trace of $\mathbf{A}$
$\mathbf{x} \cdot \mathbf{y}$ Dot product of $\mathbf{x}$ and $\mathbf{y}$
$\|\mathbf{x}\|$ Euclidean norm of $\mathbf{x}$

Information Theory

Symbol Meaning
$H(X)$ Shannon entropy of $X$
$H(X \mid Y)$ Conditional entropy of $X$ given $Y$
$D_{KL}(P \| Q)$ Kullback-Leibler divergence from $Q$ to $P$
$I(X; Y)$ Mutual information between $X$ and $Y$
$\log$ Natural logarithm (base $e$) unless stated otherwise
$\log_2$ Logarithm base 2

Prediction Market Specific

Symbol Meaning
$\pi$ Market price / implied probability
$p$ True or estimated probability
$b$ Liquidity parameter (in LMSR)
$C(\mathbf{q})$ Cost function (automated market maker)
$S(p, o)$ Scoring rule evaluated at prediction $p$ and outcome $o$
$f^*$ Optimal Kelly fraction
$W$ Wealth or bankroll
$\text{EV}$ Expected value
$\text{ROI}$ Return on investment

A.2 Probability Fundamentals

Axioms of Probability (Kolmogorov)

Given a sample space $\Omega$ and an event space $\mathcal{F}$, a probability measure $P$ satisfies:

  1. Non-negativity: $P(A) \geq 0$ for all $A \in \mathcal{F}$.
  2. Normalization: $P(\Omega) = 1$.
  3. Countable additivity: For any countable collection of mutually exclusive events $A_1, A_2, \ldots$, we have $P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)$.

Basic Rules

Complement rule:

$$P(A^c) = 1 - P(A)$$

Addition rule:

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

Multiplication rule:

$$P(A \cap B) = P(A) \cdot P(B \mid A)$$

Conditional Probability

The probability of $A$ given that $B$ has occurred is defined as:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0$$

Independence

Two events $A$ and $B$ are independent if and only if:

$$P(A \cap B) = P(A) \cdot P(B)$$

Equivalently, $P(A \mid B) = P(A)$.

Bayes' Theorem

$$P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)}$$

The extended form using the law of total probability in the denominator:

$$P(A_i \mid B) = \frac{P(B \mid A_i) \, P(A_i)}{\sum_{j=1}^{n} P(B \mid A_j) \, P(A_j)}$$

In prediction markets, Bayes' theorem is the engine of rational belief updating. When new information $B$ arrives, a trader updates the prior probability $P(A_i)$ to obtain the posterior $P(A_i \mid B)$. Markets that aggregate many Bayesian updaters tend to converge toward well-calibrated probabilities.

Law of Total Expectation

$$\mathbb{E}[X] = \mathbb{E}\left[\mathbb{E}[X \mid Y]\right]$$

This is used extensively in the analysis of conditional trading strategies and in proving that market prices form martingales under certain conditions.


A.3 Common Probability Distributions

Discrete Distributions

Bernoulli Distribution — $X \sim \text{Bernoulli}(p)$

Property Value
PMF $P(X = k) = p^k (1-p)^{1-k}$ for $k \in \{0, 1\}$
Mean $p$
Variance $p(1-p)$
Market use Models a single binary contract outcome (yes/no)

Binomial Distribution — $X \sim \text{Binomial}(n, p)$

Property Value
PMF $P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$
Mean $np$
Variance $np(1-p)$
Market use Models the number of correct predictions in $n$ independent binary markets

Poisson Distribution — $X \sim \text{Poisson}(\lambda)$

Property Value
PMF $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$
Mean $\lambda$
Variance $\lambda$
Market use Models rare event counts (e.g., number of market-moving events per day)

Continuous Distributions

Uniform Distribution — $X \sim \text{Uniform}(a, b)$

Property Value
PDF $f(x) = \frac{1}{b-a}$ for $x \in [a, b]$
Mean $\frac{a+b}{2}$
Variance $\frac{(b-a)^2}{12}$
Market use Represents maximum-ignorance prior over a bounded range; starting point before calibration

Normal (Gaussian) Distribution — $X \sim \mathcal{N}(\mu, \sigma^2)$

Property Value
PDF $f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$
Mean $\mu$
Variance $\sigma^2$
Market use Models aggregated forecast errors, log-returns, and confidence intervals around point estimates

Beta Distribution — $X \sim \text{Beta}(\alpha, \beta)$

Property Value
PDF $f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}$ where $B$ is the beta function
Mean $\frac{\alpha}{\alpha+\beta}$
Variance $\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
Market use Conjugate prior for Bernoulli outcomes; models uncertainty about the true probability of a binary event

Exponential Distribution — $X \sim \text{Exponential}(\lambda)$

Property Value
PDF $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$
Mean $1/\lambda$
Variance $1/\lambda^2$
Market use Models time between market-moving events; memoryless property relevant to arrival-time markets

A.4 Statistics Review

Point Estimation

A point estimator $\hat{\theta}$ is a function of sample data that produces a single best guess for an unknown parameter $\theta$. Key properties include:

  • Bias: $\text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta$. An estimator is unbiased if $\text{Bias}(\hat{\theta}) = 0$.
  • Consistency: $\hat{\theta}_n \overset{p}{\to} \theta$ as $n \to \infty$.
  • Efficiency: Among unbiased estimators, the one with smallest variance is most efficient.
  • Mean Squared Error: $\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2$.

Maximum Likelihood Estimation (MLE)

Given observations $x_1, \ldots, x_n$ drawn independently from $f(x \mid \theta)$, the likelihood function is:

$$L(\theta) = \prod_{i=1}^{n} f(x_i \mid \theta)$$

The log-likelihood is:

$$\ell(\theta) = \sum_{i=1}^{n} \log f(x_i \mid \theta)$$

The MLE is $\hat{\theta}_{MLE} = \arg\max_\theta \ell(\theta)$. In practice, we solve $\frac{\partial \ell}{\partial \theta} = 0$ and verify the solution is a maximum. MLE is used throughout the book when fitting models to historical market data.

Hypothesis Testing

A hypothesis test evaluates a null hypothesis $H_0$ against an alternative $H_1$:

  1. State $H_0$ and $H_1$.
  2. Choose a significance level $\alpha$ (commonly 0.05).
  3. Compute a test statistic from the data.
  4. Determine the $p$-value: the probability of observing a test statistic at least as extreme as the one computed, assuming $H_0$ is true.
  5. If $p < \alpha$, reject $H_0$.

Type I error ($\alpha$): Rejecting $H_0$ when it is true. Type II error ($\beta$): Failing to reject $H_0$ when it is false. Power $= 1 - \beta$: Probability of correctly rejecting a false $H_0$.

In prediction market analysis, we frequently test whether a trading strategy produces statistically significant returns above zero, or whether two models produce different calibration scores.

Confidence Intervals

A $100(1-\alpha)\%$ confidence interval for a parameter $\theta$ takes the form:

$$\hat{\theta} \pm z_{\alpha/2} \cdot \text{SE}(\hat{\theta})$$

where $z_{\alpha/2}$ is the critical value from the standard normal distribution and $\text{SE}$ is the standard error. For small samples or unknown variance, replace $z$ with the appropriate $t$-distribution critical value.

Linear Regression

The ordinary least squares (OLS) model:

$$y = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$$

where $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I})$. The OLS estimator is:

$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$$

Key diagnostics: $R^2$ measures explained variance; residual plots detect model violations; multicollinearity inflates standard errors.

Logistic Regression

For binary outcomes (directly relevant to binary prediction markets):

$$P(Y = 1 \mid \mathbf{x}) = \frac{1}{1 + e^{-\mathbf{x}^T \boldsymbol{\beta}}}$$

The log-odds (logit) are a linear function of the features:

$$\log \frac{P(Y=1 \mid \mathbf{x})}{1 - P(Y=1 \mid \mathbf{x})} = \mathbf{x}^T \boldsymbol{\beta}$$

Parameters are estimated via maximum likelihood. Logistic regression is the foundational model for probability estimation in Chapters 8 through 11.


A.5 Linear Algebra Essentials

Vectors

A vector $\mathbf{x} \in \mathbb{R}^n$ is an ordered list of $n$ real numbers. The dot product of two vectors:

$$\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i y_i = \|\mathbf{x}\| \|\mathbf{y}\| \cos\theta$$

The Euclidean norm:

$$\|\mathbf{x}\| = \sqrt{\sum_{i=1}^{n} x_i^2}$$

Matrices

A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ has $m$ rows and $n$ columns. Key operations:

  • Multiplication: $(\mathbf{AB})_{ij} = \sum_{k} A_{ik} B_{kj}$. Requires inner dimensions to match.
  • Transpose: $(\mathbf{A}^T)_{ij} = A_{ji}$.
  • Inverse: $\mathbf{A}^{-1}$ exists if and only if $\det(\mathbf{A}) \neq 0$. Satisfies $\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}$.

Eigenvalues and Eigenvectors

If $\mathbf{A}\mathbf{v} = \lambda \mathbf{v}$ for a nonzero vector $\mathbf{v}$, then $\lambda$ is an eigenvalue and $\mathbf{v}$ is the corresponding eigenvector. Eigendecomposition is used in:

  • Principal Component Analysis (PCA): Reducing the dimensionality of feature sets.
  • Covariance matrices: The eigenvalues of the covariance matrix describe the variance captured along each principal direction, which is relevant when analyzing correlated prediction market contracts.

Positive Definiteness

A symmetric matrix $\mathbf{A}$ is positive definite if $\mathbf{x}^T \mathbf{A} \mathbf{x} > 0$ for all nonzero $\mathbf{x}$. This guarantees that a quadratic cost function has a unique minimum, which is essential for the convergence of optimization algorithms discussed in the machine learning chapters.


A.6 Calculus Reference

Derivatives

The derivative $f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$ gives the instantaneous rate of change. Common derivatives:

$f(x)$ $f'(x)$
$x^n$ $nx^{n-1}$
$e^x$ $e^x$
$\ln(x)$ $1/x$
$\log_a(x)$ $1/(x \ln a)$
$\sin(x)$ $\cos(x)$
$1/(1+e^{-x})$ (sigmoid) $\sigma(x)(1-\sigma(x))$

Key Rules

  • Chain rule: $\frac{d}{dx} f(g(x)) = f'(g(x)) \cdot g'(x)$
  • Product rule: $(fg)' = f'g + fg'$
  • Quotient rule: $(f/g)' = (f'g - fg')/g^2$

Partial Derivatives and the Gradient

For a multivariate function $f(\mathbf{x}) = f(x_1, \ldots, x_n)$, the gradient is the vector of all partial derivatives:

$$\nabla f = \left(\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n}\right)$$

The gradient points in the direction of steepest ascent. Gradient descent iterates:

$$\mathbf{x}_{t+1} = \mathbf{x}_t - \eta \nabla f(\mathbf{x}_t)$$

to find a local minimum, where $\eta$ is the learning rate.

Optimization Conditions

For unconstrained optimization of a twice-differentiable function:

  • First-order necessary condition: $\nabla f(\mathbf{x}^*) = \mathbf{0}$.
  • Second-order sufficient condition: $\nabla^2 f(\mathbf{x}^*)$ is positive definite (for a minimum) or negative definite (for a maximum).

Convexity

A function $f$ is convex if for all $\mathbf{x}, \mathbf{y}$ and $\lambda \in [0,1]$:

$$f(\lambda \mathbf{x} + (1-\lambda)\mathbf{y}) \leq \lambda f(\mathbf{x}) + (1-\lambda) f(\mathbf{y})$$

Convex functions have no local minima other than the global minimum. The logarithmic market scoring rule cost function is convex, which guarantees well-behaved automated market makers.


A.7 Information Theory

Shannon Entropy

For a discrete random variable $X$ with possible values $\{x_1, \ldots, x_n\}$:

$$H(X) = -\sum_{i=1}^{n} P(x_i) \log P(x_i)$$

Entropy quantifies the average uncertainty or "surprise" in a random variable. In prediction markets, a high-entropy distribution over outcomes indicates deep uncertainty; a low-entropy distribution suggests the market has strong conviction.

Properties:

  • $H(X) \geq 0$ always.
  • $H(X) = 0$ if and only if one outcome has probability 1.
  • $H(X)$ is maximized when all outcomes are equally likely: $H(X) = \log n$.

Conditional Entropy

$$H(X \mid Y) = -\sum_{y} P(y) \sum_{x} P(x \mid y) \log P(x \mid y)$$

This measures the remaining uncertainty in $X$ after observing $Y$. If $Y$ is perfectly informative about $X$, then $H(X \mid Y) = 0$.

Kullback-Leibler Divergence

$$D_{KL}(P \| Q) = \sum_{i} P(x_i) \log \frac{P(x_i)}{Q(x_i)}$$

KL divergence measures the "information lost" when distribution $Q$ is used to approximate distribution $P$. Key properties:

  • $D_{KL}(P \| Q) \geq 0$ (Gibbs' inequality), with equality if and only if $P = Q$.
  • It is not symmetric: $D_{KL}(P \| Q) \neq D_{KL}(Q \| P)$ in general.
  • In prediction markets, minimizing KL divergence between a model's predicted distribution and the empirical outcome distribution is equivalent to maximizing the log scoring rule.

Mutual Information

$$I(X; Y) = H(X) - H(X \mid Y) = D_{KL}\left(P(X, Y) \| P(X)P(Y)\right)$$

Mutual information quantifies how much knowing $Y$ reduces uncertainty about $X$. This is used to measure how much information a signal (polling data, economic indicator) carries about a market outcome, and therefore to assess whether incorporating that signal into a trading model is worthwhile.

Connection to Scoring Rules

The logarithmic scoring rule $S_{\log}(p, o) = \log p_o$ is directly related to entropy. The expected score under the true distribution $P$ is:

$$\mathbb{E}_P[S_{\log}(q, O)] = -H(P) - D_{KL}(P \| Q)$$

Because KL divergence is non-negative, the expected log score is maximized when the forecaster reports $Q = P$. This proves that the logarithmic scoring rule is strictly proper, which is a central result used in Chapters 3 and 4.


A.8 Key Proofs and Derivations

A.8.1 Properness of the Brier Score

The Brier score for a binary event is:

$$BS(p, o) = (p - o)^2$$

where $p$ is the reported probability and $o \in \{0, 1\}$ is the outcome. The expected Brier score when the true probability is $q$ is:

$$\mathbb{E}_q[BS(p, O)] = q(p - 1)^2 + (1-q)(p - 0)^2 = qp^2 - 2qp + q + (1-q)p^2$$

$$= p^2 - 2qp + q$$

To minimize, take the derivative with respect to $p$ and set it to zero:

$$\frac{d}{dp}\mathbb{E}_q[BS] = 2p - 2q = 0 \implies p = q$$

The second derivative is $2 > 0$, confirming a minimum. Therefore the Brier score is minimized (optimal) when the reported probability equals the true probability, proving it is a strictly proper scoring rule. Forecasters in prediction markets that use Brier scoring have no incentive to misreport their beliefs.

A.8.2 Kelly Criterion Derivation

Suppose a bettor has wealth $W$ and faces a binary bet that pays even money ($b = 1$). The true probability of winning is $p$ and of losing is $q = 1-p$. The bettor wagers a fraction $f$ of wealth. After one bet, wealth becomes:

  • $W(1 + f)$ with probability $p$
  • $W(1 - f)$ with probability $q$

To maximize long-run geometric growth rate, we maximize the expected log-wealth:

$$G(f) = p \log(1 + f) + q \log(1 - f)$$

Taking the derivative:

$$G'(f) = \frac{p}{1+f} - \frac{q}{1-f} = 0$$

Solving:

$$p(1-f) = q(1+f)$$ $$p - pf = q + qf$$ $$p - q = f(p + q) = f$$

Therefore the optimal fraction is:

$$f^* = p - q = 2p - 1$$

For general odds $b$ (net payout per unit wagered), the Kelly fraction generalizes to:

$$f^* = \frac{bp - q}{b} = p - \frac{q}{b} = \frac{bp - (1-p)}{b}$$

The Kelly criterion maximizes the expected logarithm of wealth, which is equivalent to maximizing the long-run compound growth rate. This is why "full Kelly" produces the fastest asymptotic wealth growth but can entail large drawdowns. The "fractional Kelly" approach (using $\alpha f^*$ for $\alpha \in (0, 1)$) trades some growth rate for reduced variance, as discussed in Chapter 5.

A.8.3 LMSR Cost Function Derivation

Hanson's Logarithmic Market Scoring Rule (LMSR) defines the cost function for $n$ outcomes as:

$$C(\mathbf{q}) = b \log\left(\sum_{i=1}^{n} e^{q_i / b}\right)$$

where $q_i$ is the total number of shares of outcome $i$ that have been purchased and $b > 0$ is the liquidity parameter.

Price function. The price (instantaneous cost) of buying a marginal share of outcome $j$ is:

$$p_j = \frac{\partial C}{\partial q_j} = \frac{e^{q_j / b}}{\sum_{i=1}^{n} e^{q_i / b}}$$

Note that this is exactly the softmax function. This ensures that prices are always in $(0, 1)$ and sum to 1, so they can be interpreted as probabilities.

Cost of a trade. A trader who changes the share vector from $\mathbf{q}$ to $\mathbf{q}'$ pays:

$$\text{Cost} = C(\mathbf{q}') - C(\mathbf{q})$$

Bounded loss. The maximum loss to the market maker is:

$$C(\mathbf{q}) - \max_j q_j \leq b \log n$$

This bounded loss property is what makes automated market makers economically viable. The parameter $b$ controls the tradeoff: larger $b$ means greater liquidity (lower price impact per trade) but higher potential loss.

Convexity proof. The Hessian of $C$ is:

$$\frac{\partial^2 C}{\partial q_j \partial q_k} = \frac{1}{b}\left(\delta_{jk} p_j - p_j p_k\right)$$

This is $\frac{1}{b}(\text{diag}(\mathbf{p}) - \mathbf{p}\mathbf{p}^T)$, which is the covariance matrix of a categorical distribution scaled by $1/b$, and is therefore positive semidefinite. Hence $C$ is convex, guaranteeing that there is no arbitrage in the pricing mechanism.

A.8.4 Log Score Properness (Continuous Case)

For a density forecast $q(x)$ when the true density is $p(x)$, the expected log score is:

$$\mathbb{E}_p[\log q(X)] = \int p(x) \log q(x) \, dx$$

We can write:

$$\mathbb{E}_p[\log q(X)] = \int p(x) \log p(x) \, dx - \int p(x) \log \frac{p(x)}{q(x)} \, dx$$

$$= -H(p) - D_{KL}(p \| q)$$

Since $D_{KL}(p \| q) \geq 0$ with equality if and only if $q = p$ almost everywhere, the expected log score is uniquely maximized at $q = p$. This proves strict properness: a forecaster maximizes their expected log score by reporting their true beliefs.

A.8.5 Martingale Property of Market Prices

Under the efficient market hypothesis, prediction market prices form a martingale with respect to the public information filtration. Formally, if $\pi_t$ is the market price at time $t$:

$$\mathbb{E}[\pi_{t+1} \mid \mathcal{F}_t] = \pi_t$$

Proof sketch: If $\mathbb{E}[\pi_{t+1} \mid \mathcal{F}_t] > \pi_t$, a risk-neutral trader could profit in expectation by buying at $\pi_t$, driving the price up until the inequality vanishes. The symmetric argument holds for the case where the conditional expectation is less than the current price. In equilibrium, no predictable profit opportunity exists and the martingale condition holds.

This result is important because it implies that in an efficient prediction market, price changes are unpredictable. Any predictable pattern in prices represents an exploitable inefficiency, which is exactly what the trading strategies in Part III of this book attempt to identify and profit from.


Summary. This appendix collects the mathematical prerequisites for the entire book. Readers should be comfortable with the material in Sections A.2 through A.4 before beginning Part I, with Section A.5 and A.6 before Part III's machine learning chapters, and with Section A.7 for a deeper understanding of scoring rules. The proofs in Section A.8 provide rigorous foundations for the most important theoretical results invoked in the main text.