Chapter 20 — Key Takeaways

Discrete Probability: Outcomes, Random Variables, Expectation. One-page review card. Reread before the exam.


The model in four objects

Object What it is Symbol
Sample space set of all mutually-exclusive, exhaustive outcomes $S$
Outcome one element of $S$ (exactly one occurs) $s \in S$
Event a subset of outcomes; occurs when the outcome lies in it $E \subseteq S$
Probability function $P(E) = \sum_{s\in E} p(s)$, with $p(s)\ge 0$, $\sum_s p(s)=1$ $P$

Events are sets → use Chapter 8 operations: $E\cup F$ ("or"), $E\cap F$ ("and"), $\overline{E}$ ("not"), $E\cap F=\emptyset$ ("mutually exclusive").


Computing $P(E)$

Model Formula When it applies
Equally-likely (Laplace) $P(E)=\dfrac{\lvert E\rvert}{\lvert S\rvert}=\dfrac{\#\text{favorable}}{\#\text{possible}}$ only if all outcomes equally likely
General discrete $P(E)=\sum_{s\in E} p(s)$ always

Laplace turns every probability into two counting problems (Ch. 15–16).


Axioms & their consequences

Rule Statement
Non-negativity $P(E)\ge 0$
Normalization $P(S)=1$
Additivity (disjoint) $E\cap F=\emptyset \Rightarrow P(E\cup F)=P(E)+P(F)$
Complement $P(\overline{E})=1-P(E)$
Impossible event $P(\emptyset)=0$
Monotonicity $E\subseteq F \Rightarrow P(E)\le P(F)$
Inclusion–exclusion $P(E\cup F)=P(E)+P(F)-P(E\cap F)$
Union bound $P\!\left(\bigcup_i E_i\right)\le \sum_i P(E_i)$

Complement first: for "at least one…", compute $1-P(\text{none})$. Birthday: $P(\text{share})=1-\dfrac{P(365,n)}{365^n}$ → crosses $1/2$ at $n=23$. Same math = hash collisions (collisions likely at $n\approx\sqrt{m}$).


Random variables

Term Definition
Random variable a function $X\colon S\to\mathbb{R}$ (not random, not a variable)
Event $\{X=a\}$ $\{s\in S: X(s)=a\}$; has probability $P(X=a)$
Distribution the values of $X$ with their probabilities (must sum to 1)
Indicator $\mathbf{1}_A$ $1$ if $s\in A$, else $0$; switches on when $A$ occurs

Build a distribution: apply $X$ to every outcome, group by value (Counter).


Expected value

$$E[X]=\sum_{s\in S} X(s)\,p(s)=\sum_a a\,P(X=a)\quad(\text{long-run average / balance point})$$

Fact Statement
Indicator expectation $E[\mathbf{1}_A]=P(A)$
Linearity $E[aX+bY]=aE[X]+bE[Y]$, and $E[\sum_i X_i]=\sum_i E[X_i]$
Independence needed? NO — linearity holds even for dependent variables

The one move to remember: to find a hard $E[X]$, write $X=\sum_i \mathbf{1}_{A_i}$ (a sum of indicators), then $E[X]=\sum_i P(A_i)$. No joint distribution required.

Linearity worked examples

Problem Setup Answer
Hat-check (fixed points of random perm.) $X_i=\mathbf{1}[\text{own hat}]$, $E[X_i]=1/n$ $E[X]=1$ (any $n$)
Heads in $n$ fair flips $E[X_i]=1/2$ $E[X]=n/2$
Empty hash buckets ($n$ keys, $m$ buckets) $P(\text{bucket empty})=(1-\tfrac1m)^n$ $m(1-\tfrac1m)^n\approx m/e$ at $n=m$

Variance & standard deviation

$$\operatorname{Var}(X)=E[(X-\mu)^2]=\underbrace{E[X^2]-(E[X])^2}_{\textbf{use this}},\qquad \sigma=\sqrt{\operatorname{Var}(X)}$$

  • Measures spread around the mean; squaring removes signs and punishes large deviations.
  • $\operatorname{Var}$ is in squared units; report $\sigma$ (original units).
  • $\operatorname{Var}\ge 0$ always; $\operatorname{Var}=0$ iff $X$ is constant.
  • Fair die: $E[X]=\tfrac72$, $E[X^2]=\tfrac{91}{6}$, $\operatorname{Var}=\tfrac{35}{12}\approx 2.92$, $\sigma\approx 1.71$.

Simulation vs. proof (theme 4)

Monte-Carlo simulation Proof
Gives an estimate $\hat p=\text{hits}/k$ the exact value
Cost / error cheap; error $\sim 1/\sqrt{k}$ (×100 trials per extra digit) one derivation, then certain
Guarantee none (seed-dependent) settles all cases
Basis law of large numbers axioms, counting, linearity

Rule: simulate to find/check; prove to settle. "Ran it a million times" ≠ proof.


The probabilistic method

To prove an object with property $Q$ exists: pick a random object and show $$P(\text{random object has } Q) > 0 \;\Rightarrow\; \text{such an object exists.}$$ (Proof by contradiction: if none existed, that probability would be $0$.)

Recipe (two-coloring template): 1. Randomize (color each item by a fair coin). 2. Bound $P(\text{one bad event})$ (one block monochromatic: $2^{1-k}$). 3. Union bound over the $m$ bad events: $P(\text{any bad})\le m\,2^{1-k}$. 4. If that bound $<1$ (here $m<2^{k-1}$), then $P(\text{good})>0$ → a good object exists. $\blacksquare$

Averaging form: if $E[X]=\mu$, some outcome has $X\ge\mu$ and some has $X\le\mu$.


Common pitfalls

Pitfall Fix
Using $P=\lvert E\rvert/\lvert S\rvert$ on a non-equally-likely space (e.g. $\{$0,1,2 heads$\}$) Build an equally-likely space first ($\{HH,HT,TH,TT\}$).
Computing "at least one" directly Use the complement: $1-P(\text{none})$.
$E[XY]=E[X]E[Y]$ in general False unless independent (Ch. 21); linearity is for sums, not products.
$E[X^2]=(E[X])^2$ False — the gap is $\operatorname{Var}(X)$.
"Simulation said so" = proof Evidence only; derive the exact value.
Forgetting linearity needs no independence It always holds — exploit it.

Toolkit additions (dmtoolkit/probability.py)

Function Returns Idea
simulate(trial_fn, k) empirical $\hat p$ = fraction of $k$ trials where trial_fn() is True law of large numbers (estimate)
expected_value(rv, space, weight=None) exact $E[\text{rv}]$ over a finite space (uniform by default) $\sum_s X(s)p(s)$ with Fraction (exact)

(Chapter 21 adds bayes(pa, pb_given_a, pb).) simulateexpected_value of an indicator — the two sit on opposite sides of the estimate-vs-exact line.


The two ideas to remember

  1. A random variable is a function; its expectation is a weighted average — and expectation is linear, independence or not. Write hard expectations as sums of indicators.
  2. Positive probability proves existence. The probabilistic method turns a counting/existence question into one expectation or union-bound calculation.