Key Takeaways: Probability — The Foundation of Inference

One-Sentence Summary

Probability provides the mathematical language for quantifying uncertainty, using a small set of rules — complement, addition, and multiplication — that underpin every statistical inference, AI prediction, and data-driven decision that follows.

Core Concepts at a Glance

Concept Definition Why It Matters
Probability A number between 0 and 1 measuring how likely an event is to occur The foundation for all statistical inference
Three approaches Classical (equally likely outcomes), relative frequency (data-driven), subjective (expert judgment) Different situations call for different approaches; all follow the same rules
Law of large numbers As trials increase, observed proportions approach the true probability Why more data gives better estimates; why casinos always win long-term
Complement rule P(not A) = 1 − P(A) Turns hard "at least one" problems into easy "none" problems
Contingency tables Two-way tables showing frequencies for combinations of categorical variables The bridge from data to probability; the format for joint and marginal probabilities

The Three Approaches to Probability

Approach Formula / Method Best For Example
Classical $P(A) = \frac{\text{favorable outcomes}}{\text{total equally likely outcomes}}$ Games of chance, simple random processes Rolling dice, drawing cards
Relative Frequency $P(A) \approx \frac{\text{times A occurred}}{\text{total trials}}$ Situations with historical data Shooting percentages, defect rates
Subjective Expert assessment based on evidence and judgment One-time events, complex predictions Election forecasts, outbreak risk

Probability Rules Quick Reference

Rule 1: Boundaries

$$0 \leq P(A) \leq 1$$

  • P(A) = 0 means impossible
  • P(A) = 1 means certain

Rule 2: All Outcomes Sum to 1

$$\sum P(\text{all outcomes}) = 1$$

Rule 3: Complement

$$\boxed{P(\text{not } A) = 1 - P(A)}$$

When to use: When calculating "at least one" or "not A" is easier than calculating P(A) directly.

Rule 4: Addition Rule

Mutually exclusive events (no overlap): $$P(A \text{ or } B) = P(A) + P(B)$$

General (any events): $$\boxed{P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)}$$

When to use: Any time you need P(A or B). Always subtract the overlap unless you know the events are mutually exclusive.

Rule 5: Multiplication Rule (Independent Events)

$$\boxed{P(A \text{ and } B) = P(A) \times P(B)}$$

When to use: When events are independent (one doesn't affect the other). Always verify independence before using this rule.

Decision Guide: Which Rule Do I Need?

What probability are you calculating?
│
├── P(not A)?
│   └── COMPLEMENT RULE: P(not A) = 1 − P(A)
│
├── P(A OR B)?
│   ├── Are A and B mutually exclusive?
│   │   ├── YES → P(A or B) = P(A) + P(B)
│   │   └── NO  → P(A or B) = P(A) + P(B) − P(A and B)
│   └── TIP: If you have a contingency table, count directly
│       and divide by the grand total to verify
│
├── P(A AND B)?
│   ├── Are A and B independent?
│   │   ├── YES → P(A and B) = P(A) × P(B)
│   │   └── NO  → Need conditional probability (Ch. 9)
│   └── TIP: In a contingency table, this is cell ÷ grand total
│
└── P(at least one)?
    └── COMPLEMENT TRICK:
        P(at least one) = 1 − P(none)
        Often combine with multiplication rule for P(none)

Key Distinctions

Mutually Exclusive vs. Independent

Mutually Exclusive Independent
Meaning A and B CANNOT both happen Knowing A doesn't change P(B)
P(A and B) = 0 = P(A) × P(B)
Asks "Can these co-occur?" "Do these influence each other?"
If both have P > 0 They CANNOT be independent They might or might not be mutually exclusive
Example Rolling 2 and 5 on one die Rolling a 2 on one die, flipping heads on a coin

Critical point: Mutually exclusive events (with non-zero probability) are ALWAYS dependent. Knowing A happened tells you B didn't — that's information, which means dependence.

Contingency Table Probability Cheat Sheet

Given a contingency table with two categorical variables:

B not B Total
A a b a+b
not A c d c+d
Total a+c b+d n
Probability Formula Name
P(A) (a+b) / n Marginal probability
P(B) (a+c) / n Marginal probability
P(A and B) a / n Joint probability
P(A or B) (a+b+c) / n = P(A)+P(B)−P(A and B) Addition rule
P(not A) (c+d) / n = 1 − P(A) Complement

Common Misconceptions

Misconception Reality
"The coin is due for tails after 5 heads" Each flip is independent; the coin has no memory (gambler's fallacy)
"Two remaining options means 50/50" Only if both outcomes are equally likely (the Monty Hall trap)
"More data always means exact probabilities" More data gives better estimates; the true probability may never be known exactly
"Mutually exclusive means independent" The opposite — mutually exclusive events (with P > 0) are always dependent
"Probability predicts individual events" Probability describes long-run patterns, not individual outcomes

The Law of Large Numbers — What It Says and Doesn't Say

It DOES Say It Does NOT Say
Proportions converge to the true probability as n increases You'll get exactly 50% heads in any specific set of flips
More data → more reliable estimates The universe "corrects" for streaks
Long-run averages are predictable Individual events are predictable
Casinos always win over millions of bets Any particular gambler will lose

Python Quick Reference

import numpy as np
import pandas as pd

# --- Simulation ---
np.random.seed(42)

# Coin flip simulation (1 = heads, 0 = tails)
flips = np.random.choice([0, 1], size=10000)
prop_heads = np.mean(flips)   # Proportion of heads

# Die roll simulation
rolls = np.random.randint(1, 7, size=10000)
prop_six = np.mean(rolls == 6)  # Proportion of sixes

# --- Contingency Tables ---
# Create from a DataFrame
contingency = pd.crosstab(df['var1'], df['var2'], margins=True)

# Joint probabilities (all cells / grand total)
joint_probs = pd.crosstab(df['var1'], df['var2'],
                          margins=True, normalize='all')

# Row-wise proportions (conditional probabilities preview)
row_probs = pd.crosstab(df['var1'], df['var2'],
                        margins=True, normalize='index')

# --- Counting ---
from math import comb, factorial
comb(23, 2)     # "23 choose 2" = 253 (number of pairs)
factorial(5)    # 5! = 120

Key Terms

Term Definition
Probability A number between 0 and 1 measuring how likely an event is to occur
Event A collection of one or more outcomes of interest
Sample space The set of all possible outcomes of a random process
Outcome A single result of a random process
Classical probability P(A) = favorable outcomes / total equally likely outcomes
Relative frequency The proportion of times an event occurs over many trials
Law of large numbers As trials increase, the relative frequency approaches the true probability
Addition rule P(A or B) = P(A) + P(B) − P(A and B)
Multiplication rule P(A and B) = P(A) × P(B) for independent events
Mutually exclusive Events that cannot both occur simultaneously
Independent events Events where knowing one occurred doesn't change the probability of the other
Complement The event that A does NOT occur; P(not A) = 1 − P(A)
Contingency table A two-way table showing frequencies for combinations of two categorical variables
Joint probability The probability that two events occur simultaneously; cell count / grand total
Gambler's fallacy The mistaken belief that past random events influence future independent events

The One Thing to Remember

If you forget everything else from this chapter, remember this:

Probability is the language of uncertainty — and uncertainty is not a flaw. It's the raw material of every statistical inference you'll ever make. The complement, addition, and multiplication rules are your entire toolkit for basic probability. Master them, and you're ready for everything that follows: conditional probability, distributions, sampling, confidence intervals, hypothesis tests. It all starts here.