Key Takeaways: Conditional Probability and Bayes' Theorem

One-Sentence Summary

Bayes' theorem is the mathematical engine for updating beliefs with evidence — revealing why even "accurate" tests produce false alarms, why P(A|B) is not the same as P(B|A), and why every AI system from spam filters to self-driving cars runs on conditional probability.

Core Concepts at a Glance

Concept Definition Why It Matters
Conditional probability $P(A \mid B)$ — the probability of A given that B is true Lets you reason about probability with new information
Bayes' theorem Updates prior probability using evidence to produce posterior probability The foundation of AI, medical testing, and rational reasoning under uncertainty
P(A|B) ≠ P(B|A) The probability of A given B is almost never equal to the probability of B given A Confusing them is called the prosecutor's fallacy and has real-world consequences
Base rate fallacy Ignoring the prior probability (prevalence) when evaluating evidence Even a 99% accurate test is unreliable when the condition is very rare
Natural frequency approach Restating probability problems as counts (e.g., "out of 10,000 people...") More intuitive than formulas; produces the same answer with less error

Bayes' Theorem — Three Ways

1. The Formula

$$\boxed{P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B \mid A) \cdot P(A) + P(B \mid \text{not } A) \cdot P(\text{not } A)}}$$

2. The Plain-English Version

$$\text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Total probability of the evidence}}$$

Component Symbol Question It Answers
Prior $P(A)$ "Before I see evidence, how likely is A?"
Likelihood $P(B \mid A)$ "If A is true, how likely is this evidence?"
False alarm rate $P(B \mid \text{not } A)$ "If A is false, how likely is this evidence?"
Posterior $P(A \mid B)$ "After seeing evidence, how likely is A?"

3. The Natural Frequency Approach

  1. Start with a large round number (e.g., 10,000 people)
  2. Split by the base rate (how many have vs. don't have the condition)
  3. Apply the test to each group (true positives, false positives, etc.)
  4. Count: "Of everyone who tested positive, how many actually have the condition?"

Conditional Probability Quick Reference

From a Contingency Table

Given this table:

B not B Total
A a b a+b
not A c d c+d
Total a+c b+d n
Probability Formula What It Means
$P(A \mid B)$ $\frac{a}{a+c}$ Among those with B, what fraction also has A?
$P(B \mid A)$ $\frac{a}{a+b}$ Among those with A, what fraction also has B?
$P(A \mid \text{not } B)$ $\frac{b}{b+d}$ Among those without B, what fraction has A?
$P(\text{not } A \mid B)$ $\frac{c}{a+c}$ Among those with B, what fraction doesn't have A?

Key insight: For conditional probability, the "given" condition becomes the denominator. You restrict your universe to the column or row of the given condition.

The General Multiplication Rule

$$P(A \text{ and } B) = P(A) \times P(B \mid A) = P(B) \times P(A \mid B)$$

This is the full version of the multiplication rule from Chapter 8. The independent version ($P(A) \times P(B)$) is a special case where $P(B \mid A) = P(B)$.

Medical Testing Vocabulary

Has Condition No Condition
Test Positive True Positive (TP) False Positive (FP)
Test Negative False Negative (FN) True Negative (TN)
Measure Formula In Words
Sensitivity $\frac{TP}{TP + FN}$ "If sick, will the test catch it?"
Specificity $\frac{TN}{TN + FP}$ "If healthy, will the test say so?"
PPV $\frac{TP}{TP + FP}$ "If positive, am I really sick?"
NPV $\frac{TN}{TN + FN}$ "If negative, am I really healthy?"
False positive rate $1 - \text{specificity}$ "If healthy, what's the chance of a false alarm?"
False negative rate $1 - \text{sensitivity}$ "If sick, what's the chance of being missed?"

Critical: Sensitivity and specificity are properties of the test. PPV and NPV depend on the test AND the base rate.

Decision Guide: When to Use What

What probability question are you asking?
│
├── "Given B happened, what's P(A)?"
│   └── CONDITIONAL PROBABILITY: P(A|B) = P(A and B) / P(B)
│       └── From a table: cell ÷ column or row total
│
├── "I know P(B|A), but I need P(A|B)"
│   └── BAYES' THEOREM: P(A|B) = P(B|A)·P(A) / P(B)
│       └── TIP: Try natural frequencies first
│
├── "Are A and B independent?"
│   └── CHECK: Does P(A|B) = P(A)?
│       ├── YES → Independent
│       └── NO  → Dependent (use general multiplication rule)
│
└── "What's the overall P(B) when there are
     multiple pathways to B?"
    └── LAW OF TOTAL PROBABILITY:
        P(B) = P(B|A)·P(A) + P(B|not A)·P(not A)
        └── TIP: This is the denominator of Bayes' theorem

How PPV Changes with Prevalence

(For a test with 99% sensitivity and 99% specificity)

Prevalence PPV False Positives per True Positive
10% 91.7% 0.09
1% 50.0% 1
0.1% 9.0% 10
0.01% 1.0% 100
0.001% 0.1% 1,000

Takeaway: The rarer the condition, the less a positive test means — even with an excellent test.

Common Misconceptions

Misconception Reality
"A 99% accurate test means a positive result is 99% reliable" PPV depends on the base rate — for rare conditions, most positives are false
"P(A|B) = P(B|A)" Almost never true (the prosecutor's fallacy)
"A positive test proves I have the disease" It updates the probability; you need Bayes' theorem to know by how much
"Base rates don't matter if the test is good enough" Base rates always matter — they determine how many false alarms overwhelm true positives
"Bayes' theorem is only for medical testing" It's the engine behind spam filters, recommendation engines, language models, weather forecasts, and all of AI
"You can only apply Bayes once" Yesterday's posterior becomes today's prior — you can update repeatedly with new evidence

The Prosecutor's Fallacy — A Checklist

When someone presents probabilistic evidence, ask:

  • [ ] "The probability of WHAT given WHAT?" — Identify which direction the conditional goes.
  • [ ] "What's the base rate?" — How common is this event before considering the evidence?
  • [ ] "What's the alternative?" — You must compare competing explanations, not evaluate one in isolation.
  • [ ] "Was independence assumed?" — If probabilities were multiplied, were the events truly independent?

Python Quick Reference

import pandas as pd
import numpy as np

# --- Conditional probability from a DataFrame ---
# Method 1: pd.crosstab with normalize='index'
cond_probs = pd.crosstab(df['var1'], df['var2'], normalize='index')

# Method 2: Filter and calculate
subset = df[df['condition'] == 'value']
p_conditional = (subset['target'] == 'outcome').mean()

# Method 3: groupby
df.groupby('var1')['var2'].value_counts(normalize=True)

# --- Bayes' theorem function ---
def bayes_theorem(prior, likelihood, false_alarm):
    """P(A|B) from P(A), P(B|A), and P(B|not A)."""
    p_evidence = likelihood * prior + false_alarm * (1 - prior)
    return (likelihood * prior) / p_evidence

# Example: disease screening
posterior = bayes_theorem(
    prior=0.001,       # base rate
    likelihood=0.99,   # sensitivity
    false_alarm=0.02   # 1 - specificity
)

Key Terms

Term Definition
Conditional probability $P(A \mid B)$ — the probability of A given that B is known to be true
Bayes' theorem $P(A \mid B) = P(B \mid A) \cdot P(A) / P(B)$ — the formula for updating probability with evidence
Prior probability The probability of an event before considering new evidence
Posterior probability The updated probability after incorporating new evidence
Sensitivity $P(\text{positive} \mid \text{disease})$ — the test's ability to detect true positives
Specificity $P(\text{negative} \mid \text{no disease})$ — the test's ability to correctly clear healthy people
False positive A positive test result when the condition is absent
False negative A negative test result when the condition is present
Tree diagram A branching visual showing all paths through a multi-step probability problem
Prosecutor's fallacy Confusing $P(\text{evidence} \mid \text{innocent})$ with $P(\text{innocent} \mid \text{evidence})$
Base rate fallacy Ignoring the prior probability (prevalence) when interpreting evidence
Positive predictive value (PPV) $P(\text{disease} \mid \text{positive test})$ — what a positive result actually means
Law of total probability $P(B) = \sum P(B \mid A_i) \cdot P(A_i)$ — total probability by summing over all pathways
Likelihood ratio $P(B \mid A) / P(B \mid \text{not } A)$ — measures how strongly evidence supports a hypothesis

The One Thing to Remember

If you forget everything else from this chapter, remember this:

Probability is not fixed — it changes with evidence. Bayes' theorem is the machine that does the updating: it takes what you believed before (the prior), combines it with what you've just learned (the evidence), and produces what you should believe now (the posterior). But the update depends critically on the base rate — the prior probability you started with. Ignoring the base rate is one of the most common and most dangerous reasoning errors humans make. Every time you encounter a probability claim, ask three questions: "The probability of what?" "Given what?" and "Compared to what?" Those three questions will protect you from the prosecutor's fallacy, the base rate fallacy, and a hundred other traps.