Key Takeaways: Chapter 6
Core Concepts
-
Probability is a number between 0 and 1 representing likelihood. Two interpretations: frequentist (long-run frequency) and Bayesian (degree of belief). Both are useful in different contexts.
-
The gambler's fallacy: Past independent events do not influence future independent events. Coins and dice have no memory.
-
The hot hand fallacy: Streaks in random processes do not increase the probability of continuation. (Exception: in skill-influenced processes, there may be real hot hands — but this requires different evidence.)
-
Base rate neglect: We systematically fail to incorporate background frequencies when updating beliefs. The medical test example shows how dramatically this distorts conclusions.
The Five Foundational Principles
- All possible outcomes must sum to probability = 1
- Independent events: knowing one tells you nothing about the other; dependent events: knowing one changes the other's probability
- Conditional probability: P(A|B) = probability of A given B has occurred
- Multiplication rule (independent events): P(A and B) = P(A) × P(B)
- Addition rule (mutually exclusive events): P(A or B) = P(A) + P(B)
The Reference Class Problem
- To use base rates, you must choose the right reference class.
- Too broad: misleading (not specific enough to your situation)
- Too narrow: unreliable (insufficient data)
- The right reference class: most specific for which you have reliable data
Bayesian Updating
- Start with a prior probability based on base rates
- Update it as evidence arrives
- Strong evidence moves your belief a lot; weak evidence moves it a little
- Evidence that's equally likely under both hypotheses should barely move your belief
The Denominator Problem
- We're surprised by dramatic events without asking "out of how many?"
- Coincidences are less surprising when you consider all the opportunities for coincidence that occurred
- Psychic performances, lucky streaks, and viral content all look less magical when the denominator is visible
Applied to Social Media
- Content analytics are extremely noisy — high variance, small samples
- The multiple comparisons problem means many spurious patterns will appear in any data set
- "Cracking the algorithm" is mostly myth — creators who succeed consistently optimize for content quality and quantity, not metadata timing
Practical Toolkit
- Estimate the reference class
- Adjust for conditioning
- Consider the denominator
- Think in distributions, not points
- Distinguish confidence from probability (calibration)