Key Takeaways: Chapter 6


Core Concepts

  • Probability is a number between 0 and 1 representing likelihood. Two interpretations: frequentist (long-run frequency) and Bayesian (degree of belief). Both are useful in different contexts.

  • The gambler's fallacy: Past independent events do not influence future independent events. Coins and dice have no memory.

  • The hot hand fallacy: Streaks in random processes do not increase the probability of continuation. (Exception: in skill-influenced processes, there may be real hot hands — but this requires different evidence.)

  • Base rate neglect: We systematically fail to incorporate background frequencies when updating beliefs. The medical test example shows how dramatically this distorts conclusions.


The Five Foundational Principles

  1. All possible outcomes must sum to probability = 1
  2. Independent events: knowing one tells you nothing about the other; dependent events: knowing one changes the other's probability
  3. Conditional probability: P(A|B) = probability of A given B has occurred
  4. Multiplication rule (independent events): P(A and B) = P(A) × P(B)
  5. Addition rule (mutually exclusive events): P(A or B) = P(A) + P(B)

The Reference Class Problem

  • To use base rates, you must choose the right reference class.
  • Too broad: misleading (not specific enough to your situation)
  • Too narrow: unreliable (insufficient data)
  • The right reference class: most specific for which you have reliable data

Bayesian Updating

  • Start with a prior probability based on base rates
  • Update it as evidence arrives
  • Strong evidence moves your belief a lot; weak evidence moves it a little
  • Evidence that's equally likely under both hypotheses should barely move your belief

The Denominator Problem

  • We're surprised by dramatic events without asking "out of how many?"
  • Coincidences are less surprising when you consider all the opportunities for coincidence that occurred
  • Psychic performances, lucky streaks, and viral content all look less magical when the denominator is visible

Applied to Social Media

  • Content analytics are extremely noisy — high variance, small samples
  • The multiple comparisons problem means many spurious patterns will appear in any data set
  • "Cracking the algorithm" is mostly myth — creators who succeed consistently optimize for content quality and quantity, not metadata timing

Practical Toolkit

  1. Estimate the reference class
  2. Adjust for conditioning
  3. Consider the denominator
  4. Think in distributions, not points
  5. Distinguish confidence from probability (calibration)