Case Study 2: The Central Limit Theorem in Action — Simulating Sample Means


Tier 3 — Illustrative/Composite Example: This case study is a guided simulation exercise designed to build deep intuition for the Central Limit Theorem. All data is generated within the case study itself. The goal is not to analyze real-world data but to demonstrate one of the most profound results in statistics through hands-on experimentation.


The Setting

Priya has been skeptical.

"You're telling me," she says to her data science instructor, "that if I take averages of random samples from any distribution — even a weird, lopsided one — those averages will form a bell curve? Every time? That sounds like magic."

"It does sound like magic," the instructor replies. "So let's test it. Let's create the weirdest, most non-normal distribution we can think of, and see if the CLT still works."

This case study follows Priya as she builds increasingly aggressive tests of the Central Limit Theorem — starting with simple examples and ending with a distribution so bizarre that she's convinced the theorem must fail. Spoiler: it doesn't.

Experiment 1: The Bimodal Distribution

Priya starts with a distribution that couldn't look less like a bell curve — a bimodal distribution with two sharp peaks and a valley in the middle.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

# Create a bimodal population: half from N(20, 3), half from N(80, 3)
population = np.concatenate([
    np.random.normal(20, 3, 500000),
    np.random.normal(80, 3, 500000)
])
np.random.shuffle(population)

pop_mean = np.mean(population)
pop_std = np.std(population)

print(f"Population mean: {pop_mean:.2f}")
print(f"Population std:  {pop_std:.2f}")
print(f"Population skewness: {stats.skew(population):.4f}")

# Visualize the population
fig, axes = plt.subplots(2, 3, figsize=(16, 9))

axes[0, 0].hist(population, bins=100, density=True, color='coral',
                edgecolor='white', alpha=0.8)
axes[0, 0].set_title(f'Population: Bimodal\nmean={pop_mean:.1f}, std={pop_std:.1f}')
axes[0, 0].set_xlabel('Value')

# Take samples of increasing size and plot distribution of means
sample_sizes = [2, 5, 10, 30, 100]
for idx, n in enumerate(sample_sizes):
    row = (idx + 1) // 3
    col = (idx + 1) % 3
    ax = axes[row, col]

    sample_means = [np.mean(np.random.choice(population, n)) for _ in range(10000)]

    ax.hist(sample_means, bins=50, density=True, color='steelblue',
            edgecolor='white', alpha=0.8)

    # Overlay CLT prediction
    se = pop_std / np.sqrt(n)
    x = np.linspace(min(sample_means), max(sample_means), 100)
    ax.plot(x, stats.norm.pdf(x, pop_mean, se), 'r-', linewidth=2)

    ax.set_title(f'Sample means (n={n})\nSE={se:.2f}')
    ax.set_xlabel('Sample Mean')

plt.suptitle('Experiment 1: Bimodal Population → Normal Sample Means', fontsize=14)
plt.tight_layout()
plt.savefig('clt_bimodal.png', dpi=150, bbox_inches='tight')
plt.show()

Priya is stunned. With n=2, the distribution of means looks like a rough three-peaked shape. With n=5, it's starting to smooth out. By n=30, it's a clear bell curve. By n=100, it's a nearly perfect normal distribution.

"The population has TWO peaks," she says. "How is the average bell-shaped?"

The answer: when you average together values from two groups (some around 20, some around 80), most averages end up near the overall mean of 50 — the most likely outcome is that you get roughly equal numbers from each group. Extreme averages (all from the low group, or all from the high group) are rare. This creates the bell shape.

Experiment 2: The Discrete Distribution

Priya tries something even further from normal — a discrete distribution where values can only be 1, 2, 3, 4, 5, or 6 (a die roll).

np.random.seed(42)

# Population: discrete uniform (die rolls)
die_population = np.random.randint(1, 7, size=1000000)

fig, axes = plt.subplots(2, 3, figsize=(16, 9))

axes[0, 0].hist(die_population, bins=np.arange(0.5, 7.5, 1), density=True,
                color='coral', edgecolor='white', alpha=0.8, rwidth=0.7)
axes[0, 0].set_title('Population: Die Rolls\n(Discrete Uniform)')
axes[0, 0].set_xticks(range(1, 7))

pop_mean = 3.5
pop_std = np.std(die_population)

for idx, n in enumerate([2, 3, 5, 10, 30]):
    row = (idx + 1) // 3
    col = (idx + 1) % 3
    ax = axes[row, col]

    sample_means = [np.mean(np.random.randint(1, 7, n)) for _ in range(10000)]

    ax.hist(sample_means, bins=40, density=True, color='steelblue',
            edgecolor='white', alpha=0.8)

    se = pop_std / np.sqrt(n)
    x = np.linspace(min(sample_means), max(sample_means), 100)
    ax.plot(x, stats.norm.pdf(x, pop_mean, se), 'r-', linewidth=2)

    ax.set_title(f'Mean of {n} dice\nSE={se:.2f}')

plt.suptitle('Experiment 2: Discrete Uniform → Normal Sample Means', fontsize=14)
plt.tight_layout()
plt.savefig('clt_dice.png', dpi=150, bbox_inches='tight')
plt.show()

Even with just n=5 dice, the distribution of the average is starting to look bell-shaped. By n=30, it's unmistakably normal. A discrete, flat, uniform distribution has been transformed into a continuous bell curve — just by averaging.

Experiment 3: The Extreme Skew

"Okay," Priya says. "But what about something REALLY skewed?"

np.random.seed(42)

# Population: extremely right-skewed (exponential with rate parameter 0.5)
population = np.random.exponential(2, 1000000)

fig, axes = plt.subplots(2, 4, figsize=(18, 8))

axes[0, 0].hist(population, bins=100, density=True, color='coral',
                edgecolor='white', alpha=0.8, range=(0, 20))
axes[0, 0].set_title(f'Population: Exponential\nskewness={stats.skew(population):.2f}')

pop_mean = np.mean(population)
pop_std = np.std(population)

for idx, n in enumerate([2, 3, 5, 10, 30, 50, 100]):
    row = (idx + 1) // 4
    col = (idx + 1) % 4
    ax = axes[row, col]

    sample_means = [np.mean(np.random.exponential(2, n)) for _ in range(10000)]

    ax.hist(sample_means, bins=50, density=True, color='steelblue',
            edgecolor='white', alpha=0.8)

    se = pop_std / np.sqrt(n)
    x = np.linspace(min(sample_means), max(sample_means), 100)
    ax.plot(x, stats.norm.pdf(x, pop_mean, se), 'r-', linewidth=2)

    # Q-Q plot inset
    ax.set_title(f'n={n}, skew of means: {stats.skew(sample_means):.2f}')

plt.suptitle('Experiment 3: Extremely Skewed Population → Normal Means (Eventually)', fontsize=14)
plt.tight_layout()
plt.savefig('clt_exponential.png', dpi=150, bbox_inches='tight')
plt.show()

# Track how skewness of means decreases with n
print("\nSkewness of sample means by sample size:")
for n in [2, 5, 10, 30, 50, 100, 500]:
    means = [np.mean(np.random.exponential(2, n)) for _ in range(10000)]
    print(f"  n={n:4d}: skewness of means = {stats.skew(means):+.3f}")

With the exponential distribution (skewness ≈ 2), the CLT needs more samples to kick in. At n=5, the means are still visibly skewed. At n=30, they're approximately normal. At n=100, the bell is clean. The rule of thumb "n=30 is enough" works for most distributions, but highly skewed distributions may need more.

Experiment 4: Priya's Ultimate Challenge

Priya constructs the strangest distribution she can think of:

np.random.seed(42)

# A deliberately weird distribution:
# 50% chance of exactly 0
# 30% chance of a value from Uniform(10, 20)
# 15% chance of a value from Normal(100, 5)
# 5% chance of a value of exactly 1000

n_pop = 1000000
choices = np.random.choice([0, 1, 2, 3], size=n_pop, p=[0.50, 0.30, 0.15, 0.05])
population = np.zeros(n_pop)

for i in range(n_pop):
    if choices[i] == 0:
        population[i] = 0
    elif choices[i] == 1:
        population[i] = np.random.uniform(10, 20)
    elif choices[i] == 2:
        population[i] = np.random.normal(100, 5)
    else:
        population[i] = 1000

pop_mean = np.mean(population)
pop_std = np.std(population)

fig, axes = plt.subplots(2, 3, figsize=(16, 9))

axes[0, 0].hist(population, bins=100, density=True, color='coral',
                edgecolor='white', alpha=0.8)
axes[0, 0].set_title(f"Priya's Nightmare Distribution\nmean={pop_mean:.1f}, std={pop_std:.1f}")

for idx, n in enumerate([5, 10, 30, 50, 100]):
    row = (idx + 1) // 3
    col = (idx + 1) % 3
    ax = axes[row, col]

    sample_means = [np.mean(np.random.choice(population, n)) for _ in range(10000)]

    ax.hist(sample_means, bins=50, density=True, color='steelblue',
            edgecolor='white', alpha=0.8)

    se = pop_std / np.sqrt(n)
    x = np.linspace(pop_mean - 3*se, pop_mean + 3*se, 100)
    ax.plot(x, stats.norm.pdf(x, pop_mean, se), 'r-', linewidth=2)

    ax.set_title(f'n={n}')

plt.suptitle("Experiment 4: Even This Weird Distribution Can't Break the CLT", fontsize=14)
plt.tight_layout()
plt.savefig('clt_nightmare.png', dpi=150, bbox_inches='tight')
plt.show()

This population is absurd — a spike at 0, a bump near 15, another bump near 100, and occasional spikes at 1000. It looks nothing like any standard distribution. And yet, by n=50, the distribution of sample means is recognizably bell-shaped. By n=100, it's almost perfectly normal.

The Quantitative Verification

Priya doesn't just want to eyeball it — she wants numbers.

# Formal verification: compare simulated means to CLT predictions
print("=" * 65)
print("QUANTITATIVE CLT VERIFICATION")
print("=" * 65)

populations = {
    'Bimodal': np.concatenate([np.random.normal(20, 3, 500000),
                                np.random.normal(80, 3, 500000)]),
    'Exponential': np.random.exponential(2, 1000000),
    'Uniform': np.random.uniform(0, 100, 1000000),
    'Nightmare': population  # from Experiment 4
}

for pop_name, pop_data in populations.items():
    pop_mu = np.mean(pop_data)
    pop_sigma = np.std(pop_data)

    print(f"\n--- {pop_name} Population ---")
    print(f"  Population mean: {pop_mu:.2f}, std: {pop_sigma:.2f}")

    for n in [10, 30, 100]:
        means = [np.mean(np.random.choice(pop_data, n)) for _ in range(10000)]

        predicted_mean = pop_mu
        predicted_se = pop_sigma / np.sqrt(n)
        actual_mean = np.mean(means)
        actual_se = np.std(means)

        # Check: what % of means fall within predicted 95% interval?
        lower = predicted_mean - 1.96 * predicted_se
        upper = predicted_mean + 1.96 * predicted_se
        within_95 = np.mean((np.array(means) >= lower) & (np.array(means) <= upper)) * 100

        print(f"  n={n:3d}: predicted SE={predicted_se:.3f}, actual SE={actual_se:.3f}, "
              f"within 95% CI: {within_95:.1f}%")

The results confirm: for all four populations, even the "nightmare" distribution, the actual standard errors match the predicted values closely, and approximately 95% of sample means fall within the predicted 95% interval. The CLT works.

Why This Matters

Priya summarizes what she's learned:

The Central Limit Theorem isn't just a theorem — it's the reason statistical inference works.

When Elena computes the average vaccination rate from a sample of 50 countries, she can build a confidence interval around that average using the normal distribution — even though individual vaccination rates are left-skewed.

When Marcus computes his average daily sales from 30 days of data, the uncertainty in that average is normally distributed — even though daily sales are right-skewed (a few big days pull the average up).

When pollsters estimate voter preferences from a sample of 1,000 respondents, the margin of error comes from the normal distribution — even though individual responses are binary (yes/no).

The CLT is the bridge between the messy, non-normal real world and the clean, normal-distribution-based methods of statistical inference. Without it, most of what we'll learn in Chapters 22-27 wouldn't work.

Key Quantitative Results

Population Shape n=10 n=30 n=100
Bimodal Roughly normal Very normal Perfectly normal
Exponential (skewed) Still skewed Approximately normal Very normal
Uniform (flat) Triangular Very normal Perfectly normal
"Nightmare" (bizarre) Lumpy Approximately normal Very normal

General rule of thumb: - Symmetric populations: n=10 is often enough - Mildly skewed: n=20-30 is usually enough - Heavily skewed or very unusual: n=50-100 may be needed

Discussion Questions

  1. Priya's "nightmare" distribution includes values of exactly 1000 that make up 5% of the population. What would happen to the CLT if these extreme values were even more extreme — say, 1,000,000 instead of 1,000? Would you need larger n?

  2. The CLT requires the population to have a finite mean and finite variance. Can you think of a distribution that violates this? (Hint: the Cauchy distribution.) What happens to sample means in that case?

  3. In practice, we usually don't know the population's shape. How does the CLT help us proceed even in the absence of this knowledge?

  4. A common criticism of the CLT in applied settings is: "Sure, the distribution of means is normal, but I only have ONE sample mean. How does knowing the sampling distribution help me?" How would you respond?

Connection to the Chapter

This case study is a deep dive into Section 21.7 (the Central Limit Theorem). It extends the chapter's simulations with more extreme populations, adds quantitative verification, and explicitly connects the CLT to the upcoming chapters on confidence intervals (Chapter 22) and hypothesis testing (Chapter 23). The four experiments build a visceral, simulation-based understanding that no amount of formula memorization could match.