35 min read

You're about to do something you've never done in this course: make an inference.

Learning Objectives

  • Explain what a confidence interval represents (and what it does NOT mean)
  • Construct and interpret a confidence interval for a population mean
  • Construct and interpret a confidence interval for a population proportion
  • Determine the sample size needed for a desired margin of error
  • Understand how confidence level, sample size, and margin of error are related

Chapter 12: Confidence Intervals: Estimating with Uncertainty

"It is better to be roughly right than precisely wrong." — John Maynard Keynes

Chapter Overview

You're about to do something you've never done in this course: make an inference.

For eleven chapters, you've been building tools. You've learned to summarize data, visualize distributions, think probabilistically, and — in Chapter 11 — you discovered the Central Limit Theorem, the single result that makes inference possible. But everything you've done so far has been descriptive. You've been describing the data you have. Now you're going to say something about the data you don't have.

Here's the scenario. Dr. Maya Chen surveys 120 adults in her county and finds that their average systolic blood pressure is 128.3 mmHg. That's the sample mean. But she doesn't care about those 120 people specifically — she wants to know the average blood pressure of all 500,000 adults in the county. She can't measure everyone. So what can she say?

Before Chapter 11, the answer was "not much." She'd have a single number — 128.3 — with no idea how close it was to the truth.

But now she knows the CLT. She knows that sample means follow a predictable, approximately normal distribution centered on the true population mean, with a spread of $\sigma / \sqrt{n}$. She knows that her sample mean of 128.3 came from that normal distribution. And she can use that knowledge to build a confidence interval — a range of plausible values for the true population mean.

Instead of saying "the average blood pressure is 128.3," she'll say "we're 95% confident the true average blood pressure is between 125.1 and 131.5 mmHg."

That's inference. That's what this whole course has been building toward. And you're about to learn how to do it yourself.

In this chapter, you will learn to: - Explain what a confidence interval represents (and what it does NOT mean) - Construct and interpret a confidence interval for a population mean - Construct and interpret a confidence interval for a population proportion - Determine the sample size needed for a desired margin of error - Understand how confidence level, sample size, and margin of error are related

Fast Track: If you've constructed confidence intervals before, skim Sections 12.1–12.4 and jump to Section 12.8 (the tradeoff triangle). Complete quiz questions 1, 8, and 15 to verify your understanding.

Deep Dive: After this chapter, read Case Study 1 (Maya's disease prevalence confidence interval) for a detailed public health application, then Case Study 2 (polling margins of error) for a look at how the "± 3 points" you hear on election night actually works.


12.1 A Puzzle Before We Start (Productive Struggle)

Here's an exercise I want you to try before I explain anything.

The Fishing Net Metaphor

Imagine you're fishing in a lake, trying to catch a specific golden fish. You don't know exactly where it is. But you have two types of nets:

  • Net A is narrow (3 feet wide) and catches the golden fish 70% of the time.
  • Net B is wide (10 feet wide) and catches the golden fish 99% of the time.
  • Net C is medium (6 feet wide) and catches the golden fish 95% of the time.

(a) Which net would you choose if the cost of missing the fish is catastrophic (e.g., medical diagnosis)?

(b) Which net would you choose if you need a precise answer (e.g., calibrating a machine to the fish's exact location)?

(c) Is there a way to get both high capture probability AND a narrow net? What would you need?

(d) You've already cast Net C and caught... nothing. What is the probability that the golden fish is somewhere within the area your net covered?

Take 3 minutes. Think about (d) carefully — it's trickier than it sounds.

Here's what I hope you realized:

For parts (a) and (b), the answer depends on what matters more: certainty that you'll catch the fish (choose the wider net) or precision about where the fish is (choose the narrower net). There's a tradeoff between width and catch rate, just like there's a tradeoff between the width of a confidence interval and the confidence level.

For part (c), the answer is: use a better net — one that targets the fish more precisely. In statistical terms, that means getting more data (larger sample size), which lets you build a narrower interval without sacrificing confidence.

Part (d) is the trick question. You cast the net, and the fish isn't there. What's the probability the fish is in the net's area? Zero. The fish is either in the net or it's not. The 95% was about the process — if you cast Net C many times in random locations, 95% of your casts would capture the fish. But once you've cast a specific net in a specific location, the outcome is determined. The fish is either there or it isn't.

This is exactly how confidence intervals work. And the distinction in part (d) — between the process and the specific outcome — is the single most important idea in this chapter.


12.2 From Point Estimates to Interval Estimates

Let's start with what you already know.

When Alex Rivera calculates the average watch time for a sample of StreamVibe users, she gets a single number — say, $\bar{x} = 52.4$ minutes. This is called a point estimate: a single value used to estimate a population parameter.

🔄 Spaced Review 1 (from Ch.11): Standard Error — The Denominator of Uncertainty

In Chapter 11, you learned that sample means vary from sample to sample. If Alex took a different random sample of the same size, she'd get a different $\bar{x}$. The standard error measures how much:

$$\text{SE}_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$

And when $\sigma$ is unknown (the usual case), she estimates it with the sample standard deviation $s$:

$$\widehat{\text{SE}}_{\bar{x}} = \frac{s}{\sqrt{n}}$$

The standard error is the "typical distance" of a sample mean from the population mean — the same concept as standard deviation (Ch.6), but applied to the sampling distribution instead of the population distribution. This distinction matters because $\text{SE}$ is about to become the building block of every inference procedure in the rest of this course.

Point estimates are useful, but they have a problem: they don't tell you how close you are to the truth. Is Alex's 52.4 minutes within 1 minute of the real population average? Within 5? Within 20? The point estimate alone gives no indication.

An interval estimate, by contrast, provides a range of plausible values for the parameter:

$$\text{We're 95% confident that } \mu \text{ is between 49.8 \text{ and } 55.0 \text{ minutes.}}$$

That interval tells you two things the point estimate couldn't: 1. How precise the estimate is (the interval is 5.2 minutes wide — that's pretty good) 2. How confident you are (95% — you'll learn exactly what that means shortly)

The interval estimate is called a confidence interval (CI), and it has a beautifully simple structure:

$$\boxed{\text{Confidence Interval} = \text{Point Estimate} \pm \text{Margin of Error}}$$

The margin of error (MOE) is the maximum amount by which the sample statistic is likely to differ from the population parameter at the given confidence level. It's computed as:

$$\text{Margin of Error} = \text{Critical Value} \times \text{Standard Error}$$

So the full formula is:

$$\text{CI} = \text{Point Estimate} \pm (\text{Critical Value} \times \text{Standard Error})$$

Let's break down each component:

Component What It Is Where It Comes From
Point estimate $\bar{x}$ or $\hat{p}$ Your sample data
Critical value $z^*$ or $t^*$ The confidence level (from the normal or t-distribution)
Standard error $s/\sqrt{n}$ or $\sqrt{\hat{p}(1-\hat{p})/n}$ Your sample data + sample size

Here's the structure visually:

                    Margin of Error          Margin of Error
                  ◄─────────────────►    ◄─────────────────►

    ──────────────┼─────────────────┼─────────────────┼──────────────
                 Lower           Point               Upper
                 bound          estimate              bound
                                (x̄ or p̂)

                  ◄──────────────────────────────────►
                        95% Confidence Interval

This is the architecture of every confidence interval you'll ever build: start at the middle (your best guess) and stretch outward by the margin of error.


12.3 What "95% Confidence" Really Means

This is the most important section in this chapter. Read it carefully.

Here's the question: when we say "we're 95% confident that $\mu$ is between 125.1 and 131.5," what does the 95% mean?

Most people's instinct is to say: "There's a 95% probability that $\mu$ is in the interval."

That's wrong. And understanding why it's wrong is the threshold concept of this chapter.

The Problem with "95% Probability"

Think about it. The population mean $\mu$ is a fixed number. It's not random. The average blood pressure of all adults in Maya's county is whatever it is — say, 127.0 mmHg. It's not bouncing around. It doesn't have a probability distribution. It's just a number sitting there, waiting to be discovered.

Maya's confidence interval of (125.1, 131.5) either contains 127.0 or it doesn't. There's no randomness left. The interval was calculated, the boundaries are fixed, and either 127.0 is inside or it's not. The probability is 1 (it's in there) or 0 (it's not). There's no 0.95 about it.

So where does the 95% come from?

The Repeated Sampling Interpretation

The 95% refers to the process, not the specific interval.

Imagine this: Maya repeats her study 100 times. Each time, she draws a new random sample of 120 adults from the same county, computes a new $\bar{x}$ and a new confidence interval. Every study gives a different interval because every sample is different. Some intervals are a bit to the left, some a bit to the right, some wider, some narrower.

Here's the key: about 95 of those 100 intervals will contain the true population mean $\mu$. About 5 will miss.

Study 1:  ├────────────┤                                  ✓ (captures μ)
Study 2:       ├────────────┤                              ✓
Study 3:                ├────────────┤                     ✓
Study 4:                              ├────────────┤       ✗ (misses μ!)
Study 5:             ├────────────┤                        ✓
Study 6:          ├────────────┤                           ✓
Study 7:  ├────────────┤                                   ✓
Study 8:                   ├────────────┤                  ✓
   ⋮                        ⋮
Study 95:           ├────────────┤                         ✓
Study 96:                         ├────────────┤           ✗ (misses μ!)
Study 97:        ├────────────┤                            ✓
Study 98:                ├────────────┤                    ✓
Study 99:      ├────────────┤                              ✓
Study 100:              ├────────────┤                     ✓

                          ▲
                          │
                          μ (true population mean, fixed)

The "95% confidence" doesn't describe a specific interval. It describes the method. If you use this method over and over, 95% of the intervals you produce will capture the true parameter. It's a statement about the long-run reliability of the procedure, not about any single interval.

🎯 Threshold Concept: What "95% Confidence" Really Means

Here's the precise interpretation:

"If we repeated this sampling procedure many times and constructed a 95% confidence interval from each sample, approximately 95% of those intervals would contain the true population parameter."

Notice what this says: the randomness is in which interval you get, not in where $\mu$ is. The parameter is fixed; the interval moves. The confidence level measures how often the moving interval captures the fixed target.

This is like the fishing net in Section 12.1. The fish (population parameter) is in a fixed location. You cast the net (construct an interval) in a random location. The "95% catch rate" describes how often your random casts land on the fish — not the probability that a specific cast caught it.

Once you internalize this, you'll understand confidence intervals more deeply than most people who use them. It takes a shift in thinking: from "how likely is it that the truth is in here?" (wrong) to "how reliable is my method at capturing the truth?" (right).

🔄 Spaced Review 2 (from Ch.8): What Does 95% Mean in Probability Terms?

In Chapter 8, you learned about probability as long-run relative frequency: the probability of an event is the proportion of times it would occur in many, many repetitions. That's exactly what's happening here.

The "95%" in a 95% confidence interval is a probability in the relative frequency sense — it's the proportion of intervals that would capture $\mu$ across many repetitions of the sampling process. It's not a subjective probability about one interval. It's the long-run performance of the procedure.

This connects directly to the law of large numbers from Chapter 8: if you constructed infinitely many 95% CIs, exactly 95% would capture $\mu$.

The Correct Way to State Confidence Intervals

Here are proper and improper ways to express a confidence interval:

Statement Correct? Why?
"We are 95% confident that $\mu$ is between 125.1 and 131.5." ✅ Yes "Confident" refers to the reliability of the method
"There is a 95% probability that $\mu$ is between 125.1 and 131.5." ❌ No $\mu$ is fixed — it's either in there or it isn't
"95% of all possible 95% CIs would contain $\mu$." ✅ Yes This is the formal definition
"The interval (125.1, 131.5) has a 95% chance of containing $\mu$." ❌ No Same error as #2 — assigns probability to a fixed parameter
"We are 95% confident that $\bar{x}$ is between 125.1 and 131.5." ❌ No We know $\bar{x}$ — the CI estimates $\mu$, not $\bar{x}$
"If we repeated this study many times, about 95% of our intervals would contain $\mu$." ✅ Yes The repeated sampling interpretation

💡 Myth vs. Reality: The #1 Confidence Interval Misconception

The Myth: "A 95% confidence interval means there is a 95% probability that the population parameter falls within the interval."

The Reality: The population parameter is a fixed (unknown) number. It doesn't "fall" anywhere — it just is. The 95% describes how often the procedure captures the parameter, not how likely it is that any particular interval got it right. Think of it this way: a basketball player who shoots 95% from the free-throw line has a 95% chance of making the next shot — but once the shot is taken, it either went in or it didn't. The 95% described the process, not the outcome.


12.4 The t-Distribution: When Reality Gets in the Way

To build a confidence interval, you need a critical value — a number from a reference distribution that determines how wide the interval is. In a perfect world, you'd use the standard normal distribution (z-scores) and the formula:

$$\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}$$

But there's a problem. This formula requires $\sigma$, the population standard deviation. And you almost never know $\sigma$. If you knew enough about the population to know its standard deviation, you probably wouldn't need to estimate its mean.

So in practice, you replace $\sigma$ with the sample standard deviation $s$:

$$\bar{x} \pm \text{something} \cdot \frac{s}{\sqrt{n}}$$

But this substitution introduces extra uncertainty. Not only is $\bar{x}$ a random variable (it varies from sample to sample), but now $s$ is also random — it varies from sample to sample too. You've got two sources of randomness instead of one.

A brilliant Irish statistician named William Sealy Gosset figured this out in 1908. Working at the Guinness brewery in Dublin (yes, that Guinness), he discovered that when you replace $\sigma$ with $s$, the resulting distribution isn't quite normal. It's a bit wider, a bit flatter, with heavier tails. He published his result under the pseudonym "Student" because Guinness didn't allow employees to publish under their own names, and the distribution became known as Student's t-distribution — or simply the t-distribution.

How the t-Distribution Differs from the Normal

The t-distribution looks a lot like the standard normal (bell-shaped, symmetric, centered at 0), but with one crucial difference: it has heavier tails. This means extreme values are more likely with the t-distribution than with the normal.

Probability
Density
    ▲
    │       ╱──── Normal (z) ────╲
    │      ╱                      ╲
    │     ╱ ╱── t (df = 5) ──╲     ╲
    │    ╱ ╱                  ╲     ╲
    │   ╱ ╱                    ╲     ╲
    │  ╱ ╱                      ╲     ╲
    │ ╱ ╱                        ╲     ╲
    │╱_╱____________________________╲_____╲______
    ┼────────────────┼────────────────────────►
   -4              0                         4

    The t-distribution has heavier tails (more spread)
    than the normal, especially for small sample sizes.

Why heavier tails? Because when $n$ is small, $s$ is an unreliable estimate of $\sigma$ — it bounces around a lot from sample to sample. Sometimes $s$ underestimates $\sigma$ (making your interval too narrow), and sometimes it overestimates (making it too wide). The t-distribution accounts for this extra variability by being more spread out than the normal, which produces wider intervals — appropriately capturing the added uncertainty.

Degrees of Freedom

The t-distribution isn't a single distribution — it's a family of distributions, indexed by a parameter called degrees of freedom (df). For a one-sample confidence interval for a mean:

$$df = n - 1$$

The degrees of freedom determine how heavy the tails are:

Degrees of Freedom t-Distribution Shape
$df = 1$ Very heavy tails — the most different from normal
$df = 5$ Noticeably heavier tails than normal
$df = 10$ Getting closer to normal
$df = 30$ Nearly indistinguishable from normal
$df = \infty$ Exactly the standard normal distribution

This makes intuitive sense. When $n$ is small (so $df$ is small), $s$ is an unreliable estimate of $\sigma$, and you need wider intervals to compensate. When $n$ is large (so $df$ is large), $s$ is very close to $\sigma$, and the t-distribution converges to the normal.

Why $n - 1$ Degrees of Freedom?

Here's the intuition. You have $n$ data values, and you're using them to estimate two things: the mean ($\bar{x}$) and the standard deviation ($s$). But $s$ is calculated using deviations from $\bar{x}$, and those deviations must sum to zero (this is a mathematical fact). So once you know $n - 1$ of the deviations, the last one is determined — it has no "freedom" to vary. The $n - 1$ free deviations give you $n - 1$ degrees of freedom.

You first encountered degrees of freedom briefly in Chapter 6 (Section 6.5), where we divided by $n - 1$ instead of $n$ when calculating the sample variance. That $n - 1$ was the same idea. Now you see why it matters: it connects directly to which t-distribution to use.

Critical Values from the t-Distribution

A critical value $t^*$ is the value from the t-distribution that marks the boundary of the middle 95% (or 90%, or 99%) of the distribution. For a 95% confidence interval:

  • The middle 95% of the t-distribution is bounded by $-t^*$ and $+t^*$
  • The remaining 5% is split equally: 2.5% in each tail

Here are some common critical values:

Confidence Level $z^*$ (normal) $t^*$ (df = 10) $t^*$ (df = 30) $t^*$ (df = 100)
90% 1.645 1.812 1.697 1.660
95% 1.960 2.228 2.042 1.984
99% 2.576 3.169 2.750 2.626

Notice that for any confidence level, the t critical values are larger than the z critical values (wider intervals), and they approach the z values as $df$ increases. With $df = 100$, the difference is tiny. With $df = 10$, it's substantial.

When to use $z^*$ vs. $t^*$:

Situation Use Why
Known $\sigma$ (rare in practice) $z^*$ No extra uncertainty from estimating $\sigma$
Unknown $\sigma$, using $s$ (the usual case) $t^*$ with $df = n-1$ Must account for uncertainty in estimating $\sigma$
Large $n$ (say $n > 100$) Either — they're nearly identical $t^*$ converges to $z^*$
Proportions $z^*$ The SE formula doesn't involve estimating $\sigma$ separately

12.5 Confidence Interval for a Population Mean

Now let's put it all together. Here's the formula for a confidence interval for a population mean $\mu$:

$$\boxed{\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}}$$

where: - $\bar{x}$ is the sample mean (your best guess for $\mu$) - $t^*$ is the critical value from the t-distribution with $df = n - 1$ - $s$ is the sample standard deviation - $n$ is the sample size

The margin of error is:

$$E = t^* \cdot \frac{s}{\sqrt{n}}$$

And the confidence interval is:

$$\left(\bar{x} - E, \quad \bar{x} + E\right)$$

Conditions for Using This Formula

Before constructing a confidence interval for $\mu$, check these conditions:

  1. Random sample (or random assignment): The data must come from a random process. Without this, the CLT can't guarantee the sampling distribution is centered on $\mu$.

  2. Independence: Observations must be independent. For sampling without replacement, the 10% condition must be met: $n \leq 0.10 \times N$.

  3. Nearly normal population or large sample size: Either the population is approximately normal, or $n \geq 30$ (CLT kicks in). For smaller samples, check a histogram or QQ-plot for severe non-normality.

Worked Example: Maya's Blood Pressure Study

Dr. Maya Chen collects systolic blood pressure readings from a random sample of 120 adults in her county. Here are the summary statistics:

$$\bar{x} = 128.3 \text{ mmHg}, \quad s = 18.6 \text{ mmHg}, \quad n = 120$$

She wants to construct a 95% confidence interval for the true mean systolic blood pressure of all adults in the county.

Step 1: Check conditions.

  • Random sample? Yes — Maya used a random sample from the county's health records.
  • Independence? The county has 500,000 adults. Is $120 \leq 0.10 \times 500{,}000 = 50{,}000$? Yes, easily.
  • Nearly normal or large $n$? $n = 120 \geq 30$, so the CLT guarantees the sampling distribution of $\bar{x}$ is approximately normal. ✓

Step 2: Find the critical value.

For a 95% confidence interval with $df = 120 - 1 = 119$:

$$t^* = 1.980$$

(From a t-table or calculator. Note: this is very close to $z^* = 1.960$ because $df = 119$ is large.)

Step 3: Calculate the margin of error.

$$E = t^* \cdot \frac{s}{\sqrt{n}} = 1.980 \times \frac{18.6}{\sqrt{120}} = 1.980 \times \frac{18.6}{10.954} = 1.980 \times 1.698 = 3.362$$

Step 4: Construct the interval.

$$128.3 \pm 3.362$$

$$\text{Lower bound: } 128.3 - 3.362 = 124.94$$ $$\text{Upper bound: } 128.3 + 3.362 = 131.66$$

Step 5: Interpret.

"We are 95% confident that the true mean systolic blood pressure of adults in this county is between 124.9 and 131.7 mmHg."

What does this mean practically? The American Heart Association classifies blood pressure above 130 mmHg as Stage 1 hypertension. Maya's confidence interval spans from 124.9 to 131.7 — it includes 130. This means she can't be sure whether the county's average is above or below the hypertension threshold. She might need a larger sample to narrow the interval and get a definitive answer.

Worked Example: Alex's Watch Time

Alex Rivera wants to estimate the average daily watch time for all StreamVibe users. She takes a random sample of 200 users and finds:

$$\bar{x} = 52.4 \text{ minutes}, \quad s = 24.1 \text{ minutes}, \quad n = 200$$

95% CI for the population mean watch time:

Step 1: Check conditions. Random sample? Yes. Independence? StreamVibe has 2 million users; $200 \leq 0.10 \times 2{,}000{,}000$. ✓ Large sample? $n = 200 \geq 30$. ✓

Step 2: Critical value. $df = 199$, so $t^* \approx 1.972$ (nearly identical to $z^* = 1.960$).

Step 3: Margin of error.

$$E = 1.972 \times \frac{24.1}{\sqrt{200}} = 1.972 \times 1.704 = 3.360 \text{ minutes}$$

Step 4: Interval. $52.4 \pm 3.4 = (49.0, 55.8)$

Step 5: Interpretation. "We are 95% confident that the true mean daily watch time for all StreamVibe users is between 49.0 and 55.8 minutes."

🔄 Spaced Review 3 (from Ch.6): The Ingredients of a Confidence Interval

Look at the confidence interval formula: $\bar{x} \pm t^* \cdot s / \sqrt{n}$. Every piece of this comes from earlier chapters:

  • $\bar{x}$ is the sample mean from Chapter 6 (Section 6.1) — the balance point of your data
  • $s$ is the sample standard deviation from Chapter 6 (Section 6.5) — the typical distance of observations from the mean, computed with $n - 1$ in the denominator
  • $\sqrt{n}$ connects to the standard error from Chapter 11 — dividing by $\sqrt{n}$ converts individual-level spread into mean-level precision
  • $t^*$ is new (the critical value from the t-distribution), but it's built on the z-score concept from Chapter 6 (Section 6.8) and the normal distribution from Chapter 10

You've been learning the ingredients of confidence intervals for six chapters without knowing it. Now they snap together.


12.6 Confidence Interval for a Population Proportion

The logic is identical for proportions, but the formula changes slightly because the standard error has a different form.

When Sam Okafor wants to estimate Daria's true three-point shooting percentage, or Maya wants to estimate the proportion of adults in her county with a particular condition, they're estimating a population proportion $p$.

The confidence interval for $p$ is:

$$\boxed{\hat{p} \pm z^* \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}}$$

where: - $\hat{p}$ is the sample proportion (your best guess for $p$) - $z^*$ is the critical value from the standard normal distribution (not the t-distribution — see below) - $n$ is the sample size

Why $z^*$ Instead of $t^*$?

For means, we use $t^*$ because estimating $\sigma$ with $s$ introduces extra uncertainty. For proportions, the standard error formula $\sqrt{\hat{p}(1-\hat{p})/n}$ already accounts for the variability in one unified expression — there's no separate $\sigma$ to estimate. The standard practice is to use $z^*$ for proportion CIs. (Some statisticians argue for using $t^*$ even here, or for the Wilson interval, a more accurate alternative we'll mention in the further reading. For this course, $z^*$ is standard.)

Conditions for the Proportion CI

  1. Random sample (or random assignment)
  2. Independence: 10% condition
  3. Success-failure condition: $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$ — ensures the sampling distribution of $\hat{p}$ is approximately normal (from the CLT for proportions in Chapter 11)

Worked Example: Maya's Disease Prevalence

Dr. Maya Chen surveys a random sample of 800 adults in her county and finds that 96 of them have been diagnosed with Type 2 diabetes.

She wants to construct a 95% confidence interval for the true prevalence of Type 2 diabetes in the county.

Step 1: Calculate the sample proportion.

$$\hat{p} = \frac{96}{800} = 0.12 \quad (12\%)$$

Step 2: Check conditions.

  • Random sample? Yes.
  • Independence? $800 \leq 0.10 \times 500{,}000$. ✓
  • Success-failure? $n\hat{p} = 800 \times 0.12 = 96 \geq 10$ ✓ and $n(1-\hat{p}) = 800 \times 0.88 = 704 \geq 10$ ✓

Step 3: Find the critical value. For a 95% CI: $z^* = 1.960$.

Step 4: Calculate the margin of error.

$$E = 1.960 \times \sqrt{\frac{0.12 \times 0.88}{800}} = 1.960 \times \sqrt{\frac{0.1056}{800}} = 1.960 \times \sqrt{0.000132} = 1.960 \times 0.01149 = 0.02252$$

Step 5: Construct the interval.

$$0.12 \pm 0.023$$

$$\text{Lower bound: } 0.12 - 0.023 = 0.097$$ $$\text{Upper bound: } 0.12 + 0.023 = 0.143$$

Step 6: Interpret.

"We are 95% confident that the true proportion of adults in this county with Type 2 diabetes is between 9.7% and 14.3%."

Context matters. The national prevalence of Type 2 diabetes in U.S. adults is approximately 11.3% (CDC data). Maya's interval of (9.7%, 14.3%) contains 11.3%, suggesting her county's prevalence is consistent with the national average. If the interval had been entirely above 11.3% — say, (12.5%, 16.1%) — that would suggest her county has a higher-than-average prevalence, which would have important public health implications.

Worked Example: Sam's Shooting Percentage

Sam Okafor has been tracking Daria Williams's three-point shooting. Over 65 recent attempts, she's made 25. Sam wants a 95% confidence interval for her true three-point percentage.

Step 1: $\hat{p} = 25/65 = 0.385$ (38.5%)

Step 2: Conditions. Random selection of game situations? Reasonable to assume yes. Independence? Each shot is essentially independent. Success-failure? $65 \times 0.385 = 25 \geq 10$ ✓ and $65 \times 0.615 = 40 \geq 10$ ✓.

Step 3: $z^* = 1.960$

Step 4: $E = 1.960 \times \sqrt{0.385 \times 0.615 / 65} = 1.960 \times \sqrt{0.003641} = 1.960 \times 0.06034 = 0.1183$

Step 5: $0.385 \pm 0.118 = (0.267, 0.503)$

Step 6: "We are 95% confident that Daria's true three-point shooting percentage is between 26.7% and 50.3%."

That's a wide interval! And it should worry Sam. The interval ranges from below average (the NBA three-point average is about 36%) to exceptional (above 50%). With only 65 attempts, there's a lot of uncertainty. To get a more precise estimate, Sam would need more data — which connects to sample size determination in Section 12.9.

Notice that this interval contains 0.31 (Daria's historical average from Chapter 11's analysis). The confidence interval is telling us the same story the CLT analysis told: we can't be confident that her shooting has truly improved. The interval includes both "she got better" and "she's the same as always."


12.7 Computing Confidence Intervals in Python and Excel

You'll rarely compute confidence intervals by hand outside of a statistics course. Here's how to do it with technology.

Python: Confidence Interval for a Mean

import numpy as np
from scipy import stats

# --- Maya's blood pressure data ---
x_bar = 128.3    # sample mean
s = 18.6         # sample standard deviation
n = 120          # sample size

# Method 1: Using scipy.stats.t.interval()
# This function returns the confidence interval directly
ci = stats.t.interval(
    confidence=0.95,       # confidence level
    df=n - 1,              # degrees of freedom
    loc=x_bar,             # center (sample mean)
    scale=s / np.sqrt(n)   # standard error
)
print(f"95% CI for mean: ({ci[0]:.2f}, {ci[1]:.2f})")
# Output: 95% CI for mean: (124.94, 131.66)

# Method 2: Manual calculation (to see each step)
alpha = 0.05                               # 1 - confidence level
t_star = stats.t.ppf(1 - alpha/2, df=n-1)  # critical value
se = s / np.sqrt(n)                        # standard error
moe = t_star * se                          # margin of error

lower = x_bar - moe
upper = x_bar + moe
print(f"\nt* = {t_star:.4f}")
print(f"SE = {se:.4f}")
print(f"MOE = {moe:.4f}")
print(f"95% CI: ({lower:.2f}, {upper:.2f})")
# Output:
# t* = 1.9801
# SE = 1.6982
# MOE = 3.3627
# 95% CI: (124.94, 131.66)

Python: Confidence Interval from Raw Data

import numpy as np
from scipy import stats

# If you have the raw data (not just summary statistics):
np.random.seed(42)
blood_pressure = np.random.normal(loc=127.0, scale=18.6, size=120)

# Method 1: Direct from data
ci = stats.t.interval(
    confidence=0.95,
    df=len(blood_pressure) - 1,
    loc=np.mean(blood_pressure),
    scale=stats.sem(blood_pressure)  # stats.sem() computes s/√n
)
print(f"95% CI: ({ci[0]:.2f}, {ci[1]:.2f})")

# Method 2: Using a pandas Series
import pandas as pd
bp = pd.Series(blood_pressure)
n = len(bp)
x_bar = bp.mean()
se = bp.sem()  # pandas .sem() also computes s/√n
t_star = stats.t.ppf(0.975, df=n-1)
print(f"\n95% CI: ({x_bar - t_star*se:.2f}, {x_bar + t_star*se:.2f})")

Python: Confidence Interval for a Proportion

import numpy as np
from scipy import stats

# --- Maya's diabetes prevalence ---
x = 96       # number of "successes"
n = 800      # sample size
p_hat = x / n  # sample proportion

# Method 1: Manual calculation with z*
z_star = stats.norm.ppf(0.975)  # z* = 1.960
se = np.sqrt(p_hat * (1 - p_hat) / n)
moe = z_star * se

lower = p_hat - moe
upper = p_hat + moe
print(f"Sample proportion: {p_hat:.4f}")
print(f"z* = {z_star:.4f}")
print(f"SE = {se:.4f}")
print(f"MOE = {moe:.4f}")
print(f"95% CI: ({lower:.4f}, {upper:.4f})")
print(f"95% CI: ({lower*100:.1f}%, {upper*100:.1f}%)")
# Output:
# Sample proportion: 0.1200
# z* = 1.9600
# SE = 0.0115
# MOE = 0.0225
# 95% CI: (0.0975, 0.1425)
# 95% CI: (9.7%, 14.3%)

# Method 2: Using statsmodels (more advanced, includes Wilson interval)
# pip install statsmodels
from statsmodels.stats.proportion import proportion_confint

ci_wald = proportion_confint(x, n, alpha=0.05, method='normal')
ci_wilson = proportion_confint(x, n, alpha=0.05, method='wilson')
print(f"\nWald CI:   ({ci_wald[0]:.4f}, {ci_wald[1]:.4f})")
print(f"Wilson CI: ({ci_wilson[0]:.4f}, {ci_wilson[1]:.4f})")

Python: Visualizing Confidence Interval Coverage

This simulation demonstrates what "95% confidence" really means:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Simulate the repeated-sampling interpretation
np.random.seed(42)
mu = 127.0           # true population mean (unknown in practice)
sigma = 18.6         # true population SD
n = 120              # sample size per study
n_studies = 100      # number of studies to simulate
confidence = 0.95

# Simulate 100 studies
captured = 0
fig, ax = plt.subplots(figsize=(10, 12))

for i in range(n_studies):
    sample = np.random.normal(loc=mu, scale=sigma, size=n)
    x_bar = sample.mean()
    s = sample.std(ddof=1)
    se = s / np.sqrt(n)
    t_star = stats.t.ppf(1 - (1 - confidence)/2, df=n-1)
    lower = x_bar - t_star * se
    upper = x_bar + t_star * se

    contains_mu = lower <= mu <= upper
    if contains_mu:
        captured += 1
        color = 'steelblue'
    else:
        color = 'red'

    ax.plot([lower, upper], [i, i], color=color, linewidth=1.5)
    ax.plot(x_bar, i, 'o', color=color, markersize=3)

ax.axvline(x=mu, color='black', linestyle='--', linewidth=1.5,
           label=f'True μ = {mu}')
ax.set_xlabel('Blood Pressure (mmHg)', fontsize=12)
ax.set_ylabel('Study Number', fontsize=12)
ax.set_title(f'100 Confidence Intervals (95% level)\n'
             f'{captured} of {n_studies} captured μ '
             f'({captured}% coverage)',
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
plt.tight_layout()
plt.show()
print(f"\n{captured} out of {n_studies} intervals captured μ.")
print(f"Expected: about {int(confidence * n_studies)}.")

Excel: Confidence Intervals

For a Mean:
  Sample mean (x̄):                    =AVERAGE(A2:A121)
  Sample SD (s):                       =STDEV.S(A2:A121)
  Sample size (n):                     =COUNT(A2:A121)
  Standard error:                      =STDEV.S(A2:A121)/SQRT(COUNT(A2:A121))

  Margin of error (95%):               =CONFIDENCE.T(0.05, STDEV.S(A2:A121), COUNT(A2:A121))

  Lower bound:                         =AVERAGE(A2:A121) - CONFIDENCE.T(0.05, STDEV.S(A2:A121), COUNT(A2:A121))
  Upper bound:                         =AVERAGE(A2:A121) + CONFIDENCE.T(0.05, STDEV.S(A2:A121), COUNT(A2:A121))

  Note: CONFIDENCE.T(alpha, s, n) computes t* × s/√n directly.
        alpha = 1 - confidence level (so alpha = 0.05 for 95% CI).

For a Proportion (manual in Excel):
  Count of successes (x):              =COUNTIF(A2:A801, "Yes")   [or however coded]
  Sample size (n):                     =COUNT(A2:A801)
  Sample proportion (p̂):              =COUNTIF(A2:A801, "Yes")/COUNT(A2:A801)
  z* for 95%:                          =NORM.S.INV(0.975)        [returns 1.960]
  Standard error:                      =SQRT(p̂*(1-p̂)/n)
  Margin of error:                     =NORM.S.INV(0.975)*SQRT(p̂*(1-p̂)/n)
  Lower bound:                         =p̂ - NORM.S.INV(0.975)*SQRT(p̂*(1-p̂)/n)
  Upper bound:                         =p̂ + NORM.S.INV(0.975)*SQRT(p̂*(1-p̂)/n)

  Note: CONFIDENCE.NORM(alpha, σ, n) also exists but requires σ (population SD),
        so it's less commonly used for proportions.

Summary of Excel Functions:
  CONFIDENCE.T(alpha, s, n)    → Margin of error for a mean using the t-distribution
  CONFIDENCE.NORM(alpha, σ, n) → Margin of error for a mean using the normal distribution
  NORM.S.INV(probability)      → z* critical value (e.g., NORM.S.INV(0.975) = 1.960)
  T.INV.2T(alpha, df)          → t* critical value (e.g., T.INV.2T(0.05, 119) ≈ 1.980)

12.8 The Confidence Level / Sample Size / Margin of Error Tradeoff

There are three quantities in every confidence interval, and they're locked in a tug-of-war:

  1. Confidence level (how sure you are)
  2. Sample size (how much data you have)
  3. Margin of error (how precise you are)

You can pick any two, and the third is determined. Or you can adjust one and watch the others change. Understanding this tradeoff is one of the most practical things you'll learn in this entire course.

The Tradeoff Triangle

                  Confidence Level
                  (90%, 95%, 99%)
                       ╱╲
                      ╱  ╲
                     ╱    ╲
                    ╱  Can ╲
                   ╱  only  ╲
                  ╱  improve  ╲
                 ╱  two at the ╲
                ╱   expense of  ╲
               ╱    the third    ╲
              ╱____________________╲
   Sample Size ──────────────── Margin of Error
      (n)                         (E)

Here's how the tradeoff works:

If You Want... You Can... But...
Higher confidence (95% → 99%) Increase the critical value The margin of error gets wider
Narrower margin of error Increase the sample size That costs more time and money
Smaller sample size (cheaper study) Accept either lower confidence or wider MOE You lose precision or certainty

Let's see this with numbers, using Maya's blood pressure example ($s = 18.6$):

Effect of confidence level (holding $n = 120$ fixed):

Confidence Level $t^*$ (or $z^*$) Margin of Error Confidence Interval
90% 1.658 $1.658 \times 1.698 = 2.82$ $(125.5, 131.1)$
95% 1.980 $1.980 \times 1.698 = 3.36$ $(124.9, 131.7)$
99% 2.618 $2.618 \times 1.698 = 4.45$ $(123.9, 132.7)$

Higher confidence = wider interval. If you want to be more sure the interval captures $\mu$, you need a wider net.

Effect of sample size (holding 95% confidence):

Sample Size ($n$) SE = $s/\sqrt{n}$ Margin of Error Interval Width
30 3.40 6.90 13.8
120 1.70 3.36 6.7
480 0.85 1.66 3.3
1,920 0.42 0.83 1.7

Notice the pattern: quadrupling $n$ (from 30 to 120, or from 120 to 480) roughly halves the margin of error. This is the diminishing returns phenomenon from Chapter 11 — the $\sqrt{n}$ in the denominator means precision improves slowly.

Real-World Implication: Why Polls Use About 1,000 People

Here's a secret of the polling industry. For proportions near 50%, a sample of about 1,000 gives a margin of error of roughly ±3 percentage points at 95% confidence. Going to 4,000 only improves it to ±1.5 points. The cost quadruples, but the precision only doubles. For most purposes, ±3 points is "good enough" — which is why nearly every major national poll uses samples of 1,000 to 1,500 people.

When a news anchor says "The poll shows 52% support, with a margin of error of plus or minus 3 points," they're reporting a 95% confidence interval: $(49\%, 55\%)$. Now you know exactly what that means — and what it doesn't.


12.9 Sample Size Determination

Sometimes you need to work backward. Instead of computing a confidence interval from data you already have, you need to figure out how much data to collect before you start. This is sample size determination — deciding how large your sample needs to be to achieve a desired margin of error.

Sample Size for Estimating a Mean

Start with the margin of error formula:

$$E = z^* \cdot \frac{\sigma}{\sqrt{n}}$$

(We use $z^*$ here because we're planning the study before we collect data — we don't have $s$ yet, so we use a planning estimate of $\sigma$ and the normal distribution.)

Solve for $n$:

$$\sqrt{n} = \frac{z^* \cdot \sigma}{E}$$

$$\boxed{n = \left(\frac{z^* \cdot \sigma}{E}\right)^2}$$

Always round up to the next whole number. You can't survey 96.3 people.

Worked Example: Maya Plans a Study

Maya wants to estimate the mean systolic blood pressure of adults in her county with a margin of error of no more than 2 mmHg at 95% confidence. From prior studies, she estimates $\sigma \approx 18$ mmHg.

$$n = \left(\frac{1.960 \times 18}{2}\right)^2 = \left(\frac{35.28}{2}\right)^2 = (17.64)^2 = 311.17$$

Maya needs at least 312 adults in her sample. (Always round up.)

Compare this to her original study of 120 — she needs about 2.6 times as many people to cut the margin of error from 3.4 to 2.0.

Sample Size for Estimating a Proportion

Start with:

$$E = z^* \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Solve for $n$:

$$\boxed{n = \left(\frac{z^*}{E}\right)^2 \cdot \hat{p}(1-\hat{p})}$$

But wait — you're planning the study, so you don't know $\hat{p}$ yet! There are two approaches:

  1. Use a prior estimate. If previous studies suggest $p \approx 0.12$, use $\hat{p} = 0.12$.
  2. Use the conservative estimate $\hat{p} = 0.5$. The product $\hat{p}(1-\hat{p})$ is maximized at $\hat{p} = 0.5$ (where it equals 0.25), so $\hat{p} = 0.5$ gives the largest sample size — guaranteeing your desired margin of error no matter what $p$ turns out to be.

Worked Example: Planning a Diabetes Prevalence Study

Maya wants to estimate the county's Type 2 diabetes prevalence with a margin of error of 2 percentage points (0.02) at 95% confidence. She expects the prevalence to be around 12% based on prior data.

$$n = \left(\frac{1.960}{0.02}\right)^2 \times 0.12 \times 0.88 = (98)^2 \times 0.1056 = 9604 \times 0.1056 = 1014.2$$

She needs at least 1,015 adults.

If she uses the conservative $\hat{p} = 0.5$ instead:

$$n = \left(\frac{1.960}{0.02}\right)^2 \times 0.5 \times 0.5 = 9604 \times 0.25 = 2401$$

The conservative approach requires 2,401 adults — more than double the estimate-based approach. This is why having a prior estimate of $p$ matters: it can dramatically reduce the required sample size.

Python: Sample Size Calculation

import numpy as np
from scipy import stats

# --- Sample size for a mean ---
def sample_size_mean(sigma, E, confidence=0.95):
    """Calculate minimum sample size to estimate a mean."""
    z_star = stats.norm.ppf(1 - (1 - confidence) / 2)
    n = (z_star * sigma / E) ** 2
    return int(np.ceil(n))  # always round up

# Maya's planning: σ ≈ 18, E = 2, 95% confidence
n_needed = sample_size_mean(sigma=18, E=2, confidence=0.95)
print(f"Sample size needed (mean, E=2): {n_needed}")
# Output: Sample size needed (mean, E=2): 312

# --- Sample size for a proportion ---
def sample_size_proportion(p_hat, E, confidence=0.95):
    """Calculate minimum sample size to estimate a proportion."""
    z_star = stats.norm.ppf(1 - (1 - confidence) / 2)
    n = (z_star / E) ** 2 * p_hat * (1 - p_hat)
    return int(np.ceil(n))  # always round up

# Maya's planning: p̂ ≈ 0.12, E = 0.02, 95% confidence
n_needed_p = sample_size_proportion(p_hat=0.12, E=0.02, confidence=0.95)
print(f"Sample size needed (proportion, E=0.02): {n_needed_p}")
# Output: Sample size needed (proportion, E=0.02): 1015

# Conservative estimate
n_conservative = sample_size_proportion(p_hat=0.50, E=0.02, confidence=0.95)
print(f"Conservative sample size (p̂=0.5): {n_conservative}")
# Output: Conservative sample size (p̂=0.5): 2401

# --- How margin of error changes with sample size ---
print("\n--- Margin of Error vs. Sample Size (σ=18, 95% CI) ---")
print(f"{'n':>6}  {'MOE':>8}")
print("-" * 16)
for n in [30, 50, 100, 200, 500, 1000, 2000]:
    z = stats.norm.ppf(0.975)
    moe = z * 18 / np.sqrt(n)
    print(f"{n:>6}  {moe:>8.2f}")

12.10 The Big Picture: Why Confidence Intervals Matter

Let me step back for a moment and tell you why what you've just learned matters beyond this course.

Theme 4 Connection: Embracing Uncertainty

The confidence interval is the single most important tool for communicating uncertainty. A point estimate pretends you know the answer. A confidence interval says: "Here's my best guess, and here's how uncertain I am about it."

This is what Theme 4 — uncertainty is not failure — has been building toward since Chapter 1. Reporting uncertainty isn't a weakness. It's intellectual honesty. It's what separates rigorous science from hand-waving. When a pharmaceutical company reports that a drug reduces blood pressure by "5 to 11 mmHg" rather than "8 mmHg," they're not being vague — they're being precise about their imprecision.

The confidence interval gives you a language for doing this in every domain:

  • Medicine: "The drug reduced mortality by 12% to 24%" (we know it works, but the exact effect is uncertain)
  • Business: "Average customer spend is between $42 and $51" (we can plan around this range)
  • Public health: "Between 9.7% and 14.3% of adults have diabetes" (resources should be allocated for this range)
  • Polling: "Support is between 49% and 55%" (the election is too close to call)

Every time you hear a margin of error, you're hearing a confidence interval. Now you understand what's behind it.

Theme 1 Connection: The Superpower Activated

In Chapter 11, I said the CLT was the superpower that makes inference possible. Confidence intervals are the first use of that superpower.

Think about what you can do now that you couldn't do 50 pages ago. Given a sample of data, you can:

  1. Estimate any population mean or proportion
  2. Quantify exactly how uncertain that estimate is
  3. Determine how much data you'd need for a given level of precision
  4. Communicate your findings with appropriate humility

That's not just a statistical technique. That's a way of thinking about evidence that applies to every decision you'll ever make with incomplete information — which is to say, every decision you'll ever make.

Confidence Intervals in the Age of AI

Here's something worth noting: AI and machine learning systems produce point estimates all the time. A recommendation algorithm predicts you'll rate a movie 4.2 stars. A language model estimates a 73% probability that a sentence is positive. A medical AI says there's a 15% chance of disease.

But increasingly, the best AI systems also produce uncertainty estimates — essentially, confidence intervals for their predictions. A well-calibrated AI system might say "4.2 stars ± 0.8" instead of just "4.2 stars." This tells you whether the prediction is reliable (narrow interval) or shaky (wide interval).

Professor Washington might note that a criminal justice algorithm's risk score of 7.3 is much more meaningful if it comes with a confidence interval of (6.8, 7.8) than if it comes with an interval of (3.1, 11.5). The same point estimate carries very different practical implications depending on the uncertainty around it.

Understanding confidence intervals helps you be a critical consumer of all statistical claims — including those made by algorithms.


12.11 Common Mistakes and Misconceptions

Before we move on, let's address the most common mistakes students make with confidence intervals.

Mistake Correction
"There's a 95% probability that $\mu$ is in the interval" No — $\mu$ is fixed. The 95% describes the method, not the interval.
"95% of the data falls within the CI" No — the CI estimates the population parameter, not individual data values. Most data points are outside the CI.
"A wider interval is worse" Not necessarily — a wider interval reflects either greater honesty (higher confidence) or greater uncertainty (small $n$, large $s$).
"If two CIs overlap, the populations are the same" Not necessarily — this is a common but unreliable shortcut. Formal comparison requires a two-sample test (Chapter 16).
"If the CI doesn't contain a specific value, that value is impossible" No — it's implausible at the chosen confidence level, but not impossible. With 95% confidence, 5% of intervals miss.
"The confidence level should always be 95%" It's conventional, but 90% and 99% are also common. The choice depends on context — see Section 12.8.
Using $z^*$ when you should use $t^*$ For means with unknown $\sigma$ (the usual case), use $t^*$. For proportions, use $z^*$.

12.12 Data Detective Portfolio: Construct Confidence Intervals

Time to apply confidence intervals to your own dataset. This is the Chapter 12 component of the Data Detective Portfolio.

Your Task

Construct confidence intervals for key variables in your dataset.

  1. For a numerical variable: Choose one continuous variable and construct a 95% confidence interval for its population mean. - Report $\bar{x}$, $s$, $n$, $df$, $t^*$, SE, MOE, and the interval - Check all conditions - Interpret the interval in context

  2. For a categorical variable: Choose one categorical variable (or dichotomize a variable) and construct a 95% confidence interval for its population proportion. - Report $\hat{p}$, $n$, $z^*$, SE, MOE, and the interval - Check all conditions (especially the success-failure condition) - Interpret the interval in context

  3. The tradeoff: For one of your intervals, also compute the 90% and 99% versions. Present all three in a table and discuss how the width changes.

  4. Sample size planning: If you wanted to estimate one of your parameters with a margin of error half as large, how many observations would you need? Show the calculation.

  5. Write a paragraph: Discuss what your confidence intervals tell you about the population your data represents. Are the intervals narrow enough to be useful? What would you do differently to improve precision?

Template Code

import pandas as pd
import numpy as np
from scipy import stats

# Load your dataset
df = pd.read_csv('your_dataset.csv')

# ============================================================
# Part 1: CI for a Population Mean
# ============================================================
variable = 'your_numerical_variable'
data = df[variable].dropna()

n = len(data)
x_bar = data.mean()
s = data.std(ddof=1)
se = s / np.sqrt(n)

# Check conditions
print("=== CI for Population Mean ===")
print(f"Variable: {variable}")
print(f"n = {n}")
print(f"x̄ = {x_bar:.4f}")
print(f"s = {s:.4f}")
print(f"SE = {se:.4f}")
print(f"\nConditions:")
print(f"  Random sample: [assess based on your data source]")
print(f"  n ≥ 30 (CLT): {'✓' if n >= 30 else '✗ — check histogram for normality'}")

# 95% CI
alpha = 0.05
t_star = stats.t.ppf(1 - alpha/2, df=n-1)
moe = t_star * se
ci_lower = x_bar - moe
ci_upper = x_bar + moe

print(f"\n95% Confidence Interval:")
print(f"  t* = {t_star:.4f}")
print(f"  MOE = {moe:.4f}")
print(f"  CI = ({ci_lower:.4f}, {ci_upper:.4f})")

# Compare 90%, 95%, 99%
print("\n--- Confidence Level Comparison ---")
for conf in [0.90, 0.95, 0.99]:
    t_val = stats.t.ppf(1 - (1-conf)/2, df=n-1)
    m = t_val * se
    print(f"  {int(conf*100)}% CI: ({x_bar - m:.4f}, {x_bar + m:.4f})  "
          f"[width = {2*m:.4f}]")

# ============================================================
# Part 2: CI for a Population Proportion
# ============================================================
cat_variable = 'your_categorical_variable'
target_value = 'your_target_category'

data_cat = df[cat_variable].dropna()
n_cat = len(data_cat)
successes = (data_cat == target_value).sum()
p_hat = successes / n_cat

print(f"\n=== CI for Population Proportion ===")
print(f"Variable: {cat_variable} = '{target_value}'")
print(f"n = {n_cat}, successes = {successes}")
print(f"p̂ = {p_hat:.4f}")

# Check success-failure condition
print(f"\nConditions:")
print(f"  np̂ = {n_cat * p_hat:.1f} ≥ 10? "
      f"{'✓' if n_cat * p_hat >= 10 else '✗'}")
print(f"  n(1-p̂) = {n_cat * (1-p_hat):.1f} ≥ 10? "
      f"{'✓' if n_cat * (1-p_hat) >= 10 else '✗'}")

# 95% CI
z_star = stats.norm.ppf(0.975)
se_p = np.sqrt(p_hat * (1 - p_hat) / n_cat)
moe_p = z_star * se_p

print(f"\n95% Confidence Interval:")
print(f"  z* = {z_star:.4f}")
print(f"  SE = {se_p:.4f}")
print(f"  MOE = {moe_p:.4f}")
print(f"  CI = ({p_hat - moe_p:.4f}, {p_hat + moe_p:.4f})")
print(f"  CI = ({(p_hat - moe_p)*100:.1f}%, {(p_hat + moe_p)*100:.1f}%)")

# ============================================================
# Part 3: Sample Size Planning
# ============================================================
desired_moe = moe / 2  # half the current margin of error
z_plan = stats.norm.ppf(0.975)
n_needed = int(np.ceil((z_plan * s / desired_moe) ** 2))
print(f"\n=== Sample Size Planning ===")
print(f"To halve MOE to {desired_moe:.4f}:")
print(f"  n needed = {n_needed}")
print(f"  (currently have n = {n})")

Portfolio Tip: If you're using the CDC BRFSS dataset, try BMI for the mean CI and smoking status for the proportion CI. For the World Happiness Report, try the happiness score for the mean and the proportion of countries above a threshold (e.g., score > 6.0). In your write-up, connect your CI to a real question: "What is the average BMI of American adults?" or "What proportion of countries have high happiness scores?" Make the statistics serve the question, not the other way around.


12.13 Chapter Summary

You've just learned the first formal inference tool in statistics. Here's what you now know:

  1. A confidence interval is a range of plausible values for a population parameter, calculated as: point estimate ± margin of error.

  2. For a population mean: $\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}$, where $t^*$ comes from the t-distribution with $df = n - 1$.

  3. For a population proportion: $\hat{p} \pm z^* \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$, where $z^*$ comes from the standard normal distribution.

  4. The t-distribution is used when estimating $\sigma$ from data. It has heavier tails than the normal to account for the extra uncertainty, and it converges to the normal as $df$ increases.

  5. "95% confidence" means the procedure captures the true parameter 95% of the time — not that any specific interval has a 95% probability of being correct. The parameter is fixed; the interval is random.

  6. The tradeoff triangle: Confidence level, sample size, and margin of error are locked in a three-way relationship. You can improve any two only at the expense of the third (unless you increase the sample size).

  7. Sample size determination lets you plan studies to achieve a desired margin of error: $n = (z^* \sigma / E)^2$ for means and $n = (z^*/E)^2 \hat{p}(1-\hat{p})$ for proportions.

What's Next

In Chapter 13, you'll learn hypothesis testing — the other major inference tool. Instead of estimating a parameter with a range of plausible values, you'll test specific claims: "Is the average blood pressure above 130?" "Did the new algorithm increase watch time?" "Has Daria's shooting percentage improved?"

Confidence intervals and hypothesis tests are two sides of the same coin. In fact, every hypothesis test has a confidence interval hiding inside it, and every confidence interval implies a set of hypothesis tests. You'll see the connection clearly in Chapter 13.

Sam's question about Daria — the one that's been building since Chapter 1 — is about to get its full, rigorous answer.


Key Formulas at a Glance

Concept Formula When to Use
CI structure $\text{Point Estimate} \pm \text{Margin of Error}$ Every confidence interval
CI for a mean $\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}, \quad df = n - 1$ Estimating a population mean with unknown $\sigma$
CI for a proportion $\hat{p} \pm z^* \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ Estimating a population proportion
Margin of error (mean) $E = t^* \cdot \frac{s}{\sqrt{n}}$ Quantifying precision for a mean
Margin of error (proportion) $E = z^* \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ Quantifying precision for a proportion
Sample size (mean) $n = \left(\frac{z^* \cdot \sigma}{E}\right)^2$ Planning a study for a mean
Sample size (proportion) $n = \left(\frac{z^*}{E}\right)^2 \hat{p}(1 - \hat{p})$ Planning a study for a proportion
Critical values (95%) $z^* = 1.960$; $t^*$ depends on $df$ 95% confidence intervals
Critical values (99%) $z^* = 2.576$; $t^*$ depends on $df$ 99% confidence intervals
Critical values (90%) $z^* = 1.645$; $t^*$ depends on $df$ 90% confidence intervals