Case Study 2: Does Money Buy Health? Disentangling GDP, Spending, and Outcomes

Contributors to Introduction to Data Science

Case Study 2: Does Money Buy Health? Disentangling GDP, Spending, and Outcomes

Tier 2 — Attributed Findings: This case study discusses a well-studied relationship in global health economics. Statistics and patterns are based on widely published data from the World Health Organization, the World Bank, and published health economics research. The specific data used in simulations is constructed to reflect documented real-world patterns but is not drawn from any specific country. The relationship between national income and health outcomes is one of the most studied topics in development economics, and the patterns described here are well-established in the literature.

The Question Everyone Thinks They Can Answer

Ask anyone whether money can buy health, and they'll probably say something like: "Obviously. Rich countries have better healthcare, so their people live longer." It feels like common sense. And in a broad sense, it's true — there is a strong, well-documented correlation between a country's GDP per capita and its population health outcomes (life expectancy, infant mortality, vaccination rates, and more).

But "does money buy health?" is actually a much harder question than it appears. It requires disentangling multiple causal pathways, identifying confounders, dealing with reverse causation, and confronting the limits of observational data. It's the perfect case study for applying everything in Chapter 24.

The Raw Correlation: What the Data Shows

Let's start with what we can observe directly:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

# Simulate 150 countries with realistic relationships
n = 150

# GDP per capita (log-normal, reflecting real-world distribution)
log_gdp = np.random.normal(9.0, 1.4, n)  # log(GDP per capita)
gdp = np.exp(log_gdp)

# Life expectancy (strongly related to log GDP)
life_exp = 45 + 8 * np.log(gdp / 1000) + np.random.normal(0, 4, n)
life_exp = np.clip(life_exp, 35, 88)

# Vaccination rate
vax_rate = 30 + 12 * np.log(gdp / 1000) + np.random.normal(0, 10, n)
vax_rate = np.clip(vax_rate, 10, 99)

# Healthcare spending per capita
health_spend = 0.001 * gdp**0.8 + np.random.exponential(200, n)
health_spend = np.clip(health_spend, 20, 12000)

countries = pd.DataFrame({
    'GDP per capita': gdp,
    'Life expectancy': life_exp,
    'Vaccination rate': vax_rate,
    'Health spending per capita': health_spend
})

# The headline correlation
r_gdp_life, p = stats.pearsonr(np.log(gdp), life_exp)
print(f"Correlation between log(GDP) and life expectancy: r = {r_gdp_life:.3f}")
print(f"This is very strong. But what does it mean?")

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# GDP vs Life Expectancy
r1, _ = stats.pearsonr(np.log(gdp), life_exp)
axes[0].scatter(gdp, life_exp, alpha=0.5, color='steelblue', s=30,
                edgecolor='white')
axes[0].set_xscale('log')
axes[0].set_xlabel('GDP per capita (log scale)')
axes[0].set_ylabel('Life Expectancy (years)')
axes[0].set_title(f'GDP vs Life Expectancy\nr = {r1:.3f} (using log GDP)')

# GDP vs Vaccination Rate
r2, _ = stats.pearsonr(np.log(gdp), vax_rate)
axes[1].scatter(gdp, vax_rate, alpha=0.5, color='#e74c3c', s=30,
                edgecolor='white')
axes[1].set_xscale('log')
axes[1].set_xlabel('GDP per capita (log scale)')
axes[1].set_ylabel('Vaccination Rate (%)')
axes[1].set_title(f'GDP vs Vaccination Rate\nr = {r2:.3f} (using log GDP)')

# Health Spending vs Life Expectancy
r3, _ = stats.pearsonr(np.log(health_spend), life_exp)
axes[2].scatter(health_spend, life_exp, alpha=0.5, color='#2ecc71', s=30,
                edgecolor='white')
axes[2].set_xscale('log')
axes[2].set_xlabel('Health Spending per capita (log scale)')
axes[2].set_ylabel('Life Expectancy (years)')
axes[2].set_title(f'Health Spending vs Life Expectancy\nr = {r3:.3f} (using log)')

plt.suptitle('The GDP-Health Relationship: Strong but Complex',
             fontsize=14, fontweight='bold', y=1.03)
plt.tight_layout()
plt.savefig('gdp_health_relationships.png', dpi=150, bbox_inches='tight')
plt.show()

The pattern is visually striking and statistically robust. Richer countries really do have longer life expectancies and higher vaccination rates. But notice something important in the scatter plots: the relationship is logarithmic, not linear. Going from $1,000 to $10,000 GDP per capita buys a LOT more life expectancy than going from $30,000 to $40,000.

This logarithmic pattern — called the Preston Curve after the demographer Samuel Preston who documented it in 1975 — has been one of the most replicated findings in development economics.

The Causal Tangle: Five Pathways, Not One

The naive story is simple: more money → more healthcare → better health. But the real causal structure is much more complex.

Pathway 1: GDP → Healthcare Spending → Health Outcomes

The most obvious pathway. Wealthier countries can afford to spend more on healthcare — hospitals, doctors, medicines, vaccines, sanitation infrastructure. This directly improves health outcomes.

But this pathway alone doesn't explain the full correlation. Many studies have found that health spending accounts for only a portion of the GDP-health relationship. Other factors matter too.

Pathway 2: GDP → Education → Health Outcomes

Wealthier countries have better education systems. More educated people make better health decisions (nutrition, preventive care, hygiene), are more likely to seek vaccination, and are better equipped to navigate healthcare systems. Education improves health through individual behavior, not just through the healthcare system.

Pathway 3: GDP → Infrastructure → Health Outcomes

Wealth enables investments in clean water, sanitation, roads, electricity, and communication networks — all of which improve health outcomes independently of formal healthcare. A village with clean water and a paved road to the nearest clinic will have better health outcomes regardless of how much the government spends specifically on "healthcare."

Pathway 4: Health → GDP (Reverse Causation!)

Here's where it gets interesting. The causation doesn't just run from money to health — it runs from health to money too. Healthier populations are more productive: fewer sick days, longer working lives, better cognitive function, more energy. Healthy children learn better in school and grow up to be more productive adults.

This means the GDP-health correlation reflects a feedback loop, not a one-way arrow:

GDP → Health (wealthier countries invest in health)
Health → GDP (healthier populations produce more)

Estimating the causal effect of GDP on health is hard precisely because health also causes GDP. This is called endogeneity or simultaneous causation, and it's one of the most challenging problems in economics.

Pathway 5: Institutional Quality → Both GDP and Health

Perhaps the most important confounder. Countries with effective, transparent, accountable governance tend to have both stronger economies AND better health systems. The quality of institutions — rule of law, property rights, bureaucratic efficiency, democratic accountability — independently drives both economic growth and public health.

# Simulating the institutional quality confounder
np.random.seed(42)

# Institutional quality: the hidden driver
institutional_q = np.random.normal(50, 15, 150)

# GDP is driven by institutional quality (plus other factors)
log_gdp_sim = 7 + 0.05 * institutional_q + np.random.normal(0, 0.8, 150)
gdp_sim = np.exp(log_gdp_sim)

# Health outcomes driven by institutional quality AND GDP
life_exp_sim = (35 +
                0.3 * institutional_q +        # Direct effect of institutions
                3 * np.log(gdp_sim / 1000) +   # Direct effect of GDP
                np.random.normal(0, 4, 150))
life_exp_sim = np.clip(life_exp_sim, 35, 88)

# Correlations
r_raw = stats.pearsonr(np.log(gdp_sim), life_exp_sim)[0]

# Partial correlation controlling for institutional quality
from sklearn.linear_model import LinearRegression

iq = institutional_q.reshape(-1, 1)
gdp_resid = np.log(gdp_sim) - LinearRegression().fit(iq, np.log(gdp_sim)).predict(iq)
life_resid = life_exp_sim - LinearRegression().fit(iq, life_exp_sim).predict(iq)
r_partial = stats.pearsonr(gdp_resid, life_resid)[0]

print("=== Disentangling GDP and Institutional Quality ===")
print(f"Raw correlation (log GDP vs life expectancy):     r = {r_raw:.3f}")
print(f"Partial correlation (controlling for institutions): r = {r_partial:.3f}")
print(f"\nDrop: {(1 - r_partial/r_raw)*100:.0f}% of the correlation is explained")
print(f"by institutional quality")

The partial correlation is smaller — some of the apparent GDP-health relationship was actually driven by institutional quality causing both.

The Natural Experiment Approach

Since we can't randomly assign countries to different GDP levels, researchers have used natural experiments — events that change a country's wealth for reasons unrelated to health — to estimate the causal effect of income on health.

Example: Oil Price Shocks

When oil prices spike, oil-producing countries suddenly get richer while oil-importing countries get poorer. These wealth changes are driven by global commodity markets, not by the countries' health systems or policies. By comparing health changes in oil-rich vs. oil-poor countries before and after a price shock, researchers can estimate the causal effect of income on health with fewer confounders.

Example: Currency Crises

When a country experiences a sudden currency devaluation (like the Asian Financial Crisis of 1997-98), its effective wealth drops rapidly. If health outcomes deteriorate in the aftermath, that's stronger evidence of a causal relationship because the crisis was caused by financial factors, not by health conditions.

These natural experiments generally support the conclusion that income does causally affect health — but the effect is smaller than the raw correlation suggests. Much of the correlation is due to confounders (institutional quality, education, historical factors) rather than the direct causal path from money to health.

The Diminishing Returns Puzzle

One of the most important findings in the GDP-health literature is the pattern of diminishing returns:

# The diminishing returns curve
gdp_range = np.linspace(500, 80000, 1000)

# Logarithmic relationship
predicted_life_exp = 45 + 8 * np.log(gdp_range / 1000)
predicted_life_exp = np.clip(predicted_life_exp, 35, 88)

fig, ax = plt.subplots(figsize=(10, 6))

# The curve
ax.plot(gdp_range, predicted_life_exp, color='steelblue', linewidth=2.5)

# Annotate the diminishing returns
# $1K to $5K
ax.annotate('', xy=(5000, 58.9), xytext=(1000, 45),
            arrowprops=dict(arrowstyle='->', color='red', lw=2))
ax.text(2500, 48, '$1K→$5K:\n+13.9 years', fontsize=10, ha='center',
        color='red', fontweight='bold')

# $20K to $40K
ax.annotate('', xy=(40000, 74.5), xytext=(20000, 69.9),
            arrowprops=dict(arrowstyle='->', color='orange', lw=2))
ax.text(30000, 67, '$20K→$40K:\n+5.5 years', fontsize=10, ha='center',
        color='orange', fontweight='bold')

# $40K to $80K
ax.annotate('', xy=(80000, 80.0), xytext=(40000, 74.5),
            arrowprops=dict(arrowstyle='->', color='gray', lw=2))
ax.text(58000, 73, '$40K→$80K:\n+5.5 years', fontsize=10, ha='center',
        color='gray', fontweight='bold')

ax.scatter(gdp, life_exp, alpha=0.3, s=20, color='steelblue', zorder=0)
ax.set_xlabel('GDP per Capita ($)', fontsize=12)
ax.set_ylabel('Life Expectancy (years)', fontsize=12)
ax.set_title('The Preston Curve: Diminishing Returns of Wealth on Health',
             fontsize=14)
ax.set_xlim(0, 80000)
ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('preston_curve.png', dpi=150, bbox_inches='tight')
plt.show()

The implications of diminishing returns are profound:

For poor countries: Small increases in GDP can produce large health gains. Moving from $1,000 to $5,000 per capita could add 14 years of life expectancy.
For rich countries: Further GDP growth produces minimal health gains. The United States has a GDP per capita roughly double that of Costa Rica, but its life expectancy is actually lower. At high income levels, factors like healthcare system design, inequality, and lifestyle choices matter more than raw wealth.
For policy: It may be more effective to redistribute global health resources toward the poorest countries (where a dollar goes furthest) than to increase spending in countries already on the flat part of the curve.

The United States Paradox

The US illustrates why the GDP-health relationship is not simply causal. The United States: - Has one of the highest GDP per capita levels in the world - Spends far more on healthcare per capita than any other country - Has a life expectancy lower than many countries that spend less

This anomaly — high spending, mediocre outcomes — cannot be explained by the simple "money buys health" model. The US has specific features that weaken the GDP-health link: - High income inequality (the benefits of wealth are concentrated, not distributed) - Gaps in insurance coverage (until recently, millions of Americans were uninsured) - High healthcare prices (Americans pay more per unit of care, so more spending doesn't mean more care) - Lifestyle factors (obesity, gun violence, opioid epidemic) - Weak social safety net (compared to other wealthy countries)

This example demonstrates that the GDP-health correlation is not a law of nature. It's a tendency that can be overridden by policy choices, institutional design, and social factors.

What This Means for Vaccination Analysis

Bringing this back to the progressive project:

The Correlation You Found

GDP per capita and vaccination rates are positively correlated across countries. This is robust, well-documented, and not in dispute.

What You CAN Say

"Countries with higher GDP tend to have higher vaccination rates. This association is strong, consistent across datasets, and plausible given the multiple mechanisms through which wealth could affect vaccine delivery (healthcare infrastructure, cold chain logistics, trained personnel, public education, government capacity)."

What You CANNOT Say

"Higher GDP causes higher vaccination rates." This implies a simple, direct causal pathway that we haven't established. The correlation might reflect: - GDP → vaccination (direct causal effect through funding) - Vaccination → GDP (healthier populations are more productive) - Institutional quality → both (effective governance drives both) - Education → both (more educated populations are both wealthier and more vaccine-accepting) - Historical factors → both (colonial history shaped both economic development and health infrastructure)

What You SHOULD Say

"GDP and vaccination rates are strongly associated, likely reflecting multiple causal pathways including direct effects of wealth on healthcare capacity, shared underlying causes such as institutional quality and education, and possible reverse causation through the economic benefits of population health. The logarithmic pattern suggests that GDP increases would produce the largest vaccination gains in the poorest countries. However, the US example demonstrates that wealth alone does not guarantee high vaccination coverage — policy design, equity, and healthcare system organization also play crucial roles."

This kind of nuanced, honest analysis is what distinguishes professional data science from casual data commentary.

Discussion Questions

The Preston Curve and policy: If the GDP-health relationship follows a logarithmic curve, what does this imply for global health policy? Should international health funding focus on the poorest countries? What about middle-income countries on the steep part of the curve?
The US paradox: What specific policy features do you think explain why the US has lower life expectancy than countries with similar or lower GDP? What does this tell us about the limits of the GDP-health correlation?
Reverse causation: Design a hypothetical study (or natural experiment) that could help determine whether health improvements cause GDP growth. What would you measure, and how would you address confounders?
For your project: Write a "Limitations" section for your vaccination analysis that honestly discusses the confounders in the GDP-vaccination relationship. How does acknowledging these limitations strengthen (not weaken) your analysis?

Key Takeaway: "Does money buy health?" is not a yes-or-no question. The GDP-health relationship is real, robust, and practically important. But it reflects a complex web of causal pathways, reverse causation, and confounding variables. Understanding this complexity doesn't make the data less useful — it makes your analysis more honest and your recommendations more likely to succeed.