Case Study 2: Counting What Counts — Data Science in the Fight Against Maternal Mortality

Contributors to Introduction to Data Science

Case Study 2: Counting What Counts — Data Science in the Fight Against Maternal Mortality

Tier 2 — Attributed Findings: This case study discusses a real and ongoing global health crisis. Statistics, estimation methods, and research findings are attributed to the World Health Organization (WHO), UNICEF, the United Nations Population Fund (UNFPA), the World Bank Group, and the United Nations Population Division, whose joint reports on maternal mortality are the primary global reference. Specific figures cited are drawn from widely published estimates; minor variations may exist between report editions. Illustrative examples of individual countries and programs are based on documented patterns in the global health literature, though precise local details have been simplified for pedagogical clarity.

The Crisis

Here is a number that should stop you: approximately 287,000 women and girls died in 2020 from causes related to pregnancy and childbirth, according to estimates from the WHO and partner agencies. That's roughly one death every two minutes.

Here is another number that should stop you even longer: in some high-income countries, the maternal mortality ratio is around 3 to 5 deaths per 100,000 live births. In certain low-income countries, the ratio exceeds 500 per 100,000 — sometimes reaching over 1,000. That means a woman giving birth in the highest-risk countries faces a risk of death that is one hundred to three hundred times higher than a woman in the lowest-risk countries.

This is not because pregnancy is inherently more dangerous in some parts of the world. The vast majority of maternal deaths are from causes that are preventable or treatable with known medical interventions: severe bleeding after birth, infections, high blood pressure during pregnancy, complications from delivery, and unsafe abortion. The difference between countries where women survive childbirth and countries where they don't is largely a matter of access — to skilled birth attendants, to emergency obstetric care, to blood transfusion services, to clean facilities, to transportation when complications arise.

This is a data science problem in ways that might not be immediately obvious. Not because an algorithm can deliver a baby, but because the fight against maternal mortality depends fundamentally on measurement: knowing how many women are dying, where, why, and whether interventions are working. And measurement, it turns out, is extraordinarily difficult.

The Data Challenge

When Deaths Go Uncounted

In countries with functioning civil registration systems — where births and deaths are reliably recorded, and causes of death are certified by a physician — counting maternal deaths is straightforward (though even then, misclassification happens). Most high-income countries and an increasing number of middle-income countries have such systems.

But in many of the countries where maternal mortality is highest, civil registration is incomplete or nonexistent. A woman who dies during childbirth in a rural village may never appear in any official record. Her death may be known to her family and community but invisible to the health system, the government, and the international organizations trying to track progress.

The scale of this invisibility is staggering. The WHO has estimated that in some sub-Saharan African countries, fewer than half of all deaths are registered with civil authorities. For maternal deaths specifically, the undercount may be even worse, because maternal deaths can be misattributed to other causes (a woman who dies from postpartum hemorrhage might be recorded as dying from "anemia" or "weakness"), or they may occur outside of health facilities entirely and never receive any medical documentation.

This creates a cruel paradox: the countries where the most women are dying are the countries where the fewest deaths are counted. The data is worst precisely where the need is greatest.

Structured vs. Unstructured Data in Maternal Health

To understand the data challenge, it helps to think about the different types of information that exist about maternal mortality — a distinction that will recur throughout this course.

Structured data is information that fits neatly into rows and columns: a patient's age, blood pressure, gestational age, the date and time of delivery, the outcome. Hospital records, when they exist and are digitized, produce structured data. It can be counted, aggregated, and analyzed with standard tools.

Unstructured data is information that doesn't fit neatly into a spreadsheet: a midwife's handwritten notes, a community health worker's verbal account of what happened during a delivery, a family member's description of a woman's symptoms before she died. This information is rich and important, but it can't be directly plugged into a statistical model.

In maternal health, some of the most critical data sources fall somewhere between these categories:

Verbal autopsies: When a death occurs outside a hospital and no physician was present, researchers may interview family members to reconstruct what happened. A trained interviewer asks a structured set of questions: Did the woman have a fever? Was there heavy bleeding? How long after delivery did she die? The answers are then used to assign a probable cause of death. This method is imperfect — family members may not know or remember medical details, cultural norms may affect what people are willing to describe, and the algorithm that translates interview responses into a cause of death has known error rates. But verbal autopsies are often the only source of cause-of-death information in settings where physicians are scarce.
Health facility records: Even when hospitals keep records, the records may be on paper, stored in filing cabinets, in varying formats, and difficult to aggregate. A 2019 review in The Lancet Global Health described researchers in one country spending months manually transcribing handwritten delivery room logbooks into digital formats — and finding that 15-20% of entries were incomplete or illegible.
Household surveys: Large-scale surveys like the Demographic and Health Surveys (DHS) program ask women of reproductive age about their pregnancy history, including whether any sisters have died during pregnancy or childbirth. This "sisterhood method" provides estimates of maternal mortality, but with wide uncertainty ranges and significant time lags — the deaths being reported may have occurred 5-12 years before the survey.

🔗 Connection: The distinction between structured and unstructured data is one of the foundational concepts in data science. In Chapter 7, you'll learn to work with structured data in pandas DataFrames. In Chapter 10, you'll encounter text data — a form of unstructured data. The challenges of converting messy, real-world information into analyzable data are central to Chapter 8 (Cleaning Messy Data). The maternal mortality measurement problem is an extreme example of a challenge that faces every data science project: the gap between the messy reality of the world and the clean rows and columns that our tools require.

How the WHO Estimates Maternal Mortality

Given all these data problems, how does anyone produce a number like "287,000 maternal deaths in 2020"?

The answer involves a sophisticated estimation methodology developed jointly by the WHO, UNICEF, UNFPA, the World Bank Group, and the United Nations Population Division — a group known as the Maternal Mortality Estimation Inter-Agency Group (MMEIG).

Their approach, in simplified terms, works like this:

Collect every available data source for each country: civil registration records, hospital data, surveys, censuses, verbal autopsy studies, surveillance systems. Different countries have different combinations of sources, and some have very few.
Assess the quality of each source. Is the civil registration system complete? Are maternal deaths likely to be misclassified? Is the survey sample large enough to produce reliable estimates? Each source gets an assessment of its strengths and limitations.
Build a statistical model that combines all available sources, weighting them by quality, and accounts for known biases. For countries with reliable vital registration data, the model stays close to the observed numbers. For countries with limited data, the model relies more heavily on statistical relationships — for instance, the known correlation between maternal mortality and factors like GDP per capita, skilled birth attendant coverage, and fertility rate.
Produce estimates with uncertainty ranges. This is crucial. The MMEIG doesn't report a single number; it reports a point estimate with an 80% uncertainty interval. For a country like Sierra Leone, the 2020 estimate was a maternal mortality ratio of 443 per 100,000 live births, with an uncertainty interval from 282 to 640. That wide range is honest — it reflects how much we don't know.
Revise and update. Every few years, the entire set of estimates is recalculated using the latest data and improved methods. This means that the estimated maternal mortality ratio for, say, 2015 may change between a report published in 2019 and a report published in 2023 — not because the past changed, but because our knowledge of the past improved.

This process is data science in action. Not the flashy kind with machine-learning demos and cool visualizations, but the foundational kind: gathering imperfect data from multiple sources, wrestling with incompleteness and bias, building models that account for uncertainty, and communicating results honestly. It's also a reminder that some of the most important data science work happens at the level of measurement itself — before any analysis can even begin.

Data Science in Action

Let's look at how the three types of data science questions — descriptive, predictive, and causal — apply to maternal mortality.

Descriptive: What Does the Data Show?

Descriptive analysis answers the question: What is happening?

When researchers map maternal mortality ratios by country, clear geographic patterns emerge. Sub-Saharan Africa carries a disproportionate burden — the region accounts for roughly two-thirds of all maternal deaths worldwide, despite representing about 14% of the global population. Within regions, there is enormous variation: some countries have made dramatic progress over the past two decades while their neighbors have stagnated.

But geographic patterns are only the beginning. Descriptive analysis also reveals demographic disparities:

Age: Adolescent girls (under 20) and women over 35 face higher risks.
Wealth: Within countries, women in the poorest quintile may be three to five times more likely to die in childbirth than women in the wealthiest quintile, according to DHS data.
Rural vs. urban: Women in rural areas are at higher risk, largely because of distance from emergency obstetric care.
Education: Women with more years of education tend to have better maternal outcomes — a correlation that likely reflects both direct effects (health literacy, ability to recognize danger signs) and indirect effects (education correlates with income, autonomy, and access to services).
Ethnicity and marginalization: In many countries, indigenous women, ethnic minorities, and refugees face substantially higher maternal mortality rates than national averages.

Each of these patterns tells a story. And each suggests a different point of intervention. Descriptive analysis doesn't tell you what to do, but it tells you where to look — which is where all good analysis starts.

Predictive: Can We See Crises Before They Happen?

Predictive analysis asks: Based on what we know, what is likely to happen next?

In maternal health, predictive approaches take several forms:

At the individual level: Can we identify women who are at high risk for complications before those complications occur? Researchers have developed risk-scoring tools that combine factors like age, obstetric history, blood pressure, nutritional status, and distance from a health facility to flag high-risk pregnancies. These aren't perfect — many complications arise in women with no identifiable risk factors — but they can help target scarce resources. If a community health worker knows that a particular pregnant woman has a high risk score, she can prioritize follow-up visits and help arrange a delivery plan that includes transportation to a facility.

At the population level: Can we identify regions or health systems that are at risk of worsening outcomes? Researchers have built models that use economic indicators, health system capacity data, conflict and displacement data, and disease surveillance to flag countries where maternal mortality is likely to increase. These early-warning models could, in theory, allow international agencies to direct resources before a crisis reaches its peak, rather than after.

At the systems level: Can we predict bottlenecks in the health system? For instance, if a district hospital has 200 deliveries per month and one functioning operating theater, can we estimate when the rate of emergency cesarean sections will exceed the hospital's capacity? This kind of operational forecasting is data science applied to health system planning.

Each of these predictions is uncertain. Predictive models in global health are far less precise than, say, weather forecasts or movie recommendations. But in settings where any advance warning can save lives, even imperfect predictions have value.

Causal: What Actually Reduces Maternal Mortality?

Causal analysis asks the hardest question: If we change X, will Y improve?

Correlation is easier to find than causation. We know that countries with more skilled birth attendants have lower maternal mortality. But does training and deploying more birth attendants cause maternal mortality to decline? Or do countries that invest in birth attendants also invest in roads, hospitals, blood banks, education, and clean water — and is it the whole package that matters, not any single intervention?

This distinction is not academic. Governments and international agencies have limited budgets. If they invest heavily in training midwives but the real bottleneck is transportation to emergency facilities, the investment won't save as many lives as expected. If they build hospitals but can't staff them, the hospitals won't help.

The gold standard for establishing causation is the randomized controlled trial (RCT), but RCTs for maternal mortality interventions face deep ethical and practical challenges. You can't randomly assign some communities to receive emergency obstetric care and deny it to others. You can't randomly assign some women to give birth with a skilled attendant and force others to deliver alone.

Researchers use several alternative approaches:

Natural experiments: When a policy change or program rollout happens in some regions but not others (often for administrative rather than experimental reasons), researchers can compare outcomes before and after, between covered and uncovered areas. Studies of this type have provided evidence that access to emergency obstetric care and skilled birth attendance reduce maternal mortality.
Interrupted time series: If a country implements a new policy (such as free maternal health services), researchers can look at the trend in maternal mortality before and after the policy and see whether the trajectory changed at the point of implementation.
Quasi-experimental methods: Techniques like difference-in-differences, instrumental variables, and regression discontinuity designs allow researchers to estimate causal effects from observational data under certain assumptions. These methods are imperfect, but they're often the best available option.

The bottom line is that identifying what works in reducing maternal mortality is genuinely difficult. It requires not just data, but careful reasoning about what the data can and cannot tell us. This is a theme that will run through the entire statistical thinking section of this book (Part IV) and come to a head in Chapter 24, "Correlation, Causation, and the Danger of Confusing the Two."

Ethical Dimensions

The data science of maternal mortality is inseparable from its ethics. Every methodological choice carries moral weight.

Who Is Missing from the Data?

Remember the data challenge: in the countries with the highest maternal mortality, data systems are the weakest. This means the women who are most at risk are also the women least likely to be counted when they die.

Within countries, the same pattern holds. Women in remote rural areas, women in conflict zones, women from marginalized ethnic groups, women in extreme poverty — these are the women most likely to die in childbirth, and the women least likely to appear in any database. When researchers build models using available data, they are building models that systematically underrepresent the most vulnerable populations.

This has consequences. If a model trained on hospital data suggests that postpartum hemorrhage is the leading cause of maternal death, but the women dying at home from obstructed labor never reach a hospital, the model will point resources toward hemorrhage management in hospitals and away from transportation and access for home deliveries. The data doesn't just reflect the world — it shapes the interventions, which shape the world in return.

How Do Measurement Choices Affect Policy?

The way you define and measure something changes what you see. "Maternal mortality" has a specific medical definition: the death of a woman while pregnant or within 42 days of the end of pregnancy, from any cause related to or aggravated by the pregnancy or its management. But this definition excludes women who die from pregnancy-related causes after 42 days (late maternal deaths), women who die from suicide linked to postpartum depression, and women who survive but suffer severe complications (the concept of "maternal near-miss" or severe maternal morbidity).

Depending on which definition a country uses and how rigorously it's applied, the same set of events can produce very different numbers. A country that counts only deaths within health facilities will report a lower number than one that includes community deaths. A country that classifies a post-cesarean infection death as "infection" rather than "maternal death" will undercount maternal mortality. These are not abstractions — they are the choices that determine what appears in global reports, which countries are flagged as being in crisis, and where funding flows.

The Danger of "Data Colonialism"

A difficult but important concept: who decides what questions get asked?

Much of the data collection and analysis in global maternal health is funded by institutions in high-income countries — the WHO (headquartered in Geneva), the World Bank (Washington, D.C.), and bilateral aid agencies from Europe and North America. The surveys are designed by international researchers, the statistical models are built by global teams, and the results are published in journals read primarily by international audiences.

This creates a dynamic that some scholars have called "data colonialism" — a pattern where data about communities in the Global South is collected, analyzed, and interpreted by institutions in the Global North, with the communities themselves having limited voice in what questions are asked, how data is gathered, or how results are used.

Consider: when an international research team designs a verbal autopsy questionnaire, they embed assumptions about how symptoms should be categorized and what counts as a "cause" of death. When a statistical model uses GDP per capita as a predictor, it encodes a particular view of what drives health outcomes. When a report publishes national-level estimates, it may obscure the subnational inequalities that matter most to people on the ground.

None of this means the research is worthless or ill-intentioned. The MMEIG's estimates have been instrumental in galvanizing global action and directing resources. But data science at its best is reflexive — it asks not only "What does the data show?" but also "Whose perspective does this data represent?" and "Who benefits from the way we've framed the question?"

🚪 Threshold Concept: Data is not neutral. Every dataset reflects choices about what to measure, how to measure it, and who gets measured. These choices have consequences — especially when the data is used to make decisions about vulnerable populations. This idea may seem abstract now, but it will become concrete as you work with real datasets throughout this course. Chapter 32 (Ethics in Data Science) explores these themes in depth.

The Human Story

It's easy, in a discussion about estimation methods and statistical models, to lose sight of what the numbers represent.

Behind the number 287,000 are individual women: a 19-year-old in rural Chad who bled to death after giving birth because the nearest health facility was 80 kilometers away on an unpaved road. A 34-year-old in Afghanistan who developed eclampsia and couldn't get the magnesium sulfate that would have saved her life. A 28-year-old in India who survived a complicated delivery but suffered a fistula injury that left her incontinent, shamed, and isolated from her community.

Data science cannot replace the midwife, the surgeon, or the ambulance. What it can do is help us understand where these resources are needed most, whether they're reaching the people who need them, and whether our efforts are making a difference. The numbers are not the point. The people are the point. The numbers are how we hold ourselves accountable to the people.

If that sounds like a heavy burden for a field that's often associated with tech companies and Silicon Valley, good. Data science is powerful precisely because it can influence how resources are allocated, how policies are designed, and whose needs are prioritized. That power comes with responsibility.

Connections to the Course

This case study previewed several concepts and skills that you'll develop throughout this book:

Concept	Where You'll Learn It
Structured vs. unstructured data	Chapters 7 (pandas) and 10 (Text Data)
Missing data and its consequences	Chapter 8 (Cleaning Messy Data)
Descriptive analysis and summary statistics	Chapter 19 (Descriptive Statistics)
Uncertainty ranges and confidence intervals	Chapter 22 (Sampling and Estimation)
The difference between correlation and causation	Chapter 24 (Correlation and Causation)
Predictive modeling	Chapters 25-30 (Part V: First Models)
Communicating data to decision-makers	Chapter 31 (Communicating Results)
Ethical reasoning about data	Chapter 32 (Ethics in Data Science)

You don't need to understand any of these topics now. But when you reach Chapter 8 and read about how to handle missing values in a pandas DataFrame, remember the missing maternal deaths — the women who never appeared in any data system. The technical skill of handling missing data is connected to the human reality that missing data often means missing people.

Discussion Questions

The measurement paradox. The countries with the highest maternal mortality are the countries with the least reliable data about maternal mortality. What are the consequences of this paradox for global health policy? If you were advising a government on where to invest limited resources, would you prioritize improving data systems or directly providing health services — and can you do one without the other?
Uncertainty as honesty. The MMEIG reports its estimates with wide uncertainty intervals — for some countries, the range spans from hundreds to over a thousand deaths per 100,000 live births. Some critics say these intervals are so wide as to be useless for planning. Others say that reporting a precise number when the true value is so uncertain would be dishonest and misleading. Where do you stand? Is it better to report an uncertain truth or a precise estimate that might be wrong?
Who is missing? Think about a dataset you've encountered in everyday life — a customer satisfaction survey, a product rating system, a school ranking. Who might be systematically missing from that data? How might their absence affect the conclusions drawn from it?
The causation challenge. We know that countries with more skilled birth attendants have lower maternal mortality. But proving that training more birth attendants causes mortality to decline is much harder. Why is this distinction important? Can you think of a policy that was adopted because of a correlation that turned out not to be causal?
Data colonialism. The concept of "data colonialism" suggests that when data about one community is collected, analyzed, and interpreted by outsiders, the resulting analysis may not serve the community's actual needs. Do you find this argument compelling? Can you think of examples closer to your own experience where data was collected about a group of people but not by or for them?
Numbers and people. This case study emphasizes that "behind every data point is a person." How should this awareness affect the way you work with data? Is there a risk that thinking too much about the human stories behind the data could actually make analysis worse (for example, by making you reluctant to draw conclusions)? How do you balance rigor and empathy?

Research Extension: Explore Maternal Mortality Data for Your Country

This exercise asks you to do something you'll do repeatedly throughout this course: find real data and spend time understanding it before analyzing it.

Step 1: Find the data. Search for your country's maternal mortality ratio on the WHO's Global Health Observatory (GHO) data portal or the World Bank Open Data site. Both are freely accessible. Look for the most recent estimate and the trend over the past 20 years.

Step 2: Read the fine print. What is the source of the estimate? Is it from civil registration data, a survey, or a model? What is the uncertainty interval? If your country is high-income with reliable vital registration, the interval will be narrow. If your country has weaker data systems, the interval may be wide.

Step 3: Compare. Find the maternal mortality ratio for two other countries — one with a much higher ratio than yours, and one with a much lower ratio (or roughly similar, if yours is already very low). What differences between these countries might help explain the gap? Think about health system factors, economic factors, geographic factors, and social factors.

Step 4: Ask a question. Based on what you've found, write down one specific question that you would want a data scientist to investigate. For example: "Why did Country X's maternal mortality ratio decline so sharply between 2005 and 2015, while Country Y's remained flat?" or "What explains the gap in maternal outcomes between urban and rural areas in my country?"

Step 5: Reflect. What was easy about finding and understanding this data? What was hard? Did anything surprise you? Write 2-3 sentences about the experience.

📝 Note: You do not need any coding skills for this exercise. A web browser and your curiosity are sufficient. If you find the WHO or World Bank data portals confusing to navigate, that's useful information — part of data science is learning to find your way through unfamiliar data sources, and it gets easier with practice. In Chapter 12, you'll learn to download and work with data like this programmatically.