Case Study 1: Target's Pregnancy Prediction — When Data Science Gets Too Good

Case Study 1: Target's Pregnancy Prediction — When Data Science Gets Too Good

Background

In 2012, a New York Times Magazine article by Charles Duhigg brought an extraordinary story to public attention. Target, the American retail giant, had developed a predictive analytics system so powerful that it could identify pregnant customers — and estimate their due dates — based solely on their purchasing behavior. The system was designed to help Target capture a share of one of the most lucrative moments in consumer life: the transition to parenthood, when shopping habits are in flux and brand loyalties are open to renegotiation.

The story became one of the most widely discussed examples of predictive analytics in business history. It raised questions that remain profoundly relevant today: What are the boundaries of corporate data use? When does helpful personalization become invasive surveillance? And what happens when a company's analytical capabilities outpace its ethical frameworks?

The Business Problem

Target's marketing team, led by statistician Andrew Pole, understood a fundamental truth about consumer behavior: people are creatures of habit. Most of the time, shoppers buy the same products from the same stores in the same patterns. Disrupting those patterns is extremely difficult — and extremely expensive — through conventional marketing.

But there are moments when habits break down naturally. Getting married. Graduating from college. Moving to a new city. And most dramatically: having a baby. New parents need entirely new categories of products — diapers, cribs, formula, car seats — and their established shopping routines are disrupted by the radical life change. If Target could identify expectant parents early enough, it could offer them relevant products and coupons, establish a relationship before the baby arrived, and potentially capture a customer whose new shopping habits would be worth tens of thousands of dollars over the following years.

The challenge: how do you identify pregnant women before they announce their pregnancies?

The Analytical Approach

Pole's team used a technique that would now be called predictive modeling using purchase data. They began with Target's baby shower registry — a dataset of customers who had explicitly identified themselves as expecting. These customers served as the "positive class" in a classification problem.

The team analyzed the purchasing patterns of these known expectant mothers, looking for changes in buying behavior that preceded the registration date. What products did women start buying — or buying more of — when they became pregnant?

The team identified approximately 25 products that, when analyzed together, could predict pregnancy with remarkable confidence. The specific products included:

Unscented lotion — Pregnant women in the second trimester often switch to unscented products as their skin becomes more sensitive and their sense of smell heightens.
Mineral supplements — Particularly calcium, magnesium, and zinc, which are commonly recommended during pregnancy.
Extra-large bags of cotton balls — Often purchased in preparation for baby care.
Hand sanitizer — Purchased in increased quantities as parents-to-be become more conscious of hygiene.
Washcloths — Larger-than-usual purchases.
Certain dietary shifts — Changes in food purchasing patterns consistent with pregnancy-related dietary changes.

No single product was predictive on its own. A woman buying unscented lotion is probably just buying unscented lotion. But the combination of several of these products, purchased within a specific time window, created a signal that was highly predictive.

The model didn't just predict whether someone was pregnant — it estimated how far along she was. By correlating product purchase timing with known due dates from the registry data, the team could assign a "pregnancy prediction score" and an estimated due date. This allowed Target to send precisely timed coupons: prenatal vitamins in the first trimester, maternity clothing in the second, diapers and nursery items in the third.

The Incident

The story that captured public attention involved a man in Minneapolis who walked into a Target store and demanded to speak with a manager. He was furious. His teenage daughter had received Target mailings with coupons for maternity clothing, nursery furniture, and baby products. "She's still in high school," he told the manager. "Are you trying to encourage her to get pregnant?"

The manager apologized profusely. He called the father a few days later to apologize again. But this time, the father was more subdued.

"I had a talk with my daughter," he said. "It turns out there's been some activities in my house I haven't been completely aware of. She's due in August. I owe you an apology."

Target's algorithm had detected the pregnancy before the girl's own father knew.

The Aftermath and Ethical Dimensions

The story went viral — and not in a way that benefited Target. Public reaction was a mixture of fascination and alarm. The predictive capability was impressive; the use case was unsettling.

Target's response was instructive. Rather than stopping the pregnancy prediction program, they refined its presentation. They discovered that sending only baby-related coupons creeped people out — it made the surveillance too visible. So they began mixing baby product coupons with random, unrelated items — a lawnmower coupon next to a diaper coupon, a wine glass ad next to infant clothing. This made the targeting look like generic marketing rather than personal surveillance. The analytical capability was identical; only the packaging changed.

This decision reveals something important about the relationship between corporate data science and ethics. Target's solution to the public's discomfort was not to reconsider its practices, but to better conceal them. The analytical power was the same; the consumer just couldn't see it.

The Power of Predictive Analytics

The Target case demonstrates several key capabilities of predictive analytics:

Pattern recognition in transaction data. Individual product purchases are noisy signals. But combinations of purchases, analyzed over time, can reveal underlying life circumstances with startling accuracy.

The value of non-obvious features. No one would intuitively connect unscented lotion purchases to pregnancy prediction. The analytical model discovered relationships that domain experts hadn't identified — relationships that emerged from data, not from theory.

Feature engineering matters. The model didn't just use raw purchase data; it used changes in purchasing behavior — deviations from a customer's established baseline. This is a form of feature engineering that dramatically improved predictive power.

Temporal patterns are powerful. The model didn't just predict pregnancy; it estimated timing. The sequential nature of pregnancy-related purchases — different products at different stages — provided rich temporal features.

The Ethical Chasm

The Target case also exposes a fundamental gap in how organizations approach data science: the chasm between "Can we?" and "Should we?"

Consent and expectation. Target customers provided their transaction data by shopping at Target — they had no reasonable expectation that it would be used to infer their reproductive status. The model operated entirely within the bounds of data that Target legally possessed, yet its inferences went far beyond what customers would have anticipated or approved.

Information asymmetry. Target knew something intimate about its customers that the customers didn't know Target knew. This information asymmetry created an inherent power imbalance. Customers were making decisions (about which coupons to use, which stores to visit) without understanding the depth of knowledge informing those nudges.

The vulnerability factor. Pregnancy is a private, often sensitive life event. Revealing it — even through marketing materials — could have serious consequences for some individuals: teenagers in conservative households, women in abusive relationships, employees who haven't yet told their managers. The model treated all pregnancies as marketing opportunities without regard for the context in which the information might be received.

Proxy discrimination. Pregnancy is a protected status in many legal contexts. Using purchasing data to infer pregnancy status and then targeting different marketing at those individuals raises questions about discriminatory treatment, even if the intent is commercial rather than malicious.

The "creepiness camouflage" problem. Target's solution — hiding pregnancy coupons among random ones — addressed the optics without addressing the ethics. The practice continued; it just became less visible. This is a pattern that recurs throughout the data science industry: when the public objects to a practice, companies often adjust the presentation rather than the practice itself.

The Broader Implications

The Target case has become a touchstone for discussions of data ethics in business for several reasons:

It's relatable. Unlike abstract discussions of algorithmic bias or data governance, the Target story involves a familiar company, a familiar activity (shopping), and a situation anyone can imagine themselves in. This relatability makes it a powerful teaching tool.

It demonstrates the externalities of data science. The model created value for Target (more effective marketing) and some value for customers (relevant coupons for products they actually needed). But it also created costs: loss of privacy, potential for harm in sensitive situations, and erosion of trust when the practice was revealed. These costs were borne by customers who had no voice in the decision to deploy the model.

It predates current regulation. The events occurred before GDPR, before the California Consumer Privacy Act (CCPA), and before most corporate data ethics frameworks. The lack of regulatory guardrails allowed the program to proceed without formal review. Today, similar programs would face greater scrutiny — but the underlying capabilities have grown far more powerful.

It illustrates the gap between legality and ethics. Everything Target did was legal. Its customers had agreed to terms of service. The data was collected in the ordinary course of business. No laws were broken. Yet the public reaction demonstrated that legality and ethical acceptability are not the same thing — a distinction that every data science practitioner must grapple with.

Connections to Chapter Concepts

This case study illustrates several key concepts from Chapter 2:

Predictive analytics (Section 2.6): Target built a classification model predicting pregnancy status, demonstrating the power and the risk of prediction from observational data.
Correlation and confounding (Section 2.5): The model found correlations between purchasing patterns and pregnancy, but the causal mechanism (hormonal changes driving product preferences) was not directly measured. The model worked on correlational patterns.
The data pipeline (Section 2.10): Transaction data flowed from point-of-sale systems through data warehouses to predictive models to marketing campaign engines — a full pipeline from data generation to business action.
The last mile (Section 2.7): Target solved the last-mile problem effectively — their insights directly drove action (targeted coupons). But this case shows that closing the last-mile gap creates its own responsibilities.
Structured data and its power (Section 2.2): This case used purely structured data — transaction records — to infer deeply personal information. It demonstrates that the distinction between "innocuous" structured data and "sensitive" information is thinner than most people assume.

Discussion Questions

1. Target's model used purchasing behavior to infer a private health status (pregnancy). Where should companies draw the line on what they infer from customer data? Consider: Is inferring pregnancy different from inferring income level? Political affiliation? Sexual orientation? Health conditions? If so, what principle distinguishes them?

2. Target chose to camouflage its targeting by mixing baby coupons with unrelated offers. Was this an acceptable response to consumer discomfort, or was it a deception that made the underlying problem worse? What would you have recommended instead?

3. If you were building a similar predictive model today, what ethical safeguards would you implement? Consider: opt-in vs. opt-out, transparency about data use, sensitivity categories, third-party review, and the "newspaper test" (would you be comfortable if this practice were reported on the front page of a major newspaper?).

4. The teenager's father was angry — until he discovered the prediction was correct. Does the accuracy of the prediction matter ethically? Would the situation be more or less troubling if the model had been wrong?

5. Apply the CRISP-DM framework to Target's project. At which phase(s) should ethical concerns have been raised? How would a robust Evaluation phase (Phase 5) that included ethical criteria have changed the outcome?

6. Consider this scenario: a health insurance company uses the same type of purchasing-pattern analysis to predict which members are likely to develop expensive chronic conditions, and uses those predictions to adjust premiums or coverage. How does this differ from Target's use case? Is one more ethically acceptable than the other? Why?

7. The chapter discusses the difference between descriptive, diagnostic, predictive, and prescriptive analytics. Target's model was predictive (who is pregnant?) but was embedded in a prescriptive system (what coupon should we send?). How does the combination of prediction and prescription change the ethical calculus compared to prediction alone?

Case Study 1: Target's Pregnancy Prediction — When Data Science Gets Too Good

Background

The Business Problem

The Analytical Approach

The Incident

The Aftermath and Ethical Dimensions

The Power of Predictive Analytics

The Ethical Chasm

The Broader Implications

Connections to Chapter Concepts

Discussion Questions

Further Reading