Chapter 24: Epidemiological Surveillance — Tracking Disease and Population

29 min read

It is March 2021. Jordan Ellis has been back in their dorm for two weeks after returning to Hartwell University for the spring semester. The pandemic is ongoing; Hartwell has a testing program and a series of protocols, most of which Jordan has...

In This Chapter

Opening: The Contact Tracer's Call
24.1 What Is Epidemiological Surveillance?
24.2 John Snow and the Origin Story of Epidemiological Surveillance
24.3 The Architecture of Modern Disease Surveillance
24.4 Vital Statistics: The Surveillance of Birth and Death
24.5 Mandatory Reporting: The Legal Infrastructure of Epidemiological Surveillance
24.6 COVID-19 and the Expansion of Epidemiological Surveillance
24.7 Biobanks and Genetic Surveillance
24.8 Hartwell University Health Center as a Micro-Surveillance System
24.9 The Tension: Public Health Necessity vs. Individual Privacy
24.10 Toward Trustworthy Epidemiological Surveillance
24.11 Summary
Key Terms

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 24: Epidemiological Surveillance — Tracking Disease and Population

Opening: The Contact Tracer's Call

It is March 2021. Jordan Ellis has been back in their dorm for two weeks after returning to Hartwell University for the spring semester. The pandemic is ongoing; Hartwell has a testing program and a series of protocols, most of which Jordan has followed diligently.

The call comes on a Tuesday. A contact tracer — working for the county health department under contract to Hartwell — explains that Jordan has been identified as a close contact of someone who tested positive for COVID-19. The tracer asks a series of questions: When did Jordan last see this person? Where were they together, and for how long? Were they masked? Have they experienced any symptoms? Have they been tested recently?

Jordan answers honestly, though the conversation is uncomfortable. Jordan knows who the positive case is — the information is inferrable from the combination of questions and timeline, even though the contact tracer does not name the person explicitly.

After the call, Jordan thinks about the data flow that made this call possible. The positive case tested at Hartwell's health center. The health center reported the positive result to the county health department under mandatory reporting requirements. The county health department assigned a contact tracer who called Jordan — which means the positive case had provided Jordan's name and contact information during their own contact tracing interview. Jordan's name was in someone else's disease disclosure.

Jordan also thinks about what they disclosed in the call: a timeline of their own movements and social contacts over the previous ten days. The contact tracer now has a partial map of Jordan's social network, their locations during a specific period, and their health status as Jordan reported it. That information is in a public health database.

And then Jordan remembers that they were at a protest organized by Yara last Saturday — a gathering that was technically permitted but that had generated some administrative scrutiny. Jordan's response to one of the tracer's questions had included a reference to "being outside with some friends" on Saturday. The distinction between a protest and "being outside with some friends" felt suddenly important.

Jordan did not lie. But they also did not fully describe what that outdoor gathering was. The line between public health disclosure and political disclosure felt, in that moment, uncomfortably thin.

24.1 What Is Epidemiological Surveillance?

Epidemiological surveillance is the continuous, systematic collection, analysis, and interpretation of health-related data for the purpose of monitoring population health trends and informing public health action. The term "epidemiology" derives from the Greek epi (upon) + demos (people) + logos (study) — it is literally the study of what falls upon people.

Unlike clinical medicine — which focuses on the individual patient — epidemiology is inherently population-level. Its subjects are not patients but populations. Its data are not medical records but patterns. Its concern is not "what is wrong with this person" but "what is happening to these people."

This population-level orientation is what makes epidemiological surveillance different from other forms of medical data collection — and what creates its distinctive relationship to privacy and civil liberties. Epidemiological surveillance is not primarily interested in any individual; it is interested in the collective. But it must collect data from individuals to understand the collective.

💡 Intuition: The Epidemiological Paradox

Epidemiology faces a paradox: it needs individual data to understand population patterns, but its purpose is population understanding rather than individual surveillance. A single case of cholera is a tragedy and a clinical problem. Ten cases of cholera in the same neighborhood is an epidemiological signal that may reveal a contaminated water source. One hundred cases across multiple neighborhoods is an emergency. You cannot see the pattern without the individual data points — but the individual data points are not the point; the pattern is.

This paradox generates the central tension of epidemiological surveillance: the legitimate population-level purpose uses individual-level data in ways that have personal implications for individuals who did not primarily consent to surveillance of themselves.

The Key Functions of Epidemiological Surveillance

Public health surveillance serves five core functions that justify its existence and scope:

Detection: Identifying new health threats — novel pathogens, emerging syndromes, unusual clusters of illness
Monitoring: Tracking the prevalence and distribution of known conditions over time
Assessment: Evaluating the effectiveness of public health interventions
Prediction: Modeling disease trajectories to guide resource allocation
Accountability: Documenting health outcomes to evaluate whether programs and policies are working

Each of these functions requires data — from clinical reports, laboratory tests, vital statistics, behavioral surveys, and increasingly from environmental monitoring systems and commercial data brokers.

24.2 John Snow and the Origin Story of Epidemiological Surveillance

The canonical origin story of modern epidemiology is the cholera investigation by British physician John Snow in the Soho neighborhood of London in 1854.

Cholera was then poorly understood. The dominant theory — the miasma theory — held that disease spread through "bad air" from decaying organic matter. Snow was skeptical and proposed an alternative: disease spread through contaminated water.

In August 1854, a severe cholera outbreak erupted in the Soho neighborhood, killing hundreds within days. Snow proceeded systematically. He interviewed residents to determine who was sick and who was not. He plotted cases on a street map of the neighborhood — creating one of the first examples of disease mapping as a surveillance tool. The pattern was unmistakable: cases clustered around a specific water pump on Broad Street.

Snow's analysis went further. He investigated the exceptions — households near the Broad Street pump where people were not sick, and households far away where people were sick. The pattern held: people who used the Broad Street pump got sick; people who avoided it did not, even if they lived nearby. Workers at a brewery near the pump did not get sick because they drank beer rather than water. Residents of a workhouse near the pump did not get sick because the workhouse had its own water supply.

Snow persuaded local authorities to remove the handle from the Broad Street pump, ending access to it. The outbreak was already declining, but the pump removal became the iconic moment of the story.

What Snow's Investigation Illustrates:

Snow's methodology is recognizable as modern epidemiological surveillance: systematic data collection from identified individuals (who is sick, where do they live, what do they drink), spatial analysis (the map), investigation of exceptions to test the hypothesis, and action based on the findings (pump handle removal).

But it also illustrates something else: the data Snow collected was personal. He knew which households had sick members. He knew where people obtained their water — a private domestic decision. He had, in effect, conducted a health and behavior survey of the neighborhood without formal consent processes. The investigation was legitimate and its findings saved lives. The absence of consent was not problematic in its context. But the structure — individual data collected for population analysis, personal behavior surveilled to understand collective patterns — is the structure of all epidemiological surveillance.

📊 Real-World Application: Disease Mapping Today

John Snow's street map of cholera cases is the direct ancestor of today's disease mapping systems. The CDC's BioSense Platform provides near-real-time geographic mapping of emergency department chief complaints across thousands of hospitals, enabling detection of unusual disease clusters. During the 2009 H1N1 influenza pandemic, Google Flu Trends attempted to predict flu activity from search query patterns — later found to dramatically overestimate prevalence, illustrating that behavioral data from commercial platforms is not a reliable substitute for actual clinical surveillance. The map remains the fundamental tool of epidemiology; its digital descendants are faster, finer-grained, and raise more complex privacy concerns than Snow's hand-drawn original.

24.3 The Architecture of Modern Disease Surveillance

The contemporary U.S. epidemiological surveillance system is a layered architecture connecting individual clinical encounters to national databases, operating under a combination of mandatory reporting laws, professional norms, and federal guidance.

The National Notifiable Disease Surveillance System (NNDSS)

The National Notifiable Disease Surveillance System (NNDSS) is the core infrastructure of U.S. disease surveillance. It operates as follows:

Level 1: Clinical Encounter A patient presents to a healthcare provider (physician, hospital, clinic) with symptoms. The provider orders laboratory tests. The laboratory receives a specimen.

Level 2: Laboratory and Clinical Reporting If the laboratory identifies a notifiable disease — one of the approximately 120 diseases and conditions designated for mandatory reporting by the Council of State and Territorial Epidemiologists (CSTE) — it is required to report the case to the state health department. The reporting physician may also have an independent obligation.

What information is reported varies by disease and state but typically includes: - The condition diagnosed - Date of onset - Patient demographics (age, sex, race/ethnicity, geographic area) - Relevant clinical and laboratory information - Exposure history (for some conditions)

Note: Patient name and address are typically reported to the state health department but stripped from the data before transmission to CDC national databases.

Level 3: State Health Department State epidemiologists aggregate case reports, conduct follow-up investigations for priority conditions, report to the CDC, and respond to outbreaks.

Level 4: CDC and NNDSS The CDC aggregates state data into the national NNDSS database, which tracks trends over time, detects unusual clusters, and produces the published surveillance reports that inform public health policy.

Process Diagram: NNDSS Data Flow

Patient presents with illness
        ↓
Healthcare provider diagnoses and tests
        ↓
Laboratory identifies notifiable condition
        ↓
Lab AND/OR Provider reports to State Health Department
  (State-specific reporting form: demographics, clinical data)
        ↓
State Health Department:
  Records case
  Investigates if priority condition
  May conduct contact tracing
  Reports to CDC (de-identified or aggregate)
        ↓
CDC National NNDSS Database:
  Aggregates all state data
  Monitors trends
  Detects unusual clusters
  Publishes surveillance reports
  Shares data with WHS for international reporting

Syndromic Surveillance

NNDSS tracks confirmed diagnoses. Syndromic surveillance tracks disease-like symptoms before diagnosis is confirmed — attempting to detect outbreaks earlier by monitoring emergency department visits and chief complaints, pharmacy sales of over-the-counter medications, and other proxies for illness.

The CDC's BioSense Platform aggregates emergency department data from thousands of hospitals nationwide, tracking the daily numbers of visits for fever, respiratory complaints, gastrointestinal illness, and other syndromes. By comparing current patterns to historical baselines, the system can detect unusual increases that may indicate emerging outbreaks — sometimes days before confirmed diagnoses are reported through traditional channels.

Syndromic surveillance raises additional privacy concerns because it involves monitoring of people who have not (yet) been diagnosed with any specific condition. A person who visits an emergency department for a fever becomes part of a surveillance dataset even if their complaint turns out to be entirely benign.

24.4 Vital Statistics: The Surveillance of Birth and Death

The most fundamental form of epidemiological surveillance predates NNDSS by centuries: vital statistics — the systematic recording of births, deaths, and marriages. Every person who is born, who dies, and in many jurisdictions who marries, is a data point in a government surveillance database.

Birth Registration

When a child is born in the United States (and in virtually every country with modern administrative infrastructure), a birth certificate is completed. The birth certificate records: - Date, time, and location of birth - Infant's name, sex, and birth weight - Gestational age and plurality (twins, etc.) - Mother's name, age, race/ethnicity, address, and educational attainment - Father's name and information (if provided) - Prenatal care received - Complications of pregnancy and delivery - Whether the infant received certain medical interventions at birth

This information is used for public health purposes — tracking preterm birth rates, infant mortality, disparities by race and ethnicity — and for administrative purposes (establishing citizenship and identity). It is also a surveillance record that links an individual's existence, from the first moments of life, to government databases.

Death Registration

Death certificates are similarly comprehensive: - Date, time, and place of death - Manner of death (natural, accident, suicide, homicide, undetermined) - Cause of death and contributing conditions - Decedent's demographics, occupation, and residence - Whether an autopsy was performed

The cause-of-death information on death certificates is a critical epidemiological data source. Mortality statistics — how many people die from heart disease, cancer, COVID-19, drug overdose — are derived primarily from death certificate data. The accuracy of mortality statistics depends entirely on the accuracy of cause-of-death documentation, which varies substantially by cause (well-documented for common conditions, less reliable for conditions that carry stigma or require specific knowledge to identify).

⚠️ Common Pitfall: Cause of Death as Contested Surveillance Data

Death certificate cause-of-death data is not simply objective fact — it reflects decisions made by certifying physicians, medical examiners, and coroners about how to classify complex clinical situations. Drug overdose deaths, for example, were substantially undercounted for decades because certifying physicians were reluctant to record drug overdose as a cause of death — a stigma effect that meant surveillance data underestimated the opioid epidemic's scale. COVID-19 death counts were similarly contested throughout the pandemic, with some governments accused of undercounting COVID deaths for political reasons. Surveillance data is produced by human decisions; those decisions reflect the same social pressures and biases as any other human activity.

24.5 Mandatory Reporting: The Legal Infrastructure of Epidemiological Surveillance

Epidemiological surveillance requires that people and institutions report health information to government authorities — often without the patient's specific consent. This mandatory reporting is the legal foundation of the surveillance system, and it represents one of the most direct applications of state power to health data collection.

What Is Required to Report, and by Whom

In the United States, mandatory reporting requirements are established by state law (epidemiology is primarily a state responsibility under the Tenth Amendment). Each state designates a list of notifiable conditions — which vary somewhat by state — and requires healthcare providers, laboratories, and other institutions to report cases.

The legal obligation falls primarily on institutions and professionals rather than patients: - Healthcare providers (physicians, nurse practitioners, etc.) - Hospitals and clinics - Clinical laboratories - Schools (for certain childhood diseases) - Coroners and medical examiners

Patients do not, in general, have a legal obligation to report their own health conditions. However, patients may have indirect obligations: many states require healthcare providers to report HIV status to public health authorities, and some require contact tracing participation for certain conditions.

What Mandatory Reporting Means for Privacy

Mandatory reporting means that certain health information — your HIV status, your tuberculosis diagnosis, your sexually transmitted infection — flows automatically to government health databases, without your specific consent to that transmission. You consented to being treated by your healthcare provider; you did not individually consent to your provider reporting your condition to the health department.

This is not unique to health surveillance. Tax information, financial reports, and building permits involve similar mandatory disclosures. But health information carries particular sensitivity because of its potential for stigma, discrimination, and social harm.

The legal framework that governs this information — primarily HIPAA (Health Insurance Portability and Accountability Act) in the U.S. — explicitly permits disclosure for public health purposes. Public health reporting is a carved-out exception to HIPAA's general privacy protections. The exception is appropriate — epidemiological surveillance requires it — but it is important to understand that HIPAA's famous privacy protections do not apply to the reports that flow from your healthcare provider to the state health department.

📝 Note: The HIV Surveillance Example

The surveillance history of HIV/AIDS is among the most contested in public health. Early in the epidemic, before effective treatment was available, gay rights and HIV advocacy groups strongly opposed name-based reporting to health departments — fearing that lists of HIV-positive individuals in government databases would enable discrimination and persecution. They advocated for anonymous testing and aggregate reporting. As treatment became available and contact tracing for treatment access became important, the calculus shifted — name-based reporting enabled health departments to ensure that HIV-positive individuals were connected to care. The HIV debate remains a paradigm case for how surveillance necessity and civil liberties concerns must be continually renegotiated as circumstances change.

24.6 COVID-19 and the Expansion of Epidemiological Surveillance

The COVID-19 pandemic transformed epidemiological surveillance in ways that will persist long after the pandemic's acute phase has ended. Three developments deserve particular attention: contact tracing, mobility data, and wastewater surveillance.

Contact Tracing

Contact tracing is one of the oldest epidemiological techniques — identifying people who have been exposed to an infectious disease and notifying them so they can be tested, treated, or isolated. It was used in smallpox eradication, tuberculosis control, and HIV/AIDS management before COVID-19.

What COVID-19 added was scale and technology. The need to trace contacts for millions of cases simultaneously drove rapid development of digital contact tracing tools.

Exposure Notification Apps

Apple and Google jointly developed a COVID-19 Exposure Notification (EN) system in April 2020 — a remarkable collaboration between competing tech giants, made possible by the scale of the public health emergency. The EN system used Bluetooth Low Energy (BLE) signals to detect proximity between smartphones, without using GPS or any location data.

How it worked: - Each phone periodically broadcasts a random, rotating identifier via Bluetooth - Nearby phones record these identifiers and store them locally (not transmitted to any server) - When a person tests positive and registers this with the app, a list of their recent identifiers is uploaded to a server - Other users' phones download these identifiers and compare them against locally stored identifiers - If a match is found, the person receives a notification that they may have been exposed — without knowing who the positive case was or where the exposure occurred

This architecture was specifically designed to avoid creating a centralized database of individual movements or contacts. The privacy protections were genuine: the system never knew who contacted whom or where, only that proximity had occurred.

The en architecture stands in contrast to centralized contact tracing apps developed by some governments — including China, South Korea, and Singapore — which used GPS location data and created centralized databases of individual movements, enabling governments to reconstruct detailed records of where individuals had been and who they had been with.

🌍 Global Perspective: Differential COVID Surveillance

South Korea's COVID response used credit card transaction data, cell phone location data, and CCTV footage to reconstruct the movements of COVID-positive individuals in extraordinary detail — publishing these reconstructed routes publicly (initially with identifiable information, later anonymized). South Korea's relatively low early COVID mortality rate was cited as evidence that this intensive surveillance was effective. Critics noted that it was also a dramatic demonstration of what government COVID surveillance could look like when civil liberties constraints were minimized. The comparison between South Korea's GPS-based contact tracing and Apple/Google's decentralized Bluetooth system illustrates how the same public health goal can be pursued through dramatically different surveillance architectures with dramatically different implications for privacy and civil liberties.

Mobility Data in Pandemic Response

In April 2020, Google released COVID-19 Community Mobility Reports — aggregated and anonymized summaries of how populations were moving relative to pre-pandemic baseline, derived from location data collected from users who had opted into Google Maps location sharing. The reports showed changes in visits to retail, recreation, grocery, transit, workplace, and residential locations for more than 100 countries.

Public health officials used this data to assess whether social distancing policies were changing behavior, to identify communities where mobility remained high despite stay-at-home orders, and to calibrate policy responses. Apple released a similar product — Mobility Trends Reports — showing changes in routing requests for driving, transit, and walking.

This use of commercial location data for public health purposes raises important questions:

On the positive side: The data provided rapid, large-scale insight into population behavior that no traditional public health survey could have generated. It helped governments understand whether their policies were working within days rather than months.

On the concerns side: The data was not collected for this purpose. Users who had enabled Google Maps location sharing consented to a commercial service, not to contributing to a government epidemiological database. The aggregation and anonymization were performed by Google, with no external verification. Governments and researchers using the data had no way to audit the underlying collection methodology.

The use of commercial behavioral data for public health surveillance is an example of the textbook's "consent as fiction" theme applied to epidemiology: the consent obtained (to use Google Maps) is a different consent than the one exercised (contributing to pandemic surveillance). The gap between them is bridged by aggregation and anonymization, but those technical protections are not the same as genuine consent.

Wastewater Surveillance

As detailed in Chapter 23's Case Study 23-2, COVID-19 wastewater surveillance became a major tool for early detection of community transmission. The National Wastewater Surveillance System (NWSS) expanded rapidly and has been extended to monitor influenza, RSV, mpox, and other pathogens.

The wastewater surveillance case illustrates the pandemic's accelerating effect on surveillance expansion: infrastructure that would have taken years to develop and deploy under normal conditions was built in months under emergency authorization. The speed of deployment meant that community consultation, privacy impact assessment, and democratic deliberation about the scope of monitoring were largely bypassed. The emergency justified the speed; the speed foreclosed the deliberation.

24.7 Biobanks and Genetic Surveillance

The most rapidly expanding frontier of epidemiological surveillance is genetic. Biobanks — large collections of biological samples (blood, tissue, saliva) and associated health and behavioral data — have become essential infrastructure for genetic epidemiology.

What Biobanks Are and How They Work

A biobank is a repository of biological samples, typically collected from research participants who consent to broad future use of their samples for research purposes. Major biobanks include:

UK Biobank: 500,000 participants, recruited between 2006 and 2010; biological samples, health records, lifestyle data, imaging, and genetic data
All of Us Research Program (NIH): Targeting 1 million diverse U.S. participants; comprehensive health data including genetics, wearable device data, and electronic health records
Million Veteran Program (VA): Veterans' health and genetic data; one of the largest genomic databases in the world

Biobank participants typically consent to having their samples used for broad future research — research that was not specified at the time of consent and that will be conducted by researchers they will never know. This "broad consent" model was developed because it is impossible to specify in advance all the research that will be done with biobank samples; requiring specific consent for each study would make biobanks impractical.

But broad consent also means that biobank participants have consented to surveillance without knowing what they have consented to — exactly the pattern the textbook calls "consent as fiction."

The Re-Identification Problem

Genetic data is uniquely re-identifiable. Even if your name is stripped from a genetic database entry, your genome is uniquely yours — it can in principle be linked to you if any other genetic data linked to your identity exists (such as a direct-to-consumer ancestry test, a convicted offender DNA sample, or a relative's genetic data in any database).

Researchers have demonstrated that even aggregate genetic statistics — the kind published in academic papers — can be used to determine whether a specific individual's genome is in the dataset. This means that de-identification of genetic data does not provide the privacy protection it provides for other types of health data.

The implications for biobank participants are significant: genetic information contributed to a research biobank may be linkable to the participant's identity regardless of the de-identification measures applied at collection.

🎓 Advanced: Third-Party Genetic Exposure

One of the most distinctive features of genetic data surveillance is that it extends to people who never participated in any database. Your genome encodes information about your relatives — parents, siblings, children, cousins. When you contribute your genetic data to a biobank or a consumer DNA service, you are simultaneously contributing information about people who made no such choice. Law enforcement databases like CODIS (Combined DNA Index System) have extended their effective reach by using familial DNA matching — finding database records that partially match an unknown sample (indicating a relative is in the database) and using the match to identify the suspect through a process of narrowing. This is genetic surveillance of non-consenting individuals through their consenting relatives — a surveillance extension that no individual governance mechanism can fully address because the information is inherently shared.

24.8 Hartwell University Health Center as a Micro-Surveillance System

Jordan's encounter with contact tracing is a specific instance of a broader surveillance apparatus that operates at every institution with a health center.

Hartwell University's health center serves as a node in multiple intersecting surveillance systems simultaneously:

Clinical surveillance: When students receive care, their diagnoses are recorded in electronic health records. These records are subject to HIPAA, which permits disclosure to campus administration for certain purposes (though usually not without student consent) and requires disclosure to public health authorities for notifiable conditions.

COVID-era surveillance: During the pandemic, many universities implemented testing programs with reporting obligations. A positive test triggers a mandatory report to the county health department, which triggers contact tracing, which may involve administrative staff who are not healthcare providers.

Title IX surveillance: When a student seeks health care for a sexual assault, universities have complex obligations that vary by institutional policy, Title IX regulations, and state law. Some students are surprised to learn that a healthcare visit generates administrative records with implications beyond clinical care.

Mental health surveillance: University mental health services are subject to HIPAA but also to the "duty to warn" doctrine established in Tarasoff v. Regents of the University of California (1976), which requires mental health providers to warn identifiable third parties of credible threats. Students seeking mental health care may not fully understand the circumstances under which confidentiality can be broken.

Vaccination records: Many universities require vaccination documentation for enrollment. This documentation — maintained by the health center — is a health surveillance record that can be accessed in legal proceedings, disclosed under certain circumstances to parents (for dependent students), and used for administrative decisions about enrollment status.

The health center presents itself as a confidential medical resource. It is also a site of multiple overlapping surveillance systems, each justified by a legitimate purpose, each collecting data that flows beyond the clinical relationship.

✅ Best Practice: Understanding Your Health Rights on Campus

Students have specific rights regarding their health information at university health centers: - HIPAA privacy rights apply to your medical records held by the health center - FERPA (Family Educational Rights and Privacy Act) may also apply to certain health records maintained as student records - You have the right to receive a Notice of Privacy Practices explaining how your information may be used and disclosed - You can request access to your own records - You can request restrictions on certain disclosures (though healthcare providers are not required to agree) - Exceptions exist for public health reporting, imminent safety concerns, and court-ordered disclosures Read the health center's privacy notice carefully — it will tell you what actually governs your data, not what you assume.

24.9 The Tension: Public Health Necessity vs. Individual Privacy

Epidemiological surveillance is, at its core, a tension between two legitimate values: the public's interest in understanding and controlling disease (which requires collecting individual data), and the individual's interest in privacy and autonomy (which resists surveillance and disclosure). This tension cannot be fully resolved — it can only be managed.

The Arguments for Strong Epidemiological Surveillance

Lives are at stake: Inadequate surveillance allows disease to spread. The AIDS epidemic was worsened by delayed recognition, inadequate data systems, and political resistance to surveillance. The COVID-19 pandemic demonstrated repeatedly that early detection enables earlier response. Wastewater surveillance provided days of advance warning before clinical surges.

Epidemiology requires population-level data: You cannot understand a population-level phenomenon with individual-level data alone. The smoking-lung cancer link required population studies with thousands of participants. Vaccine efficacy requires large-scale surveillance of vaccinated and unvaccinated populations. Epidemiology is inherently a surveillance enterprise.

Benefits are broadly shared: Unlike some surveillance systems that benefit narrow institutional interests, epidemiological surveillance benefits everyone. Disease control is a public good; the benefits of surveillance extend to the people being surveilled.

The Arguments for Strong Privacy Protection

Surveillance chills health-seeking behavior: Fear that health information will be disclosed to employers, insurance companies, immigration authorities, or law enforcement can deter people from seeking care — an effect documented for HIV testing, sexually transmitted infections, mental health treatment, and immigration-related health conditions. Surveillance that deters care-seeking is counterproductive for public health.

Data security cannot be guaranteed: Health databases are valuable targets for hackers. Breaches of health data systems expose sensitive information to unauthorized parties. The 2015 Anthem health insurance breach exposed the records of 78.8 million people; the 2021 Elekta health data breach affected hundreds of cancer treatment centers.

Data is repurposed for non-health purposes: There is a documented history of health surveillance data being used for purposes beyond public health — immigration enforcement, law enforcement investigations, insurance discrimination, and employment discrimination. Once health data is collected and stored, its future uses are difficult to control.

Surveillance affects populations unequally: The populations whose health behaviors are most intensively surveilled — communities of color, low-income communities, incarcerated populations — are also the populations most likely to experience discrimination based on health status. Surveillance that falls disproportionately on already-marginalized populations exacerbates existing inequalities.

🔗 Connection: Jordan's Protest and the Political Dimensions of Health Disclosure

Jordan's discomfort during the contact tracing call — the awareness that disclosing "being outside with friends" might be disclosing participation in a political protest — illustrates a specific dimension of this tension. Public health surveillance that requires people to disclose their locations and social contacts creates a record of political associations. Even if the health system has no intention of using this information for political purposes, the chilling effect is real: people in the contact tracing call may describe their activities differently depending on whether they trust that the information will remain within the public health system. The chilling effect on political activity from public health disclosure is a genuine civil liberties concern, even when the public health purpose is legitimate.

24.10 Toward Trustworthy Epidemiological Surveillance

The tension between public health necessity and individual privacy is not resolved by choosing one side. It is managed through institutional design — the specific rules, constraints, and oversight mechanisms that govern how health surveillance data is collected, used, protected, and eventually deleted.

Key elements of trustworthy epidemiological surveillance:

Minimum necessary collection: Collect only the data required for the specific public health purpose. If aggregate reporting is sufficient for a given condition, don't collect individual-level data. If age and geographic area are sufficient, don't collect name and address.

Purpose limitation: Data collected for public health purposes should be used for public health purposes, not shared with law enforcement, immigration authorities, or employers. This requires both legal protections and active institutional defense against requests for data that exceed the collection purpose.

Proportional retention: Health surveillance data should be retained for as long as needed for public health purposes and no longer. Historical archives have scientific value, but indefinite retention of individual-level data creates risks that compound over time.

Community engagement: Surveillance programs that involve communities in their design and governance are more likely to be trusted and more likely to be used appropriately. Community advisory boards, public comment processes, and ongoing accountability reporting are institutional mechanisms for this engagement.

Equity monitoring: Surveillance systems should themselves be monitored for equity — whether they collect data disproportionately from marginalized populations, whether their benefits accrue disproportionately to advantaged populations, whether their chilling effects fall disproportionately on those who already distrust institutions.

24.11 Summary

Epidemiological surveillance is the systematic monitoring of population health — a practice that began with John Snow's cholera investigation in 1854 and has evolved into a comprehensive infrastructure connecting individual clinical encounters to national databases and international reporting systems. It operates through mandatory reporting by healthcare providers and laboratories, vital statistics registration, syndromic surveillance, and increasingly through environmental monitoring (wastewater epidemiology) and commercial behavioral data (mobility data).

The COVID-19 pandemic accelerated the development and deployment of epidemiological surveillance technologies at unprecedented speed, raising governance challenges that institutions have not yet fully addressed. Biobanks and genetic databases extend the surveillance frontier into the genome, creating new forms of re-identification risk and third-party exposure.

The fundamental tension — public health necessity versus individual privacy — cannot be resolved but must be continually managed through institutional design, legal frameworks, and community engagement. Chapter 31 will examine the specific legal frameworks that govern health data in the United States and internationally, including HIPAA, GDPR, and the growing body of state health privacy law.

Key Terms

Epidemiological surveillance: Continuous, systematic collection, analysis, and interpretation of health-related data for monitoring population health trends and informing public health action.

NNDSS (National Notifiable Disease Surveillance System): The primary infrastructure of U.S. disease surveillance; tracks approximately 120 reportable conditions through mandatory reporting from healthcare providers and laboratories to state and federal health departments.

Syndromic surveillance: Monitoring of disease-like symptoms before clinical diagnosis — using emergency department data, pharmacy sales, and other proxies — to detect outbreaks earlier than traditional reporting systems allow.

Mandatory reporting: Legal requirements that healthcare providers, laboratories, and other institutions report specified conditions to government health authorities, without requiring patient consent for each disclosure.

Contact tracing: The process of identifying and notifying people who have been exposed to an infectious disease case; one of the oldest epidemiological techniques, scaled through digital tools during COVID-19.

Biobank: A large repository of biological samples and associated health and behavioral data collected from research participants for broad future research use.

Vital statistics: Systematically recorded data on births, deaths, and marriages — the foundational dataset of epidemiology and demography.

Wastewater epidemiology: The analysis of sewage for biological and chemical markers to estimate population-level health conditions without individual testing.

Mobility data: Location and movement data derived from smartphones and other devices, used in pandemic response to assess compliance with social distancing policies.