Chapter 32 Exercises: Ethics in Data Science: Bias, Privacy, Consent, and Responsible Practice

Contributors to Introduction to Data Science

How to use these exercises: Part A tests conceptual understanding of bias, fairness, and privacy. Part B requires applying ethical frameworks to realistic scenarios. Part C involves evaluating real-world cases. Part D pushes toward synthesis — developing your own ethical reasoning about complex tradeoffs.

Difficulty key: ⭐ Foundational | ⭐⭐ Intermediate | ⭐⭐⭐ Advanced | ⭐⭐⭐⭐ Extension

Note: Many of these exercises do not have single "correct" answers. Ethics involves reasoning through competing values. The guidance provided offers one perspective — yours may differ, and that is fine, provided you reason carefully and consider multiple viewpoints.

Part A: Conceptual Understanding ⭐

Exercise 32.1 — Identifying where bias enters the pipeline

For each scenario, identify the stage of the data science pipeline where bias enters (problem definition, data collection, feature selection, model training, or deployment):

A medical AI trained primarily on data from academic hospitals performs poorly on patients from community clinics.
A company builds a "productivity prediction" model for job applicants, but "productivity" is defined as "hours logged in the office" — disadvantaging remote workers and caregivers.
A loan approval model includes zip code as a feature, which correlates with race due to residential segregation.
A content recommendation system that maximizes click-through rate promotes increasingly extreme content because extremity generates clicks.
A hiring model is tested on overall accuracy but not broken down by demographic groups, hiding the fact that it rejects women at twice the rate of men.

Guidance

1. **Data collection.** The training data is not representative of the full patient population. Academic hospital patients differ systematically from community clinic patients in demographics, conditions, and treatment patterns. 2. **Problem definition.** The bias enters before any data is collected — "productivity" is defined in a way that encodes assumptions about what good work looks like. 3. **Feature selection.** Zip code serves as a proxy for race. Even though race is not explicitly included, the model can learn to discriminate through the proxy variable. 4. **Deployment/feedback loop.** The model is deployed with an optimization objective (maximize clicks) that produces harmful outcomes. The feedback loop reinforces the pattern: extreme content gets clicks, which trains the model to recommend more extreme content. 5. **Model training/evaluation.** The model is evaluated using a single aggregate metric that masks subgroup disparities. The bias is in how the model is assessed, not in the data itself.

Exercise 32.2 — Understanding fairness definitions ⭐

A university uses an algorithm to predict which admitted students will graduate within four years. The algorithm is used to allocate scholarship funding. Consider two groups: students from high-income families and students from low-income families.

For each definition of fairness, explain what it would require in this context and identify one potential problem:

Demographic parity
Equal opportunity
Predictive parity

Guidance

1. **Demographic parity** would require that the same proportion of high-income and low-income students receive scholarships. Problem: if low-income students genuinely face more barriers and are less likely to graduate without financial support, then demographic parity might under-serve the group that most needs help. 2. **Equal opportunity** would require that among students who *would* graduate on time (the "positive" class), the algorithm identifies them at equal rates for both groups. Problem: if the algorithm is less accurate at identifying successful low-income students (perhaps because the features that predict success differ between groups), then equal opportunity is violated — capable low-income students are unfairly denied support. 3. **Predictive parity** would require that among students the algorithm selects for scholarships, the same proportion from each group actually graduates. Problem: this can coexist with very different false negative rates — the algorithm might miss many more deserving low-income students while still being "calibrated" for those it selects. The deeper issue: all three definitions are reasonable, but they may conflict. If low-income students have lower average graduation rates (due to financial stress, not ability), it becomes mathematically impossible to satisfy all three simultaneously.

Exercise 32.3 — Proxy discrimination ⭐

For each feature, explain how it could serve as a proxy for a protected attribute (race, gender, age, disability, religion):

First name
University attended
Zip code
LinkedIn profile photo (presence/absence)
Web browsing history

Guidance

1. **First name → race/ethnicity and gender.** Names are strongly correlated with demographic characteristics. A model that learns to prefer "traditional" Western names may discriminate against applicants with names from other cultural backgrounds. 2. **University → race and socioeconomic status.** Some universities are historically Black colleges (HBCUs), some are women's colleges, and university attendance strongly correlates with family income and geography, which correlate with race. 3. **Zip code → race and income.** Due to historical segregation and ongoing economic disparities, zip codes in the U.S. are strongly correlated with race and income. 4. **LinkedIn photo → race, gender, age, disability.** Presence of a photo may correlate with certain demographics, and if the system analyzes the photo itself, it directly accesses protected attributes. 5. **Browsing history → religion, political affiliation, health status, sexual orientation.** Websites visited can reveal sensitive personal characteristics — religious sites, political news, health information searches, dating platforms. The key insight: removing protected attributes from a model does not prevent discrimination if other features serve as proxies.

Exercise 32.4 — Privacy and anonymization ⭐

A health department releases an "anonymized" dataset of patient records for research. The following fields are included: zip code, date of birth, gender, diagnosis, treatment, and outcome.

Explain why this dataset is not truly anonymous.
Describe a scenario where someone could re-identify an individual.
Propose two changes that would improve privacy protection.

Guidance

1. Research has shown that 87% of the U.S. population can be uniquely identified using only zip code, date of birth, and gender. These three fields together form a "quasi-identifier" — they do not individually identify anyone, but combined, they narrow the possibilities to one person in most cases. 2. Scenario: A journalist knows that a local politician was hospitalized in March. The journalist finds the anonymized dataset, filters to the politician's zip code, approximate date of birth, and gender, and identifies a single matching record showing a diagnosis of substance abuse. The politician's private medical information is now public. 3. Improvements: - **Generalize quasi-identifiers:** Replace exact dates of birth with age ranges (30-39), replace 5-digit zip codes with 3-digit prefixes, or add random noise to dates. - **Apply k-anonymity:** Ensure that every combination of quasi-identifiers matches at least k individuals (e.g., k=5), so no single record can be uniquely identified. - **Apply differential privacy:** Add calibrated noise to the data or query results so that individual records cannot be distinguished.

Exercise 32.5 — Informed consent evaluation ⭐

Evaluate the quality of informed consent in each scenario (strong, weak, or absent):

A university researcher surveys students about study habits, explaining the purpose, how data will be stored, and that participation is voluntary.
A social media platform collects location data from users who agreed to a 12,000-word terms of service document.
A smart TV manufacturer records viewing habits and sells the data to advertisers; this is disclosed in paragraph 47 of the privacy policy.
A fitness app tracks users' exercise patterns and shares aggregated (not individual) data with public health researchers.
A city installs surveillance cameras with facial recognition capabilities in public parks without notifying residents.

Guidance

1. **Strong consent.** The researcher clearly explains purpose, storage, and voluntariness. This meets IRB standards for informed consent. 2. **Weak consent.** The user technically "agreed," but a 12,000-word document is not meaningfully informative. Most users do not read it. The consent is legal but not truly "informed." 3. **Very weak/absent consent.** Burying critical information in paragraph 47 of a privacy policy is designed to avoid scrutiny, not to inform users. This is consent in name only. 4. **Moderate consent.** The aggregation provides some privacy protection, and users consented to tracking. But did they understand their data would be shared externally? The quality depends on how clearly this was disclosed. 5. **Absent consent.** No notification was given. Being in a public space does not constitute consent to facial recognition surveillance. This is particularly concerning because residents cannot reasonably avoid public parks.

Part B: Applied Scenarios ⭐⭐

Exercise 32.6 — The hiring algorithm ⭐⭐

You are asked to build a model that predicts whether a job applicant will be a "successful" employee. You have access to 5 years of performance review data from the company.

What assumptions are embedded in using historical performance review data as the definition of "success"?
If certain demographic groups have historically received lower performance ratings (due to biased reviewers), what will your model learn?
Propose an alternative approach that reduces the risk of perpetuating historical bias.

Guidance

1. **Assumptions:** That past performance reviews accurately measure actual performance (they may reflect reviewer bias). That the criteria for "success" have remained constant (they may have evolved). That the people who were hired represent the best possible candidates (they represent only the candidates who passed the previous — possibly biased — hiring process). 2. **The model will learn to discriminate.** If women or minorities systematically received lower reviews (research shows this happens), the model will learn that features associated with these groups predict "lower performance." The model will then screen out similar candidates, perpetuating the cycle. 3. **Alternative approaches:** - Use objective performance metrics (sales numbers, code commits, project completion rates) instead of subjective reviews. - Blind the training data by removing demographic information AND potential proxies (names, photos, university names). - Audit the model's predictions across demographic groups before deployment. - Use the model as one input among many, not as an automated decision-maker. - Continuously monitor outcomes for demographic disparities after deployment.

Exercise 32.7 — The predictive policing dilemma ⭐⭐

A city police department wants to deploy a predictive policing model that uses historical crime data to identify "high-risk" neighborhoods for increased patrols.

Explain the feedback loop problem this creates.
Who benefits from this system? Who is potentially harmed?
What safeguards could reduce the risk of harm?
Some police departments have abandoned predictive policing entirely. What is the argument for abandonment vs. reform?

Guidance

1. **Feedback loop:** Historical crime data reflects where police have been deployed, not just where crime has occurred. Heavily policed neighborhoods generate more arrests (even for minor offenses), which produces more "crime data," which makes the model direct more policing to those neighborhoods. The model becomes a self-fulfilling prophecy, regardless of actual crime distribution. 2. **Benefits:** Police departments (efficiency, resource allocation), residents of truly high-crime areas (if the model is accurate). **Harms:** Residents of over-policed neighborhoods (increased surveillance, more stops, more arrests for minor offenses, psychological burden, community distrust), disproportionately communities of color. 3. **Safeguards:** - Use victim-reported crime data (calls for service) rather than arrest data - Exclude minor offenses (drug possession, loitering) that are enforcement-dependent - Cap the amount of additional policing the model can direct to any single area - Regular audits of the model's racial and geographic impact - Community oversight board with veto power - Sunset clauses requiring periodic re-evaluation 4. **Abandonment argument:** The data is so thoroughly corrupted by historical policing patterns that no amount of correction can make it fair. The system will always reflect and amplify existing disparities. **Reform argument:** Policing decisions are already being made — at least a model can be audited and adjusted, whereas human intuition cannot. The question is whether a biased model is better or worse than biased human judgment. This is a genuine dilemma with reasonable people on both sides.

Exercise 32.8 — Privacy vs. public health ⭐⭐

During a pandemic, a government proposes using mobile phone location data to trace contacts of infected individuals. The data would identify everyone who spent more than 15 minutes within 6 feet of a confirmed case.

What are the public health benefits of this approach?
What are the privacy risks?
How would you design a system that balances both concerns?
Does the answer change depending on the severity of the pandemic? Why?

Guidance

1. **Benefits:** Faster identification of exposed individuals, more targeted quarantine (reducing the need for blanket lockdowns), better understanding of transmission patterns, potential to save lives. 2. **Privacy risks:** The government would know who was near whom, when, and where — information that reveals social networks, romantic relationships, political associations, and daily habits. This data could be misused for surveillance, law enforcement, or political targeting. Once collected, the data could be retained and repurposed beyond the pandemic. 3. **Design for balance:** - Use decentralized architecture (phones exchange anonymous tokens, not identities) - Implement automatic data deletion after 14-21 days - Make participation voluntary, not mandatory - Use differential privacy for aggregate analysis - Prohibit use of contact tracing data for law enforcement - Establish independent oversight and sunset the system when the pandemic ends - Be transparent about exactly what data is collected and how it is used 4. **Severity matters** because the ethical calculus shifts with the stakes. A disease with a 0.1% fatality rate does not justify the same privacy intrusions as one with a 30% fatality rate. But even in severe cases, safeguards matter — both because privacy is a fundamental right and because public trust (which depends on privacy protections) is essential for voluntary compliance.

Exercise 32.9 — The credit scoring question ⭐⭐

A fintech startup wants to use social media data (number of friends, posting frequency, content sentiment) to supplement traditional credit scoring for people with limited credit history.

What privacy concerns does this raise?
Could this approach reduce bias (by providing credit to people excluded from traditional scoring) or increase bias? Explain both possibilities.
What would GDPR require if this system were deployed in the EU?

Guidance

1. **Privacy concerns:** Social media data reveals political views, religious beliefs, relationship status, mental health indicators, and social circles — all of which are sensitive personal information that people did not share for the purpose of credit evaluation. Using this data for credit decisions repurposes it in ways users did not anticipate or consent to. 2. **Could reduce bias:** Traditional credit scoring excludes people without credit history — disproportionately young people, immigrants, and low-income individuals. Alternative data could give these groups access to credit they would otherwise be denied. **Could increase bias:** Social media behavior correlates with demographics. Number of friends, posting frequency, and content type vary by culture, age, and socioeconomic status. A model trained on this data could learn to discriminate along these dimensions. People who cannot afford smartphones or who avoid social media (often older, poorer, or more privacy-conscious) would still be excluded. 3. **GDPR requirements:** - Users must explicitly consent to their social media data being used for credit decisions (consent must be specific, informed, and freely given) - The company must explain the logic of the decision-making process (Article 22) - Users have the right to contest automated decisions - The data must be adequate, relevant, and limited to what is necessary (data minimization) - Users can request deletion of their data - A Data Protection Impact Assessment would likely be required before deployment

Exercise 32.10 — Auditing a dataset ⭐⭐

You have been given a dataset of 50,000 patient records from a clinical trial for a new medication. Perform an ethical audit by answering these questions:

What demographic information would you check, and why?
If you find that 85% of participants are white and 70% are male, what are the implications?
The dataset includes genetic data. What additional privacy concerns does this raise?
The trial was conducted only at hospitals in major U.S. cities. How does this affect generalizability?
What would you include in an "ethical limitations" section of your report?

Guidance

1. **Check:** Age distribution, sex/gender balance, race/ethnicity, geographic location, socioeconomic indicators, and any other characteristics that might affect how the medication works. Why: medication efficacy can vary across demographic groups due to genetic, metabolic, and environmental factors. A trial that underrepresents a group cannot make reliable claims about the medication's safety or efficacy for that group. 2. **Implications:** The results may not generalize to women, non-white patients, or other underrepresented groups. Side effects that predominantly affect underrepresented groups may go undetected. If the medication is approved based on this data, it may be prescribed to populations for whom its safety has not been established. 3. **Genetic data** raises heightened privacy concerns because: (a) it cannot be changed or anonymized (your genome is uniquely yours), (b) it reveals information about biological relatives who did not consent, (c) it could be used for discrimination in insurance or employment, (d) it can reveal sensitive information about ancestry, disease predisposition, and parentage. 4. **Major U.S. cities** introduces selection bias: urban hospital patients may differ from rural patients in health status, access to care, diet, environmental exposures, and socioeconomic factors. Results may not apply to rural populations, international populations, or populations with limited healthcare access. 5. **Ethical limitations section:** Should disclose the demographic imbalance, the geographic limitation, the potential for the findings to not generalize to underrepresented groups, the sensitivity of genetic data, and a recommendation that follow-up studies include more diverse populations.

Exercise 32.11 — The A/B test ethics ⭐⭐

Your company runs an A/B test where half of users see a "dark pattern" design (a UI that makes it difficult to cancel a subscription) and half see a straightforward cancellation flow. The dark pattern group has 40% lower cancellation rates.

Is this A/B test ethical? Why or why not?
The company argues that the test is "just measuring user behavior." Evaluate this claim.
What ethical framework would you apply to evaluate this decision?

Guidance

1. **This test is ethically problematic.** The "treatment" (dark pattern) is designed to manipulate users against their own interests. Unlike medical A/B tests, where participants provide informed consent, users in this test do not know they are being experimented on, and the "treatment" is designed to harm them (making it harder to exercise a choice they want to make). 2. **The "just measuring" argument is flawed.** The test is not neutrally observing behavior — it is actively manipulating the user experience to prevent desired actions. "Measuring" the effect of a manipulative design is not neutral; it is evaluating the effectiveness of a manipulation tool with the intent to deploy it. 3. **Ethical framework application:** - **Who benefits?** The company (reduced churn). **Who is harmed?** Users who want to cancel but cannot easily do so (financial harm, frustration, erosion of trust). - **Was there consent?** No — users did not agree to be part of this test. - **Is it transparent?** No — the design intentionally obscures the cancellation option. - **The categorical imperative:** Would it be acceptable if *every* company used dark patterns? The resulting world — where no one can easily cancel anything — is clearly undesirable. - **Conclusion:** The test produces knowledge about how to manipulate users. Even "just measuring" this serves an unethical purpose.

Exercise 32.12 — Disparate impact analysis ⭐⭐

A company's promotion algorithm considers: years of experience, number of completed projects, manager ratings, and "leadership score" (based on peer evaluations). The algorithm does not include gender.

Analysis reveals: - Men receive an average leadership score of 7.2/10 - Women receive an average leadership score of 5.8/10 - The promotion rate for men is 23%; for women, it is 11%

Does this system have disparate impact? Explain.
Is the system "biased" even though it does not include gender as a feature?
What could explain the difference in leadership scores?
What would you recommend?

Guidance

1. **Yes, disparate impact exists.** The four-fifths rule (used in U.S. employment law) states that if the selection rate for a protected group is less than 80% of the rate for the majority group, there is evidence of adverse impact. Women's promotion rate (11%) is less than four-fifths of men's rate (23% × 0.8 = 18.4%), so the system has disparate impact. 2. **Yes, the system is biased** in its effects, even though gender is not an explicit feature. The leadership score functions as a proxy for gender. The model does not need to "know" someone's gender if it has a feature that correlates with gender. 3. **Possible explanations for the score gap:** - **Bias in peer evaluations:** Research consistently shows that identical behaviors are perceived as "leadership" in men and "bossiness" in women. Peer evaluations may reflect gender stereotypes. - **Opportunity gap:** If women are less likely to be assigned to high-visibility projects, they may have fewer opportunities to demonstrate leadership. - **Different communication styles:** If "leadership" is culturally defined in male-coded terms (assertiveness, dominance), women who lead differently may receive lower scores. 4. **Recommendations:** - Audit the leadership score for gender bias specifically - Consider using structured evaluation criteria rather than subjective peer ratings - Test the promotion algorithm with and without the leadership score to measure its contribution to the gender gap - If the feature cannot be debiased, consider removing it or down-weighting it - Implement regular disparate impact monitoring

Part C: Case Evaluation ⭐⭐⭐

Exercise 32.13 — COMPAS deep dive ⭐⭐⭐

Read about the ProPublica analysis of the COMPAS recidivism prediction tool (the case described in Section 32.2). Then answer:

Explain the specific fairness definitions that ProPublica used and that Northpointe used, and why both could claim their position was correct.
A judge uses COMPAS scores as one factor among many in sentencing. Does this make the ethical concerns better or worse? Why?
The COMPAS tool uses 137 features, but race is not one of them. Explain how the tool can still produce racially disparate outcomes.
If you were advising a court system, would you recommend using, reforming, or abandoning algorithmic risk assessment tools? Justify your position.

Guidance

1. ProPublica focused on **error rate balance** (equal false positive and false negative rates across racial groups). By this standard, COMPAS was unfair because Black defendants had a much higher false positive rate. Northpointe focused on **predictive parity** (among defendants scored as high-risk, the proportion who actually reoffended was similar across racial groups). By this standard, COMPAS was fair. Both are valid definitions of fairness, but they cannot be simultaneously satisfied when base rates (actual reoffending rates) differ between groups. 2. **Both arguments exist:** Better, because the judge can apply human judgment, context, and mitigating factors that the algorithm cannot consider. Worse, because research shows that judges tend to anchor on the algorithm's score even when they have other information — the presence of a "scientific" number can make judges less likely to exercise independent judgment. Also, inconsistent use (some judges relying heavily on it, others ignoring it) creates its own unfairness. 3. **Proxy variables.** Many of the 137 features correlate with race: neighborhood characteristics, prior arrest history (which reflects policing patterns), employment status, education level, family history of incarceration. Race need not be explicitly included for the model to effectively learn racial patterns. 4. **This is a genuine dilemma.** Arguments for reform: algorithms can be audited and improved; human judges are also biased but less transparent. Arguments for abandonment: the data is too corrupted by systemic racism to produce fair predictions; the veneer of objectivity makes algorithmic bias harder to challenge. A middle position: use algorithms only to flag cases for additional review, never to increase punishment, with mandatory annual bias audits.

Exercise 32.14 — Facial recognition policy ⭐⭐⭐

Several cities have banned or restricted the use of facial recognition technology by government agencies. Evaluate the arguments for and against a complete ban.

List three legitimate uses of facial recognition technology.
List three ways it could be misused.
What is the argument that regulation (not a ban) is sufficient?
What is the argument that a ban is necessary?
What is your position, and what evidence supports it?

Guidance

1. **Legitimate uses:** Finding missing children or persons, identifying victims in mass casualty events, airport security screening (with consent), unlocking personal devices (voluntary use). 2. **Potential misuses:** Mass surveillance of public spaces without consent, tracking political protesters or journalists, racial profiling (given documented accuracy disparities), creating a database of citizens' movements and associations, enabling authoritarian social control. 3. **Regulation argument:** The technology itself is not inherently bad — it depends on how it is used. Well-designed regulations (accuracy standards, warrant requirements, prohibited uses, bias audits, transparency reports) can allow beneficial uses while preventing harmful ones. A blanket ban prevents legitimate uses. 4. **Ban argument:** History shows that surveillance technology, once deployed, tends to expand in scope. Regulations can be weakened, exceptions can be exploited, and enforcement is difficult. The accuracy disparities disproportionately harm communities of color. The chilling effect on free speech and assembly exists regardless of whether the data is "misused." Some technologies are too dangerous to regulate — they must be prohibited. 5. Students should develop their own position with supporting reasoning. Strong answers acknowledge the legitimate concerns on both sides while articulating a clear position.

Exercise 32.15 — Cambridge Analytica analysis ⭐⭐⭐

Research the Cambridge Analytica scandal and answer:

At what point in the data pipeline did the primary ethical violation occur?
Was the problem the technology (data analysis and targeting) or the governance (policies and oversight)?
What changes resulted from the scandal (regulatory, corporate, public awareness)?
Could the same type of data harvesting happen today? Why or why not?

Guidance

1. **Multiple points:** Data *collection* (harvesting friends' data without consent), data *use* (repurposing academic research data for political manipulation), and *deployment* (using psychological profiles to target voters with personalized messaging). The most fundamental violation was the collection of data from people who never consented — the friends of quiz-takers. 2. **Both, but primarily governance.** The analytical techniques (psychometric profiling, targeted advertising) are standard tools. The problem was that Facebook's policies allowed mass data extraction by third parties, there was no effective oversight of how the data was used, and the boundary between academic research and commercial/political use was not enforced. 3. **Changes:** GDPR enforcement intensified. Facebook restricted third-party data access. Cambridge Analytica was shut down. Public awareness of data privacy increased significantly. Several countries launched investigations. The FTC fined Facebook $5 billion. But targeted political advertising continues, and many of the underlying dynamics have not changed. 4. **Partially addressed, partially not.** The specific mechanism (third-party apps accessing friends' data) has been closed. But social media companies still collect vast amounts of behavioral data, and targeted political advertising based on behavioral profiles remains common. The underlying business model — surveillance capitalism — has not fundamentally changed.

Part D: Synthesis and Ethical Reasoning ⭐⭐⭐–⭐⭐⭐⭐

Exercise 32.16 — Ethical audit of the vaccination project ⭐⭐⭐

Apply the five-question ethical framework from Section 32.6 to your vaccination rate analysis project. Write a 400-500 word ethical audit addressing:

Who benefits from your analysis, and who could be harmed?
Is your data representative? What groups are underrepresented or missing?
What are the failure modes of your analysis?
How could your findings be misused?
Are you being transparent about limitations?

Guidance

A strong audit will address specific issues rather than generalities. For example: **Who is harmed:** Countries with low vaccination rates could be stigmatized, reduced to a number, or blamed for systemic failures beyond their control. If your analysis is used to allocate aid, the criteria you choose could advantage some countries over others. **Representation:** Country-level data masks within-country variation. Small countries and conflict-affected states often have unreliable data or are excluded entirely. Indigenous and refugee populations are often invisible in national statistics. **Failure modes:** If your correlation between GDP and vaccination rates is interpreted as "just increase GDP," it oversimplifies. If you identify trends using incomplete data, your projections could misdirect resources. **Misuse:** Rankings could be used to shame countries rather than help them. Correlations with governance indicators could be weaponized in political discourse. **Transparency:** Are you clear about data quality variation? About what your analysis can and cannot conclude?

Exercise 32.17 — The trolley problem of data science ⭐⭐⭐

A hospital uses a model to predict which patients will benefit most from an organ transplant. The model is highly accurate overall, but it systematically underestimates the survival benefit for elderly patients (because the training data contains fewer elderly transplant recipients, the model has less information about this group).

If you deploy the model as-is, what happens to elderly patients?
If you add a correction factor to equalize the model's performance for elderly patients, you will slightly decrease overall accuracy. Is this tradeoff justified? Why or why not?
Who should make this decision — the data scientist, the hospital administration, an ethics board, or the patients?
How does this differ from the COMPAS case? How is it similar?

Guidance

1. Elderly patients will be systematically ranked lower on the transplant list, receiving organs less often than their actual medical need warrants. Some elderly patients who would have benefited will die waiting. 2. This is a values question. Arguments for the correction: the reduced accuracy is small and diffuse, while the harm of underserving elderly patients is large and concentrated. The current model is not truly "more accurate" — it is more accurate for young patients at the expense of elderly patients. Arguments against: any correction that reduces overall accuracy means some non-elderly patients who would have benefited will now wait longer. 3. This should be a shared decision. The data scientist should identify the problem and propose solutions. An ethics board (ideally including patient advocates and ethicists) should evaluate the tradeoffs. The hospital administration should implement the decision. Transparency with patients about how allocation decisions are made is essential. 4. **Similar:** Both involve models that perform differently for different groups, both require value judgments about which type of error is more tolerable, and both affect life outcomes. **Different:** COMPAS involves the criminal justice system and racial bias; this involves healthcare and age. The stakes are similarly high but the social context differs. In healthcare, there may be more consensus about the goal (save the most lives) than in criminal justice (where the purpose of risk assessment itself is debated).

Exercise 32.18 — Designing an ethical data science practice ⭐⭐⭐

You are hired as the first data scientist at a small company (50 employees). The CEO asks you to "set up data science the right way." Design an ethical data science practice by outlining:

Three policies you would implement immediately
A process for ethical review of new data science projects
How you would handle a request from marketing to use customer data in a way you consider unethical
How you would communicate data ethics principles to non-technical colleagues

Guidance

1. **Three immediate policies:** - **Data inventory and access control:** Document what data the company collects, where it is stored, who can access it, and what it is used for. Restrict access to need-to-know basis. - **Privacy-by-design:** All new products and features must include a privacy review before launch. Default to collecting less data, not more. - **Bias testing:** Any model that affects customers (recommendations, pricing, credit, access) must be tested for demographic disparities before deployment and monitored after. 2. **Ethical review process:** Before starting a new project, complete a one-page "ethical impact assessment" covering: purpose, affected populations, data sources, potential for bias, potential for misuse, and privacy implications. For high-risk projects (affecting health, employment, credit, or freedom), convene a review meeting with stakeholders outside the data team. 3. **Handling an unethical request:** Document your concerns in writing. Explain the specific risks (legal, reputational, ethical) in business terms the CEO will understand. Propose an alternative approach that achieves the business goal without the ethical problem. If overruled, escalate to the appropriate level. Know your own ethical boundaries. 4. **Communicating to non-technical colleagues:** Use real-world case studies (COMPAS, Amazon hiring) rather than abstract principles. Conduct a 30-minute "data ethics" workshop. Create a simple decision tree: "Before using data for X, check: Do we have consent? Could this discriminate? What happens if the model is wrong?"

Exercise 32.19 — The cost of not being biased ⭐⭐⭐⭐

A car insurance company discovers that using a customer's credit score to set insurance premiums is highly predictive of accident risk — customers with lower credit scores have more accidents, on average. However, credit scores are correlated with race and income, so using them results in higher premiums for minority and low-income drivers.

Is it ethical to use credit scores for insurance pricing? Present arguments on both sides.
If the company removes credit score from its model, overall accuracy drops by 8%. What are the consequences of this accuracy reduction?
Is there a middle ground between "use it" and "don't use it"?
Several U.S. states have banned the use of credit scores in insurance pricing. Do you agree with this approach? Why?

Guidance

This is one of the most debated questions in insurance ethics. Strong answers engage with both sides: **Arguments for using credit score:** It is genuinely predictive (people with lower credit scores do file more claims). Insurance is about pricing risk accurately. If the company cannot use credit scores, premiums for all customers increase to compensate, effectively asking good drivers to subsidize higher-risk drivers. The correlation with race may reflect causal factors (financial stress → distracted driving, deferred vehicle maintenance) rather than discrimination. **Arguments against:** The correlation with race means that the practice has disparate impact on minorities, effectively charging them more for insurance regardless of their individual driving record. Credit score reflects systemic economic disadvantage, not individual moral character. Using it punishes people for being poor. It creates a vicious cycle: financial stress leads to worse credit, worse credit leads to higher insurance costs, higher costs increase financial stress. **Consequences of removal:** Higher premiums for currently low-risk customers (cross-subsidization), lower premiums for currently high-risk customers, slightly worse predictions of accident risk, potentially more equitable outcomes overall. **Middle ground options:** Use credit score only as one factor among many, with a maximum allowable impact on premiums. Allow customers to appeal credit-based surcharges with evidence of safe driving. Use alternative metrics that are predictive but less correlated with protected attributes.

Exercise 32.20 — Writing a data ethics statement ⭐⭐⭐⭐

Write a 200-300 word personal data ethics statement that articulates your principles as a data scientist. Address:

What values guide your work?
What would you refuse to build?
How do you balance business objectives with ethical concerns?
How do you handle uncertainty about whether something is ethical?

This is a reflective exercise — there is no single right answer. The goal is to articulate your own ethical framework.

Guidance

A strong ethics statement is specific, not generic. Compare: **Weak:** "I believe in using data for good and not for evil." **Strong:** "I will test every model I build for disparate impact across demographic groups before deployment. If I discover that a model harms a vulnerable group and cannot be corrected, I will recommend against deployment even if the model is profitable." Your statement should reflect your actual values, not what you think sounds good. If you prioritize business outcomes, say so — and explain what limits you would set. If you prioritize equity, explain how you would navigate situations where equity and accuracy conflict. The process of writing this statement is as valuable as the product. It forces you to articulate principles that may have been implicit and to confront tradeoffs you may not have considered.

Exercise 32.21 — Historical harms and data ⭐⭐⭐⭐

Research one of the following historical cases where data and statistics were used to cause harm:

The use of census data to identify and intern Japanese Americans during World War II
The Tuskegee syphilis study (1932-1972)
The use of IQ testing to justify forced sterilization in the eugenics movement
The use of data analytics by authoritarian regimes for political repression

Write a 300-word analysis connecting the historical case to modern data science ethics. What parallels exist? What safeguards have been established? What risks remain?

Guidance

Each case illustrates a theme relevant to modern data science: **Census data and internment:** Data collected for one purpose (population statistics) was repurposed for another (identifying an ethnic group for mass imprisonment). This connects directly to the GDPR principle of purpose limitation and to debates about government access to commercial data. **Tuskegee:** Researchers deliberately withheld treatment from Black men with syphilis to study the disease's progression — without informed consent. This case led to the establishment of IRBs and modern informed consent requirements. The parallel to data science: are we conducting experiments on users (A/B tests, algorithmic changes) without meaningful consent? **Eugenics and IQ testing:** Statistics were used to create a scientific veneer for racist ideology. "Objective" measurements were used to justify discrimination. The parallel to modern ML: are we using "objective" models to encode and legitimize existing social biases? **Authoritarian data use:** Surveillance data enables political repression. The parallel: facial recognition, location tracking, and social media monitoring are tools that can be used by any government, democratic or authoritarian.

Exercise 32.22 — The future of data ethics ⭐⭐⭐⭐

Consider three emerging technologies: (1) large language models that can generate synthetic text, code, and images; (2) emotion recognition AI that claims to detect emotions from facial expressions; (3) brain-computer interfaces that can read neural signals.

For each technology, write 3-4 sentences addressing: What new ethical questions does it raise? How are existing ethical frameworks (consent, privacy, fairness) insufficient for this technology?

Guidance

**Large language models:** These raise questions about attribution (who "authored" AI-generated text?), manipulation (personalized persuasion at scale), misinformation (synthetic media that is indistinguishable from reality), and labor displacement. Existing frameworks address how data is collected and used, but not what happens when AI creates new "data" (synthetic text, images) that can be mistaken for human output. **Emotion recognition:** The scientific validity of reading emotions from facial expressions is disputed — many researchers argue that facial expressions do not map reliably to internal emotional states, and that the mapping varies across cultures. Using this technology for hiring, security, or education could disadvantage people whose facial expressions do not match the system's cultural assumptions. Existing consent frameworks do not address whether people can be "emotionally surveilled" in public spaces. **Brain-computer interfaces:** These raise unprecedented privacy questions — thought privacy. If a device can read neural signals, who owns that data? Can an employer require brain monitoring for "safety"? Can law enforcement access neural data? Existing privacy frameworks were designed for behavioral data (what you do), not cognitive data (what you think). The concept of "informed consent" for neural monitoring is entirely uncharted.

Exercise 32.23 — Ethical case debate ⭐⭐⭐

Organize a class debate (or write arguments for both sides) on one of the following propositions:

"Data scientists should be professionally licensed, like doctors and engineers."
"Companies should be required to open-source all algorithms that make decisions about people."
"Individuals should have the right to a human decision-maker (not an algorithm) for any decision that significantly affects their life."

Write a 200-word argument FOR and a 200-word argument AGAINST your chosen proposition.

Guidance

Example for proposition 3: **FOR:** Algorithms lack the ability to consider context, nuance, and the full complexity of human circumstances. A person denied a loan, a parole, or a medical treatment by an algorithm has no one to explain their situation to — no one who can recognize extenuating circumstances or exercise compassion. Human decision-makers are imperfect, but they can be reasoned with, appealed to, and held accountable in ways that algorithms cannot. The right to face your decision-maker is a fundamental principle of justice. **AGAINST:** Human decision-makers are also biased — but their biases are invisible, inconsistent, and impossible to audit. Judges give harsher sentences before lunch. Doctors treat pain less aggressively in Black patients. Loan officers favor applicants who "seem trustworthy." At least an algorithm can be examined, tested, and improved. Mandating human decision-makers would be enormously expensive, slow, and would not eliminate bias — it would merely make bias opaque. The right approach is to improve algorithms, not to replace them with equally flawed human judgment.

Exercise 32.24 — Ethical impact assessment ⭐⭐⭐⭐

Choose a data science application that interests you (social media recommendation, autonomous vehicles, predictive healthcare, educational technology, etc.) and write a formal ethical impact assessment covering:

Description of the system and its purpose
Stakeholder analysis (who benefits, who is affected)
Data sources and potential biases
Fairness considerations (which definition of fairness applies?)
Privacy implications
Potential for misuse
Recommended safeguards
Overall ethical assessment (should this system be built? Under what conditions?)

Target length: 500-800 words.

Guidance

This exercise synthesizes everything in the chapter into a single practical document. A strong assessment will: - Be specific about the system (not "AI" in general, but a particular application) - Identify concrete harms, not just abstract risks - Acknowledge benefits as well as risks (most systems have both) - Propose specific, actionable safeguards rather than vague principles - Reach a clear conclusion with conditions ("this system should be built IF these safeguards are in place")

Exercise 32.25 — Reflection: your ethical compass ⭐

Write a one-paragraph reflection (150-200 words) answering the question: After studying this chapter, what is the single most important ethical principle you will carry into your data science practice, and why?

This is a personal reflection — there is no wrong answer.

Guidance

Strong reflections are specific and grounded in the chapter material. Rather than "I will be ethical," consider: "The most important lesson for me was that bias enters before any code is written — at the problem definition stage. I now realize that the question 'what are we optimizing for?' is an ethical question disguised as a technical one. I will make it a habit to ask, for every project, whether the metric I'm optimizing actually measures what matters, or just what's easy to measure."

Reflection

Ethics is not a chapter you complete and move on from. It is a lens you carry into every project, every analysis, and every communication for the rest of your career. The cases in this chapter — COMPAS, Amazon hiring, facial recognition, Cambridge Analytica — are not historical curiosities. Similar situations are unfolding right now, and you may encounter them in your own work.

The goal is not to become paralyzed by ethical anxiety. The goal is to build the habit of asking ethical questions alongside technical ones. "Is this model accurate?" AND "Is this model fair?" "Does this analysis work?" AND "Does this analysis consider who it might harm?"

Those two sets of questions are not in conflict. They are two halves of doing data science well.