Chapter 27: Exercises — Privacy-Preserving AI Techniques

DataField.Dev

Chapter 27: Exercises — Privacy-Preserving AI Techniques

These 25 exercises range from technical exploration to policy analysis and organizational design. They are designed for individual completion, paired work, or classroom discussion.

Conceptual Understanding

Exercise 1: The Trade-Off Spectrum Describe the privacy-accuracy trade-off in your own words using a concrete, non-technical analogy — not mathematical notation. Then explain why this trade-off can be reduced by privacy-preserving techniques but not eliminated. What would it mean to claim that a technique "eliminates" the trade-off? Why should such a claim be viewed skeptically?

Exercise 2: Epsilon Intuition The epsilon parameter controls the privacy-accuracy trade-off in differential privacy. In your own words (no formulas), explain what it means for epsilon to be very small (say, 0.01) and what it means for it to be very large (say, 100). A marketing team wants to describe the company's use of epsilon = 1.0 as "perfect privacy." Write a one-paragraph response explaining why that description is inaccurate and what a more accurate description would be.

Exercise 3: Federated vs. Centralized A healthcare company is deciding whether to train its clinical AI model using centralized patient data or federated learning. List three specific advantages of federated learning for this use case. Then list three limitations or residual risks of federated learning that the company should not ignore. What additional technique would you recommend pairing with federated learning to address the most significant residual risk?

Exercise 4: The GAN Synthetic Data Problem A data science team generates a synthetic patient dataset using a GAN trained on real patient records. They plan to share the synthetic data publicly, claiming it is "anonymous because it contains no real patients." Identify at least two technical failure modes that could allow the synthetic data to reveal information about real individuals. What additional step would provide formal privacy guarantees?

Exercise 5: Privacy Budget Exhaustion A government statistics agency uses differential privacy with an annual privacy budget of epsilon = 5.0 for its survey data. The agency wants to answer 20 queries per year from different policy analysts, each requiring epsilon = 0.5 per query. Will the budget last the year? What should the agency do when the budget is exhausted? Propose a budget management policy with specific rules.

Technical Application

Exercise 6: Running the Demonstration Code If you have Python installed with numpy and matplotlib, run the differential_privacy_demo.py file in the code/ subdirectory of this chapter. Document what happens to the accuracy of the count query as epsilon decreases from 10 to 0.1. At what epsilon value does the error become large enough that the DP answer would be practically useless? What does this reveal about the relationship between population size and the feasibility of strong privacy?

Exercise 7: Sensitivity Calculation Calculate the sensitivity for each of the following queries on a dataset of 1,000 employees: - (a) The count of employees who are over 50 years old - (b) The average salary of employees in the Engineering department, where salaries range from $50,000 to $200,000 - (c) The maximum age in the dataset, where ages range from 22 to 65 - (d) The proportion of employees who attended a training

For each, explain how one employee's data could change the query result and by how much in the worst case.

Exercise 8: Local vs. Central DP Comparison A company is choosing between local differential privacy (noise added on users' devices) and central differential privacy (data collected raw; noise added on server). For each of the following dimensions, explain which approach is better and why: - (a) Protection against company server breach - (b) Protection against adversarial company employees - (c) Accuracy of derived statistics - (d) Trust model (what users must trust about the company)

Under what circumstances would central DP be preferable despite its weaker trust model?

Exercise 9: SMPC Feasibility Analysis Three competing banks want to jointly train a fraud detection model without sharing raw transaction data. Evaluate the feasibility of using secure multi-party computation for this use case. Consider: (a) What computation would need to be performed via SMPC? (b) What is the approximate scale of computation? (c) What is the communication overhead? (d) What would a practical hybrid approach look like? Under what conditions would SMPC be practically feasible for this use case?

Exercise 10: Homomorphic Encryption Use Cases Rate each of the following potential homomorphic encryption applications as "currently practical," "near-term feasible," or "future research" based on the performance characteristics described in this chapter. Explain your reasoning for each: - (a) Encrypted inference: a user submits an encrypted medical image; a remote service returns an encrypted diagnostic prediction - (b) Training a large language model on encrypted text data - (c) Computing a private database lookup (is record X in database Y?) using encrypted queries - (d) Real-time encrypted video analysis for security applications

Policy and Regulatory Analysis

Exercise 11: GDPR Compliance Analysis A company wants to train a customer behavior prediction model on GDPR-covered personal data (EU customer clickstream data). The company considers using federated learning: model training happens on users' browsers; only model updates (gradients) flow to the company's server. Analyze this approach against the following GDPR provisions: (a) lawful basis for processing (Article 6), (b) data minimization (Article 5(1)(c)), (c) purpose limitation (Article 5(1)(b)), (d) whether gradient transmission constitutes a "disclosure" or "transfer" of personal data.

Exercise 12: HIPAA Analysis Apply the analysis from Exercise 11 to a hospital using federated learning to train a sepsis prediction model. Replace GDPR provisions with: (a) HIPAA's minimum necessary standard, (b) HIPAA's requirements for de-identification, (c) the Business Associate Agreement framework. Does federated learning eliminate the need for a BAA between the hospital and the technology vendor? Why or why not?

Exercise 13: Census Bureau Trade-Off Analysis The US Census Bureau used epsilon = 19.61 for its 2020 Census differential privacy implementation. Civil rights groups argued the accuracy loss for small minority communities was too high. Privacy researchers argued the prior swapping methodology was inadequate. Evaluate both arguments. What epsilon value would you have recommended, and what criteria would guide your recommendation? Who should make this decision?

Exercise 14: Regulatory Gaps Analysis Identify three specific regulatory gaps that currently make it difficult for organizations to know whether their privacy-preserving AI implementations satisfy GDPR, HIPAA, or CCPA requirements. For each gap, propose a regulatory clarification that would provide actionable guidance to organizations.

Exercise 15: FTC Enforcement Standards The FTC has brought enforcement actions against companies for making deceptive privacy claims. A company claims its AI product is "privacy-preserving" in its marketing materials, but the product uses federated learning without gradient differential privacy and without auditing for gradient inversion vulnerabilities. Evaluate whether this claim could constitute a deceptive trade practice under Section 5 of the FTC Act. What specific disclosures would be required to make the claim accurate?

Organizational Implementation

Exercise 16: Technique Selection Workshop Your organization operates in the healthcare space and is considering three AI development initiatives: - (a) A cancer screening tool to be trained on imaging data from 15 partner hospitals - (b) A clinical trial participant matching system trained on EHR data from a single institution - (c) An aggregate analytics dashboard showing population health trends for public release

For each initiative, recommend the most appropriate privacy-preserving technique or combination of techniques. Justify your recommendation based on data type, number of participants, accuracy requirements, and regulatory environment.

Exercise 17: Privacy Budget Policy Draft a privacy budget policy for a financial services company that uses differential privacy to answer internal analytics queries on customer transaction data. The policy should specify: (a) how the annual privacy budget is set, (b) how budget is allocated across business units, (c) how individual queries are charged against the budget, (d) what happens when the budget is exhausted, and (e) who has authority to approve exceptional budget usage.

Exercise 18: Vendor Evaluation You are evaluating two federated learning vendors for a healthcare AI project:

Vendor A: Claims "zero-knowledge federated learning" with no differential privacy implementation. Offers accuracy benchmarks from a single large hospital. Does not disclose gradient protection methodology. Price: $200,000/year.

Vendor B: Implements federated learning with differential privacy applied to gradients, epsilon = 2.0 per training round. Publishes accuracy benchmarks across diverse institutions including rural and community hospitals. Third-party audited implementation. Price: $350,000/year.

Evaluate these vendors on privacy, accuracy, and governance dimensions. Which would you recommend and why? What additional information would you request before making a final decision?

Exercise 19: Communicating PETs to the Board You need to present the company's adoption of federated learning and differential privacy to the board of directors in five minutes. The board members are not technical. Draft a five-minute talking-points script that: (a) explains what the company does with customer data for AI training, (b) explains what federated learning and differential privacy are without jargon, (c) explains the privacy benefit in concrete terms, (d) honestly addresses the limitations, and (e) explains the business case.

Exercise 20: Skills Gap Assessment Your organization wants to build in-house capability in privacy-preserving AI. Conduct a skills gap analysis by identifying: (a) what technical skills are needed for each major technique (DP, FL, SMPC, HE, synthetic data), (b) what organizational roles currently hold relevant partial expertise, (c) what training programs or hiring profiles would address the gap, and (d) what tasks should be sourced from specialized vendors even after internal capability is built.

Critical Thinking

Exercise 21: The "Ethics Washing" Risk The key takeaways for this chapter note that claiming a system is "privacy-preserving" without specifying technique, parameters, and limitations is a form of ethics washing. Identify three specific ways that a company could make technically true but misleading claims about its privacy-preserving AI practices. For each, draft an alternative disclosure that would be both accurate and honest.

Exercise 22: The Census Equity Problem The Census Bureau case study documents that equal formal privacy protection (same epsilon for everyone) produces unequal accuracy impacts, with minority communities in small geographic areas bearing larger accuracy costs. This is a general property of differential privacy, not unique to the Census.

Consider a city that wants to use differential privacy to publish neighborhood-level public health data while protecting residents' privacy. Given that smaller, more vulnerable neighborhoods will face higher accuracy costs: - (a) What are the ethical arguments for applying uniform epsilon across all neighborhoods? - (b) What are the ethical arguments for applying lower epsilon (more privacy, lower accuracy) in large neighborhoods and higher epsilon (less privacy, higher accuracy) in small ones? - (c) Is there a way to resolve this equity tension within the DP framework?

Exercise 23: Gradient Inversion Attack Impact A hospital system has deployed federated learning for cancer imaging AI, relying on the "data doesn't leave the hospital" argument in its patient consent documentation and IRB application. Subsequent research demonstrates that gradient inversion attacks can reconstruct high-quality images from the gradients the hospital transmits. Evaluate the following options for the hospital's response: - (a) Retrofit gradient differential privacy immediately, potentially reducing model accuracy - (b) Disclose the vulnerability to IRBs and seek amended protocols - (c) Continue operations while monitoring for evidence of actual attack - (d) Discontinue federated learning pending updated guidance - (e) Consult legal counsel before taking any public action

Which options are ethically required? Which are legally required? What timeline is appropriate?

Exercise 24: Power Asymmetry in Privacy-Preserving AI This chapter notes that privacy-preserving techniques address information leakage but not the structural power imbalance between large organizations and individuals. Consider a scenario: a large tech company uses federated learning to train a behavioral model on users' phone data. Users' raw data never leaves their phones. The company holds the global model, which encodes patterns across billions of users' behavior.

Does federated learning address the users' privacy interests adequately in this scenario? What information does the company gain that users did not consent to sharing? What governance mechanisms would be needed beyond the technical privacy guarantee?

Exercise 25: Designing a Privacy-Preserving AI Development Lifecycle Design a comprehensive Privacy-Preserving AI Development Lifecycle for a financial services organization. The lifecycle should specify: - At what stage in the development process each PET technique is most relevant (data collection, exploratory analysis, model development, testing, deployment, monitoring) - What privacy review checkpoints should be built into the process - What documentation is required at each stage - What governance approvals are needed for different levels of data sensitivity - How the lifecycle addresses the limits of technical solutions (consent, purpose, power)

Present your design as a visual flow diagram with accompanying narrative (approximately 500 words).

Chapter 27 | AI Ethics for Business Professionals