Chapter 27: Quiz — Privacy-Preserving AI Techniques

20 questions. Mix of multiple choice, true/false, and short answer.

Multiple Choice

1. Apple's implementation of differential privacy for emoji usage statistics uses which variant?

a) Central differential privacy, where user data is collected raw on Apple's servers and noise is added during processing b) Local differential privacy, where noise is added on each user's device before any data leaves the device c) Federated differential privacy, where each model update is privacy-protected during federated training d) Synthetic differential privacy, where user data is replaced with synthetic equivalents before transmission

Answer: b

2. In the Laplace mechanism for differential privacy, the scale of the added noise is determined by:

a) The size of the dataset divided by epsilon b) The sensitivity of the query divided by epsilon c) The epsilon value multiplied by the dataset size d) The variance of the underlying data distribution

Answer: b Noise scale = sensitivity / epsilon. Sensitivity is the maximum change one individual could cause in the query result; epsilon is the privacy budget parameter.

3. In federated learning, what is transmitted between participants and the central server?

a) Raw training data, encrypted with the participant's private key b) Model predictions on held-out validation data c) Model parameters or gradients (updates to model weights), not raw data d) A compressed and anonymized subset of the training data

Answer: c

4. Which of the following best describes a "gradient inversion attack" in the federated learning context?

a) An attack that gradually increases the epsilon value of a differentially private system, degrading its privacy guarantees over time b) A technique for reconstructing aspects of training data from model gradients shared in federated learning c) An attack that flips the labels in a federated training dataset to corrupt the global model d) A method for inverting the federated averaging algorithm to identify each participant's contribution

Answer: b

5. Secure Multi-Party Computation (SMPC) is best described as:

a) A technique for securely encrypting data before uploading it to cloud servers for computation b) A cryptographic protocol allowing multiple parties to jointly compute a function over their combined data without revealing any party's data to the others c) A method for distributing computation across multiple servers to improve performance while maintaining encryption d) An access control framework for ensuring that only authorized parties can participate in model training

Answer: b

6. What is the primary practical limitation of fully homomorphic encryption (FHE) for large-scale AI applications?

a) It requires all parties to be present simultaneously, limiting asynchronous use b) It only works for linear operations, ruling out neural network inference c) The computational overhead is typically orders of magnitude higher than plaintext computation d) It requires a trusted third party to hold the encryption keys, creating a security vulnerability

Answer: c

7. Which property is NOT guaranteed by differential privacy?

a) That an individual's participation in the dataset barely affects the output distribution b) That no individual in the dataset can be re-identified from the published results c) That the output's probability distribution changes by at most a factor of e^epsilon depending on any individual d) A mathematical upper bound on the privacy loss associated with the computation

Answer: b This is a critical distinction: DP provides a bound on how much the output distribution changes, not an absolute guarantee against re-identification. For large epsilon values, re-identification may still be possible in practice.

8. The US Census Bureau's Disclosure Avoidance System (DAS) for the 2020 Census used a total epsilon of approximately 19.61. Civil rights groups criticized this choice because:

a) The epsilon was too small, making census data so noisy that it was useless for policy purposes b) The accuracy loss was concentrated in small geographic areas and minority populations, who would bear larger percentage errors in data about their communities c) The DAS did not apply differential privacy at the national level, only at state and below d) The epsilon value was set by career technical staff rather than politically accountable officials

Answer: b

9. Which of the following best describes the "privacy budget exhaustion" problem?

a) Organizations that collect too much data eventually exceed what is permitted under GDPR, at which point they must delete records b) Under differential privacy composition, multiple queries on the same dataset accumulate privacy loss; eventually the cumulative epsilon makes the privacy guarantee meaningless c) Differential privacy systems become computationally prohibitive after a fixed number of training iterations d) The Laplace mechanism adds so much noise after many queries that the data becomes statistically unusable

Answer: b

10. GAN-based synthetic data WITHOUT formal privacy guarantees (i.e., without DP applied to the generation process) is potentially vulnerable to which type of attack?

a) Gradient inversion attacks that reconstruct the GAN training data from the published synthetic dataset b) Statistical linkage attacks that use the synthetic data's statistical properties, combined with auxiliary information, to make inferences about real individuals c) Homomorphic attacks that decrypt the synthetic data using properties of the GAN architecture d) Federated poisoning attacks that corrupt the synthetic data by injecting false records

Answer: b

True/False

11. Federated learning provides formal, mathematical privacy guarantees equivalent to those of differential privacy.

Answer: False. Federated learning provides practical privacy improvement (raw data does not leave the participant's device or institution) but does not provide formal privacy guarantees unless combined with differential privacy applied to model updates (gradients). Gradient inversion attacks have demonstrated that model gradients can, under some conditions, be used to reconstruct aspects of training data. The combination of federated learning with gradient differential privacy provides formal guarantees; federated learning alone does not.

12. Under the GDPR data minimization principle, using federated learning instead of centralized data collection can support a stronger compliance argument because less personal data is transmitted to the data controller.

Answer: True. GDPR Article 5(1)(c) requires data to be "limited to what is necessary in relation to the purposes." Federated learning, which processes data locally and transmits only model updates (not raw data) to the central controller, demonstrates stronger data minimization than an approach that centralizes raw personal data. This does not automatically satisfy GDPR's lawful basis requirements, but it supports the data minimization compliance argument.

13. The sensitivity of a query in differential privacy refers to how sensitive the underlying data topic is (e.g., health data is more sensitive than shopping data).

Answer: False. In the technical DP context, "sensitivity" refers to the maximum amount one individual's data can change the result of the query — it is a mathematical property of the function being computed, not a description of the data's ethical sensitivity. For a count query, sensitivity is 1. For a sum query over values in [0, B], sensitivity is B. This technical definition is independent of how personally sensitive the data is in everyday language.

14. Synthetic data generated by a GAN satisfies the GDPR definition of "anonymized" data, which would remove it from GDPR's scope.

Answer: False (contested, but most likely False under current regulatory interpretation). Whether synthetic data constitutes truly anonymized data under GDPR is an open legal question, and most EU data protection authorities take a cautious approach. The Article 29 Working Party (now EDPB) requires that anonymization be robust against attack using "all means reasonably likely" to be used by an attacker. Given documented statistical linkage attacks on synthetic datasets without formal privacy guarantees, regulators are unlikely to consider standard GAN-based synthetic data as definitively anonymized. GAN-based generation combined with differential privacy provides stronger arguments but is still not definitively resolved under EU law.

15. The database reconstruction attack that motivated the Census Bureau's adoption of differential privacy was performed using data that was not publicly released — it required access to confidential census microdata.

Answer: False. The database reconstruction attack was performed using only publicly released 2010 Census tabulations — the same data the Bureau had released for public use. Researchers treated the published tables as mathematical constraints and, using integer programming, reconstructed a dataset that matched the underlying census records with high accuracy. This is precisely what made the finding alarming: a genuine privacy breach was achievable using only published public data, with no access to confidential records.

16. The Privacy by Design framework's "positive-sum" principle argues that privacy and functionality are in a zero-sum trade-off that must be explicitly negotiated in each deployment.

Answer: False. The positive-sum principle argues the opposite: Privacy by Design rejects the zero-sum framing. The principle holds that it is a false choice to frame privacy and functionality as competing values. Systems can be designed to achieve both privacy and function simultaneously, and the zero-sum framing often reflects design choices rather than technical necessity. Apple's differential privacy implementation exemplifies the positive-sum principle: both privacy protection and improved autocomplete capability were delivered.

Short Answer

17. Explain the "trust dividend" of privacy-preserving AI in business terms. What specific business outcomes does demonstrably strong privacy practice produce? Provide at least two concrete examples.

Model Answer: The trust dividend refers to the measurable business benefits that accrue to organizations whose privacy practices are demonstrably sound, not merely asserted. Concretely: (1) User adoption and retention — healthcare apps and financial services that can demonstrate federated learning and differential privacy as specific, verifiable privacy protections, rather than vague policy commitments, achieve higher user adoption in privacy-sensitive categories. Apple's public documentation of its differential privacy methodology contributes to its premium brand positioning on privacy. (2) Regulatory goodwill — organizations with documented PET implementations are better positioned in regulatory investigations and enforcement proceedings. Regulators distinguishing good actors from bad are more likely to treat PET-implementing organizations as compliant and cooperative. (3) Data access — organizations with strong privacy credentials are more successful in negotiating data access agreements with partners, research institutions, and government agencies that would not share data with organizations lacking documented privacy protections. (4) Talent recruitment — researchers and engineers from the privacy-conscious technical community increasingly consider privacy practice when evaluating employers; organizations known for genuine PET implementation attract talent that might otherwise decline.

18. What is meant by the claim that "equal formal privacy protection can produce unequal practical accuracy impacts"? Use the Census Bureau case to illustrate.

Model Answer: Differential privacy's formal guarantee is equal across all individuals in a dataset: each person's data receives the same mathematical privacy protection, calibrated by the same epsilon value. But "equal protection" does not produce "equal accuracy" in the data about different groups. The accuracy of statistics derived from differentially private data depends on the size of the group: for large populations (say, a county with 500,000 residents), DP noise is small relative to the actual count, and aggregate statistics are highly accurate. For small populations (a rural census block with 50 residents), the same DP noise is large relative to the count, and estimates can have large percentage errors. The Census Bureau's DAS illustrated this precisely: civil rights groups argued that Native American tribal areas, small-town minority communities, and other small geographic populations would see the data about their communities distorted by DP noise at rates that were much higher proportionally than for large, predominantly white suburban areas — even though every individual's privacy was equally protected. The equity concern is not about individual privacy but about the accuracy of statistics used for resource allocation, representation, and policy — statistics that matter most to communities that have historically been underrepresented or undercounted.

19. A data science team claims their model is "federated-learning-trained, so it's privacy-preserving." Identify at least three specific questions you would ask to evaluate whether this claim is accurate.

Model Answer: (1) "Are model gradients protected with differential privacy before transmission?" Federated learning without gradient DP provides practical but not formal privacy protection; gradient inversion attacks have demonstrated that unprotected gradients can leak training data. (2) "What is the epsilon value and how was it chosen?" The privacy guarantee of any DP-protected system depends critically on the epsilon value. A system with epsilon = 50 provides very weak formal privacy. If epsilon is not specified, the claim of "privacy-preserving" is not meaningful. (3) "Has the system been audited for gradient inversion vulnerability?" Given the research literature on gradient attacks, claims of privacy preservation in federated systems should be backed by independent evaluation, not just theoretical analysis. (4) "What happens if a participant's device or server is compromised?" Federated learning protects against central server breach; it does not protect against compromise of individual participants' local systems. (5) "Has the global model been evaluated for membership inference attacks?" Even a correctly implemented federated system with gradient DP may be vulnerable to membership inference attacks on the final trained model. Has this been assessed?

20. In one paragraph, explain why privacy-preserving AI techniques are necessary but not sufficient for ethical AI, using one specific example to illustrate each gap.

Model Answer: Privacy-preserving techniques address one important dimension of AI ethics — preventing individual information from being extracted from AI outputs and training processes — while leaving other critical dimensions untouched. They do not address consent: a federated learning system that keeps raw data on users' devices still requires a lawful basis for the underlying data processing; differential privacy on a mass surveillance system does not make the surveillance consensual (consider: a city that uses federated learning and DP to train a real-time criminal prediction model without residents' knowledge). They do not address purpose limitation: data collected with formal DP protection for one purpose can still be used for another without users' agreement (an employment platform that trains a DP-protected model on productivity data and then uses the model's outputs to rank workers for termination, a purpose users did not consent to). They do not address discriminatory outputs: a model trained with perfect formal differential privacy can still produce biased predictions (a credit-scoring model trained with DP on historical lending data will still reproduce historical racial discrimination in loan approvals if that discrimination is embedded in the training labels). And they do not address power imbalance: a company using federated learning controls the global model that encodes population-level behavioral patterns, and individual users have no meaningful say in what that model is used for — privacy-preserving architecture does not change this asymmetry. Ethical AI requires privacy-preserving techniques and consent frameworks, purpose limitation, bias auditing, and genuine accountability for power.

Chapter 27 | AI Ethics for Business Professionals