Chapter 27: Further Reading — Privacy-Preserving AI Techniques

A curated selection of 18 sources spanning foundational theory, technical implementation, regulatory context, and organizational practice. Organized by category.


Foundational Papers

1. Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the Theory of Cryptography Conference (TCC), LNCS 3876, pp. 265–284. The paper that introduced the formal definition of differential privacy and the Laplace mechanism. Written for a theoretical computer science audience, but the introduction is accessible and the definitions are clear. Essential for anyone who wants to understand what "epsilon-differential privacy" means at a technical level rather than as a marketing claim.

2. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B.A. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). The Google paper that introduced FedAvg — the foundational federated averaging algorithm that most federated learning systems build on. Describes the architecture in which local model updates are averaged to produce a global model. The paper is readable for those with basic ML background, and its introduction to federated learning remains the clearest available.

3. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical Secure Aggregation for Privacy-Preserving Machine Learning. Proceedings of the ACM CCS 2017. A Google paper describing the cryptographic protocols for secure aggregation in federated learning — ensuring that the aggregation server learns only the sum of participants' updates, not individual contributions. Demonstrates how federated learning can be made more privacy-preserving through cryptographic techniques.

4. Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep Learning with Differential Privacy. Proceedings of the ACM CCS 2016. The paper introducing DP-SGD (Differentially Private Stochastic Gradient Descent) — the technique for training neural networks with differential privacy. Essential for understanding how DP is applied to model training, rather than just to query outputs.


Accessible Books and Monographs

5. Wood, A., Altman, M., Bembenek, A., Bun, M., Gaboardi, M., Honaker, J., Nissim, K., O'Brien, D.R., Steinke, T., & Vadhan, S. (2018). Differential Privacy: A Primer for a Non-Technical Audience. Vanderbilt Journal of Entertainment and Technology Law, 21(1), 209–276. The best introduction to differential privacy for a non-technical audience, written by leading researchers in collaboration with legal scholars. Covers the formal definition, key mechanisms, practical applications, and policy implications without requiring mathematical background. Freely available from SSRN.

6. Garfinkel, S. (2019). Understanding Differential Privacy (a series of blog posts and working papers). US Census Bureau. The Census Bureau's own accessible explanations of differential privacy as applied to the 2020 Census. Simson Garfinkel, as the Bureau's chief privacy officer, produced a series of public-facing documents explaining the DAS. These are practical, government-audience oriented, and directly relevant to the Census Bureau case study in this chapter. Available from the Census Bureau website.


Technical Implementation Resources

7. OpenDP. Harvard Privacy Tools Project. (opendp.org) The OpenDP library is the leading open-source implementation of differential privacy for practical data analysis. Developed at Harvard with support from multiple institutions and funded in part by the Sloan Foundation, it provides vetted implementations of the Laplace mechanism, the Gaussian mechanism, and other DP techniques for tabular data analysis. The project's documentation includes tutorials accessible to practitioners.

8. Google Research. (2020). TensorFlow Federated: Machine Learning on Decentralized Data. (tensorflow.org/federated) Google's open-source federated learning framework, built on TensorFlow. The framework includes implementations of FedAvg, gradient clipping for DP-SGD in federated settings, and secure aggregation protocols. The documentation includes tutorials ranging from introductory to research-level. Essential reference for practitioners building federated learning systems.

9. Microsoft Research. SEAL: Simple Encrypted Arithmetic Library. (github.com/microsoft/SEAL) Microsoft's open-source homomorphic encryption library, implementing BFV and CKKS schemes — the two most widely used HE schemes for practical applications. SEAL provides C++ and .NET implementations with documentation and examples. Represents the practical starting point for organizations exploring homomorphic encryption.


Research Papers on Limitations and Attacks

10. Zhu, L., Liu, Z., & Han, S. (2019). Deep Leakage from Gradients. Advances in Neural Information Processing Systems (NeurIPS) 32. The research paper demonstrating gradient inversion attacks in federated learning — showing that high-quality images can be reconstructed from model gradients in image classification tasks. Essential reading for understanding why federated learning without gradient differential privacy does not provide formal privacy guarantees.

11. Shokri, R., Strobel, M., & Zick, Y. (2021). On the Privacy Risks of Model Explanations. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. Demonstrates that model explanations — saliency maps, feature attributions, and similar interpretability outputs — can leak information about training data. Relevant for organizations that use both explainable AI (Chapter 14) and privacy-preserving techniques, since these goals can be in tension.

12. Narayanan, A., & Shmatikoff, V. (2008). Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy. The landmark paper demonstrating de-anonymization of Netflix's "anonymous" movie rating dataset using only the ratings data and auxiliary information from IMDb. Provides the empirical grounding for skepticism about anonymization claims and the context for differential privacy's formal approach. Freely available.


Regulatory and Policy Documents

13. UK Information Commissioner's Office (ICO). (2023). Privacy-Enhancing Technologies: A Guide for the Public Sector. Information Commissioner's Office. The ICO's practical guidance for public sector organizations on deploying PETs to support GDPR compliance. Covers differential privacy, federated learning, secure multi-party computation, and synthetic data in the context of specific use cases. Freely available from ico.org.uk.

14. National Institute of Standards and Technology. (2023). Privacy-Enhancing Technologies and AI (Draft NIST SP 600-4 series). NIST. NIST's technical standards documentation for PETs in AI contexts. As NIST's standards are widely referenced in US regulatory and procurement contexts, this document shapes how PETs are evaluated in government procurement and in compliance frameworks referencing NIST. Available from nist.gov.

15. European Data Protection Board. (2021). Opinion 05/2021 on the Interplay between the Clinical Trials Regulation and the GDPR. EDPB. While focused on clinical trials, this EDPB opinion addresses the use of pseudonymization and privacy-enhancing techniques in healthcare data processing and provides the most current EU regulatory interpretation of anonymization requirements. Relevant to the Google/Mayo case study context.


Case Studies and Applied Research

16. Roth, H., Chang, K., Singh, P., Neumark, N., Li, W., Gupta, V., Gupta, S., Qu, L., Ihsani, A., Bizzo, B.C., et al. (2020). Federated Learning for Breast Density Classification: A Real-World Implementation. Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. A documented real-world implementation of federated learning for medical imaging AI across multiple healthcare institutions without patient data centralization. Provides a concrete technical and organizational model for the healthcare federated learning use case discussed in this chapter.

17. Garfinkel, S., Abowd, J.M., & Powazek, S. (2018). Issues Encountered Deploying Differential Privacy. Proceedings of the 2018 Workshop on Privacy in the Electronic Society (WPES). A reflective paper by Census Bureau researchers documenting the practical challenges of deploying differential privacy at scale — the engineering problems, the communication challenges, the parameter selection debates, and the organizational resistance. An unusually honest account of implementation reality versus theoretical ideals.


News and Long-form Journalism

18. Mervis, J. (2019). Can a Set of Equations Keep U.S. Census Data Private? Science Magazine. An accessible long-form article explaining the Census Bureau's adoption of differential privacy for the 2020 Census, the database reconstruction attack that motivated it, and the controversy that followed. Written for a general scientific audience, it provides excellent context for the Census case study and illustrates the challenges of explaining technical privacy trade-offs to non-technical stakeholders.


Note on Open-Source Resources

Practitioners implementing privacy-preserving AI should monitor:

  • OpenMined (openmined.org): Community developing open-source tools for privacy-preserving ML, including PySyft for federated learning and PyDP for differential privacy.
  • Google's DP library (github.com/google/differential-privacy): Well-maintained implementations of the Laplace and Gaussian mechanisms, continuously updated.
  • The Alan Turing Institute (turing.ac.uk): UK research institute producing applied research on PETs, including guidance for healthcare and public sector applications.
  • IAPP (International Association of Privacy Professionals) (iapp.org): Regular publication of PET-related regulatory developments and practitioner guidance.

Chapter 27 | AI Ethics for Business Professionals