Chapter 29 Exercises: Privacy, Security, and AI

DataField.Dev

Chapter 29 Exercises: Privacy, Security, and AI

Section A: Recall and Comprehension

Exercise 29.1 Define the following terms in your own words, using no more than two sentences each: (a) differential privacy, (b) federated learning, (c) homomorphic encryption, (d) model inversion attack, (e) prompt injection, (f) privacy by design.

Exercise 29.2 Explain the three distinctive privacy challenges that AI systems create beyond traditional IT systems: the data appetite problem, the inference problem, and the opacity problem. For each, provide one example not mentioned in the chapter.

Exercise 29.3 List and briefly describe the seven foundational principles of privacy by design as articulated by Ann Cavoukian. For each principle, identify one way Athena violated it prior to the breach.

Exercise 29.4 Compare and contrast GDPR and CCPA/CPRA across five dimensions: scope, consent model (opt-in vs. opt-out), enforcement mechanisms, penalties, and provisions specific to AI or automated decision-making.

Exercise 29.5 Describe the six steps in Athena's breach response timeline (Detection, Containment, Assessment, Legal notification, Customer notification, Public statement). For each step, identify the most critical success factor.

Exercise 29.6 Explain the difference between evasion attacks and data poisoning attacks on AI systems. Why might a data poisoning attack be harder to detect than an evasion attack?

Exercise 29.7 What is the epsilon (ε) parameter in differential privacy? Explain the privacy-utility tradeoff it controls. Why is the choice of epsilon a business decision, not just a technical one?

Section B: Application

Exercise 29.8: Privacy Impact Assessment Select an AI system you interact with regularly (a recommendation engine, a virtual assistant, a content moderation system, etc.). Conduct a simplified privacy impact assessment by answering: - (a) What personal data does the system collect, either directly or by inference? - (b) What is the stated purpose of data collection? Is the data proportionate to that purpose? - (c) What privacy risks does the system create? Identify at least three specific risks. - (d) What privacy-preserving technologies (differential privacy, federated learning, synthetic data, etc.) could reduce these risks? For each recommendation, assess the feasibility and tradeoffs. - (e) What organizational controls (policies, audits, governance) should supplement the technical controls?

Exercise 29.9: Breach Response Planning You are the VP of Data & AI at a mid-sized healthcare company (500 employees, 2 million patient records). Your data science team has deployed a patient readmission prediction model that accesses clinical records, demographics, and insurance information.

Design a breach response plan for a scenario in which the model's API is compromised and patient records are accessed by an unauthorized party. Your plan should include: - (a) A detection strategy (what monitoring would catch the breach?) - (b) A containment protocol (what happens in the first four hours?) - (c) A communication plan (who is notified internally and externally, in what order, and within what timeframes?) - (d) Regulatory obligations (which regulations apply, and what do they require?) - (e) A post-breach remediation roadmap (what changes to prevent recurrence?)

Exercise 29.10: Evaluating Privacy-Enhancing Technologies A financial services company wants to train a fraud detection model using transaction data from three partner banks. No bank is willing to share raw customer data with the others or with a central party. Evaluate four approaches for enabling collaborative model training: - (a) Federated learning - (b) Synthetic data generation (each bank generates synthetic data and shares it) - (c) Secure multi-party computation - (d) Homomorphic encryption

For each approach, assess: technical feasibility, privacy guarantees, model quality implications, computational cost, and organizational complexity. Which approach (or combination) would you recommend, and why?

Exercise 29.11: Consent Redesign Find a real-world cookie consent dialog or privacy consent interface (screenshot it or describe it). Analyze it for dark patterns using the following criteria: - (a) Is the default privacy-protective (opt-in) or privacy-invasive (opt-out)? - (b) How many clicks does it take to accept all cookies vs. reject all cookies? - (c) Is the language clear and understandable, or is it designed to confuse? - (d) Are the visual cues (button size, color, placement) neutral or biased toward acceptance? - (e) Redesign the interface to comply with both GDPR requirements and privacy-by-design principles. Explain your design choices.

Exercise 29.12: Data Minimization Audit Consider Athena's recommendation engine as described in the chapter and in Chapter 10. The engine originally had access to: customer name, email address, mailing address, phone number, date of birth, loyalty tier, purchase history (items, dates, amounts, categories), browsing history, search queries, and account creation date. - (a) For the purpose of generating product-to-product recommendations, which of these data elements are necessary? Justify each inclusion or exclusion. - (b) For the purpose of personalized category ranking (showing a customer their most relevant product categories first), which additional data elements, if any, are necessary beyond those identified in (a)? - (c) Design a data access architecture that provides each function with only the data it needs. Specify what data each API endpoint should receive and what it should not.

Exercise 29.13: Adversarial Threat Modeling You are the security lead for a company that has deployed an AI-powered content moderation system for a social media platform. The system uses computer vision to detect prohibited images and NLP to detect hate speech. - (a) Identify three specific evasion attacks that bad actors might use against the computer vision component. - (b) Identify three specific evasion attacks against the NLP component. - (c) For each attack, propose a defense. Be specific about the technical approach and its limitations. - (d) How would you monitor for adversarial attacks in production? What signals would indicate that the model is being systematically targeted?

Section C: Analysis and Evaluation

Exercise 29.14: The Privacy-Utility Tradeoff The chapter argues that "the marginal model improvement from additional data is not worth the marginal privacy risk" beyond a certain point. But organizations face pressure to maximize model performance. - (a) Develop a framework for evaluating the privacy-utility tradeoff for a specific AI use case. What factors should be considered? How should they be weighted? - (b) Apply your framework to two contrasting scenarios: (i) a healthcare AI that predicts patient deterioration in ICUs, and (ii) a retail AI that recommends products on an e-commerce site. How does the calculus differ? - (c) Who should make the final decision about where to set the tradeoff — data scientists, product managers, legal counsel, or executive leadership? Argue your position.

Exercise 29.15: Privacy Regulation Gaps Current privacy regulations were largely designed before the widespread deployment of generative AI, autonomous agents, and multimodal models. - (a) Identify three privacy risks created by large language models that existing regulations (GDPR, CCPA) do not adequately address. - (b) For each gap, propose a specific regulatory provision that would address the risk. Consider enforcement feasibility. - (c) Evaluate the EU AI Act's provisions on general-purpose AI models. Do they adequately fill the gaps you identified? Why or why not?

Exercise 29.16: The Athena Post-Mortem Write a post-mortem analysis of the Athena data breach, structured as follows: - (a) Timeline of events (summarize the key milestones from detection to remediation) - (b) Root cause analysis (distinguish between the proximate cause and the systemic causes) - (c) What went right (identify at least three things Athena did well during the response) - (d) What went wrong (identify at least three failures or shortcomings, including pre-breach decisions that contributed to the incident) - (e) Recommendations (propose five specific changes to prevent similar incidents, organized by priority) - (f) Accountability (who, if anyone, should be held accountable, and what form should accountability take?)

Exercise 29.17: Synthetic Data Evaluation A hospital wants to share a dataset of 50,000 patient records with a university research team for AI model development. The dataset includes demographic information, diagnoses, treatment history, and outcomes. The hospital's legal team has rejected sharing the real data due to HIPAA restrictions. - (a) Design a synthetic data generation strategy that preserves the statistical properties needed for model training while eliminating re-identification risk. - (b) What validation steps would you perform to ensure the synthetic data is fit for the research purpose? - (c) What residual privacy risks remain even with synthetic data? How would you mitigate them? - (d) Under what circumstances, if any, would synthetic data be inadequate for this use case?

Section D: Integration and Strategy

Exercise 29.18: Privacy Strategy for a Startup You are advising a Series B startup ($30M raised, 80 employees) that is building an AI-powered personal finance application. The app analyzes users' bank transactions, credit card statements, and investment accounts to provide budgeting recommendations and financial health scores. The company operates in the US and plans to expand to Europe within 12 months.

Develop a comprehensive privacy strategy that includes: - (a) A data minimization plan (what data to collect, what to avoid, and why) - (b) A consent architecture (how to obtain meaningful consent given the complexity of AI processing) - (c) A privacy-preserving technology stack (which PETs to implement and in what priority order) - (d) A regulatory compliance roadmap (CCPA now, GDPR for European expansion) - (e) A breach response capability (proportionate to a startup's resources) - (f) A governance structure (who owns privacy decisions, how are they made?) - (g) A privacy-as-differentiator marketing strategy

Exercise 29.19: The Board Presentation After Athena's breach, Ravi must present to the board of directors. The board wants to understand: what happened, why it happened, what it will cost, and what Athena is doing to prevent recurrence.

Draft a 10-slide board presentation outline. For each slide, specify: - The key message (one sentence) - The supporting data or visual - The anticipated board question and your prepared response

Your presentation should balance transparency (the board needs the full picture) with strategic framing (the board needs confidence that leadership is handling the situation).

Exercise 29.20: Cross-Chapter Integration Drawing on concepts from Chapter 10 (recommendation systems), Chapter 23 (cloud security), Chapter 25 (bias), Chapter 26 (explainability), Chapter 27 (governance), and this chapter, design a comprehensive "AI Trust Framework" for Athena Retail Group that integrates: - Privacy protections (data minimization, PETs, consent) - Security measures (access control, monitoring, incident response) - Fairness safeguards (bias detection, demographic parity testing) - Transparency mechanisms (model explanations, audit logs) - Governance structures (roles, responsibilities, escalation paths)

Present your framework as a one-page visual diagram and a two-page narrative explanation.

Exercises 29.1-29.7 test recall and comprehension. Exercises 29.8-29.13 require application to specific scenarios. Exercises 29.14-29.17 demand critical analysis and evaluation. Exercises 29.18-29.20 integrate concepts across chapters. Selected answers are available in Appendix B.