In 1995, an Ontario privacy commissioner named Ann Cavoukian was watching the data protection landscape and seeing a pattern that worried her. Privacy regulations were designed to respond to privacy violations — to penalize companies after they had...
In This Chapter
- Opening: The Privacy by Design Premise
- Section 1: Privacy by Design — The Cavoukian Framework
- Section 2: Technical Approaches to Privacy
- Section 3: Policy Responses
- Section 4: The Limits of Technical Solutions
- Section 5: Jordan's Privacy Policy Exercise
- Chapter Summary
- Key Terms
- Discussion Questions
Chapter 39: Designing for Privacy: Architecture, Technology, and Policy Responses
Opening: The Privacy by Design Premise
In 1995, an Ontario privacy commissioner named Ann Cavoukian was watching the data protection landscape and seeing a pattern that worried her. Privacy regulations were designed to respond to privacy violations — to penalize companies after they had collected data inappropriately, used it for unauthorized purposes, or failed to secure it adequately. Privacy was being treated as a compliance problem: you built your system however was technically convenient, and then, if regulators came looking, you demonstrated that you had checked the necessary boxes.
Cavoukian saw the problem clearly: if privacy is addressed at the end of the design process, as a compliance layer on top of systems built without it in mind, the result is privacy protection that is weaker, more expensive, more easily circumvented, and less effective than privacy built into the system from the beginning. She called her alternative "Privacy by Design" — the principle that privacy should be proactively engineered into systems before data flows begin, rather than reactively patched onto systems after data flows have already been designed.
This chapter is about what privacy by design looks like in practice, and what it cannot do.
We will cover the technical approaches (differential privacy, federated learning, end-to-end encryption), the product design approaches (data minimization, purpose limitation, privacy as default), the policy approaches (GDPR, the EU AI Act, community surveillance ordinances), and the governance approaches (privacy impact assessments, algorithmic auditing, community oversight boards). We will also be honest about the limits: why privacy by design is necessary but not sufficient, why technical solutions alone cannot resolve political problems, and why design without power analysis is incomplete.
Jordan will draft a privacy policy for a hypothetical campus application as a class project. This exercise will reveal, as design exercises often do, how easy it is to write good principles and how difficult it is to build good systems.
Section 1: Privacy by Design — The Cavoukian Framework
1.1 The Seven Foundational Principles
Ann Cavoukian's Privacy by Design framework, published in its canonical form in 2009 and adopted as an international standard by the International Assembly of Privacy Commissioners and Data Protection Authorities, comprises seven foundational principles:
Principle 1: Proactive not Reactive; Preventive not Remedial Privacy protection is designed into systems before events occur, rather than remediated after privacy violations are detected. The system architect anticipates and prevents privacy risks rather than responding to complaints.
Principle 2: Privacy as the Default Setting Without any action by the individual, the system provides maximum privacy protection. Users do not have to do anything to protect their privacy; they would have to take deliberate action to reduce their privacy. The default setting maximizes privacy rather than minimizing it.
Principle 3: Privacy Embedded into Design Privacy is not a feature added on top of the system — it is integral to the system's architecture. Privacy and system functionality are achieved together as an integrated design goal, not separately or in tension.
Principle 4: Full Functionality — Positive-Sum, Not Zero-Sum Privacy does not come at the expense of legitimate system functions. Cavoukian explicitly rejects the framing that privacy and security, or privacy and functionality, are necessarily in tension. Well-designed systems can be both private and functional.
Principle 5: End-to-End Security — Full Lifecycle Protection Privacy protection extends across the entire data lifecycle: collection, use, retention, and secure destruction. Data security is maintained from the moment of collection through deletion.
Principle 6: Visibility and Transparency — Keep It Open System operators make their data practices transparent to individuals and to regulators. Accountability is built in: the system's privacy claims can be independently verified.
Principle 7: Respect for User Privacy — Keep It User-Centric The system is designed around the interests of individual users, not the convenience of system operators. User controls are meaningful, not merely formal.
💡 Intuition Check: Read through Cavoukian's seven principles again and ask: how many of them describe the actual design philosophy of the surveillance systems you have encountered in this book? The Ring network? GoGuardian? Clearview AI? ShotSpotter? The surveillance systems we have examined were almost uniformly designed without these principles — which is not a coincidence. Designing with privacy as a priority requires designing against the default incentive structure of the surveillance economy, which rewards data collection and is indifferent to privacy costs.
1.2 Privacy as the Default — What This Means in Practice
The "privacy as the default" principle is perhaps the most consequential of Cavoukian's seven, because it reverses the burden of action. Most current digital systems place the burden of privacy protection on the user: you must actively choose not to share your location, not to enable cookies, not to allow behavioral advertising, not to link your accounts. The default is sharing; privacy requires effort.
Privacy as the default means reversing this: the default is non-sharing; data collection requires affirmative action. In practice:
- Location tracking is off unless the user turns it on for a specific purpose at a specific time
- Data is not retained after the purpose for which it was collected is complete
- Third-party data sharing requires explicit opt-in, not opt-out
- Analytics and behavioral tracking are disabled unless the user deliberately enables them
Several existing products and systems approximate this design philosophy. Signal, the encrypted messaging application, was designed with privacy as a structural commitment rather than a feature: it stores as little data as possible about users and their communications, making it technically unable to respond to law enforcement requests for communication content that other platforms can fulfill. This is not because Signal's developers are anti-law enforcement — it is because they designed a system that, as a technical matter, does not have the data.
Apple's approach to privacy in iOS represents a significant (if imperfect) commercial commitment to privacy as default: the App Tracking Transparency framework, introduced in 2021, requires apps to obtain explicit user permission before tracking behavior across other apps and websites. The result — reported by multiple analyses — was that the majority of iOS users declined to allow cross-app tracking, demonstrating that privacy is the default preference when users are actually given a meaningful choice.
Section 2: Technical Approaches to Privacy
2.1 Data Minimization — Collecting Only What Is Necessary
Data minimization is the principle that systems should collect, retain, and process only the data that is necessary for the specific, stated purpose of the system. It is simple to state and consistently violated.
The violation is not usually deliberate; it is structural. Building a surveillance system — or any data system — requires making decisions about what data to collect. The default incentive in almost every commercial context is to collect more rather than less: more data provides more analytical flexibility, more potential product development opportunities, more ability to respond to future use cases not yet imagined. Data that has been collected is an asset; the costs of data minimization (opportunity costs of uncollected data) are immediate and visible, while the privacy costs of excessive collection (potential harms to data subjects, regulatory risk) are diffuse and future.
Data minimization as a design practice requires actively resisting this default by:
- Defining the specific purpose of data collection before beginning to collect
- Identifying the minimum dataset necessary to serve that purpose
- Building in automatic deletion or anonymization of data that is no longer needed for its original purpose
- Resisting the extension of collection to additional data that might be useful in the future
GDPR's data minimization requirement (Article 5(1)(c)) mandates that personal data be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed." This legal requirement creates accountability for data minimization but does not automatically produce minimized systems; it creates a standard that regulators can enforce and that data subjects can invoke.
2.2 Purpose Limitation — Using Data Only for Stated Purposes
Purpose limitation is the companion to data minimization: data collected for one purpose should not be used for a different purpose without additional authorization. This principle addresses what regulators call "function creep" — the tendency of data collected for one ostensibly narrow purpose to be repurposed for broader, often less bounded, uses.
We have seen function creep throughout this book. Location data collected by a fitness app for the purpose of tracking workouts is repurposed for advertising targeting. Cookies placed by a news website for the purpose of session management are repurposed for cross-site behavioral tracking. DNA submitted to a genealogy service for the purpose of ancestry research is repurposed for law enforcement familial searching. In each case, the data subject agreed (nominally) to one use and is subjected to others.
Purpose limitation as a design principle means:
- Articulating specific purposes before collection begins
- Building technical mechanisms (access controls, data labeling, retention schedules) that enforce purpose limitations
- Conducting review processes before extending data use to new purposes
- Providing users with notification and opportunity to consent when purpose extension is being considered
📝 Note: Purpose limitation and data minimization together address what Daniel Solove calls the "aggregation problem" — the fact that many individually innocuous pieces of information, combined, produce a privacy invasion that none of them individually constitutes. Your zip code is not private. Your birth date is not private. Your sex is not private. Combined with each other, they identify 87% of Americans uniquely (Sweeney, 2000). Minimizing collection and constraining use limits the aggregation that creates privacy violations out of innocuous inputs.
2.3 Privacy Impact Assessments
A Privacy Impact Assessment (PIA) — called a Data Protection Impact Assessment (DPIA) under GDPR — is a systematic process for evaluating the privacy risks of a system before deployment. PIAs are mandated under GDPR for processing that is "likely to result in a high risk" to individuals' rights and freedoms; they represent best practice for any significant data system.
A PIA typically involves:
- Description: What data is collected, from whom, for what purpose, by what process?
- Necessity and proportionality: Is the collection necessary for the stated purpose? Are there less privacy-invasive ways to achieve the same goal?
- Risk assessment: What are the potential harms to data subjects from collection, use, breach, or misuse?
- Mitigation: What technical and organizational measures will reduce identified risks?
- Residual risk evaluation: After mitigation, what risks remain? Are they acceptable?
- Documentation: Is the assessment documented and available for regulatory review?
PIAs serve several functions simultaneously: they force system designers to articulate their data practices explicitly, identify risks they might otherwise overlook, demonstrate regulatory compliance, and create a record of the design choices made and why. When conducted genuinely rather than as a compliance exercise, they can significantly improve the privacy characteristics of deployed systems.
⚠️ Common Pitfall: PIAs conducted as compliance formalities — structured to reach a predetermined conclusion that the system is acceptable — are a common failure mode. A meaningful PIA requires genuine engagement with privacy risks, including the possibility that the assessment will recommend against deployment or require significant redesign. Organizations whose PIAs consistently conclude that proposed systems are low-risk should be asked to explain why their risk assessment process keeps reaching the same conclusion.
2.4 Differential Privacy — Mathematical Privacy Guarantees
Differential privacy is a mathematical framework for analyzing datasets in ways that provide quantifiable guarantees about individual privacy. Invented by Cynthia Dwork and colleagues at Microsoft Research in 2006, it has been adopted by Apple, Google, the U.S. Census Bureau, and other organizations as a mechanism for extracting aggregate statistical insights from data while protecting individual-level information.
The core idea of differential privacy is elegant: a data analysis mechanism is differentially private if its outputs are essentially the same whether or not any individual's data is included in the dataset. An individual cannot, by examining the output of a differentially private analysis, determine whether their own data was part of the analysis — because the output would look the same either way.
In practice, differential privacy works by adding carefully calibrated random noise to query results. The noise makes individual-level inference impossible while allowing accurate aggregate statistics to be extracted across a large population. The "privacy budget" (epsilon, ε) controls the trade-off between privacy protection and data accuracy: smaller epsilon provides stronger privacy guarantees but noisier results; larger epsilon provides more accurate results but weaker privacy.
A conceptual implementation of the Laplace mechanism — the most commonly used differential privacy mechanism — illustrates the approach:
import numpy as np
def laplace_mechanism(true_value: float, sensitivity: float, epsilon: float) -> float:
"""
Applies the Laplace mechanism to provide differential privacy.
Parameters:
-----------
true_value : float
The actual data value to be privatized (e.g., a count or sum).
sensitivity : float
The global sensitivity — the maximum amount a single individual's
data can change the query result. For a COUNT query, sensitivity = 1.
For a SUM query, sensitivity = max possible individual value.
epsilon : float
The privacy budget parameter (epsilon > 0). Smaller epsilon = stronger
privacy, more noise. Typical values: 0.1 (strong) to 10.0 (weak).
Returns:
--------
float
The privatized value — the true value plus Laplace noise.
Notes:
------
The Laplace distribution has scale parameter b = sensitivity / epsilon.
Lower epsilon → larger b → more noise → stronger privacy protection.
The trade-off: stronger privacy requires accepting less precise answers.
"""
# Scale of the Laplace noise
b = sensitivity / epsilon
# Draw noise from Laplace distribution with scale b and mean 0
noise = np.random.laplace(loc=0.0, scale=b)
# Return the privatized result
return true_value + noise
def differentially_private_count(data: list, query_fn, sensitivity: float, epsilon: float) -> float:
"""
Counts elements in data satisfying query_fn, with differential privacy.
Parameters:
-----------
data : list
The dataset to query.
query_fn : callable
A function that returns True for elements to count.
sensitivity : float
Query sensitivity (1.0 for counting queries).
epsilon : float
Privacy budget.
Returns:
--------
float
Differentially private count (may be fractional due to noise).
"""
true_count = sum(1 for item in data if query_fn(item))
return laplace_mechanism(true_count, sensitivity, epsilon)
# ----- EXAMPLE: Student Wellness Survey -----
# A university wants to know how many students reported experiencing
# significant anxiety in the past month — without exposing individual responses.
# Simulate 500 student survey responses (True = reported anxiety)
np.random.seed(42)
student_responses = [bool(np.random.binomial(1, 0.35)) for _ in range(500)]
true_count = sum(student_responses)
print(f"True number of students reporting anxiety: {true_count}")
print(f"True proportion: {true_count / len(student_responses):.2%}")
print()
# Apply differential privacy with different epsilon values
for eps in [0.1, 0.5, 1.0, 5.0]:
private_count = differentially_private_count(
student_responses,
query_fn=lambda x: x, # Count all True responses
sensitivity=1.0, # Adding/removing one person changes count by at most 1
epsilon=eps
)
print(f"ε = {eps:3.1f} | Private count: {private_count:6.1f} | "
f"Error: {abs(private_count - true_count):5.1f} | "
f"Privacy: {'Strong' if eps <= 0.5 else 'Moderate' if eps <= 2.0 else 'Weak'}")
print()
print("Key trade-off: Smaller ε = stronger privacy = larger error in the count.")
print("The university learns approximately how many students reported anxiety,")
print("but learns nothing about whether any specific student did.")
What differential privacy enables and what it doesn't:
Differential privacy enables organizations to publish aggregate statistics about populations without exposing individual data. The U.S. Census Bureau used differential privacy in the 2020 Census to provide demographic data at the census tract level without enabling identification of individual households. Apple uses differential privacy in iOS to identify commonly misspelled words and popular emoji without learning individual users' input patterns. Google uses differential privacy in Chrome to analyze browser usage patterns.
What differential privacy does not enable:
- Individual-level analysis (the noise makes individual records uninterpretable)
- Data accuracy at small sample sizes (noise that is proportionately small in a million-person dataset is very large in a hundred-person dataset)
- Protection of data that is inherently identifying even in aggregate (geographic queries with high precision can enable individual identification even with noise)
🎓 Advanced Note: Differential privacy is a formal mathematical guarantee, but it is only as strong as the epsilon value chosen and the implementation that applies it. An organization can claim to use differential privacy with an epsilon value so large that the guarantee is essentially meaningless. Evaluating a differential privacy claim requires understanding both the mathematical guarantee and the epsilon parameter — which is why the technical literacy of privacy regulators is a significant governance question.
2.5 Federated Learning — Keeping Data on Device
Federated learning is a machine learning approach in which the training of a model is distributed across many devices, with each device training on its local data and sharing only the resulting model updates (not the underlying data) with a central server. The central server aggregates model updates to improve the global model, without ever accessing the raw data from which those updates were derived.
Google developed federated learning and uses it in Gboard (the Android keyboard app) to improve text prediction without transmitting users' typed text to Google's servers. Apple uses federated learning for Siri suggestions and emoji recommendations. The approach allows AI systems to improve from diverse user data while maintaining the principle that raw personal data stays on the device where it was generated.
Federated learning does not provide perfect privacy — the model updates that are transmitted can potentially reveal information about the underlying data, and sophisticated inference attacks on federated learning systems are an active research area. But it represents a significant architectural departure from the dominant model, in which all user data is transmitted to a central server for processing, and it demonstrates that functional AI systems can be built without centralizing raw personal data.
2.6 Architectural Responses — End-to-End Encryption and Data Localization
End-to-end encryption (E2EE) means that data is encrypted at the source and can only be decrypted at the intended destination — no intermediate party, including the service provider, can read the content. Signal uses E2EE for all messages; iMessage uses it for messages between Apple devices; WhatsApp uses it for message content (though not for metadata).
E2EE is the most powerful technical protection for communications privacy because it eliminates the ability of the service provider to comply with law enforcement requests for message content — there is no content to provide, because the provider doesn't have the decryption keys. This is precisely why law enforcement agencies, including the FBI, have repeatedly sought legislation that would mandate "exceptional access" mechanisms — technical backdoors that would allow law enforcement to access E2EE communications with legal authorization.
The cryptographic community's position on exceptional access has been consistent: there is no way to build a backdoor that is accessible only to law enforcement and not to adversaries. A system with a backdoor has a vulnerability; a system with a vulnerability can be exploited by anyone who finds it. The debate is not between privacy and public safety — it is between a communications infrastructure that is secure for everyone and one that is insecure for everyone.
Data localization means storing data in the jurisdiction where it was collected, rather than transmitting it to servers in other countries. Data localization requirements are motivated partly by privacy (keeping data within a legal regime that provides certain protections) and partly by national sovereignty (preventing foreign governments from accessing data about citizens). The EU's restrictions on data transfers to countries without adequate privacy protections are a form of data localization requirement.
Section 3: Policy Responses
3.1 GDPR — The Global Standard
The European Union's General Data Protection Regulation (GDPR), in force since 2018, is the most comprehensive and influential privacy regulatory framework in the world. Its reach extends beyond the EU: any organization processing personal data of EU residents, regardless of where the organization is located, is subject to GDPR. This extraterritorial effect has made GDPR a de facto global privacy standard for multinational companies.
GDPR's key requirements include:
- Lawful basis for processing: Organizations must identify a legal basis for processing personal data (consent, contract, legal obligation, vital interests, public task, or legitimate interests)
- Data minimization: Personal data must be "adequate, relevant and limited to what is necessary"
- Purpose limitation: Data collected for one purpose cannot be reused for an incompatible purpose
- Data subject rights: Access, correction, deletion, portability, and restriction of processing
- Privacy by design and by default: Technical and organizational measures must implement these principles
- Data Protection Impact Assessments: Required for high-risk processing
- Data breach notification: Notification to supervisory authorities within 72 hours of discovering a breach
GDPR's enforcement has been inconsistent but strengthening. The largest fines issued under GDPR have exceeded $1 billion (Meta, 2023). The regulation has demonstrably changed corporate data practices, driven the development of GDPR-compliance infrastructure, and set a global standard that other jurisdictions (California, Canada, Brazil, India) have increasingly used as a template.
📊 Real-World Application: The United States, uniquely among major democracies, lacks a comprehensive federal privacy law. Several sector-specific laws (HIPAA for health data, FERPA for education, COPPA for children) and state laws (California's CCPA and CPRA, Illinois's BIPA) provide partial coverage. The absence of a comprehensive federal framework means that privacy protection in the U.S. is fragmented, varies by state, and leaves large categories of personal data and many categories of data subjects without adequate protection.
3.2 The EU AI Act — Risk-Based Regulation of AI Surveillance
The European Union AI Act, provisionally agreed in 2023 and entering into force in stages from 2024, is the world's first comprehensive framework for regulating artificial intelligence systems. Its approach is risk-based: AI systems are classified by the risk of harm they pose, with more intensive regulation for higher-risk systems.
For surveillance, the most significant provisions are:
Prohibited AI practices (the "unacceptable risk" category): The AI Act prohibits: - Real-time remote biometric identification in publicly accessible spaces for law enforcement purposes (with narrow exceptions for specific serious crimes and judicial authorization) - AI systems that exploit psychological vulnerabilities to distort behavior - Social scoring by public authorities - Untargeted scraping of facial images from the internet to build facial recognition databases (addressing the Clearview AI model)
High-risk AI systems: The AI Act places law enforcement AI (including predictive policing and crime analytics), border management AI, and employment screening AI in the "high-risk" category, requiring: - Conformity assessments before deployment - Logging and documentation requirements - Human oversight requirements - Accuracy and robustness requirements
The AI Act's prohibition on real-time biometric identification in public spaces represents the most significant regulatory intervention against facial recognition surveillance yet enacted by a major government. It does not prohibit all facial recognition — post-event use for serious crime investigation is permitted with judicial authorization. But it establishes a clear default against the ambient facial recognition surveillance that several authoritarian governments have deployed.
🌍 Global Perspective: The EU AI Act's approach — risk-based classification, prohibition of the highest-risk applications, accountability requirements for high-risk uses — represents one policy model. Other models include: the U.S. approach (no comprehensive AI regulation, reliance on existing law and sector-specific guidance), China's approach (AI regulation focused on content recommendation and generative AI, with extensive carve-outs for security and law enforcement use), and various national approaches in between. The global governance of AI surveillance remains fragmented and is a central challenge for international law.
3.3 Municipal Surveillance Ordinances — Community Control Over Police Surveillance
While federal regulation has lagged, a significant number of U.S. municipalities have enacted surveillance ordinances through what advocates call the "Community Control Over Police Surveillance" (CCOPS) model. Pioneered in Santa Cruz and Oakland, California, CCOPS ordinances typically require:
- Pre-acquisition approval: Police departments must obtain city council approval before acquiring any new surveillance technology
- Public disclosure: The proposed technology must be publicly disclosed, including its capabilities, the vendor, the cost, and how data will be used and retained
- Community input: A public comment period is required before council votes on acquisition
- Use policies: Approved surveillance technologies must be governed by detailed use policies approved by the council
- Annual reporting: Departments must report annually on how each approved technology was used
- Prohibition on unapproved technology: Using surveillance technologies not approved through this process is prohibited
As of 2024, versions of the CCOPS model have been adopted in over 20 U.S. cities, including Oakland, Santa Cruz, and Nashville, with variations reflecting local political contexts.
✅ Best Practice: The CCOPS model represents best practice in democratic governance of surveillance because it addresses the governance problem rather than merely the technology problem. It does not prohibit surveillance; it requires that surveillance decisions be made through democratic deliberation, with public transparency and community input. This model is scalable and adaptable to different political environments, and it addresses the accountability deficit that characterizes most surveillance deployment — the fact that most surveillance technologies are acquired and deployed with no public awareness and no opportunity for democratic input.
3.4 The Role of Procurement — Cities and Companies as Surveillance Consumers
Much of the surveillance infrastructure described in this book was not built by governments — it was purchased from private companies. School districts purchase GoGuardian; police departments purchase PredPol; cities purchase ShotSpotter and Axon body cameras and license plate reader databases. The procurement decision is, in many cases, the surveillance decision.
Privacy advocates have increasingly focused on procurement as a leverage point for surveillance governance. If purchasing entities — cities, school districts, healthcare systems, universities — required privacy standards as conditions of purchase (the way they require competitive bidding, financial transparency, and vendor insurance), the market for surveillance technology would be restructured. Vendors who could not meet privacy standards would lose market access. Privacy-protective design would be commercially rewarded.
This approach has been implemented in limited forms. Several cities have adopted procurement standards that require privacy impact assessments for technology purchases. The State of California requires privacy assessments for technology purchases by state agencies. The U.S. federal government's updated data protection frameworks for vendor contracts have gradually strengthened since 2021. But comprehensive privacy procurement standards at the scale that would restructure the surveillance technology market remain aspirational.
3.5 Algorithmic Auditing
Algorithmic auditing — independent technical and policy review of AI systems for accuracy, fairness, bias, and compliance with stated specifications — has emerged as a potential governance tool for AI surveillance. The idea is to apply to AI systems the kind of external review that financial statements receive from independent auditors: independent experts examine the system and provide a professional assessment of its properties.
Several challenges complicate algorithmic auditing as currently practiced:
- Access: Auditors need access to training data, model specifications, and deployment contexts that companies typically treat as trade secrets
- Technical standards: There is no agreed professional standard for what an algorithmic audit should examine or conclude
- Meaningful interpretation: The results of technical audits need to be interpretable by non-technical stakeholders (regulators, community boards, courts)
- Consequential authority: Audit findings have legal force only if regulatory frameworks require compliance
Despite these challenges, algorithmic auditing is a growing practice. The GDPR's "right to explanation" creates implicit audit requirements for certain AI decisions. The EU AI Act requires conformity assessments for high-risk AI systems. Several states have enacted laws requiring audits of specific algorithmic systems (New York City's Local Law 144 requires bias audits of AI hiring tools). The field is developing rapidly.
Section 4: The Limits of Technical Solutions
4.1 Privacy by Design Is Necessary But Not Sufficient
Ann Cavoukian's Privacy by Design framework is genuinely valuable. Systems built with privacy by design principles are less privacy-invasive than systems built without them. Data minimization, purpose limitation, and privacy as default produce real, measurable improvements in individual privacy protection.
But privacy by design operates within a political economy that creates powerful incentives against its adoption. The surveillance economy rewards data collection. Businesses that collect less data have less to sell. Platforms that default to non-sharing get less advertising revenue. Products built with privacy as a structural commitment are commercially disadvantaged in a market where privacy is a feature rather than a baseline.
Cavoukian's Principle 4 — that privacy and functionality are "positive-sum, not zero-sum" — is an aspiration, not a description of how current markets work. In many commercial contexts, privacy and revenue generation are directly in tension. A product designed to maximize user privacy will collect less behavioral data, deliver less targeted advertising, and generate less revenue than a product designed to maximize data extraction. The privacy by design framework does not resolve this political economy; it provides tools that only become effective at scale when the political economy changes.
🔗 Connection to Chapter 34: In Chapter 34, we examined the capitalism critique of surveillance — the argument that surveillance is not a failure of capitalism but a feature of it, an expression of capital's fundamental drive to extract behavioral data as a new form of raw material. This analysis implies that privacy by design, while useful as a technical framework, cannot be the primary response to surveillance capitalism. Technical solutions that do not address the economic incentives that produce surveillance will be adopted only by a minority of producers who can afford to sacrifice commercial advantage, while the mainstream continues to extract data because the market rewards extraction.
4.2 Technical Solutions Require Political Will
Differential privacy, federated learning, and end-to-end encryption are powerful technical tools. They are also tools that require political will to mandate and economic conditions to sustain.
Apple's App Tracking Transparency framework — which required apps to obtain explicit user permission before tracking across apps — was implemented unilaterally by Apple as a product decision. It was effective because Apple controls the iOS platform and could mandate compliance by all apps in the App Store. This kind of unilateral imposition of privacy standards is possible only for platforms with sufficient market power to make compliance non-optional for the vendors who depend on their platform.
The same technical approach applied by a small company competing in an open market with incumbents who do not implement it would simply cause the small company to lose competitive ground. Technical solutions reach the scale needed to change the surveillance landscape only when mandated by regulators, implemented by platform controllers with market power, or adopted voluntarily at sufficient scale to become competitive requirements.
Section 5: Jordan's Privacy Policy Exercise
5.1 The Assignment
Dr. Osei has assigned Jordan a class project: draft a privacy policy for "Hartwell Connect" — a hypothetical campus application that enables students to find study partners, coordinate group projects, share course notes, and access campus event information. Hartwell Connect would use campus card data to verify enrollment, collect location data to suggest nearby study spots, and enable direct messaging between students. The university is considering building the application and wants students in the surveillance seminar to evaluate its privacy implications.
5.2 Jordan's Draft
Jordan spends several evenings working through the privacy implications. The exercise turns out to be harder than they expected.
The first problem: it is difficult to know what to collect. Hartwell Connect's functions require some data — without knowing who is enrolled, you can't verify students; without knowing approximate location, you can't suggest nearby spaces. But the functional minimum and the technological maximum are very different. Jordan drafts what they think is a minimal data set, then asks: what does each data element actually enable? Which functions require which data?
The second problem: data flows are harder to bound than they initially appear. Hartwell Connect might share anonymous aggregate data with the facilities department to improve space allocation. It might share course affiliation data with student organizations. It might share event attendance data with the student government. Each of these secondary uses seems benign in isolation; together, they create the aggregation problem.
The third problem: retention is always assumed. Jordan realizes that no one has designed an automatic deletion mechanism for Hartwell Connect. By default, all data would persist indefinitely — forever. Designing out that default requires deliberate architecture.
Jordan's final draft policy includes these commitments:
Data minimization: Hartwell Connect collects only the following data: verified enrollment status (yes/no, from the Registrar's API — no other student record data); coarse location (campus zone, not GPS coordinates); user-created profile information (self-entered, not derived from other university systems); direct message content (end-to-end encrypted, not accessible to university administrators).
Purpose limitation: Data collected for study partner matching is used only for study partner matching. Location data is not retained beyond the session in which it was used. Message content is not accessible to university administrators or law enforcement except through formal legal process applied to the device (not the app's servers, since the app holds no unencrypted content).
Privacy as default: All profile information is private by default. Location sharing is off unless the user turns it on for a specific session. No behavioral analytics are conducted.
Retention: Enrollment verification data expires when the student graduates or leaves. Profile data is deleted 90 days after the student's last active session. Message data is end-to-end encrypted and not retained on servers beyond delivery.
Transparency: The university publishes an annual report on how many law enforcement requests were received, how many were complied with, and what categories of data were involved.
User controls: Students can access all data the application holds about them, download it in a portable format, and request deletion at any time.
💡 Intuition Check: Jordan's policy is better than most actual application privacy policies. It applies data minimization, purpose limitation, privacy as default, and transparency principles. It includes end-to-end encryption for messages. It specifies a meaningful retention period with automatic deletion. And yet Jordan knows it is incomplete: it does not address the privacy policy of whatever cloud infrastructure Hartwell Connect would run on; it does not address the possibility that the app could be redesigned after launch with different privacy characteristics; it does not address the governance question of who at Hartwell has access to aggregate usage data. Privacy by design is harder than it looks.
Chapter Summary
Privacy by design — the proactive engineering of privacy into systems rather than the reactive remediation of privacy violations — is the foundational principle of this chapter's approach. Ann Cavoukian's seven principles provide a framework for what privacy-respecting design looks like: privacy as the default, data minimization, purpose limitation, transparency, user-centricity.
The technical implementation of these principles draws on specific tools: differential privacy (mathematical noise mechanisms that allow aggregate analysis without individual exposure), federated learning (distributing model training so raw data never leaves the device), end-to-end encryption (communications content inaccessible to anyone but sender and receiver), and data localization (keeping data in jurisdictions with protective legal frameworks).
Policy responses — GDPR, the EU AI Act, municipal CCOPS ordinances, algorithmic auditing requirements — represent attempts to make privacy by design mandatory rather than optional, and to create accountability mechanisms when surveillance systems fail.
Jordan's privacy policy exercise illustrated the difficulty of translating good principles into concrete design: the functional minimum of data collection, the aggregation problem, the automatic assumption of permanent retention. It also illustrated what is achievable when someone tries.
The chapter ended with honesty about limits: privacy by design is necessary but not sufficient. Technical solutions operate within a political economy that rewards data extraction. The scale at which privacy protections matter requires political will, regulatory mandate, and changes in the economic incentives that currently produce the surveillance landscape this book has documented.
Key Terms
- Privacy by Design: Cavoukian's framework for proactive engineering of privacy into systems before data flows begin
- Privacy as the default: Design principle that maximum privacy is achieved without any action by the user; sharing requires affirmative choice
- Data minimization: Collection and retention of only the data necessary for a specific, stated purpose
- Purpose limitation: Restriction of data use to the purposes for which it was collected
- Privacy Impact Assessment (PIA) / Data Protection Impact Assessment (DPIA): Systematic evaluation of the privacy risks of a system before deployment
- Differential privacy: Mathematical framework for providing quantifiable privacy guarantees through calibrated noise addition; does not reveal individual-level data while enabling accurate aggregate statistics
- Epsilon (ε): The privacy budget parameter in differential privacy; smaller epsilon means stronger privacy guarantees and more noise
- Federated learning: Distributed machine learning approach in which model training occurs on individual devices with only model updates (not raw data) shared centrally
- End-to-end encryption (E2EE): Encryption scheme in which only sender and recipient can decrypt messages; the service provider holds no readable content
- Exceptional access: Law enforcement requests for technical mechanisms enabling access to E2EE communications; cryptographers argue this is technically incompatible with secure encryption
- Community Control Over Police Surveillance (CCOPS): Municipal ordinance model requiring democratic approval before police acquire surveillance technology
- Algorithmic auditing: Independent technical and policy review of AI systems for accuracy, fairness, and compliance
- GDPR: EU General Data Protection Regulation, the most comprehensive privacy regulatory framework in force globally
- EU AI Act: EU regulation providing risk-based framework for AI systems; prohibits real-time biometric identification in public spaces for law enforcement
Discussion Questions
-
Cavoukian's Principle 4 holds that privacy and functionality are "positive-sum, not zero-sum." Under what conditions is this true? Under what conditions does a real tension between privacy and functionality exist?
-
The chapter argues that privacy by design is "necessary but not sufficient." What would be sufficient? What combination of technical, legal, economic, and political changes would be required to produce a surveillance landscape substantially less invasive than the present one?
-
Jordan's privacy policy exercise revealed that designing for privacy is harder than it looks. What was the most significant design challenge Jordan encountered? Does this challenge suggest a limit to privacy by design as an approach?
-
The EU AI Act prohibits real-time biometric identification in public spaces for law enforcement purposes, with narrow exceptions. Evaluate this policy choice: what are its benefits and limitations? What alternative approaches would you consider?
-
Differential privacy provides mathematical privacy guarantees but requires accepting less precise data. How should regulators think about the epsilon parameter — who should set it, for what use cases, with what accountability?