37 min read

Every morning, Jordan Ellis reaches for their phone before their feet touch the floor. A quick scroll through Instagram, a weather check, a news headline or two. The ritual takes perhaps eight minutes, and Jordan thinks nothing of it — it is simply...

Chapter 11: The Data Economy — Your Attention Is the Product

Opening: The Invisible Transaction

Every morning, Jordan Ellis reaches for their phone before their feet touch the floor. A quick scroll through Instagram, a weather check, a news headline or two. The ritual takes perhaps eight minutes, and Jordan thinks nothing of it — it is simply how mornings work now.

What Jordan does not think about is the other side of those eight minutes. In that brief window, Jordan's phone has silently reported their location to seventeen different advertising networks, their scroll behavior has been logged by three data brokers, the time they spent hovering over a post about student loan forgiveness has been recorded, analyzed, and added to a behavioral profile — a profile that now estimates Jordan's political leanings with 73% confidence. None of this required Jordan to click anything, agree to anything visible, or type a single character.

They got the apps for free. The data was the price.

This chapter is about the economy that makes that transaction work — an economy so large, so diffuse, and so structurally embedded in everyday digital life that most participants never see it operating. We will examine what data is collected and how, who collects it and why, what it is worth and to whom, and — crucially — what it means for human autonomy when your behavior becomes a raw material extracted for someone else's profit.


11.1 A New Kind of Economy: Behavioral Data as Raw Material

Economic historians have a useful framework for understanding capitalism's evolution. Each era of capitalism is defined by the raw material it extracts from the environment. Industrial capitalism extracted coal, iron, and timber — physical resources that could be quantified, shipped, and sold. Agricultural capitalism extracted grain, cattle, and cotton — biological resources governed by seasons and soil. The capitalism that has emerged in the twenty-first century extracts something different: it extracts human experience.

Not experience in any romantic sense. Not wisdom, or memory, or feeling. What contemporary capitalism extracts is the trace that experience leaves — the click, the pause, the search query, the location ping, the purchase, the message, the hover. The digital detritus of living. Scholars have given this trace a name: behavioral residue.

Shoshana Zuboff, the Harvard Business School professor whose 2019 book The Age of Surveillance Capitalism provided the field's most comprehensive theoretical framework, describes this extraction in stark terms. She argues that a new economic logic has emerged — one that claims human experience as free raw material, translates it into behavioral data, uses much of that data to improve products and services, and claims the surplus data to fabricate "prediction products" that anticipate what we will do now, soon, and later. These prediction products are then sold in a marketplace Zuboff calls "behavioral futures markets."

The claim is sweeping, and some scholars have contested its scope and its implicit technological determinism. But its core observation — that the dominant business model of the internet is the monetization of user behavior — is, at this point, empirically uncontroversial.

💡 Intuition: Think of behavioral data the way a factory owner thinks of timber. The forest (your life online) is full of raw material. The factory owner does not need the whole tree — just the lumber. Data companies do not need your whole experience — just the behavioral patterns extractable from it. And like timber, your behavioral residue is not naturally occurring in a form that can be sold; it must be processed, cut, shaped, and standardized before it enters the market.

How the Data Economy Got Here: A Brief Genealogy

The data economy did not spring fully formed from a single inventor's mind. It emerged from the collision of three historical forces.

The first was the architecture of the early internet. When the World Wide Web went commercial in the mid-1990s, its designers made a foundational choice: access would be free, and costs would be offset by advertising. This was not a novel idea — radio and television had operated on the same model. But digital advertising had a property broadcast advertising did not: it was measurable. You could know exactly how many people saw an ad, whether they clicked it, and what they did afterward.

The second force was the invention of the cookie in 1994 (treated at length in Chapter 12), which for the first time allowed websites to recognize returning visitors and accumulate information about their browsing behavior. The cookie made personalization possible — and personalization made targeted advertising possible.

The third force was the realization, gradual and then sudden, that the most valuable thing about users was not their eyeballs but their patterns. In the early 2000s, Google discovered what would become the template for modern surveillance capitalism: the behavioral data generated by users searching for information was more valuable than the searches themselves. That data could be used to sell advertising that was orders of magnitude more effective than any prior medium. The model worked so well that it became the template — copied, refined, and extended across every major platform that followed.

🔗 Connection: In Chapter 5, we introduced Zuboff's framework alongside Foucault and Bentham. Here we move from theory to mechanism: how the economic logic of surveillance capitalism actually operates in practice. The full critique — its implications for democracy, freedom, and selfhood — will return in Chapter 34.


11.2 The Attention Economy: Advertising as Foundation

Before we can understand the data economy, we must understand the advertising economy that underlies it, because the collection of behavioral data is not an end in itself. It is a means to sell advertising more effectively. And advertising, in the digital context, requires a commodity more fundamental than data: attention.

Herbert Simon, the Nobel Prize-winning economist and cognitive scientist, observed in 1971 that "a wealth of information creates a poverty of attention." The problem of an information-rich world, Simon argued, is not scarcity of information but scarcity of the human attention available to process it. Attention became a resource — finite, rivalrous, and therefore economically valuable.

By the 2010s, the major technology companies had explicitly organized their businesses around the capture and sale of human attention. The mechanism is straightforward: platforms create compelling content environments (social media feeds, search results, video recommendations) that attract users and keep them engaged. They then sell that engagement — that captive attention — to advertisers. The longer users stay, the more attention they deliver, the more inventory the platform has to sell.

This is why "engagement" became the dominant metric of platform success. Not user satisfaction. Not information quality. Not wellbeing. Engagement — which correlates with time spent, interactions produced, and emotional activation (positive or negative). The internal documents from Facebook, revealed through the 2021 whistleblower disclosures, showed explicitly that the company's own researchers found that its systems were optimizing for engagement in ways that amplified outrage, anxiety, and divisive content because these generated more interaction than calm or nuanced material.

📊 Real-World Application: The digital advertising market generated approximately $600 billion in global revenue in 2023. Google and Meta together account for roughly half of all digital advertising revenue in the United States. This duopoly is built entirely on the attention-capture model: create a platform compelling enough to attract users, monitor their behavior comprehensively, and sell that behavioral intelligence to advertisers.

The attention economy has structural consequences that extend far beyond which advertisements you see. If platforms profit from engagement, and engagement is maximized by content that provokes strong emotional responses, then the platform has a financial incentive to surface outrage, fear, and controversy over accuracy, nuance, and calm. The surveillance of your attention does not just tell advertisers which ads to show you; it shapes the information environment you live in.

⚠️ Common Pitfall: Students sometimes assume that if they ignore ads, the surveillance system fails and becomes irrelevant. This misunderstands the architecture. Even users who never click ads generate valuable behavioral data. Your patterns, preferences, and reactions have value regardless of whether you convert to a purchase. The data extracted from your behavior is sold and used in ways that have nothing to do with whether you bought the shoes in that banner ad.


11.3 What Data Is Collected: A Taxonomy of Behavioral Residue

"Data" is a frustratingly vague term. When companies say they "collect data," the statement is technically accurate but profoundly incomplete. Understanding the data economy requires a more precise vocabulary. Scholars and industry analysts have developed a four-category taxonomy that distinguishes between types of data based on how they originate.

Declared Data

Declared data is what users explicitly and knowingly provide. When you create an account and enter your name, email address, date of birth, and zip code, that information is declared. When you fill out a survey, complete a purchase form, or answer a questionnaire, the data you enter is declared.

Declared data has legal and ethical status that is relatively clear: you typed it in, you knew it was being collected, and there is usually a terms-of-service document (however unread) that covers its use. This does not make declared data unproblematic — data brokers routinely purchase declared data from retailers, loyalty programs, and subscription services and combine it with other sources — but its collection is at least visible to the user.

Observed Data

Observed data is generated by your behavior but collected without your active input. When a website logs which pages you visit, how long you spend on each, which links you click, and where your cursor moves, it is collecting observed data. When your smartphone reports your location every few minutes, that is observed data. When a loyalty card program tracks every item you purchase, that is observed data.

Observed data is far more voluminous than declared data. A single user visiting a news website might generate thousands of data points per session: page loads, scroll positions, hover events, click events, time stamps. Multiply this by billions of users across thousands of websites and the scale becomes almost incomprehensible. The Princeton WebTAP study, which analyzed tracking on the top 1 million websites, found that 88% of them transmitted user data to at least one third party — and the average page loaded resources from over twenty different tracking domains.

Inferred Data

Inferred data is generated not by collection but by analysis. It is what data companies calculate about you based on patterns in your declared and observed data. If your purchase history shows frequent purchases of baby items in March 2023, a retailer can infer with high probability that you had or were expecting a baby around that time. If your browsing behavior shows repeated visits to cancer support forums, an algorithm can infer a probable health condition. If your political news consumption skews heavily toward certain outlets, a political leaning can be inferred.

The famous Target pregnancy prediction case, documented by Charles Duhigg in The New York Times in 2012, illustrated the power and the invasiveness of inference. Target's data scientists discovered that women who bought certain combinations of products — unscented lotion, magnesium supplements, a large purse — were frequently pregnant and, crucially, that pregnant women were undergoing a period of shopping habit formation that made them unusually valuable targets for retail loyalty cultivation. Target began mailing targeted pregnancy-related coupons to women before they had announced their pregnancies — in one case, before a teenager's father had learned she was pregnant.

Inferred data occupies a deeply ambiguous ethical space. No one "collected" the pregnancy inference — it emerged from patterns. The data subject never provided the information. But the inference may be more accurate, and more intimate, than the declared data.

Derived Data

Derived data is a subset of inference that describes data generated by combining two or more sources. Your credit score is derived data — it does not exist in any raw form but is calculated from your payment history, debt levels, credit utilization, and account age. A "psychographic profile" combining your social media behavior, purchase history, and location data is derived data.

Derived data is particularly powerful because it can reveal things that no single data source could. Your location history might not reveal your religion. Your purchase history might not reveal your political views. But combined, they might produce a profile that infers both with high accuracy. This combination effect is one reason privacy researchers emphasize that the aggregation problem — the way that combining individually innocuous data points can produce deeply invasive profiles — is central to understanding modern surveillance.

🎓 Advanced: Information theorists distinguish between data and information using the concept of entropy. "Data" is raw signal; "information" is the reduction of uncertainty that data provides. When data companies talk about the value of their data, what they mean is its information content — its ability to reduce uncertainty about what you will do, buy, believe, or want. Derived and inferred data typically have the highest information content because they represent the most powerful reduction of uncertainty about the data subject.

📝 Note: The four-category taxonomy (declared, observed, inferred, derived) was developed in part to support regulatory frameworks that distinguish between data subjects' expectations and their actual exposure. Europe's GDPR distinguishes between data "provided by the data subject," data "observed" about the subject, and data "inferred or derived" from other data. Each category may carry different legal obligations. We will examine these regulatory frameworks in detail in Chapter 28.


11.4 Metadata: The Map Is the Territory

One of the most persistent misconceptions about digital surveillance is that the content of communications is where the risk lies. The intuition is understandable: if no one reads your messages, what is the harm? But this intuition systematically underestimates the power of metadata — data about data.

Metadata describes the context of communication rather than its content. For a phone call, metadata includes: who called whom, when the call was made, how long it lasted, and the location of both phones at the time. For an email, metadata includes: sender, recipient, timestamp, and subject line. For a website visit, metadata includes: which site, what time, how long, from which device.

Stewart Baker, the former NSA General Counsel, acknowledged in a 2013 conference panel that "metadata absolutely tells you everything about somebody's life." His candor was unusual but his point was accurate. A sequence of metadata can reveal:

  • A person who called their doctor, then a specialist, then a cancer treatment center
  • A person who attended a mosque on Friday afternoons and a gun shop on Saturday mornings
  • A person whose phone was in the vicinity of a political protest on a specific date
  • A person whose browsing metadata (timestamps, site categories, session durations) suggests insomnia or depression

The content of any individual communication might be innocuous. But the pattern of communications — the metadata — reveals behavior, relationships, health, beliefs, and vulnerabilities in ways that content alone cannot.

This is why data brokers are often as interested in behavioral metadata as in message content, and why the legal protections around metadata have historically been weaker than those around content. In the United States, the Third Party Doctrine — the legal principle established in Smith v. Maryland (1979) — holds that information voluntarily shared with a third party (including phone companies) carries no reasonable expectation of privacy. This doctrine, developed for a world of landlines and paper records, now governs an era of continuous metadata generation by billions of smartphones.


11.5 The Data Broker Industry: The Hidden Market for You

Most people who think about data collection think about the platforms they use — Google, Facebook, Amazon. These are visible actors with recognizable products. But there exists a parallel industry, largely invisible to consumers, that is in many ways more comprehensive in its data collection and more opaque in its operations: the data broker industry.

Data brokers are companies whose primary business is the collection, aggregation, sale, and licensing of personal data about individuals who are not the brokers' direct customers. They do not have a product that users pay for. They have a product that users are. Their customers are businesses, insurers, marketers, employers, landlords, law enforcement agencies, political campaigns, and anyone else who wants to know about other people.

The data broker industry is large, diffuse, and poorly regulated. The Federal Trade Commission estimated in 2012 that there were approximately 4,000 data broker companies in the United States alone. The industry generates revenues estimated at over $250 billion annually. The three largest companies — Acxiom, Experian, and LexisNexis — are publicly traded, institutionally respectable, and almost entirely unknown to the people whose data they sell.

Acxiom: The Quiet Giant

Acxiom, based in Conway, Arkansas, describes itself as a "data and technology company" that helps businesses "connect with the right customers." What this means in practice is that Acxiom maintains a database of approximately 300 million American consumers, with an average of 1,500 data points per individual. Their profiles include:

  • Demographics: age, gender, race (inferred), marital status, household income, education level
  • Property: home ownership, estimated home value, mortgage details
  • Vehicles: makes, models, years, estimated purchase prices
  • Health: inferred health conditions, prescription drug categories, medical specialties visited
  • Finances: credit card usage categories, estimated net worth, investment behaviors
  • Lifestyle: hobbies, interests, religious affiliation (inferred), political affiliation (inferred)
  • Behavioral: purchase histories, catalog orders, magazine subscriptions, donation history
  • Social: estimated household composition, presence of children, ages of children

None of this data was obtained through any direct relationship with the individuals profiled. It was purchased, licensed, scraped, and aggregated from thousands of sources: retailer loyalty programs, public records, consumer surveys, real estate databases, vehicle registration records, credit card transaction processors, and hundreds of other streams.

📊 Real-World Application: In 2012, Acxiom launched a consumer portal called "AboutTheData" that allowed individuals to see some of what the company held about them. The experiment was instructive. Users who checked their profiles found information they had never consciously provided: their estimated income bracket, their "life stage" category, their inferred health interests, their modeled political behavior. The profiles were sometimes wrong — but they were often unsettlingly right.

Experian: Credit Bureau to Data Broker

Experian is best known as one of the three major credit bureaus, but its data brokerage operations have expanded far beyond credit reporting. Experian's marketing services division sells what it calls "audience intelligence" — detailed profiles of consumer segments that go well beyond creditworthiness. Its "ConsumerView" database covers approximately 300 million consumers and 126 million households in the United States.

The credit bureau origins of companies like Experian create a particularly interesting regulatory problem. Credit data is governed by the Fair Credit Reporting Act (FCRA), which provides consumers with certain rights — including the right to see their credit report and to dispute inaccurate information. But Experian's marketing data operations are not classified as "consumer reports" under the FCRA, and therefore do not carry the same protections. The same company, handling data from the same consumers, operates under two entirely different legal regimes depending on how the data is used.

LexisNexis: The Law Enforcement Vendor

LexisNexis, perhaps best known as a legal research platform, also operates one of the largest data brokerage operations in the world. Its "Risk Solutions" division sells data and analytics to insurance companies, law enforcement agencies, government agencies, and financial institutions. Unlike Acxiom and Experian, LexisNexis specializes in what it calls "public records" data — court records, property records, bankruptcy filings, address history, professional license records — combined with purchased commercial data.

LexisNexis has a significant law enforcement clientele. Its "CLEAR" platform is used by thousands of police departments and federal agencies to investigate individuals by aggregating public and commercial records. The platform allows investigators to map a person's address history, known associates, vehicle registrations, and social media presence in minutes. This represents a form of what scholars call "function creep" — data collected for one purpose (credit risk assessment, legal research) being used for a substantially different purpose (law enforcement investigation) without the knowledge or consent of the data subjects.

🔗 Connection: The concept of function creep — introduced in Chapter 5's theoretical framework — is central to understanding data broker operations. The data that Experian collects to assess creditworthiness, that LexisNexis collects to support legal research, and that Acxiom collects from retailer loyalty programs were not collected with law enforcement, political targeting, or insurance pricing in mind. But that is exactly how they are used.


11.6 What Your Data Is Worth: The $0.0005 Economy

Understanding the commercial logic of the data economy requires grappling with a number that seems, at first glance, to be trivially small: the per-record price at which personal data is typically bought and sold.

Raw individual records — a name, email address, and zip code — sell for approximately $0.0005 to $0.002 in bulk data markets. That is less than a fraction of a penny. The apparent implication — that your data is nearly worthless — is a misreading of the economics. The low per-record price reflects the extreme volume at which data trades. When you purchase a list of one million email addresses for $500, you are paying a half-cent per record, but you are also addressing one million people with a single marketing message. The scale transforms value.

But the per-record price also varies enormously based on data quality, specificity, and intended use. A raw email address has low commercial value. An email address linked to a confirmed purchase of high-end cookware, a household income over $150,000, and an interest in gourmet food has substantially higher value. A health data record linked to a specific diagnosis can sell for $50 or more. Financial data can sell for hundreds of dollars per record in specialized contexts.

The overall economic logic requires two layers of understanding:

Layer One: Aggregate Value vs. Individual Value. Your data, individually, is worth very little. Your data, combined with the data of millions of others and used to model population-level behavior, is worth an enormous amount. This is why surveillance infrastructure must be massive to be economically viable. Google and Facebook are not valuable because they know a lot about you; they are valuable because they know a lot about everyone.

Layer Two: Prediction Value. The most economically valuable form of data is not historical record but predictive intelligence. An advertiser does not just want to know what you bought; they want to know what you are about to buy. An insurer does not just want to know your health history; they want to know your health trajectory. The value of data is largely the value of the predictions it enables — which is why inferred and derived data often command higher prices than raw observed data.

💡 Intuition: Think of the difference between iron ore and steel. Iron ore has value, but steel — iron ore that has been processed, refined, and structured for a specific purpose — is worth far more. Raw behavioral data is like iron ore; predictive models trained on that data are like steel. The data pipeline that transforms raw behavioral residue into actionable prediction is where the real economic value is created.

🌍 Global Perspective: The economics of data differ across jurisdictions. In the European Union, GDPR has meaningfully constrained some forms of data trading by requiring explicit consent for many uses. In practice, this has created a situation where European users' data is sometimes treated differently by major platforms — with more opt-out mechanisms, more privacy notices, and somewhat less aggressive secondary use. Critics argue this creates a two-tier system: European users as citizens with rights; American users as products without them. The contrast illuminates how legal architecture shapes the surveillance economy.


11.7 The Data Pipeline: From Behavior to Commodity

The journey from your morning scroll to an advertising auction takes place through a structured infrastructure scholars call the data pipeline. Understanding this pipeline — even at a general level — demystifies the claim that data collection is somehow natural, inevitable, or technically necessary.

Stage One: Collection

Collection is the first stage and the most varied. Data enters the pipeline through multiple channels simultaneously:

First-party collection occurs when a platform collects data directly from its own users — Facebook logging your posts and reactions, Google logging your search queries, Amazon logging your purchases and product views. This data carries the highest quality and the most direct relationship to user intent.

Third-party collection occurs when code embedded in websites — tracking pixels, cookies, JavaScript libraries — reports your behavior back to advertising networks, analytics companies, and data brokers that have no direct relationship with you. When you visit a news site, the news organization may have embeddded fifty different third-party scripts, each reporting your presence and behavior to a different company. You have never heard of most of them, agreed to their terms, or had any interaction with their products.

Offline data collection occurs through loyalty programs, purchase records, warranty registrations, and public records. This "offline" data is increasingly linked to online behavioral profiles through a technique called "offline-to-online matching" — matching email addresses, phone numbers, or other identifiers common to both offline and online records.

Passive device collection occurs through the sensors embedded in smartphones, smart TVs, and IoT devices. Accelerometers, GPS chips, microphones (in devices with wake words), and Wi-Fi radios all generate data streams that can be harvested with appropriate software permissions.

Stage Two: Aggregation

Raw data collected from individual sources has limited value. Aggregation — the combination of data from multiple sources into comprehensive profiles — is where value accumulates. Data management platforms (DMPs) specialize in this aggregation function, accepting data streams from hundreds of sources and merging them into unified consumer profiles.

The technical challenge of aggregation is matching — determining that the Jordan Ellis who visited a news site, the Jordan Ellis who used a loyalty card at a pharmacy, and the Jordan Ellis who created an Instagram account are the same person. This identity resolution uses a variety of linking mechanisms: email addresses, phone numbers, device identifiers, IP addresses, browser fingerprints, and probabilistic matching algorithms that can link records even without a common identifier.

Stage Three: Analysis

Aggregated profiles are analyzed to produce the behavioral scores, audience segments, and predictive models that are the actual commercial products. This analysis ranges from simple categorization (placing users in demographic buckets) to sophisticated machine learning (building predictive models of purchase behavior, health outcomes, or political persuasion).

The analysis stage is where inference and derivation occur — where patterns in observed data are used to produce inferred characteristics that the data subject never provided. A user who has been assigned to an "in-market for new car" audience segment did not declare that interest; it was inferred from patterns in their browsing behavior.

Stage Four: Monetization

Monetization takes multiple forms. The most common is advertising: profiles and audience segments are sold to advertisers who use them to target specific individuals or audience types. But data is also monetized through direct sale to third parties, through licensing for research and analytics, through government contracts, and through its incorporation into products like credit scoring, insurance pricing, and employment screening.

📊 Real-World Application: The real-time advertising auction that delivers the ad you see when you load a webpage represents a compressed version of the entire data pipeline. In approximately 100 milliseconds — while the page is loading — your browser identifier triggers a query to a data management platform, which retrieves your behavioral profile, which is sent to an ad exchange, where multiple advertisers bid to show you their ad, with the auction winner delivering an ad tailored to what the data pipeline predicts about your likely receptivity. This entire process — collection, aggregation, analysis, monetization — happens dozens of times on a single webpage load. Chapter 14 examines the real-time bidding system in depth.


11.8 The Data Broker Profile: What They Actually Know

Abstract descriptions of data categories are one thing. Encountering an actual profile is another. The following is a composite representation of the kind of profile a data broker like Acxiom might hold about a person like Jordan Ellis, constructed from categories publicly documented in FTC reports, academic studies, and data broker company disclosures.


Consumer Profile: Composite Individual Based on documented data broker categories

Demographics: - Age range: 20-24 - Gender: M/F/Unknown (brokers typically infer from name) - Race/ethnicity: Inferred from name, neighborhood demographics, and purchasing patterns (used in some datasets; explicitly prohibited in others but present through proxy variables) - Household income bracket: $25,000-$35,000 - Education: Some college (no degree yet) - Renter: Yes (apartment)

Financial: - Estimated credit score range: 620-680 - Credit card utilization: High (inferred from thin credit file) - Student loan debt: Present (inferred from age and education data) - Investment accounts: No - Recent major purchases: Electronics, athletic wear, fast food

Health (Inferred): - Health insurance status: Unknown/possibly employer or university - Interest categories: Fitness, mental health (inferred from search behavior) - Prescription categories: Unknown

Lifestyle and Interests: - Interest segments: Music streaming, social media, fast food, budget travel - Political affiliation: Leaning Democratic (inferred; 65% confidence) - Religious affiliation: Unknown/not inferred - Presence of children: No - Pet ownership: Unknown

Behavioral: - Online activity level: High - Primary device: Mobile - Shopping behavior: Research-heavy, price-sensitive, rarely converts on first visit - Response to email marketing: Low open rate - Estimated "lifetime value": $1,200 (low — young, low income, price-sensitive)


The profile is estimated, probabilistic, and in some respects wrong. But it is not a random guess — it is a systematic inference from real behavioral traces. And it will be used — for advertising targeting, for credit decisioning, for employment screening, for insurance pricing — as if it were authoritative.


11.9 Jordan's Scenario: The Request for a Data Report

Jordan Ellis had heard about data brokers before — Yara had mentioned them at a rally, Marcus had dismissed the concern as "conspiracy stuff." After a particularly frustrating conversation about privacy with both of them, Jordan decided to do something concrete: find out what data brokers actually knew.

The process was more difficult than it should have been. Jordan started by googling "see what data brokers know about me," which led to a series of websites, some legitimate, some themselves data-collecting operations dressed up as consumer advocacy. After an hour of research, Jordan had a list of five major brokers: Acxiom, Spokeo, Intelius, PeopleFinder, and Epsilon.

Each had a different opt-out process. Acxiom required creating an account (thereby providing more data to the company) to request a file. Spokeo required a copy of a government ID. Intelius' opt-out form worked on the fourth try. One company's opt-out link led to a 404 error.

When the Acxiom file finally arrived — formatted as a somewhat opaque CSV — Jordan was surprised by what it contained and what it didn't. It did not contain the complete profile Jordan had feared. But it contained enough: an estimated income bracket that was slightly wrong, an interest category labeled "budget-conscious shopper" that stung a little because it was accurate, and a category called "politically engaged — liberal-leaning" that Jordan hadn't told anyone.

Jordan showed the file to Marcus, who said, "That's not that bad, honestly. They don't even have your real income."

"They have a version of me," Jordan said. "That's being sold to people who make decisions about me. That's not nothing."

Dr. Osei, in the next class session, offered a framework for thinking about it: "The issue isn't whether any single data point is accurate or inaccurate. The issue is that you have a file — compiled without your knowledge, used without your consent, sold to parties you've never heard of, for purposes you were never told about. The accuracy question is secondary. The structural question is primary."

💡 Intuition: The experience of requesting your data broker file is instructive precisely because of its friction. The difficulty of the opt-out process — the different procedures, the broken links, the account-creation requirements — is not accidental. Friction in the opt-out process is a design choice. It is, in effect, a structural form of consent management that systematically favors data retention over data subjects' rights.


11.10 Visibility Asymmetry and the Data Economy

The data economy is, at its structural core, a visibility asymmetry engine. Those who operate the data pipeline have extraordinary visibility into the behavior, characteristics, and vulnerabilities of hundreds of millions of people. The people who generate that data have almost no visibility into how it is collected, what it contains, who has access to it, or how it is used.

This asymmetry has several dimensions:

Epistemic asymmetry: Data companies know things about you that you do not know about yourself — or, more precisely, that you have never articulated to yourself. The aggregate of your behavioral traces may reveal patterns in your mental state, your health trajectory, your financial vulnerability, or your susceptibility to persuasion that you would be surprised to learn have been identified and sold.

Temporal asymmetry: Data is archived. Your behavioral residue from 2019 — from a period of your life you may have moved beyond, from a context you no longer inhabit — may still be in a data broker's file, shaping how you are categorized in 2026. You have no mechanism to expire information that is old, contextually misleading, or simply no longer accurate.

Remedial asymmetry: If a data broker's profile contains inaccurate or harmful information, the practical remedies available to the data subject are limited. Unlike credit reporting (where the FCRA provides dispute rights), most data broker data carries no right of correction, no right of deletion, and no right to know how it has been used.

Power asymmetry: The companies that operate the data pipeline have lobbied successfully against comprehensive federal privacy legislation in the United States for decades. They have legal resources, regulatory relationships, and economic power that dwarf those of individual data subjects. The asymmetry is not just informational; it is structural and political.

🎓 Advanced: Some theorists, drawing on the concept of informational self-determination (a right recognized by Germany's Constitutional Court in 1983), argue that the data economy fundamentally violates a basic dignity right — the right to control information about oneself. On this view, the data broker industry is not merely inconvenient or annoying but constitutively wrong: it treats persons as means rather than ends by commodifying their behavioral traces without their meaningful consent. This Kantian framing contrasts with consequentialist analyses that focus on the harms data use produces. Both perspectives are examined in Chapter 27.


11.11 The Illusion of the Free Service

Perhaps the most effective ideological support for the data economy is the framing of digital platforms as "free." Google is free. Facebook is free. Instagram is free. Spotify has a free tier. The word "free" does the work of obscuring the actual transaction.

"If you're not paying for the product, you are the product" is a saying that has become so familiar it has lost some of its analytical force. But it identifies something real: the word "free" in the data economy is a commercial designation, not a description of the transaction. Users of "free" platforms pay with something other than money — they pay with behavioral data, which is then monetized by the platform.

The economist and legal scholar Lior Strahilevitz has proposed a thought experiment that clarifies the situation: What if users were paid for their data rather than providing it for free? If Google paid users one cent per search query (a generous estimate of individual data value), a typical user conducting 1,000 searches per year would earn $10 annually — roughly consistent with the value estimates in data markets. But this framing makes visible something that the "free service" model obscures: there is a transaction happening, there is value being exchanged, and the terms of that transaction were set by one party without the meaningful participation of the other.

The philosopher Michael Sandel has argued that there is something deeply important about which goods are allocated by markets and which are not — and that the transformation of personal experience into marketable behavioral data represents a marketization of something that was not previously a commodity. The question is not just whether users are being paid fairly; it is whether human behavioral experience should be a commodity at all.

These philosophical questions will receive a fuller treatment in Chapter 34. For now, the operative point is practical: the "free service" framing should be understood as a rhetorical construction that obscures the actual economic relationship between platforms and users. The service is not free; the payment is just invisible.


11.12 Structural vs. Individual Explanations

It is tempting to respond to the data economy with individual-level solutions: install an ad blocker, delete your cookies, refuse loyalty cards, use a VPN. These responses are not wrong — we will discuss practical counter-surveillance tools in Chapter 32 — but treating the data economy as a problem of individual behavior systematically misdiagnoses its character.

The data economy is a structural phenomenon. It is embedded in the business models of the dominant information platforms, the technical architecture of the internet, the regulatory frameworks (or lack thereof) that govern data markets, and the economic incentives that reward data collection at scale. An individual who successfully opts out of several data broker databases is, in effect, rearranging furniture in a burning building. The opt-out does not change the structure that produced the data collection in the first place.

This is not an argument against individual action — it is an argument for understanding the limits of individual action without structural change. Every major advance in data privacy protection — GDPR, the California Consumer Privacy Act, the Children's Online Privacy Protection Act — has been a regulatory achievement that changed the structure, not just the behavior of individual consumers.

The structural diagnosis also illuminates why the data economy's problems cannot be solved by market competition. Users who prefer privacy cannot meaningfully signal that preference through market behavior when the underlying business model of the most useful digital services depends on data collection. The market failure is fundamental: users cannot choose a version of Google that does not surveil them, because a non-surveillance Google would be less functional, less profitable, and effectively a different product. The surveillance is not incidental; it is constitutive.

✅ Best Practice: When analyzing any aspect of the data economy, practice distinguishing between structural and individual explanations. Ask: Is this problem solvable by individual behavior change? Or does it require changes to the rules, incentives, and architecture of the system? The answer shapes both the analysis and the appropriate response. Individual privacy hygiene matters — but it cannot substitute for systemic reform.


Summary: The Economy Built on Watching

The data economy is not a side effect of the digital world. It is the digital world's dominant economic logic — the engine that funds the platforms, services, and tools that most people use every day. It operates through the extraction of behavioral residue — the traces of digital life — which is collected, aggregated, analyzed, and monetized at industrial scale.

Understanding this economy requires understanding its components: the attention-based advertising model that creates the demand for behavioral data; the taxonomy of declared, observed, inferred, and derived data; the metadata problem; the data broker industry that operates outside users' direct relationships with platforms; the data pipeline that transforms behavior into commodity; and the visibility asymmetry that gives data companies profound insight into people who have almost no insight into the companies.

Jordan's struggle to navigate the data broker landscape — finding the opt-out processes deliberately opaque, encountering a profile that was partly wrong and partly uncomfortably accurate, and having no reliable mechanism to correct or delete it — is not an individual failure. It is the designed experience of a system built to maximize data collection and minimize meaningful user control.

In Chapter 12, we will descend one level deeper into the technical infrastructure of the data economy: the cookies, tracking pixels, and third-party ecosystems that form the collection layer of the data pipeline. Chapter 14 will examine what the pipeline produces — the behavioral targeting and real-time bidding systems that are the pipeline's commercial output. And Chapter 34 will return to Zuboff's framework for the full critique: what it means, as a political and philosophical matter, to live in a surveillance capitalism.


Key Terms

Attention economy — An economic framework in which human attention is the scarce resource that media and technology companies compete to capture and sell to advertisers.

Behavioral residue — The traces of behavior — clicks, searches, locations, purchases — left by individuals in the course of digital activity, which are collected and used as raw material in the data economy.

Data broker — A company whose primary business is the collection, aggregation, and sale of personal data about individuals who are not the company's direct customers.

Data pipeline — The infrastructure through which behavioral data moves from collection through aggregation and analysis to commercial monetization.

Declared data — Data explicitly and knowingly provided by users, such as name, email address, or survey responses.

Derived data — Data generated by combining two or more data sources, such as a credit score or a psychographic profile.

Identity resolution — The technical process of linking records from different data sources to construct a unified profile of a single individual.

Inferred data — Data generated by analysis of observed patterns — characteristics that the data subject never provided but that algorithms calculate from behavioral traces.

Metadata — Data about data — describing the context of communication or activity rather than its content.

Observed data — Data generated by user behavior but collected without active user input, such as website visit logs, location pings, or purchase tracking.

Surveillance capitalism — Shoshana Zuboff's term for the economic logic that claims human behavioral experience as free raw material, transforms it into behavioral data, and monetizes it through the prediction and modification of human behavior.


Discussion Questions

  1. The data economy is often defended on the grounds that "free" services provide genuine value. Evaluate this defense. What would it mean for users to be genuinely compensating platforms for the services they receive? What would it mean for platforms to be genuinely compensating users for the data they provide?

  2. The four categories of data — declared, observed, inferred, and derived — carry different ethical implications. Rank them from most to least ethically problematic, and defend your ranking. Consider: Does the subject's awareness of collection matter? Does accuracy matter? Does use matter?

  3. Stewart Baker, former NSA General Counsel, said that metadata "tells you everything about somebody's life." Does this claim hold up? Consider cases where metadata might be misleading or insufficient. What does the claim get right, and where does it overstate?

  4. Jordan's roommate Marcus argues that data brokers aren't really a problem because the information they hold is sometimes wrong. Construct the strongest version of Marcus's argument. Then construct the strongest counter-argument. Which do you find more persuasive, and why?

  5. The chapter argues that the data economy's problems cannot be solved by market competition because "the surveillance is not incidental; it is constitutive." Evaluate this claim. Can you construct a counter-example — a successful business model that is both data-driven and genuinely privacy-respecting?


Chapter 11 of 40 | Part 3: Commercial Surveillance Backward reference: Chapter 5 (Theoretical Frameworks) Forward references: Chapter 12 (Cookies and Tracking), Chapter 14 (Behavioral Targeting), Chapter 34 (Surveillance Capitalism: Full Critique)