48 min read

> "The road to hell is paved with good intentions — and A/B tests."

Chapter 6: Who Builds These Systems?

The Engineers, Ethicists, and Incentives Inside Platform Companies


"The road to hell is paved with good intentions — and A/B tests."

— Anonymous former Facebook product manager, quoted in The Atlantic, 2022


Introduction: The Problem With Blaming Individuals

When a social media platform recommends content that radicalizes a teenager, or an algorithm amplifies misinformation during an election, or a notification system is engineered to interrupt sleep — our first instinct is to look for a villain. Who built this? Who decided this was acceptable? Whose fault is it?

This instinct is not unreasonable. Accountability matters. But it is also frequently misleading, because the most consequential harms produced by platform technology rarely emerge from individual malice. They emerge from systems — from the accumulated weight of thousands of individually reasonable-seeming decisions made by people who were, in most cases, doing what they were hired, trained, evaluated, and rewarded to do.

This chapter is about those people: the engineers, product managers, designers, data scientists, and ethicists who build the systems examined throughout this book. It is not an exculpatory account — the structural argument does not dissolve individual responsibility. But it is an accurate account, and accuracy requires understanding that most people who build engagement-maximizing systems are not mustache-twirling villains. They are bright, motivated, often genuinely idealistic human beings operating inside organizations with specific cultures, incentive structures, and constraints that shape what is possible, what is rewarded, and what gets built.

Understanding who these people are — and what forces act on them — is essential for anyone who wants to change what these systems do. Blaming engineers without changing incentives produces moral theater. Changing incentives without understanding culture produces policy that founders on implementation. The gap between intent and effect, one of the central themes of this book, is nowhere more visible than here: in the space between what engineers say they want to build and what the systems they build actually do.

This chapter will argue something that might seem contradictory at first: individual engineers genuinely matter AND structural incentive change is genuinely necessary. These are not competing claims. They are complementary ones, and the failure to hold both simultaneously — the tendency to collapse into either "engineers are villains" or "engineers are helpless cogs" — is itself one of the obstacles to meaningful change.


Who Works at These Companies?

The Talent Pipeline

The engineers who build the world's most consequential social platforms are not a random sample of humanity. They are drawn from a remarkably narrow pipeline. A 2019 analysis of LinkedIn profiles for software engineers at Facebook, Google, Twitter, and Amazon found that roughly 40% had attended just twelve universities — MIT, Stanford, Carnegie Mellon, UC Berkeley, and a small cluster of other elite institutions. The overwhelming majority hold computer science, mathematics, or electrical engineering degrees. Humanities backgrounds are rare; philosophy and ethics training, rarer still.

This concentration matters because educational environment shapes not just technical skills but conceptual frameworks — the implicit models people carry about what problems are worth solving, what metrics constitute success, and what questions are even worth asking. A training environment that prizes optimization, measurability, and elegant algorithmic solutions produces engineers who are exceptionally good at optimizing measurable things, and who may systematically undervalue things that resist measurement — like human dignity, social trust, or the long-term psychological health of a population.

Graduate computer science programs at elite institutions teach students to think about problems in terms of efficiency, scalability, and correctness. A function is correct if it produces the specified output for all valid inputs. A system is efficient if it processes operations with minimal resource expenditure. A design is scalable if it maintains performance as load increases. What these criteria do not include: whether the output is good for the people affected by it, whether the efficiency serves human ends, whether scaling is desirable when scaling means distributing a harmful design to a billion users instead of a thousand.

This is not a criticism of computer science education as such. Technical rigor is genuinely valuable. But a workforce trained almost exclusively in technical rigor, deployed to build systems with enormous social consequences, will produce systems that are technically rigorous and socially consequential in ways that the workforce was never trained to evaluate.

The compensation structures reinforce this concentration. As of 2023, total compensation packages for senior software engineers at Meta, Google, and Apple routinely exceeded $300,000 to $500,000 annually, including stock grants. These packages attract talent — but they also create a specific kind of employee: financially invested in the company's success, often carrying significant mortgage or lifestyle obligations tied to continued high income, and acutely aware that exit from the platform economy means a substantial pay cut. An engineer who has built her life around a $400,000 annual compensation package has a different relationship to internal ethical dissent than an engineer who arrived last year with minimal financial obligations. Financial stake in company performance is not a trivial consideration when evaluating the structural incentives for internal ethics work.

Demographics of the Engineering Workforce

The tech workforce at major platforms is young, predominantly male, and disproportionately Asian American and white. A 2022 diversity report from Meta revealed that women constituted 24.6% of its technical workforce. Black employees represented 3.9% of technical staff; Hispanic or Latino employees, 6.6%. Google's 2023 numbers tell a similar story: women at 33.4% of the technical workforce, Black employees at 5.5%, Hispanic or Latino at 7.3%.

These numbers are not merely a fairness concern, though they are that. They are also a design concern. Who builds a system shapes what that system does, whose experience it centers, whose blind spots it embeds. An engineering team that is demographically homogeneous will have shared blind spots — not because of bad intentions, but because shared life experience produces shared assumptions. When Facebook engineers designed systems for communities in Myanmar, Ethiopia, and India, they were designing for populations whose social contexts, political tensions, and risks from viral misinformation differed radically from Silicon Valley. The gap between designer and designed-for is not incidental to the harms that resulted; it is structural.

Researcher Safiya Umoja Noble, in her 2018 book Algorithms of Oppression, documented how search algorithm design reflected the assumptions and blind spots of the mostly white, mostly male teams that built them. Similar patterns appear throughout platform design. The features that get built are often features that solve problems the engineers themselves have. The experiences that go unconsidered are often the experiences of people unlike the engineers who built the platform.

The workforce also skews young. The median age at Facebook in 2022 was 29. The median age at Google was 30. The median age at Microsoft, which runs a substantially more mature and enterprise-focused business, was 38. A workforce median age of 29 means that many of the engineers making consequential design decisions are doing so without the benefit of extensive life experience — with families, with illness, with the specific vulnerabilities that come with age — that might make certain harms more legible. A 27-year-old without children is making different default assumptions about the effects of social media on children than a 38-year-old whose 12-year-old has an Instagram account.

The Hiring Funnel and What It Selects For

The interview process at major tech companies has been refined over decades and is engineered to select for specific abilities: the capacity to solve algorithmic puzzles under time pressure, to reason about system architecture at scale, to write clean code quickly and correctly. The canonical format — the "whiteboard interview" in which candidates solve data structures and algorithms problems while observed — is effective at assessing a specific kind of technical competence.

What it does not systematically test for: ethical reasoning, design empathy at scale, or the ability to anticipate second-order social effects. A candidate who solves a graph traversal problem brilliantly but has never thought about how graph traversal algorithms shape information flows in social networks will pass the interview. A candidate who has deep insight into the social dynamics of online communities but struggles with dynamic programming problems will not.

This is not an accident. The interview process selects for skills that are useful for building systems that are fast, efficient, and scalable. Ethical reasoning is harder to test, harder to score, harder to compare across candidates, and less immediately relevant to the task of making a news feed algorithm run at the scale of two billion daily users. The result is a hiring filter that systematically underweights precisely the skills that would be most useful for anticipating harm.

Some companies have begun adding behavioral components to interviews that probe ethical reasoning. Whether these additions change outcomes — whether they produce different hiring decisions, or merely add a component that candidates learn to perform — is an empirical question that has not been carefully studied.


Platform Culture: Moving Fast and Breaking Things

The Ethos and Its Origins

Mark Zuckerberg's famous injunction — "Move fast and break things" — was never merely a productivity philosophy. It was a complete worldview, encoding a set of assumptions: that speed produces competitive advantage, that friction is the enemy of progress, that breaking things is acceptable because things can be fixed, that the unknown costs of caution exceed the known costs of speed.

This ethos was adaptive in Facebook's early years. The company was a scrappy challenger competing against MySpace and Friendster, the stakes were limited to college social networks, and the "things" that broke were minor UX inconveniences rather than public health infrastructure. Moving fast and breaking things in that context made competitive sense.

It became progressively less adaptive as the platform scaled to billions of users and as the "things" that broke included democratic institutions, public health discourse, and the mental health of adolescents. By the time Facebook's algorithms were amplifying ethnic violence in Myanmar in 2017, the cultural infrastructure that might have slowed down, checked, and corrected the system had been systematically weakened by a decade of treating speed as a primary virtue and caution as an obstacle.

The "move fast" culture did not disappear as platforms matured — it calcified into organizational DNA. Zuckerberg himself modified the formulation in 2014, shifting to "move fast with stable infrastructure" as the company went public and faced the realities of enterprise-scale systems. But the underlying orientation — toward speed, toward shipping, toward measuring success in releases per quarter — persisted. New employees absorbed it through onboarding, through management, through the vocabulary of product reviews. It was reproduced not through explicit instruction but through the daily repetition of cultural practices.

The contrast with industries that have developed more cautious cultures is instructive. Aviation developed its culture of procedural caution after decades of catastrophic failures; the phrase "tombstone technology" refers to safety improvements implemented only after fatal accidents made the cost of inaction visible. Pharmaceutical development built elaborate regulatory frameworks — clinical trial requirements, adverse event reporting, post-market surveillance — after tragedies like thalidomide demonstrated the costs of moving fast. Platform technology has not yet faced the equivalent forcing events that reshaped these industries, and its culture reflects the absence of those events.

OKRs: How Objectives Shape Design

Objectives and Key Results — the goal-setting framework developed at Intel by Andy Grove and popularized at Google by John Doerr, then adopted across the tech industry — shape design decisions more powerfully than any explicit policy. OKRs work by translating abstract organizational goals into measurable quarterly targets that determine individual performance reviews, team budgets, and career trajectories.

The specific OKRs that teams are given reveal organizational priorities with unusual clarity. In the mid-2010s, Facebook's growth teams were evaluated on metrics like Daily Active Users (DAU), time-on-platform, and return visit rate. Product teams were evaluated on engagement metrics: likes, comments, shares, reactions. Infrastructure teams were evaluated on uptime and latency. What was not in the OKRs: user wellbeing, emotional experience quality, accuracy of information encountered, or long-term retention as distinct from short-term engagement.

This matters because the OKR system creates a powerful alignment mechanism that is largely invisible precisely because it is so pervasive. Engineers and product managers are not stupid; they can read the signal clearly. The things that are measured are the things that matter for promotion, for team survival, for the company's quarterly earnings call. The things that are not measured — that exist only as qualitative concerns, as edge cases, as externalities — systematically receive less attention, less engineering time, and less budget.

The mechanism is compounded by the specificity of quarterly timelines. An OKR is evaluated every three months. Harms from social media design choices — effects on adolescent mental health, contribution to polarization, facilitation of addiction — operate on timescales of years or decades. A quarterly measurement framework will systematically undervalue long-term harms relative to short-term engagement gains, not because anyone decides to undervalue them, but because the temporal mismatch makes the long-term harms invisible in the measurement system.

Facebook's own internal research, revealed in the 2021 Facebook Papers leak, showed that the company's engineers and data scientists were aware by 2019 that engagement-maximizing algorithms were amplifying divisive and emotionally provocative content. The internal question was not primarily whether this was happening, but whether the cost to the company's public reputation of changing it exceeded the revenue cost of reducing engagement. That framing — in which user wellbeing is a variable in a reputation-management calculation — tells you everything about what the OKR culture had produced.

A/B Testing as Moral Distance-Creator

The A/B test is the primary epistemological tool of the platform engineer. Want to know whether a new notification design increases return visits? Run an A/B test: show version A to half a million users, version B to the other half, measure the outcome over 7 days, ship the variant that performs better. The discipline, elegance, and empirical rigor of this approach is genuinely admirable — it replaced opinion-based design decisions with evidence-based ones.

It also created a specific form of moral distance.

When a designer changes a button color, they are making a choice. When a product manager ships a feature, they are making a choice. These feel like choices — they are made in a first-person way, with the designer or manager as an explicit agent. When a data scientist runs an A/B test and the test reveals that notification version B increases 7-day return visits by 2.3%, and that finding is reported to a product review meeting, and the product review meeting approves shipping version B — who made the choice? The A/B test appears to have made the choice. The data appears to have made the choice.

This is an illusion, but it is a remarkably effective one. The A/B test can only measure what it is designed to measure. If it measures return visits and not sleep disruption, it reports on return visits. If it measures session length and not emotional state after the session, it reports on session length. The decision about what to measure is a human decision — a choice about what matters, made early and quietly, in the design of the experiment rather than in the dramatic moment of feature approval. But because the dramatic moment feels data-driven, the human judgment embedded in the experimental design is rendered invisible.

The researcher Emily Bender and colleagues, in their 2021 paper "On the Dangers of Stochastic Parrots," used the phrase "value-laden design decisions made to seem neutral" to describe a similar phenomenon in machine learning. The phrase applies equally well to the A/B test culture of platform engineering. The choice of what to measure encodes values. The metric optimization that follows appears value-neutral because it is quantitative. The values disappear into the design of the measurement instrument, and the measurement instrument appears to speak for itself.

This mechanism — what organizational psychologists call moral disengagement through procedural abstraction — is not unique to tech. Medical device manufacturers evaluate drugs on measurable outcomes while underpowering studies for rare side effects. Financial engineers design instruments whose full risk profile they do not model, then point to the mathematical rigor of their models when the risk profile produces failures. But the speed, scale, and cultural prestige of data-driven decision-making in platform technology makes this abstraction particularly potent.


Velocity Media: Dr. Aisha Johnson's First Week

Arrival

Dr. Aisha Johnson's badge photo was taken at 9:07 AM on a Tuesday in March. By 9:45, she had been walked through a campus map, handed a laptop pre-loaded with development tools she had never heard of, and deposited in an open-plan workspace that hummed with the low-grade intensity of concentrated ambition.

She had come to Velocity Media from four years in academic research — computational social science at a mid-tier state university, studying the relationship between platform design choices and adolescent self-perception. She had published five peer-reviewed papers. She had testified (briefly, in written form) before a state legislative committee on social media and youth mental health. She had a doctorate from Columbia. She had arrived with a clear sense of purpose: she wanted to do her research inside the system, where it might actually change something.

She had also, she quickly discovered, almost no standing in the organization she had just joined.

Her title was "Head of User Wellbeing Research," which sounded significant. The reporting structure was less so: she reported to the VP of Trust and Safety, who reported to the Chief Operating Officer, who reported to the CEO. The growth teams reported to the Chief Product Officer, who also reported to the CEO. In organizational structure terms, she was a lateral cousin to the people whose work she was ostensibly there to assess. She had no authority over product decisions, no seat in product reviews (she would have to request an invitation), and no budget of her own. She had a headcount allocation for two researchers, both positions currently unfilled.

Her desk was in the Trust and Safety wing, physically separated from the product teams by a glass-walled corridor and roughly the cultural equivalent of an ocean. The Trust and Safety wing processed content policy violations, responded to regulatory inquiries, and worked on spam and fraud detection. It was staffed by people who were good at reactive work — responding to problems that had already surfaced — rather than proactive design review. Aisha was, as far as she could tell, the first person at Velocity whose job was to ask, before a feature shipped, whether it might cause harm.

Meeting Marcus Webb

On her second day, she met Marcus Webb at a coffee machine on the third floor. Marcus was a senior engineer on the recommendations team — five years at Velocity, one of the primary architects of the content ranking system that determined which posts appeared in users' feeds and in what order. He was 31, cheerful and slightly distracted in the way of someone whose mind is always running a background process, and wearing a t-shirt that said "Ship It."

"You're the ethics person," he said. It was not unkind — just classificatory, the way you might say "you're the new hire" or "you're from marketing."

"User wellbeing research," she said.

"Right." He poured his coffee. "We've been waiting for someone to fill that role. There was a position open for about eight months."

She asked why it had taken so long to fill.

He thought about it. "Finding someone who could actually engage with the technical side, I think. A lot of ethics and policy people can't read a model output or parse an A/B test result. That creates problems when you're trying to have conversations with engineering teams." He paused. "Also, I think there was some debate about what the role actually does."

"What do you think it does?" she asked.

He smiled. It was a genuine smile, slightly rueful. "That's a great question for your first week." He picked up his coffee. "Come to the Wednesday product review. You'll learn a lot."

The Wednesday Product Review

The Wednesday product review was held in Conference Room C, which was large enough for twenty people and contained thirty. Aisha had requested an invitation from Marcus, who had forwarded the request to the product org, which had approved it without comment. She found a seat near the wall with a clear view of the presentation screen. Marcus was at the table; she recognized several other people from the organizational chart she had been studying.

The review covered six features in various stages of development. The format was consistent across all six: a product manager presented the feature and its current state, showed metrics from recent A/B tests, the team discussed the results, and a decision was made about whether to proceed to full rollout, continue testing, or shelve the feature.

The metrics presented, in every case, were: daily active users (DAU), session length, return visit rate within 24 hours, content interaction rate (likes, comments, shares, reactions), notification open rate, and ad impression count. One presentation also included revenue per user. Every feature was evaluated against these metrics. A feature that moved any of these metrics positively, without moving others negatively, was a candidate for shipment. A feature that moved them all positively was approved quickly, with brief discussion.

What was not presented: user-reported emotional state, user-reported satisfaction, screen time patterns, age-segmented behavioral data, or any measure of whether users felt their time on the platform had been well spent. These things were not presented because they were not tracked for most features, and they were not tracked because they were not in anyone's OKRs.

Aisha watched and took notes. She did not speak for the first hour and forty minutes.

Then the fifth feature was presented: a new notification design from the growth team. The feature was a variation on the "red dot" indicator that appeared on the app icon when unread content was available. The current design showed a static red dot with a number. The new design added a subtle animation: the dot pulsed gently, expanding and contracting over a 2-second cycle, drawing the eye more effectively.

The product manager presenting the feature was perhaps 28, confident with her data, clearly good at her job. She showed the A/B test results: the animated notification increased 1-hour return visits by 3.1% and 24-hour return visits by 1.8%. Both findings were statistically significant, with p-values well below 0.05. The sample size was approximately 800,000 users. Confidence intervals were tight.

The team approved it for full rollout. The decision took about four minutes.

Aisha raised her hand. Several people looked at her with the mild surprise of people noticing, for the first time, that a new person is in the room.

"I have a question," she said. "Do we have any data on when users are receiving these notifications? Specifically, whether the animated notification is more effective during evening and nighttime hours?"

The product manager looked at her with the professional attention of someone encountering an unexpected question and deciding how to treat it. "We have time-of-day breakdowns, yes. Evening and night hours show higher notification engagement generally — users are less busy, more likely to be on their phones."

"Does the animated notification differentially increase return visits during the hours typically associated with sleep? Say, 10 PM to 6 AM?"

A brief pause. The Chief Product Officer, who had not spoken to Aisha before, looked at her with an expression that was not hostile but was very much the expression of someone assessing whether a new element in the environment is likely to cause problems or not.

"We haven't broken that out specifically," the product manager said. "The overall analysis was across all hours."

"Would it be possible to get that data before rollout?"

"We can add that to the analysis," the product manager said, with the careful neutrality of someone making a small concession. "But the overall metrics are positive."

"Right," said Aisha. "I'm just wondering whether 'positive overall' might be masking 'disrupts sleep' specifically. A 3.1% increase in 1-hour returns looks different if most of that gain is coming from 11 PM notifications. Sleep disruption has downstream effects on the wellbeing outcomes I'm here to track."

The room had gone slightly quiet. Not hostile-quiet — more like attention-quiet. Marcus was watching her with an expression she could not quite read.

The VP of Trust and Safety — her boss — was nodding slowly. The Chief Product Officer said, "Good point. Run the time-of-day breakdown before full rollout." He looked at the product manager, who nodded. Decision made. Moving on.

The sixth feature was presented. The meeting continued.

Walking out, Marcus fell into step beside her.

"Good question," he said.

"Will they actually run the breakout?"

He was quiet for a moment. "They'll run it. Whether it changes anything..." He trailed off. "The metrics are positive. If the time-of-day breakdown shows that the nighttime effect is driving the overall result, there will be a conversation about whether that's a problem. But the default frame is: metrics are positive, ship the feature."

"What would change that default?"

He considered. "Regulatory risk. Press risk. Or someone with authority who decides it matters." He looked at her sideways. "That last one is the hard one."

She understood him precisely. The metrics were positive. The question she had raised was about a metric that was not being measured — and in the world of OKRs and product reviews, what is not being measured has a specific gravity: it approaches zero. Her question had been heard, which was something. Whether it changed anything would depend on forces that her question alone could not determine.

What She Learned

In her first week, Aisha Johnson learned several things that were not in her job description or her offer letter.

She learned that Velocity Media's product teams had no formal process for evaluating features against wellbeing criteria before rollout. There were legal reviews (for regulatory compliance), security reviews (for data safety), and a Trust and Safety review (for content policy violations). There was no wellbeing review. Her predecessor in the role — a researcher who had left after eighteen months — had proposed creating one. She found the proposal in a shared drive folder. It had been acknowledged, discussed at one meeting, and not implemented.

She learned that her predecessor had produced three research reports in eighteen months on specific wellbeing topics: screen time patterns among teenage users, social comparison dynamics in photo-sharing features, and the relationship between notification frequency and self-reported anxiety. She found copies of all three. All had been circulated to relevant teams with a "please review and consider" email. She found no evidence that any of them had changed a product decision.

She learned that the phrase "user wellbeing" appeared frequently in company communications — in All Hands presentations, in press releases, in the excerpts from congressional testimony that were posted to the company intranet alongside the CEO's profile photo. It appeared rarely in product review meetings.

She learned that Marcus Webb was one of roughly a dozen engineers on the recommendations and growth teams who had, over the past two years, raised internal concerns about specific algorithmic behaviors. She found a Slack channel — #product-ethics — that contained their conversations. It had 47 members in a company of 4,000. Its most recent discussion was about the animated notification feature she had just raised in the product review. Three engineers had flagged the sleep disruption concern in the channel three days ago. Their conversation had not made it into the product review.

She was not entirely surprised by any of this. She had read the research. She had read the whistleblower accounts. She had known, in the abstract, what she was walking into. But there is a difference between knowing something abstractly and sitting in a conference room where it is happening, and watching the machinery operate in real time.

She filed the animated notification concern in a memo and sent it to her VP, with a request to be included in the time-of-day data analysis before rollout. Then she opened a new document and started drafting a proposal for a formal wellbeing review process — something that would give user wellbeing criteria the same procedural weight as legal and security review, something that would require product teams to evaluate features against wellbeing metrics before approval rather than after.

She did not know whether the proposal would go anywhere. But she understood, by the end of her first week, that the answer to that question would tell her almost everything she needed to know about whether internal ethics work at Velocity Media was real or decorative.


The Whistleblower Record: What Insiders Have Revealed

Frances Haugen and the Facebook Papers

On October 3, 2021, Frances Haugen sat before the U.S. Senate Commerce Subcommittee on Consumer Protection and testified about what she had seen during her two years as a product manager on Facebook's civic integrity team. She had come to Congress after spending months systematically downloading and preserving thousands of internal Facebook documents — research reports, executive memos, product review slides, internal survey data — which she provided to regulators, lawmakers, and a consortium of news organizations under the protection of the SEC's whistleblower program.

Haugen was 37 years old, with a computer science degree from Olin College of Engineering and an MBA from Harvard Business School. She had worked at Google, Yelp, Pinterest, and Gigster before joining Facebook in 2019. She had specifically requested assignment to the civic integrity team because she had a personal history with radicalization — a close friend had been drawn into conspiracy theories through online communities — and she believed working on political misinformation was where she could do the most good.

What she found, and what the Facebook Papers revealed in extraordinary detail, was that Facebook had conducted extensive internal research on the harms its platform produced — and had, in multiple documented cases, chosen not to act on that research when acting would have reduced engagement metrics.

The specific findings were damning in their specificity. Internal research from 2019 showed that Facebook's recommendation systems — specifically, the tools that suggested Groups for users to join — were funneling users toward increasingly extreme content. The research team described this as a "rabbit hole" effect: users who joined one political group were systematically recommended progressively more extreme groups. Facebook's data scientists had proposed changes that would reduce this effect. The changes were not implemented because they were projected to reduce engagement.

Internal Instagram research, dating from 2019 to 2021, showed that the platform's design was associated with negative mental health outcomes for a significant proportion of teenage girls. One internal presentation, widely reported after the Papers were published, stated: "We make body image issues worse for one in three teen girls." The research found that 32% of teenage girls who said they already felt bad about their bodies reported that Instagram made them feel worse. Facebook had not published this research. It had not modified the platform design in response to it.

Perhaps most disturbing was the documented evidence regarding vaccine misinformation and ethnic violence. Internal reports showed that Facebook's systems were amplifying vaccine misinformation even as the company made public commitments to address it. Documents showed that Facebook's algorithms were contributing to ethnic violence in Ethiopia, India, and elsewhere — a pattern the company had been warned about by researchers and NGOs — and that resource constraints and organizational prioritization meant these warnings had not been adequately acted upon.

Facebook's response to the Papers combined three strategies: disputing the framing of specific findings, pointing to actions the company had taken since the documents were created, and attacking Haugen's credibility and motives. Mark Zuckerberg, in an October 2021 statement, said the picture painted by the documents was "a false image of the company." He noted that the company had invested heavily in safety and integrity work. He did not dispute the specific findings documented in the internal research.

The congressional testimony produced significant public understanding of the gap between platform public commitments and internal knowledge. In the United States, it did not produce major regulatory legislation — the hearings were held, senators expressed outrage across party lines, and the platforms were rebranded (Facebook became Meta) while the algorithms continued to operate. European regulators moved more aggressively, using the Digital Services Act to impose new accountability requirements. But the foundational business model — advertising revenue dependent on engagement maximization — remained intact.

Sophie Zhang and the Coordinated Inauthentic Behavior Networks

Frances Haugen was not the first Facebook insider to raise concerns publicly. Sophie Zhang, a data scientist who worked on Facebook's integrity team from 2018 to 2020, published an internal memo in September 2020 — shortly before her termination — documenting her findings about fake engagement networks operating on the platform.

Zhang's work focused not on algorithmic recommendation but on coordinated inauthentic behavior: networks of fake accounts and pages, often operated by political actors, that used Facebook's engagement systems to artificially amplify political content in countries including Bolivia, Brazil, Ecuador, Honduras, India, and Azerbaijan. Zhang had spent months identifying and reporting these networks, often working alone outside her official job duties. In many cases, she found that her reports sat in queues or were deprioritized because the affected countries were not high-priority markets for Facebook's business.

Her memo, titled "I Have Blood on My Hands," was published by BuzzFeed News and subsequently reported around the world. The title was deliberately provocative and deliberately accurate: Zhang was documenting not just organizational failure but her own sense of personal complicity. "I'm not blameless," she wrote. "I've made mistakes... I've knowingly made decisions that led to harm."

This combination — insider critique with explicit self-implication — made Zhang's account unusually credible and unusually instructive. She was not positioning herself as a hero who had tried to stop the company; she was positioning herself as an employee who had tried to do the right thing within the constraints of the organization, had sometimes succeeded and sometimes failed, and was finally speaking publicly because the constraints had become untenable.

The Broader Pattern

Haugen and Zhang are the most prominent names in a longer record of internal dissent at platform companies. In 2020, a group of Twitter engineers wrote an internal letter raising concerns about the company's ad targeting practices and their potential for discriminatory outcomes. In 2018, Google employees organized to protest Project Maven, a Department of Defense contract to provide AI capabilities for drone targeting analysis; the company subsequently declined to renew the contract, a rare case of internal advocacy producing policy change. In 2021, Amazon employees published an open letter raising concerns about the company's facial recognition product, Rekognition, being sold to law enforcement.

What the pattern reveals is not that platform companies are staffed by uniquely unethical people. It reveals that the gap between what engineers know and what organizations act on is systematic and predictable. The internal research disclosed by Haugen was not secret from the engineers who produced it. It was, in a specific sense, known — known to the researchers, to their managers, to some product leaders. What it lacked was organizational consequence: a mechanism by which knowledge of harm translated into change of practice.

This is the key observation. The problem at Facebook, as documented in the Papers, was not that the company lacked information about harms. It had the information. The problem was that it lacked the organizational structure to translate that information into action when acting would reduce engagement and, therefore, revenue.


What Individual Engineers Can Do

The Documented Cases

The history of platform technology contains genuine cases of individual engineers and designers successfully pushing back against decisions that would have caused harm. These cases are worth examining carefully — not because they are typical, but because they establish that individual action within organizations can sometimes matter.

The Google Project Maven case is the clearest large-scale example. In April 2018, a letter signed by more than 3,000 Google employees called on the company to cancel its contract with the Department of Defense to develop AI for drone targeting. Approximately a dozen employees resigned in protest. Google leadership initially defended the contract; it subsequently declined to renew it when it expired in 2019. The company also developed and published a set of AI principles — principles developed partly in response to the internal pressure — that explicitly prohibited developing AI for weaponized surveillance or weapons systems. A subsequent contract dispute with the Pentagon, in which Google declined to bid on a major cloud computing contract in part due to ethical concerns, suggested that the organizational change was at least partly real.

The Project Maven case is instructive because it shows what conditions enabled internal advocacy to produce change: a large number of employees willing to act publicly, including resignations; significant external media attention; and a company whose business model depended on its reputation for attracting elite technical talent. Engineers at Google had leverage because Google needed them. The threat of departure by talented engineers was a credible business risk, and the company calculated that the Maven contract was not worth that risk.

At the level of individual product decisions, there are documented cases of engineers declining to implement specific features, recommending redesigns that reduced dark patterns, and raising concerns in design review meetings that resulted in feature modifications. These cases are difficult to track systematically because they occur in internal settings and rarely produce public records. But qualitative research on platform engineers — including Evan Selinger and Brett Frischmann's 2018 book Re-Engineering Humanity and subsequent ethnographic work — suggests that individual dissent is not rare and that it sometimes changes specific decisions.

The field of value-sensitive design, developed by Batya Friedman and colleagues at the University of Washington, offers a formal methodology for incorporating human values into the engineering design process. Engineers trained in this approach have frameworks and vocabulary for raising ethical concerns in technical settings — for asking "what values does this design embed?" in a way that can be heard by colleagues trained in quantitative reasoning. Some platform companies have adopted elements of this framework, and some individual engineers have used it effectively as a lever for raising concerns that might otherwise be dismissed as subjective or unquantifiable.

The Structural Limits of Individual Action

But the honest accounting of what individual engineers can do must also account for what they cannot do.

An individual engineer can decline to implement a specific feature. The feature will likely be implemented by someone else. An individual engineer can raise concerns in a product review meeting. The review meeting will proceed on the basis of the metrics it was designed to track. An individual engineer can write a concerned internal memo. The memo will be filed in a shared drive folder where it will sit alongside the previous concerned memos.

This is not cynicism — it is structural analysis. The forces that shape platform design decisions at the macro level — OKR systems, quarterly earnings pressure, competitive dynamics between platforms, the foundational decision to build advertising-based business models that monetize user attention — are not changed by individual engineers exercising individual conscience. They require organizational change, which requires either internal power (held by executives, not individual contributors) or external pressure (from regulators, legislators, or markets).

The "heroic engineer" narrative — the idea that the right individual, with enough courage and skill, can change a harmful system from within — is emotionally satisfying but empirically weak as a general account. For every Project Maven, there are hundreds of cases where internal dissent produced no change. For every engineer who successfully argued against a harmful feature, there are dozens of harmful features that shipped despite internal concerns. The heroic cases are remembered; the routine cases are not.

More importantly, the heroic engineer narrative places the burden of structural change on individuals who are least positioned to achieve it, while relieving the organizations, investors, and regulators who are most positioned to change incentive structures of their accountability. If the problem is that platform incentive structures produce harmful design decisions, and the solution is "hire more ethical engineers," then the organizations that profit from the harmful design decisions bear no responsibility to change those structures. That framing serves the interests of the organizations at the expense of the people affected by the systems they build.


The Ethics Hire Problem

Why Companies Hire Ethicists

Major tech companies have, since roughly 2016, substantially increased their hiring of people with explicit ethics, safety, or responsible AI mandates. Google created a Responsible AI team. Facebook hired hundreds of Trust and Safety employees and created a Responsible Innovation team. Twitter built a Safety organization. Microsoft developed an AI ethics framework. Amazon created an AI fairness team.

The reasons for these hires are multiple. Some platform leaders genuinely believe that ethics expertise will improve their products. Some are responding to genuine internal pressure from employees who want their companies to do better. Many are responding to regulatory pressure and the reputational risk of recurring public scandals. Some are responding to research that suggests certain kinds of harms create legal liability. The motivations are mixed, and the mixed motivations produce mixed results.

But the organizational position of ethics hires — where they sit in the hierarchy, what authority they hold, whether they have the power to block or modify product decisions — tells you more about a company's actual commitments than the hiring decision itself does. An ethicist who produces reports submitted to external audiences is a different organizational actor than an ethicist who can veto a product launch. Most platform ethics hires are much closer to the former.

The analyst Frank Pasquale, in his 2020 book The New Laws of Robotics, argued that ethics hires at tech companies often function as "ethics washing" — the appearance of ethical commitment without the structural implementation of it. This is perhaps too cynical as a universal account, but it is accurate as a description of the modal case. A company that hires an ethicist, places her below the product organization in the hierarchy, gives her no veto authority, no budget authority, and no independent reporting lines, and then fires her when her research produces findings that challenge commercial interests — that company has not made an ethical commitment. It has made a reputation management decision.

Timnit Gebru and the Structural Limits of Internal Dissent

The clearest illustration of the ethics hire problem is the case of Timnit Gebru at Google.

Gebru joined Google in 2018 as a research scientist and co-lead of the Ethical AI team. She was, by any measure, a prominent researcher: she had co-authored the Diversity in Faces dataset study and created the Gender Shades project, which exposed dramatic racial and gender disparities in facial analysis algorithms from major commercial vendors — including Google. She had received the MIT Technology Review's 35 Innovators Under 35 award. She came to Google with a clear mandate and, by her account, with genuine excitement about the opportunity to do ethics research where it might actually influence product development.

In December 2020, Gebru co-authored a research paper with Margaret Mitchell (another Google AI ethics researcher) and several colleagues, titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" The paper made several arguments about large language models — the kind underlying Google's search and NLP products — including: the enormous environmental costs of training them, the ways they encode and amplify biases present in training data, and the risk that their outputs, despite appearing fluent and authoritative, could generate harmful or misleading content at scale.

Google leadership asked Gebru to either substantially revise the paper or remove her name from it before publication. The specific objections were framed as procedural — concerns about the paper's review process — but the paper's subject matter was clearly central to the dispute: the paper's arguments, if accepted, had implications for Google's core research programs and product roadmap. Gebru refused to withdraw or revise on the timeline demanded, and in December 2020 she was fired — in an email sent while she was traveling, which terminated her employment immediately without the normal transitional notice.

The firing was presented by Google management as the result of a dispute about internal review procedures. Gebru and her colleagues described it differently: as the termination of a researcher whose work had challenged the company's commercial interests. The specific mechanism was procedural, but the structural dynamic was clear.

What followed was illuminating. Margaret Mitchell, who had co-authored the paper and publicly supported Gebru, was fired in February 2021, after Google conducted a search of her emails looking for policy violations. The Ethical AI team was restructured. Multiple other AI ethics researchers left Google in the subsequent months. The research agenda of the team shifted. Google published its AI principles, which included explicit commitments to responsible AI development — principles developed partly in response to internal pressure and external criticism.

The Gebru case illustrates what happens when ethics research genuinely threatens commercial interests. The organization's tolerance for ethics work is revealed. At Google, that tolerance extended to research that produced findings useful for reputation management, useful for attracting talent, or not threatening to core revenue streams. It did not extend to research that challenged core product development directions. When the collision occurred, the resolution was not "adjust the product development direction" but "remove the researcher."

This pattern — ethics work tolerated when decorative, suppressed when consequential — is not unique to Google. Variations of it are documented at Facebook, Twitter, and other major platforms. The Gebru case is notable for its visibility and for the unusual solidarity it produced in the AI ethics research community, which made it a public event rather than an internal one. But the underlying dynamic is common.


Organizational Culture and Moral Disengagement

Diffusion of Responsibility

Social psychologist Albert Bandura's concept of moral disengagement describes the psychological mechanisms through which people who consider themselves ethical come to participate in harmful actions without experiencing the cognitive dissonance that would normally result. Several of these mechanisms are endemic to large tech organizations.

Diffusion of responsibility operates when the number of people involved in a decision makes it possible for each individual to feel they bear only a fraction of the moral weight. In a product launch involving dozens of engineers, data scientists, product managers, legal reviewers, and executives, each individual can reasonably tell themselves they were responsible only for their piece — the code they wrote, the A/B test they designed, the meeting they attended, the approval they gave. No individual feels responsible for the whole, and so the whole lacks a moral owner.

This mechanism is intensified by the specific structure of platform product development. The distance between the engineer who writes the notification code and the 14-year-old who is awakened by a notification at 2 AM is vast: organizational, geographic, demographic, cultural. The engineer cannot see the teenager. The teenager does not know the engineer exists. The harm is real but abstract; the action that produces it is distant and mediated by layers of code, infrastructure, and organizational hierarchy.

Bandura's research on diffusion of responsibility found that it was not merely a rationalization applied after the fact — it was a genuine psychological experience that reduced the activation of moral self-regulatory processes before and during the harmful action. Engineers who feel that they bear only a small fraction of responsibility for a product decision are not lying to themselves; they are accurately describing the organizational structure. But the psychological effect — reduced moral activation — is the same whether the diffusion is real or constructed.

Euphemistic Labeling and the Vocabulary of Platform Culture

Platform culture has developed a rich vocabulary of euphemism that does cognitive work: it shapes what questions are asked, what alternatives are considered, and what feelings are appropriate about one's work.

Features designed to create compulsive checking behavior are described as "engagement optimization." Algorithms that amplify outrage and divisive content are described as "relevance ranking" or "quality content" systems. Users who exhibit behavioral patterns consistent with addiction are described as "highly engaged users" or "power users." Notifications engineered to interrupt sleep are described as "re-engagement campaigns." The design of variable reward schedules, modeled explicitly on slot machine mechanics, is described as "personalizing the user experience."

This language is not merely cosmetic. Cognitive linguistics research — including work by George Lakoff and others on conceptual metaphor — demonstrates that the language we use to describe an activity shapes how we reason about it. An engineer who thinks of their work as "increasing user engagement" is less likely to ask whether the engagement being increased is harmful than an engineer who thinks of their work as "designing systems that will affect billions of people's daily psychological states." The euphemism is not just a PR strategy; it is a cognitive frame that shapes what questions feel relevant.

Advantageous Comparison

A third mechanism from Bandura: the practice of evaluating one's actions by comparison to a worse alternative, rather than against an absolute ethical standard. Platform engineers who feel uncomfortable about specific features often comfort themselves — and this comfort is not entirely illegitimate — by noting that they are working to make the platform better, that without engineers who care the platform would be worse, that the alternative to internal reform is not regulation but a worse version of the same system built by people who care less.

This reasoning is not wholly wrong. Internal reform sometimes works. The presence of engineers who care does make some things better than they would otherwise be. But "better than the worst case" is a low bar, and the logic of advantageous comparison can sustain continued participation in systems that produce substantial harm, indefinitely, as long as one believes that one's presence reduces the harm by some increment.

The psychologist Philip Zimbardo, in his analysis of how ordinary people come to participate in harmful systems (developed partly through his work on the Stanford Prison Experiment and its aftermath), described this as "the Lucifer Effect": the gradual normalization of harmful actions through small steps, each of which seems justifiable by reference to the previous step. The tech industry is not the Stanford Prison Experiment. But the psychological mechanisms that Zimbardo identified — step-by-step rationalization, in-group loyalty, diffusion of responsibility, and the seductive logic of "I'm making things better from the inside" — operate in tech organizations as they do in other institutional settings.

Metric Normalization

Perhaps the most powerful mechanism of moral disengagement in platform culture is the simple normalization of metric-based reasoning. When "did this feature move the engagement metric?" becomes the default evaluative question — asked in every product review, tracked in every performance assessment, reported in every executive dashboard — it becomes increasingly difficult to ask "did this feature make people's lives better?"

The first question has a clean, quantitative answer. The second question is messy, contested, requires longitudinal data, and often lacks consensus on methodology. Over time, the first question crowds out the second, not through explicit decision-making but through the gradual displacement of harder questions by easier ones. This is what Shoshana Zuboff, in her 2019 book The Age of Surveillance Capitalism, called the "behavioral surplus" logic: once you have reduced people to behavioral data, the data becomes the reality, and the people become secondary.


Conclusion: Both/And, Not Either/Or

The argument of this chapter is not that individual engineers are blameless because structural forces are powerful. It is that both individual responsibility and structural change are necessary, and that focusing exclusively on either one produces an incomplete — and ultimately ineffective — response to the problems examined in this book.

Individual engineers and designers make choices, every day, that have real consequences. The engineer who raises a concern in a product review meeting, who refuses to implement a feature she believes will cause harm, who writes a concerned memo and sends it up the chain, who starts a Slack channel where colleagues with similar concerns can find each other — these actions matter. They matter morally, because they constitute integrity. They sometimes matter practically, because they can shift specific decisions. And they matter cumulatively, because a culture in which some engineers push back is different from a culture in which no one does.

But individual integrity operating within an unreformed incentive structure does not change the structure. Frances Haugen's disclosure of the Facebook Papers produced remarkable public understanding of the gap between platform public commitments and internal practices. It produced some regulatory attention in Europe and some legislative momentum in the United States. It did not produce fundamental change in how Facebook's algorithms operate, because the algorithms are a product of business model logic and competitive dynamics that congressional testimony alone cannot change.

Dr. Aisha Johnson, in her first week at Velocity Media, asked a good question about notification timing and sleep disruption. Her question was heard — noted, acknowledged, treated as legitimate. A data breakdown will be run. Whether the data breakdown changes the feature decision will depend on organizational factors: whether there is anyone with authority who makes sleep disruption a criterion for product approval, whether the OKR system makes wellbeing a factor in performance assessment, whether the product review process has a formal mechanism for wellbeing concerns to create procedural delays. None of those things currently exist at Velocity.

Aisha knows this. She is drafting a proposal to change it. Whether the proposal succeeds will depend on factors beyond her control: how much organizational support the VP of Trust and Safety can muster, whether the CEO sees structural wellbeing review as a business interest or a business cost, whether external regulatory pressure creates incentives for internal change. She is one person with a mandate and no power. She is also, potentially, the beginning of a process that changes how Velocity Media makes decisions.

The gap between intent and effect is real, and understanding it clearly is necessary for closing it. It will not be closed by assuming engineers are villains. It will not be closed by assuming engineers are heroes. It will be closed by the kind of work that Aisha Johnson is beginning — unglamorous, structural, patient work to change the processes, metrics, and accountability mechanisms that shape what platform systems actually do — combined with the external pressure that has historically been most reliable in motivating structural change: regulatory accountability, legal liability, and the organized pressure of an informed public.

The people who build these systems are human beings making human choices inside human organizations. They are subject to the same psychological mechanisms, the same institutional pressures, and the same moral complexity as human beings in any other institutional setting. Understanding them clearly — their motivations, their constraints, their failures and their genuine possibilities — is not a distraction from the project of changing what they build. It is the beginning of it.


Chapter Summary

This chapter examined the human dimension of engagement-maximizing platform design: who builds these systems, under what pressures, within what organizational cultures, and with what degrees of freedom and constraint.

Key themes: the demographic and educational homogeneity of the tech workforce and its design implications; the role of OKR culture in systematically shaping design toward measurable engagement over unmeasured wellbeing; the moral distancing function of A/B testing as procedural abstraction; the whistleblower record (Haugen, Zhang, and others) and what it reveals about the gap between internal knowledge and organizational action; the structural limitations of ethics hires as illustrated by the Timnit Gebru case; and the mechanisms of moral disengagement — diffusion of responsibility, euphemistic labeling, advantageous comparison, metric normalization — that allow ethical people to participate in harmful systems without experiencing it as such.

The Velocity Media narrative introduced Dr. Aisha Johnson as the book's recurring inside-perspective case study: an ethics professional navigating the gap between her mandate and her organizational power, learning what it means to try to do ethics work inside a system that was not designed to support it.

The chapter concluded with the both/and argument: individual engineers genuinely matter AND structural incentive change is genuinely necessary. The failure to hold both simultaneously — collapsing into either "engineers are villains" or "engineers are helpless cogs" — is one of the primary obstacles to meaningful change.


Next: Chapter 7 — The Attention Economy: Business Models That Monetize Your Mind