Case Study 02: Social Media Algorithms and the Transparency Paradox
Introduction
In the fall of 2021, a former Facebook product manager named Frances Haugen walked out of the company's offices carrying thousands of pages of internal research documents. She shared them with the Wall Street Journal, US Senate investigators, and regulators in several countries. The documents — which became known as the "Facebook Files" — included internal research reports, strategy memos, and engineering analyses. What they revealed was a significant and troubling gap between what Facebook said about its algorithm publicly and what the company knew about its algorithm internally.
Among the most striking findings: Facebook's own researchers had documented, with considerable specificity, that the company's engagement-optimization algorithm had a systematic tendency to amplify divisive, emotionally provocative content — and that the company had considered and largely rejected interventions that would have reduced divisiveness at the cost of reducing engagement. The algorithm was designed to maximize the time users spent on the platform and the intensity of their interactions. It had learned, from vast behavioral data, that outrage and conflict accomplished that goal effectively.
This case study examines the opacity of social media recommendation algorithms, uses the Facebook Files as a window into what that opacity conceals, and considers what meaningful platform transparency would require.
1. How Social Media Recommendation Algorithms Work
Social media platforms face a version of the same challenge every day: they have more content than users can possibly consume. At any given moment, a Facebook user's potential feed contains content from hundreds of friends, dozens of Pages they follow, and a vast pool of potential advertising. A Twitter user might follow thousands of accounts. An Instagram user might have thousands of connections. The platform must choose, for each user in each session, which content to surface — and in what order.
The solution is a recommendation system: an algorithmic model that predicts, for each piece of content and each user, some measure of likely engagement, and then ranks the content accordingly. The content predicted to generate the most engagement appears at the top of the feed; content predicted to generate less engagement is buried or omitted entirely.
The engagement signals these systems use vary by platform and evolve constantly, but typically include: how long a user pauses on a post (dwell time); whether the user likes, comments, shares, or reacts; whether the user clicks through to linked content; and how quickly the user scrolls past. Platforms weight these signals differently — Facebook has historically given heavier weight to comments and shares than to simple likes, on the theory that active engagement signals stronger interest than passive approval.
Modern recommendation systems are highly sophisticated multi-stage pipelines. A retrieval stage rapidly identifies a large candidate set of potentially relevant content from the full corpus. A ranking stage then applies more complex models to score and order the candidates. The ranking models may be deep neural networks trained on billions of behavioral data points. They are optimized end-to-end to maximize the chosen engagement metric on held-out validation data.
These systems work in the sense that they are extraordinarily effective at maximizing the metric they are optimized for. They are also highly opaque: the specific weights, features, and interaction patterns that determine why one post is shown before another are not accessible to users, researchers, or regulators. The system's logic is distributed across a massive neural network, updated continuously, and never written down in any form that a human being can read.
2. What Facebook's Internal Research Showed: Engagement Optimization and Divisiveness
The Facebook Files contained internal research documents that shed remarkable light on what the company knew, and when, about the consequences of its engagement-optimization algorithm.
A core finding from Facebook's internal research was that the algorithm had a measurable tendency to promote "content that makes people angry or outraged." Internal researchers documented that content evoking strong negative emotions — fear, anger, disgust — reliably generated more comments, reactions, and shares than content evoking positive emotions or neutral information. Because the algorithm was optimized for engagement, it systematically elevated this emotionally provocative content.
Researchers inside Facebook ran experiments testing whether changing the ranking algorithm could reduce divisiveness and misinformation without proportionally reducing engagement. Some of these experiments showed that meaningful reductions in divisiveness and misinformation were achievable — though often at some cost to engagement metrics. Internal documents show that these findings were presented to senior leadership, including Mark Zuckerberg, and that decisions were made to prioritize engagement over reducing divisiveness in most cases.
A 2019 internal presentation explicitly stated that Facebook's engagement optimization had the side effect of "making angry content more prominent." Another internal document quoted a researcher's conclusion that the algorithm was "one of the main drivers of divisiveness and extremism on [Facebook]."
These were not conclusions reached by outside critics or journalists working with limited information. They were conclusions reached by Facebook's own researchers, working with full access to the company's systems and data, and presented to senior leadership. Yet they did not become the basis for fundamental changes to the algorithm's objective function.
3. The Opacity Problem: What Regulators, Researchers, and Users Could Not See
The significance of the Facebook Files is not only in what they revealed about Facebook's algorithm. It is in what the opacity of that algorithm had prevented anyone outside the company from knowing.
For years before the Haugen disclosure, external researchers, journalists, and public interest advocates had argued that Facebook's engagement-optimization algorithm was amplifying divisive, false, and harmful content. These arguments were largely based on observation of platform behavior — the kinds of content that went viral, the patterns of spread in the 2016 election, the documented role of Facebook in the Rohingya genocide in Myanmar — combined with theoretical reasoning about the incentive structures of engagement optimization.
Facebook consistently denied or minimized these characterizations. Company representatives testified before Congress that the company took these concerns seriously, that it had extensive systems for identifying and removing harmful content, and that it was constantly working to improve the quality of content on the platform. These statements were technically accurate in narrow senses while being profoundly misleading about what the company's own research showed.
The opacity of the algorithm was central to this dynamic. Without access to the internal research and the model's actual behavior, external researchers could observe symptoms but could not demonstrate their cause. They could show that misinformation spread widely on Facebook; they could not prove that the algorithm was designed in a way that systematically promoted it. They could observe that political content was more emotionally charged on Facebook than on other platforms; they could not prove that the algorithm was the cause. The opacity of the algorithm gave Facebook the ability to assert, with apparent plausibility, that these observed patterns were not the company's doing.
4. Facebook's Public Statements vs. Internal Reality: The Ethics Washing Pattern
The contrast between Facebook's public statements about its algorithm and its internal findings represents a clear instance of what this textbook calls "ethics washing" — the practice of making public commitments to ethical principles while internal practice does not honor those commitments.
Facebook's public communications about its algorithm consistently emphasized the company's commitment to meaningful social connection, to reducing misinformation, to connecting people with content they care about, and to building communities. When critics raised concerns about divisiveness, the company pointed to its investments in Trust and Safety infrastructure, its partnerships with third-party fact-checkers, and its stated values.
The internal documents tell a more complicated story. They show a company that had extensive knowledge of the algorithm's propensity to amplify divisiveness, had conducted research establishing that this propensity could be reduced, and had consistently chosen to prioritize engagement metrics over reducing divisiveness. The external narrative of an ethical technology company diligently addressing content quality problems was maintained in the face of internal evidence that the fundamental design of the algorithmic system was working against that goal.
This is not unique to Facebook. It reflects a structural tension common to commercially operated algorithmic platforms: the algorithmic objective (maximize engagement) is determined by commercial incentives (advertising revenue is proportional to user time spent), while the public narrative emphasizes social value. When internal research reveals that the algorithmic objective produces harmful social effects, the organization faces a choice between changing the objective — potentially sacrificing revenue — or maintaining the objective while sustaining a public narrative that obscures the conflict.
The Haugen disclosure made this structural tension visible in a uniquely detailed and documented way. But the structural tension itself is endemic to commercially operated algorithmic platforms.
5. The Haugen Documents: What Disclosure Changed, and What It Didn't
The Frances Haugen disclosure was significant in several ways. It confirmed, with internal documentation, claims that critics had been making for years. It catalyzed congressional hearings that posed more specific and informed questions to Facebook's leadership than had previously been possible. It contributed to the passage of the EU Digital Services Act, which imposed new transparency and algorithmic accountability requirements on large platforms. And it shifted the terms of public debate about social media: from "do these platforms cause harm?" to "how much harm do they cause, and what do they know about it?"
What the disclosure did not do is force fundamental changes to the algorithmic system at issue. Facebook (now Meta) made various adjustments following the disclosure — changes to how political content was ranked, modifications to some viral sharing features — but has not published its core ranking algorithm, does not provide researchers with systematic access to its systems, and continues to operate a recommendation system fundamentally oriented around engagement optimization.
This illustrates an important point about disclosure: disclosure of past misconduct does not automatically produce structural reform. Changing an algorithmic system as central to a company's business model as Facebook's ranking algorithm requires sustained regulatory pressure, not merely public disclosure.
6. Platform Transparency Initiatives and Regulatory Responses
The opacity of social media algorithms has attracted regulatory attention in both the US and Europe, with notably different results.
The Algorithm Accountability Act. First introduced in the US Congress in 2019 and reintroduced in subsequent sessions, the Algorithm Accountability Act would require companies above a certain size to conduct impact assessments of high-risk automated decision systems and to report the results to the FTC. The bill has not been enacted, and its prospects have been uncertain in a Congress where technology regulation has been politically contested.
EU Digital Services Act (DSA). The DSA, which entered into force in 2022 and became broadly applicable to large platforms in 2024, represents the most significant regulatory response to algorithmic platform opacity enacted to date. For "Very Large Online Platforms" (VLOPs) — platforms with more than 45 million monthly active users in the EU — the DSA requires:
- Annual algorithmic risk assessments, examining systemic risks including amplification of illegal content, disinformation, and threats to fundamental rights.
- Independent audits of risk assessments and mitigation measures.
- Access for researchers to platform data, under a structured researcher access regime administered by the European Commission.
- Transparency reports on content moderation.
- The right for users to opt out of recommendation systems based on profiling.
- A prohibition on targeted advertising based on special categories of sensitive data or targeting minors.
The DSA's researcher access provisions are particularly significant. The EU has created a mechanism by which vetted academic researchers can apply to access platform data — including information about algorithmic systems — for legitimate public interest research. This is a meaningful structural intervention that, if implemented effectively, could substantially improve the quality of external research on algorithmic harms.
Platform transparency reports. Several large platforms have voluntarily published "transparency reports" covering content removal, account suspension, and government data requests. These reports have become more detailed over time, but they remain limited in their coverage of algorithmic recommendation systems. Platforms disclose aggregate content removal statistics but not the criteria or logic of their ranking systems.
7. The Researcher Access Problem
Even before the Haugen disclosure, academic researchers studying social media had identified the lack of data access as the fundamental obstacle to understanding algorithmic behavior. To conduct rigorous research on how a recommendation algorithm affects user behavior, political attitudes, news consumption, or mental health, researchers need access to data that platforms control: individual-level data on what users were shown, what they engaged with, and how their feeds were constructed.
Platforms have made very limited data available to researchers through structured programs. Facebook's Social Science One program, launched in 2018 after congressional pressure, promised to provide academic researchers with access to data about URLs shared publicly on the platform. The program faced significant implementation problems, provided less data than promised, and satisfied few of the researchers who participated.
The challenges of research without data access are illustrated by the studies that have been done. Researchers have used surveys, browser plugins, and other indirect methods to study social media effects — but these approaches are limited in scale, subject to selection biases, and cannot directly measure what the algorithm was showing users. The most rigorous studies have been conducted by researchers inside platforms — like the Facebook researchers whose findings appeared in the Haugen documents — who cannot independently publish without the platform's consent.
The DSA's researcher access mechanism addresses this problem at the regulatory level, at least for EU markets. Whether equivalent mechanisms will emerge in the US is uncertain. Some US states have considered data access legislation, and the FTC has taken an interest in the question, but no comparable federal mechanism exists.
8. What Meaningful Platform Transparency Would Look Like
Genuine transparency for social media recommendation algorithms would require interventions at multiple levels:
Algorithmic disclosure. Platforms should be required to disclose, at a level of specificity sufficient for meaningful external analysis, the signals and features their recommendation algorithms use, the relative weights assigned to different signals, and the metrics those algorithms are optimized to maximize. This need not require disclosure of proprietary implementation details — but it must go well beyond current transparency reports, which describe algorithmic systems only at a level of generality that says essentially nothing.
Objective function transparency. Perhaps the most important dimension of algorithmic transparency for social media is the objective function: what is the algorithm trying to maximize? Facebook's decision to optimize primarily for engagement rather than for user welfare, connection quality, or information accuracy is not a neutral technical decision — it is a values choice with large social consequences. That choice should be visible and accountable.
Researcher access infrastructure. Platforms should be required to provide structured access to data — including information about what content was recommended to whom and why — to vetted academic researchers. The EU DSA model provides a starting point; the US would benefit from a comparable framework.
User-facing explanations. Platforms should provide users with accessible explanations of why specific content appears in their feed — which signals drove the recommendation. Several platforms have made partial steps in this direction, providing "why am I seeing this?" explanations for advertisements and some organic content. These explanations are currently insufficient, but the infrastructure exists to make them more informative.
Independent auditing. Platforms should be subject to mandatory, independent algorithmic audits — not just audits of content removal processes, but audits of the recommendation algorithm's effects on user well-being, information quality, and democratic health. These audits should be conducted by qualified independent parties with appropriate data access, and their findings should be public.
Real-time monitoring. During high-stakes periods — elections, public health emergencies, major social crises — platforms should be subject to heightened transparency obligations, with real-time or near-real-time reporting to regulators about how their algorithmic systems are functioning.
The commercial resistance to these measures is predictable and substantial. Platforms argue that transparency would enable bad actors to game their systems, that disclosure of engagement data would undermine user privacy, and that algorithmic systems are proprietary assets that should be protected. These are legitimate concerns, but they are also pretexts for opacity that serves commercial interests at the expense of public accountability. The DSA and similar frameworks demonstrate that transparency requirements can be designed in ways that address legitimate concerns while providing meaningful accountability.
9. Discussion Questions
-
Facebook's internal research showed that its engagement-optimization algorithm amplified divisive content, and that internal experts had documented this finding and presented it to leadership. What ethical obligations did the company have once it possessed this internal knowledge? At what point — if ever — does internal knowledge of algorithmic harm create an obligation to disclose that knowledge to regulators, users, or the public?
-
Frances Haugen has been praised as a whistleblower who served the public interest and criticized as someone who disclosed confidential business information in breach of her employment obligations. How should we evaluate her actions? What framework should govern disclosures of this kind — and should the legal protections available to whistleblowers apply to people who disclose information about algorithmic harms?
-
The EU Digital Services Act requires large platforms to provide researcher access to data about their algorithmic systems. Platform companies argue that this requirement raises privacy concerns (user data might be exposed) and competitive concerns (proprietary systems might be reverse-engineered). How should regulators weigh these concerns against the public interest in understanding how algorithmic systems affect democracy, public health, and user well-being?
-
Social media platforms argue that their recommendation algorithms are editorial judgments protected by the First Amendment, and that compelling transparency about those editorial choices would infringe on their constitutionally protected editorial discretion. Evaluate this argument. Does the scale at which algorithmic editorial decisions are made — billions of decisions daily, affecting billions of users — change the constitutional analysis? How does it compare to the editorial discretion exercised by a traditional newspaper editor?
This case study connects to Chapter 13 (Sections 13.5 and 13.6 on institutional opacity and the audit problem), Chapter 10 (Platform Governance and Content Moderation), and Chapter 20 (AI and Democracy). The ethics washing theme is examined in depth in Chapter 8 (AI Ethics Theater).