Chapter 34: Exercises — Platform Content Moderation: Policies, Challenges, Trade-offs

DataField.Dev

Chapter 34: Exercises — Platform Content Moderation: Policies, Challenges, Trade-offs

Section A: Conceptual Understanding

Exercise 1: The Moderation Spectrum Place the following moderation interventions in order from least to most restrictive, and explain why you've placed each where you have: - Fact-check label applied to a post - Interstitial warning screen before content plays - Permanent account termination - Reduced algorithmic recommendation (downranking) - Account restricted from posting for 48 hours - Content removed; account receives no warning - Friction prompt asking if user wants to read article before sharing - Platform-wide ban of an account across all its services

Then: for each intervention, identify a category of content for which it would be the most appropriate response, and explain why more or less severe interventions would be inappropriate.

Exercise 2: False Positives and False Negatives A content moderation system for health misinformation has the following performance characteristics in testing: - Out of 10,000 posts containing genuine health misinformation: 7,200 are correctly flagged (removed or labeled) - Out of 90,000 posts that are legitimate health content: 4,500 are incorrectly flagged

Calculate: a) The true positive rate (sensitivity/recall) b) The false positive rate c) The false negative rate d) The precision of the system e) If the system processes 1 million health-related posts per day, how many legitimate posts are incorrectly actioned per day? f) If each incorrect action can be appealed at a cost of $2.50 in moderation labor, what is the daily cost of false positive appeals? g) What threshold change might reduce false positives, and what would be the cost in increased false negatives?

Exercise 3: Platform Policy Comparison Compare how Facebook, YouTube, and Twitter/X (as of their most recent publicly available policies) handle each of the following: a) A video claiming that a specific approved vaccine causes autism in a large percentage of children b) A post from a political candidate claiming the previous election was stolen, with no supporting evidence c) A documentary including graphic footage of a mass atrocity, hosted by a human rights organization d) A satirical video that mocks a political figure using fabricated quotes, clearly labeled as satire e) A post by a credentialed scientist disputing mainstream scientific consensus on a health matter

Exercise 4: The Implied Truth Effect Section 34.6 discusses the implied truth effect — the phenomenon where labeling some misinformation leads users to infer that unlabeled content is accurate.

Design a behavioral experiment that would: a) Test whether the implied truth effect exists in your target population b) Measure its magnitude c) Distinguish between the implied truth effect and other potential explanations (e.g., seeing labels make users think more carefully about all content) d) Test whether the effect differs by content domain (health vs. political vs. financial)

Describe: sample, materials, conditions, measures, and analysis plan.

Exercise 5: YouTube Strikes System Analysis A YouTube channel with 500,000 subscribers produces daily videos. Over 18 months: - Month 2: Video removed for COVID-19 misinformation (Strike 1) - Month 3: Channel appeals; appeal denied - Month 7: Video removed for promoting a dangerous health practice (Strike 2) - Month 10: Strike 1 expires (90 days without new strike from the date of Strike 1) - Month 12: Video removed for incitement to violence (Strike 3, but Strike 1 has expired) - Month 14: Strike 2 expires

Walk through the YouTube strikes system as described to determine: a) At what point did the channel face posting suspension and for how long? b) At Month 12, does Strike 3 result in channel termination? Why or why not? c) What is the status of the channel at Month 15? d) Is the strikes system effective at addressing this channel's behavior? What alternative approach might work better?

Section B: Applied Analysis

Exercise 6: Moderation Decision Exercise You are a content moderator reviewing the following posts. For each, identify: (1) the relevant platform policy (choose any major platform), (2) whether you would remove, label, restrict, or take no action, (3) your reasoning, and (4) what information you would need to make a more confident decision.

a) A post claiming "The flu vaccine gives you the flu. My doctor told me to never get it."

b) A video showing a political protest where some protesters are holding signs with profane language, and one demonstrator pushes a police officer who then subdues them.

c) A post by a verified account of a politician saying "Our opponents are using illegal voting machines programmed to steal elections. This is not speculation — it's proven."

d) A thread by an account claiming to be a doctor, advising followers to take high doses of a specific vitamin as a cancer treatment, with links to testimonials but not peer-reviewed research.

e) A historical documentary clip showing actual footage from a World War II concentration camp, posted by a reputable museum.

f) A comedy sketch mocking a religious practice, clearly labeled as satire, posted by a known comedy channel.

Exercise 7: Oversight Board Case Brief Study the Oversight Board's published decision in one of its publicly available cases (all decisions are published at oversightboard.com).

Prepare a case brief covering: a) The content at issue and the platform's initial decision b) The Board's analysis and decision (binding or advisory) c) Whether Meta complied with the decision and any Meta response d) The Board's reasoning regarding applicable policies and human rights principles e) Your assessment: was the outcome appropriate? What alternative decision would you have made?

Exercise 8: The Outsourced Moderation Problem A major US social media platform has 10,000 content moderators — 2,000 employed directly and 8,000 through contracts with outsourcing firms in the Philippines, Kenya, and India.

The platform's direct employees: - Earn average $65,000/year with full benefits - Have access to 24/7 mental health support including counseling and rotation away from disturbing content - Work standard 40-hour weeks

The outsourced contractors: - Earn average $5,000/year with minimal benefits - Have access to limited counseling services (1 hour/week group sessions, no individual sessions) - Are expected to review 1,000+ items per 8-hour shift

a) Calculate the annual labor cost differential between the two workforces for the same number of moderators. b) If the platform were to bring all moderation in-house at US salaries and benefits, estimate the additional annual cost. c) What ethical obligations does the platform have to outsourced moderators? d) Design a minimum standard of care for content moderators that addresses both psychological support and fair compensation, that could realistically be implemented.

Exercise 9: The Whack-a-Mole Problem A coordinated health misinformation campaign operates as follows: - A central coordinating website produces and updates false health claims - These claims are distributed through 200 social media accounts across 5 platforms - When accounts are removed, new accounts are created within 48 hours - The website is hosted in a jurisdiction with no content regulation

You are the head of trust and safety for the largest of the 5 platforms.

a) What can your platform do unilaterally to disrupt this campaign? b) What cross-platform coordination mechanisms exist (if any) that you could use? c) How would you address the coordination infrastructure (the website)? d) How would you evaluate whether your interventions are effective? e) What residual harm is likely to persist despite your best efforts?

Exercise 10: Community Notes Analysis Twitter/X's Community Notes system allows users to submit context notes on tweets; notes appear when a sufficient and politically diverse community of contributors agrees they are helpful.

Analyze Community Notes as a moderation approach: a) What types of misinformation is Community Notes well-suited to address? b) What types is it poorly suited to address? c) What potential biases might emerge from crowd-sourced fact-checking? Who might be systematically over- or under-fact-checked? d) How does Community Notes compare to professional fact-checking on dimensions of accuracy, speed, scale, and perceived legitimacy? e) What does "sufficient and politically diverse agreement" mean in practice? What are the failure modes of this requirement?

Section C: Policy Design

Exercise 11: Design a Fact-Check Label System Given the evidence on fact-check label effectiveness and the implied truth effect, design a fact-check labeling system for a news aggregation platform that: a) Addresses the implied truth effect b) Minimizes label fatigue c) Is scalable (can handle millions of posts per day without reviewing each individually) d) Provides adequate notice to speakers whose content is labeled e) Includes a meaningful appeals process

Evaluate your design: what trade-offs did you make? What evidence would you need to evaluate whether the system achieves its goals?

Exercise 12: Oversight Board Redesign The Facebook Oversight Board has been criticized for: limited scope (only reviewing referred cases), small capacity (dozens of cases per year), advisory-only policy recommendations, and funding dependency on Meta.

Redesign the Oversight Board to address these limitations: a) How should the Board's scope be expanded without making it operationally impossible? b) How should binding authority on policy recommendations be structured? c) How should independence from Meta be better protected? d) What resources would the reformed Board require? e) Should a reformed board apply to Meta only, or should it be an industry-wide body? What governance challenges does each approach create?

Exercise 13: Content Moderation Policy for a New Platform You are founding a new social media platform and need to write a content moderation policy for health information. Your platform will launch with 1 million users.

Your policy must: a) Define what health misinformation is covered and what is not b) Specify the moderation actions available c) Establish the process for making moderation decisions d) Provide for user appeals e) Specify transparency reporting requirements

Then: identify the top three ways your policy is likely to fail in practice and what you would do to address them.

Exercise 14: The Scale Problem — Technical Design You have been hired to design the automated moderation pipeline for a platform that processes 10 million posts per day in 50 languages. Your budget allows for: - Human review capacity: 100,000 posts per day - Automated system: unlimited throughput but ~5% false positive rate and ~15% false negative rate

Design a pipeline that: a) Allocates human review capacity to the posts where it will have the most impact b) Minimizes both false positive and false negative rates for the most serious categories of content c) Handles cross-linguistic content appropriately d) Creates feedback loops that improve automated system accuracy over time

What additional resources would make your pipeline meaningfully better?

Exercise 15: Moderator Mental Health Standards Draft minimum standards for psychological support for content moderators, modeled on workplace health and safety regulations in any jurisdiction of your choice.

Your standards should address: a) Exposure limits: maximum number of graphic content items reviewed per shift/day b) Mandatory break and rotation requirements c) Mental health support services that must be provided d) Evaluation mechanisms for identifying moderators experiencing harm e) Transparency requirements for platforms to report on moderator wellbeing f) Applicability to outsourced and contracted workers

Section D: Research and Investigation

Exercise 16: Platform Transparency Report Analysis Download and analyze a recent transparency report from a major platform (Meta, Google, TikTok, or X).

a) What categories of content removal are reported, and at what level of specificity? b) What languages/countries show the highest removal rates? What might explain this distribution? c) What government removal requests are reported? Which countries make the most requests? d) What information is NOT reported that would be necessary for meaningful accountability? e) Compare the platform's stated policies against its reported actions: are there categories where the numbers seem inconsistent with stated policy?

Exercise 17: Investigating Moderation Consistency Using any accessible dataset of content moderation decisions (several researchers have published datasets; check repositories at Harvard Dataverse or ICWSM proceedings), test whether moderation decisions are consistent across: a) Content topics (does political content receive different treatment than health content?) b) Account demographics (if identifiable from public information) c) Viral status (does content with high engagement receive different treatment?) d) Language

Document your methodology and findings. What limitations does your analysis have?

Exercise 18: The Musk Acquisition Natural Experiment The Twitter/X acquisition in October 2022 created a natural experiment: a major change in content moderation approach on a large platform, with some pre-acquisition baseline data.

Research what happened to the following metrics after the acquisition (using published academic research, news investigations, or available data): a) Hate speech prevalence (slur usage) b) Advertiser revenue and participation c) User account activity (active users, posting rates) d) Fact-check label coverage e) Coordinated inauthentic behavior detection

Evaluate: what do these findings tell us about the relationship between content moderation investment and platform information quality?

Exercise 19: Reading a Moderation System from the Outside Choose a social media platform you use regularly. Without access to internal systems, attempt to reverse-engineer its content moderation approach by: a) Testing what content is surfaced by the algorithm vs. what is not b) Observing which posts in your own feed (if any) have labels or warnings c) Reporting specific categories of content and observing how quickly (if at all) action is taken d) Examining the platform's public-facing policies and transparency reports

What can and cannot be learned about moderation systems from the outside? What information would require internal access to evaluate?

Exercise 20: Comparative Moderation Across Jurisdictions The same piece of content may be moderated differently on different versions of a platform operating under different national laws.

Research how TikTok's content moderation differs between its Chinese version (Douyin) and its international version. Identify: a) Categories of content available on TikTok internationally but not on Douyin b) Categories of content available on Douyin but not TikTok internationally c) What this comparison reveals about the relationship between platform governance and national regulatory context d) Whether TikTok's international version is affected by its Chinese ownership in detectable ways (examine existing research, not speculation)

Section E: Argumentation and Critical Analysis

Exercise 21: The Oversight Board Debate The Oversight Board has been called both "Facebook's most important structural reform" (advocates) and "a PR stunt designed to deflect regulatory pressure" (critics).

Write a 750-word assessment of the Oversight Board that: a) Acknowledges the strongest version of the critics' argument b) Acknowledges the strongest version of the advocates' argument c) Takes a position on balance: does the Oversight Board represent meaningful accountability, and why? d) Proposes a specific structural reform that would address the critics' most compelling concern

Exercise 22: Moderator Testimony Preparation You are a former content moderator who worked for an outsourced company reviewing content for a major social media platform. You have been asked to testify before a legislative committee examining working conditions for content moderators.

Prepare testimony that: a) Describes the content you were required to review and its psychological impact b) Describes the support (or lack thereof) you received c) Explains the structural reasons why your employer had limited incentive to improve conditions d) Proposes specific legal requirements that would protect future moderators

Exercise 23: The Borderline Content Debate YouTube's borderline content policy reduces algorithmic amplification of content that approaches but does not clearly violate its rules. A YouTube creator has had their videos placed in the "borderline" category without being notified.

Make the strongest possible case for each of the following positions: a) YouTube's borderline content policy is an appropriate exercise of editorial discretion that does not require notification or appeals. b) YouTube's borderline content policy, as applied without notification or meaningful appeals, is a form of censorship requiring reform. c) The borderline content policy should be replaced with a system that labels all borderline content for users while removing algorithmic restriction.

Exercise 24: Scale and Justice A fundamental tension in content moderation is between the demands of justice (individualized, contextual decisions) and the demands of scale (automated, categorical decisions).

At a scale of 10 million posts per day with a 1% policy violation rate, how many moderation decisions (100,000) require individual human review? If each human review takes 10 minutes, how many full-time moderators would be needed to review all borderline cases?

Calculate, then discuss: a) Is meaningful individualized review possible at this scale? b) What compromises between justice and scale are acceptable? c) How should error rates be distributed — is it worse to err on the side of over-removal or under-removal, and does the answer depend on the type of content? d) Is there a technology development path that could make more individualized review feasible? What would it require?

Exercise 25: Designing Accountability Without Censorship You are advising a democratic government that wants to improve platform content moderation without engaging in government censorship.

Design a regulatory framework that: a) Does NOT require platforms to remove any specific categories of content b) DOES create meaningful accountability for how platforms make moderation decisions c) Addresses the documented under-moderation of non-English content d) Provides meaningful recourse for users whose content has been incorrectly removed e) Creates incentives for investment in content moderation capacity

What are the limits of this approach? What harms would it be unable to address?

Reflection Exercises

Exercise 26: Your Own Content Moderation Recall a time when your content was removed, labeled, or restricted on a social media platform, or when you reported content and no action was taken.

a) What happened? What was the content, and what action (or non-action) did the platform take? b) Did you understand why? Did the platform's notification give adequate information? c) Did you appeal? If so, what was the outcome? d) In retrospect, was the decision correct? Why or why not? e) What would a better process have looked like?

Exercise 27: The Hidden Moderator Section 34.8 describes the psychological harm experienced by content moderators. Before this chapter, were you aware that a human workforce reviews disturbing content as a condition of your social media use?

a) Does knowing about this hidden workforce change how you think about using social media? How? b) As a user, what obligations (if any) do you have toward the workers who keep platforms safe? c) What would you be willing to pay (in subscription fees, reduced algorithmic targeting, or other terms) to ensure content moderators receive fair wages and adequate mental health support?