Case Study 37.1: LLM-Generated News Articles and the Rise of Content Farms


Overview

In 2023, the media monitoring organization NewsGuard documented something that researchers in the disinformation space had been anticipating since the release of GPT-3 in 2020: the large-scale emergence of news websites that appeared to be operating entirely or primarily on AI-generated content, with little or no human editorial staff, publishing hundreds or thousands of articles per week and monetized through programmatic advertising.

NewsGuard's researchers identified over 400 such sites by mid-2023, a number that continued to grow through the year. The sites ranged from generic "news aggregator" platforms covering everything from celebrity gossip to foreign policy, to more targeted operations producing what appeared to be local or regional news coverage. Some were nakedly commercial — optimization vehicles for advertising revenue — while others appeared to have additional goals, including partisan influence and the exploitation of specific geographic information vacuums.

This case study examines the structure, operation, and impact of AI-generated content farm journalism, with particular attention to the local news variant and its implications for community information infrastructure.


Background: The Local News Context

To understand why AI-generated local news sites represent a specific category of threat rather than simply another form of low-quality online content, it is necessary to understand the collapse of the local journalism ecosystem in which they are appearing.

Between 2004 and 2023, the United States lost approximately 2,500 local newspapers. In that period, newspaper newsroom employment fell by more than 57 percent. The losses were not uniform: large metropolitan newspapers retained more resources while smaller regional and community papers — the ones covering city council meetings, school board elections, local zoning disputes, and municipal budget decisions — collapsed at the highest rates.

The practical consequence of this collapse is a genuine informational vacuum. In hundreds of American counties and smaller cities, there is now no professional journalist who routinely covers local government. City council meetings happen without a reporter present. Zoning variances are granted without public coverage. Local elections occur with minimal civic information available to voters. School board decisions — often among the most consequential local government actions affecting community members directly — are made without journalistic accountability.

This vacuum is not experienced by community members as an absence. People do not notice the absence of news they never had. What they experience instead is: a generalized sense that local news is available if they look for it, combined with uncertainty about where to find good local coverage, and a willingness to accept apparently local sources as legitimate if they can find them.

AI-generated local news sites are designed, whether deliberately or by opportunistic coincidence, to fill this perceived gap with content that appears to be local journalism.


How AI-Generated Local News Sites Operate

The operational structure of AI-generated local news sites, as documented through multiple research investigations, follows a recognizable pattern.

Infrastructure: The site is established with a domain name designed to suggest local or regional coverage — names like "The [City] Reporter," "[County] Daily News," or "[State] Voice" are common patterns. The sites are typically built on standard content management systems (WordPress is disproportionately common) and visually styled to resemble conventional news sites, with masthead graphics, category navigation, and article formats indistinguishable from legitimate local journalism at a glance.

Content generation: Articles are generated using commercial LLM APIs or consumer interfaces. The generation is typically semi-automated: a human operator may set general topic areas or geographic focuses, and the system generates articles on those topics using a mix of prompted generation and automated summarization of wire service or other publicly available content. Some operations appear to have a human editor performing light review; others appear to publish generated content with no human review whatsoever.

The content mix: Importantly, not all articles on these sites are false or even inaccurate. Research into the content of identified AI-generated news sites reveals a typical mix: a substantial proportion of the content is accurate (or at least not identifiably false) — generic lifestyle articles, rewritten summaries of real news events, wire service material reformatted. The false or manipulative content — partisan fabrications, stories promoting specific political positions, content serving influence operation goals — is mixed into this base of credible-seeming material.

This content mix is not accidental. It is a trust-building strategy. Readers who encounter several accurate articles are more likely to extend credibility to subsequent articles, including inaccurate ones.

Monetization: The primary revenue model for commercially motivated sites is programmatic advertising — the same technology that funds legitimate news sites. Ad networks have historically not screened for content authenticity, only for traffic volume and basic content categories. A site generating high article volume will generate high page views; high page views generate advertising revenue regardless of whether the articles are authentic. In documented cases, AI-generated local news sites were generating revenue from major consumer brands through automated ad placement, with those brands having no knowledge that their advertising was funding fabricated content.


Case Analysis: The Pink Slime Journalism Pattern

Researchers have noted that AI-generated local news sites echo and amplify a pattern that preceded generative AI: "pink slime" journalism, a term coined around 2019 to describe networks of ostensibly local news sites that were actually producing partisan political content centrally, with a local veneer.

The original pink slime operations — documented by the Columbia Journalism Review, the Tow Center for Digital Journalism, and others — typically employed human writers producing locally branded partisan content at scale. Sites like Metric Media's network produced thousands of local-branded news articles per month, each formatted to appear as authentic local news but actually produced by a small central staff with partisan or commercial motivations.

The advent of LLM-based generation has lowered the operational cost of this model to near zero. Where the original pink slime networks required hundreds of human writers to maintain their volume, LLM-equipped successors can produce equivalent or greater volume with a single operator and a modest API budget. The geographic granularity available to the AI model — its ability to reference specific local place names, institutions, and relevant context — is sufficient to maintain the appearance of local coverage for audiences who are not themselves embedded in the specific community.

The specific harm of this model is its substitution effect on civic knowledge. A resident seeking information about a local school board election who finds an AI-generated "local news" article about that election does not simply encounter false information — they fill the role in their information landscape that authentic local journalism would have occupied, reducing their motivation to seek other sources and potentially acting on fabricated information.


Documented Impact

The full impact of AI-generated local news sites on civic information and political behavior is difficult to measure, because measurement requires identifying AI-generated content that has been deliberately designed to resist identification. However, several documented cases illustrate the range of potential effects.

Fabricated local election coverage: In the 2022 and 2023 election cycles, multiple research groups identified AI-generated sites producing fabricated coverage of local races — reporting on fictional candidate positions, fabricated endorsements from local organizations, and invented controversy about candidates. In at least some documented cases, the fabricated content was cited and shared by real community members who had found the sites through search and had no means of evaluating their authenticity.

Local zoning and development disputes: AI-generated sites have been identified producing content on local real estate and development issues, including content that appeared to serve the interests of specific development projects or to generate opposition to others. In one documented pattern, sites producing apparently oppositional "community concerns" content about development projects were traced to domain registrations associated with political consulting firms with interests in those projects.

Health misinformation in local framing: Several identified AI-generated sites have produced health misinformation — anti-vaccine content, COVID treatment misinformation — framed as local news rather than national-level commentary. Local framing adds apparent relevance ("health officials in [your county] warn...") and reduces the likelihood that readers will look for corroborating national coverage.


Analysis: Applying the Course Framework

Analyzing AI-generated local news sites through the propaganda analysis framework developed in this course yields several observations:

Source and channel analysis (Chapter 9): The defining characteristic of AI-generated local news sites, from a source analysis perspective, is the absence of identifiable editorial accountability. Authentic local journalism involves named editors, identifiable reporters with verifiable professional histories, institutional affiliations, and in some cases local ownership or community roots. AI-generated sites typically have no identifiable editorial staff, generic "about us" pages with vague descriptions of mission, and domain registrations that obscure ownership. Source analysis focused on editorial accountability is therefore the most reliable identification strategy.

Propaganda technique analysis (Chapter 10): The propaganda techniques employed by AI-generated content are the standard techniques analyzed throughout the course — not novel AI-specific techniques. This confirms Section 37.10's argument that technique-based inoculation transfers. The specific techniques vary by the site's purpose: commercially motivated content farms tend to rely on emotional engagement and clickbait framing; politically motivated sites tend to employ selective emphasis, appeal to authority, and fear appeals.

Manufactured consensus (Chapter 14): Multiple AI-generated sites covering the same topic from different apparent geographic origins can create the impression of widespread national concern about a particular issue — each appearing to be an independent local source but actually produced by a coordinated operation. This is manufactured consensus at a geographic scale that required an enormous coordinated human operation in the pre-AI era.

The industrial propaganda model (Chapter 12): The pattern of AI-generated local news content farms is structurally analogous to the Nazi propaganda ministry's use of centrally controlled regional press — local-appearing outlets that were actually producing centrally directed content under the direction of a single authority. The Nazis maintained hundreds of regional newspapers that appeared to be independent but operated under Ministry of Propaganda editorial control. AI-generated content farms replicate this structure with far lower operational cost and without requiring the coercive state power that Goebbels required to maintain press coordination.


Discussion Questions

  1. The chapter distinguishes between commercially motivated AI-generated content farms (optimizing for advertising revenue) and politically motivated ones (serving influence operation goals). Does this distinction matter for the harm they cause? Why or why not?

  2. The "trust-building content mix" strategy — mixing accurate content with false content to build source credibility — is a long-standing propaganda technique (discussed in Chapter 9). How does AI generation change the economics of this strategy? What are the implications for how readers should evaluate apparently accurate content from sources with uncertain credentials?

  3. Platform advertising networks are the primary revenue source for many AI-generated content farms. What responsibilities, if any, do those networks have for the content they fund? What practical steps could they take to reduce funding to AI-generated disinformation sites?

  4. The chapter describes local news deserts as creating the specific vulnerability that AI-generated local news exploits. What does this suggest about the relationship between the collapse of local journalism and the vulnerability to disinformation? Is rebuilding local journalism infrastructure a counter-disinformation strategy?

  5. Apply the lateral reading technique (Chapter 28) to the AI-generated local news problem. What would a reader using lateral reading discover when investigating a suspected AI-generated local news site? What limitations does lateral reading face in this specific context?


Further Research

  • NewsGuard's "AI-Generated News Sites" research series (2023–ongoing)
  • Tow Center for Digital Journalism, "The Expanding News Desert" reports
  • Columbia Journalism Review's coverage of "pink slime" journalism networks
  • Renée DiResta, "The Supply of Disinformation Will Soon Be Infinite," The Atlantic (2020)
  • Stanford Internet Observatory research on coordinated inauthentic behavior patterns