Chapter 22 Further Reading: Natural Language Processing for Misinformation Detection

Annotated Bibliography

Sources are organized by topic. Entries marked (*) are especially recommended for those new to the field. Academic papers are listed with full citations; all are available through Google Scholar or direct links.


Foundational NLP and Machine Learning

1. Jurafsky, Daniel, and James H. Martin. Speech and Language Processing* (3rd ed., draft). Prentice-Hall, 2023. Available free at web.stanford.edu/~jurafsky/slp3/

The standard NLP textbook, comprehensively covering tokenization, language models, word embeddings, transformers, and information extraction. Chapters 6 (vector semantics and embeddings), 10 (transformers), and 11 (BERT and fine-tuning) are directly relevant to this chapter. The draft third edition includes updated coverage of neural methods. Essential background for anyone planning to implement NLP systems.


2. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." NAACL-HLT, 2019, pp. 4171–4186.

The BERT paper that initiated the transformer fine-tuning paradigm in NLP. Essential reading for understanding the masked language modeling pre-training objective, the [CLS] token representation for classification, and the GLUE benchmark results that demonstrated BERT's versatility. The paper's clarity makes it accessible to graduate students with ML background. Available on arXiv (1810.04805).


3. Vaswani, Ashish, et al. "Attention Is All You Need." NeurIPS, 2017, pp. 5998–6008.

The paper introducing the transformer architecture. Somewhat technical but important for understanding why transformers replaced RNNs as the dominant NLP architecture. The self-attention mechanism is explained mathematically in Section 3. Understanding this paper gives a foundation for understanding why BERT, GPT, and all subsequent large language models work the way they do. Available on arXiv (1706.03762).


Fake News Datasets and Benchmarks

4. Wang, William Yang. "'Liar, Liar Pants on Fire': A New Benchmark Dataset for Fake News Detection." ACL*, 2017, pp. 422–426.

The paper introducing the LIAR dataset. Short (5 pages) and clearly written. Describes dataset construction, presents baseline results using LSTM and CNN models with and without metadata, and honestly discusses limitations including the PolitiFact selection bias and the difficulty of the task. Required reading alongside Case Study 22-1. Available on arXiv (1705.00648).


5. Thorne, James, et al. "FEVER: A Large-Scale Dataset for Fact Extraction and VERification." NAACL, 2018, pp. 809–819.

The paper introducing the FEVER benchmark. Describes claim generation methodology, annotation process, evaluation metrics (including the strict FEVER score), and baseline results. The dataset's construction methodology — using Wikipedia mutations — is described in detail and is important for understanding the limitations discussed in Case Study 22-2. Available on arXiv (1803.05355).


6. Shu, Kai, et al. "FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media." Big Data, 7(3), 2019.

Describes the FakeNewsNet dataset combining news content with social context information (engagement, diffusion networks, publisher credibility). Important for research integrating textual and network features. The paper also proposes a taxonomy of fake news and discusses the data collection and verification methodology.


Classical Approaches to Misinformation Detection

7. Pérez-Rosas, Verónica, et al. "Automatic Detection of Fake News." COLING, 2018, pp. 3391–3401.

A well-structured paper demonstrating that stylometric and content features (n-grams, readability, sentiment) achieve competitive performance on fake news detection. Important for establishing that classical ML approaches are not obsolete compared to deep learning, and for demonstrating which feature types are most informative.


8. Rashkin, Hannah, et al. "Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking." EMNLP, 2017, pp. 2931–2937.

Linguistic analysis of what distinguishes fake news from real news across multiple datasets. Examines hedging, certainty, sentiment, and argumentation patterns. Provides the theoretical grounding for why stylometric features work — fake news and real news use language differently in systematic, detectable ways.


Transformer Approaches

9. Liu, Yinhan, et al. "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv, 2019. arXiv:1907.11692.

Demonstrates that BERT's performance can be substantially improved by training longer, with larger batches, on more data, and without the Next Sentence Prediction objective. RoBERTa's improvements over BERT are largely engineering/hyperparameter rather than architectural. Important for understanding that pre-training choices matter significantly.


10. Zhou, Xinyi, and Reza Zafarani. "A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities." ACM Computing Surveys, 53(5), 2020, Article 109.

A comprehensive survey covering: theoretical frameworks for understanding misinformation, datasets, feature engineering approaches, classical ML methods, deep learning methods, and knowledge-graph-based approaches. Excellent reference for understanding the full landscape of the field at a point in time. Some sections are now slightly dated on deep learning methods but remain valuable for the conceptual framework and dataset survey.


Limitations and Adversarial Robustness

11. Thorne, James, et al. "Evaluating Adversarial Attacks Against Multiple Fact Verification Systems." EMNLP, 2019.

Describes the FEVER 2.0 adversarial challenge, where humans wrote claims specifically designed to fool automated fact verification systems. Analyzes what types of attacks succeed and why, and evaluates models trained on adversarial examples. Essential reading for understanding the cat-and-mouse dynamics of adversarial robustness.


12. Horne, Benjamin D., et al. "This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire Than Real News." ICWSM, 2017.

Demonstrates that fake news and satire share distinctive stylistic features that differ from real news — providing empirical grounding for why stylometric features work AND why they confound satire with misinformation. The finding that fake news titles differ more from real news than bodies do has implications for feature selection in classification systems.


Ethical and Social Considerations

13. Mitchell, Margaret, et al. "Model Cards for Model Reporting." FAccT*, 2019, pp. 220–229.

The paper introducing "Model Cards" — a structured documentation format for ML models that communicates intended use, performance characteristics, evaluation details, limitations, and ethical considerations. Essential for anyone building and deploying NLP systems. The framework has been adopted by major AI research labs (Google, Hugging Face) and is becoming a standard for responsible AI documentation.


14. Gillespie, Tarleton. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press, 2018.

A sociological analysis of content moderation as a practice — how platforms make decisions about what content to allow, the labor involved, the consequences for speech, and the political dynamics of moderation policies. Essential context for understanding why technical solutions are insufficient and what organizational and political factors shape how automated detection is deployed. Not a technical book but essential for the ethical analysis in Section 22.9.


15. Raji, Inioluwa Deborah, et al. "Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing." AAAI/ACM AIES, 2020.

Although focused on facial recognition, Raji et al.'s auditing methodology — testing for disparate error rates across demographic groups — is directly applicable to NLP-based content moderation systems. The paper provides a rigorous framework for what "algorithmic auditing" means in practice and what findings should be required before deployment of high-stakes AI systems.


Online Resources and Tools

  • Papers With Code — Fake News Detection: paperswithcode.com/task/fake-news-detection — Tracks state-of-the-art results on major fake news benchmarks with links to code repositories.

  • FEVER Dataset and Leaderboard: fever.ai — Official site for the FEVER benchmark with dataset downloads, evaluation server, and leaderboard of published results.

  • HuggingFace Model Hub: huggingface.co/models — Pre-trained models for text classification, NLI, and fact verification, including several fine-tuned on FEVER and LIAR.

  • ClaimBuster: idir.uta.edu/claimbuster — A claim detection and fact-checking assistance tool from UTA. Their API allows checking whether input text contains check-worthy claims and cross-referencing with existing fact-checks.

  • NLTK Documentation: nltk.org — Reference documentation for all preprocessing functions used in this chapter.

  • spaCy Documentation: spacy.io — Documentation and interactive examples for spaCy's NLP pipeline, including tokenization, NER, and lemmatization.

  • The Data Statements Initiative: datastatements.washington.edu — Framework for documenting NLP datasets in terms of curation rationale, language variety, speaker demographics, and known limitations — complementing Model Cards for training data transparency.