Chapter 37 Further Reading

DataField.Dev

Chapter 37 Further Reading

Foundational Text Analysis Methods

Bird, Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O'Reilly Media, 2009. Available free at nltk.org/book. The standard introductory text for Python NLP, covering tokenization, tagging, parsing, and classification. Essential background for the chapter's feature engineering approach.

Jurafsky, Daniel, and James H. Martin. Speech and Language Processing. 3rd ed. (draft chapters available free at web.stanford.edu/~jurafsky/slp3/). The graduate-level NLP textbook. Chapters on text classification, sentiment analysis, and information extraction are directly relevant. Chapter 4 (naive Bayes and sentiment) and Chapter 5 (logistic regression for text) are most useful for this chapter's methods.

VanderPlas, Jake. Python Data Science Handbook. O'Reilly Media, 2016. Available free at jakevdp.github.io/PythonDataScienceHandbook. Comprehensive reference for pandas, numpy, matplotlib, and scikit-learn. Chapter 5 (Machine Learning) covers classification, feature importance, and model selection.

Populism Text Analysis Methods

Rooduijn, Matthijs, and Teun Pauwels. "Measuring Populism: Comparing Two Methods of Content Analysis." West European Politics 34.6 (2011): 1272–1283. The foundational text for the dictionary-based approach implemented in this chapter. Read this before implementing your own populism dictionary.

Hawkins, Kirk A., Ryan E. Carlin, Levente Littvay, and Cristóbal Rovira Kaltwasser, eds. The Ideational Approach to Populism: Concept, Theory, and Analysis. Routledge, 2019. Comprehensive methodological review of approaches to measuring populism including survey, text, and expert-coding methods. The text analysis chapters are directly relevant.

Dai, Yining. "Does Populism Work? Anti-Establishment Campaign Rhetoric and Electoral Outcomes in the 2016 US Presidential Election." Research paper, Stanford University, 2017. Available free online. An example of quantitative rhetoric analysis applied to US political communication, using methods similar to those in this chapter.

Widmann, Tobias. "How Emotional Are Populists Really? Factors Explaining Emotional Appeals in the Communication of European Parties." Political Psychology 42.1 (2021): 163–181. Research combining populism measurement with linguistic analysis of emotional appeals — directly relevant to the urgency and emotional intensity features in this chapter.

Machine Learning for Political Text

Lucas, Christopher, et al. "Computer-Assisted Text Analysis for Comparative Politics." Political Analysis 23.2 (2015): 254–277. Methodological review of machine learning approaches for political science text analysis. Excellent on the trade-offs between supervised and unsupervised approaches.

Wilkerson, John, and Andreu Casas. "Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges." Annual Review of Political Science 20 (2017): 529–544. Reviews computational text analysis in political science. Good on validity concerns and the relationship between technical and theoretical choices.

Grimmer, Justin, Margaret Roberts, and Brandon Stewart. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press, 2022. The most comprehensive and current methodological text for social science text analysis, including supervised classification, topic models, and the fundamental questions of validation and interpretation.

Scikit-learn and Classification

Pedregosa, Fabian, et al. "Scikit-learn: Machine Learning in Python." Journal of Machine Learning Research 12 (2011): 2825–2830. The original scikit-learn paper. The scikit-learn documentation (scikit-learn.org) is excellent and contains extensive examples for all methods used in this chapter.

Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron (O'Reilly, 3rd ed., 2022). Chapters 3-4 cover classification in depth, including precision/recall trade-offs and the ROC curve.

Critical Perspectives on Text Classification

Birhane, Abeba, et al. "The Values Encoded in Machine Learning Research." In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022. Examines how research choices in machine learning embed values. Relevant to the measurement-shapes-reality theme.

Jacobs, Abigail Z., and Hanna Wallach. "Measurement and Fairness." In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021. Essential reading on construct validity and measurement in machine learning systems. Directly applicable to the question of whether the populism classifier measures what it claims to measure.

Data and Code Resources

ODA speeches dataset structure: As described in Section 37.1. The full dataset is fictional for this textbook; the structure and analysis pipeline are generalizable to real speech corpora.

Real comparable datasets: - Comparative Manifesto Project (MARPOR): manifesto-project.wzb.eu — Coded party manifestos for 50+ countries. Can be used for populism text analysis research. - US Congressional Record: congress.gov/congressional-record — All floor speeches in Congress, accessible programmatically. - VoxPopuli: voxpopuli.ai — Political speech database (varies by access level) - Sunlight Foundation Congress API (archived): Documentation available through GitHub for historical congressional speech data.

Python libraries used in this chapter: - pandas (pandas.pydata.org) — Data manipulation - numpy (numpy.org) — Numerical computing - matplotlib (matplotlib.org) — Visualization - seaborn (seaborn.pydata.org) — Statistical visualization - scikit-learn (scikit-learn.org) — Machine learning - scipy.stats (docs.scipy.org) — Statistical testing - re (Python standard library) — Regular expressions