Further Reading: Text and NLP Visualization

Tier 1: Essential Reading

Harris, Jacob. "Word Clouds Considered Harmful." NiemanLab, 2011. The canonical critique of word clouds from a working data journalist. Short, direct, and influential. Search "word clouds considered harmful" — the post is freely available online and still makes the argument cleanly more than a decade later.

Sievert, Carson, and Kenneth Shirley. "LDAvis: A method for visualizing and interpreting topics." Proceedings of the workshop on interactive language learning, visualization, and interfaces, 2014. The paper introducing pyLDAvis. Describes the design decisions behind the 2D topic scatter and the relevance slider.

Michel, Jean-Baptiste, et al. "Quantitative analysis of culture using millions of digitized books." Science 331, no. 6014 (2011): 176-182. The paper that introduced the Google Ngram dataset and the term "culturomics." Freely available and directly relevant to Case Study 2.

Tier 2: Recommended Specialized Sources

Bird, Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O'Reilly Media, 2009. The classic NLTK book, freely available at nltk.org/book. Covers tokenization, stopwords, stemming, and basic text analysis. Older than modern libraries but still a good introduction to the concepts.

Jurafsky, Daniel, and James H. Martin. Speech and Language Processing. 3rd ed. draft, 2024. The most comprehensive modern NLP textbook. Covers the full pipeline from tokenization to modern transformers. Freely available at stanford.edu/~jurafsky/slp3. Essential for serious NLP work.

Kessler, Jason. "Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ." ACL 2017 Demo Session, 2017. The paper introducing Scattertext. Covers the visualization philosophy and the specific design decisions.

Cairo, Alberto. The Truthful Art. New Riders, 2016. Chapter on text visualization covers word clouds, topic models, and sentiment. Contains Cairo's critique of word clouds and recommendations for alternatives.

Blei, David M. "Probabilistic Topic Models." Communications of the ACM 55, no. 4 (2012): 77-84. A readable introduction to topic models (LDA and variants) from one of the field's co-founders.

Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a Word-Emotion Association Lexicon." Computational Intelligence 29, no. 3 (2013): 436-465. The paper introducing the NRC Emotion Lexicon, a widely-used resource for emotion analysis.

Tier 3: Tools and Online Resources

Resource	URL / Source	Description
wordcloud	github.com/amueller/word_cloud	The Python wordcloud library. Despite the chapter's critique, still useful for decorative purposes.
nltk	nltk.org	The Python NLP toolkit. Comprehensive, with built-in visualization utilities.
spaCy	spacy.io	Modern industrial NLP library with fast tokenization, NER, and dependency parsing.
gensim	radimrehurek.com/gensim	Topic modeling and similarity in Python.
pyLDAvis	github.com/bmabey/pyLDAvis	The interactive topic model visualizer.
Scattertext	github.com/JasonKessler/scattertext	Comparative text visualization by Jason Kessler.
Google Ngram Viewer	books.google.com/ngrams	The interactive Ngram Viewer discussed in Case Study 2.
Google Ngram data	storage.googleapis.com/books/ngrams/books/datasetsv3.html	The raw Ngram dataset for bulk analysis.
Hugging Face Transformers	huggingface.co	Pretrained transformer models for sentiment, classification, and more.
textblob	textblob.readthedocs.io	Simple sentiment analysis library (wraps NLTK).
VADER sentiment	github.com/cjhutto/vaderSentiment	Rule-based sentiment analysis tuned for social media.
Voyant Tools	voyant-tools.org	Web-based interactive text analysis with built-in visualizations.

A note on reading order: If you want one additional source, read Harris's "Word Clouds Considered Harmful" blog post — it's short, sharp, and still relevant. For serious NLP work, bookmark Jurafsky & Martin's free textbook. For practical visualization, start with the Scattertext paper for an example of good text visualization design.