Further Reading: Text and NLP Visualization
Tier 1: Essential Reading
Harris, Jacob. "Word Clouds Considered Harmful." NiemanLab, 2011. The canonical critique of word clouds from a working data journalist. Short, direct, and influential. Search "word clouds considered harmful" — the post is freely available online and still makes the argument cleanly more than a decade later.
Sievert, Carson, and Kenneth Shirley. "LDAvis: A method for visualizing and interpreting topics." Proceedings of the workshop on interactive language learning, visualization, and interfaces, 2014. The paper introducing pyLDAvis. Describes the design decisions behind the 2D topic scatter and the relevance slider.
Michel, Jean-Baptiste, et al. "Quantitative analysis of culture using millions of digitized books." Science 331, no. 6014 (2011): 176-182. The paper that introduced the Google Ngram dataset and the term "culturomics." Freely available and directly relevant to Case Study 2.
Tier 2: Recommended Specialized Sources
Bird, Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O'Reilly Media, 2009. The classic NLTK book, freely available at nltk.org/book. Covers tokenization, stopwords, stemming, and basic text analysis. Older than modern libraries but still a good introduction to the concepts.
Jurafsky, Daniel, and James H. Martin. Speech and Language Processing. 3rd ed. draft, 2024. The most comprehensive modern NLP textbook. Covers the full pipeline from tokenization to modern transformers. Freely available at stanford.edu/~jurafsky/slp3. Essential for serious NLP work.
Kessler, Jason. "Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ." ACL 2017 Demo Session, 2017. The paper introducing Scattertext. Covers the visualization philosophy and the specific design decisions.
Cairo, Alberto. The Truthful Art. New Riders, 2016. Chapter on text visualization covers word clouds, topic models, and sentiment. Contains Cairo's critique of word clouds and recommendations for alternatives.
Blei, David M. "Probabilistic Topic Models." Communications of the ACM 55, no. 4 (2012): 77-84. A readable introduction to topic models (LDA and variants) from one of the field's co-founders.
Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a Word-Emotion Association Lexicon." Computational Intelligence 29, no. 3 (2013): 436-465. The paper introducing the NRC Emotion Lexicon, a widely-used resource for emotion analysis.
Tier 3: Tools and Online Resources
| Resource | URL / Source | Description |
|---|---|---|
| wordcloud | github.com/amueller/word_cloud | The Python wordcloud library. Despite the chapter's critique, still useful for decorative purposes. |
| nltk | nltk.org | The Python NLP toolkit. Comprehensive, with built-in visualization utilities. |
| spaCy | spacy.io | Modern industrial NLP library with fast tokenization, NER, and dependency parsing. |
| gensim | radimrehurek.com/gensim | Topic modeling and similarity in Python. |
| pyLDAvis | github.com/bmabey/pyLDAvis | The interactive topic model visualizer. |
| Scattertext | github.com/JasonKessler/scattertext | Comparative text visualization by Jason Kessler. |
| Google Ngram Viewer | books.google.com/ngrams | The interactive Ngram Viewer discussed in Case Study 2. |
| Google Ngram data | storage.googleapis.com/books/ngrams/books/datasetsv3.html | The raw Ngram dataset for bulk analysis. |
| Hugging Face Transformers | huggingface.co | Pretrained transformer models for sentiment, classification, and more. |
| textblob | textblob.readthedocs.io | Simple sentiment analysis library (wraps NLTK). |
| VADER sentiment | github.com/cjhutto/vaderSentiment | Rule-based sentiment analysis tuned for social media. |
| Voyant Tools | voyant-tools.org | Web-based interactive text analysis with built-in visualizations. |
A note on reading order: If you want one additional source, read Harris's "Word Clouds Considered Harmful" blog post — it's short, sharp, and still relevant. For serious NLP work, bookmark Jurafsky & Martin's free textbook. For practical visualization, start with the Scattertext paper for an example of good text visualization design.