Chapter 14 Further Reading: NLP for Business

DataField.Dev

Chapter 14 Further Reading: NLP for Business

Foundations of NLP

1. Jurafsky, D., & Martin, J. H. (2024). Speech and Language Processing (3rd edition draft). Stanford University. The definitive textbook on NLP, freely available online. Jurafsky and Martin cover everything from tokenization and language models to transformers and ethical considerations. The third edition (continuously updated as a free online draft) includes comprehensive coverage of modern transformer architectures. MBA students should focus on the introductory and application-oriented chapters; the mathematical detail is optional but rewarding for those with the background. Available at: https://web.stanford.edu/~jurafsky/slp3/

2. Lane, H., Hapke, H., & Howard, C. (2019). Natural Language Processing in Action. Manning Publications. A practitioner-focused guide to building NLP systems in Python. Lane, Hapke, and Howard walk through the complete NLP pipeline — from tokenization to deep learning — with working code examples. More hands-on than Jurafsky and Martin and better suited for readers who learn by building. The chapters on TF-IDF, word embeddings, and sequence models align directly with this chapter's coverage.

3. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media. Although the original edition uses Python 2 and NLTK (the Natural Language Toolkit), this book remains an excellent introduction to NLP concepts through hands-on programming. An updated version covering Python 3 is available free online. Particularly useful for understanding tokenization, stemming, lemmatization, and corpus analysis at a foundational level.

Word Embeddings and Representation Learning

4. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). "Efficient Estimation of Word Representations in Vector Space." arXiv preprint arXiv:1301.3781. The paper that introduced Word2Vec and launched the word embedding revolution. Mikolov and colleagues demonstrated that training a shallow neural network on word co-occurrences produces vector representations that capture semantic relationships — including the famous "king - man + woman = queen" arithmetic. Highly readable for a research paper, and foundational for understanding how modern NLP represents meaning.

5. Pennington, J., Socher, R., & Manning, C. D. (2014). "GloVe: Global Vectors for Word Representation." Proceedings of EMNLP 2014, 1532-1543. GloVe (Global Vectors) is an alternative to Word2Vec that combines the benefits of count-based methods (like TF-IDF) and prediction-based methods (like Word2Vec). The resulting embeddings are widely used in practice and available as free, pre-trained downloads from Stanford. A good companion to the Mikolov paper for understanding how the field converged on dense word representations.

Transformers and Modern NLP

6. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). "Attention Is All You Need." Advances in Neural Information Processing Systems, 30. The paper that introduced the transformer architecture and changed the trajectory of NLP and AI research. While the mathematical detail is dense, the core intuition — replacing sequential processing with parallel self-attention — is accessible and essential for understanding why modern language models work. Widely regarded as one of the most influential machine learning papers of the decade.

7. Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." Proceedings of NAACL-HLT 2019, 4171-4186. The paper that made transfer learning practical for NLP. Devlin and colleagues demonstrated that pre-training a transformer on masked language modeling and next-sentence prediction, then fine-tuning on specific tasks, achieved state-of-the-art results on eleven NLP benchmarks. BERT's impact on business NLP cannot be overstated — it reduced the labeled data requirements for high-accuracy text classification from tens of thousands to hundreds of examples.

8. Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). "A Primer in BERTology: What We Know About How BERT Works." Transactions of the Association for Computational Linguistics, 8, 842-866. A comprehensive survey of research analyzing what BERT learns and how it works. Useful for business leaders who want to understand the strengths and limitations of BERT-based models without reading dozens of individual papers. The authors synthesize findings on what knowledge BERT captures, where it fails, and how fine-tuning affects its behavior.

Sentiment Analysis and Opinion Mining

9. Liu, B. (2022). Sentiment Analysis and Opinion Mining (2nd edition). Morgan & Claypool. The most comprehensive academic treatment of sentiment analysis. Bing Liu covers lexicon-based, ML-based, and deep learning approaches, with extensive discussion of aspect-based sentiment analysis. The sections on challenges — sarcasm, negation, comparative opinions — are particularly relevant for practitioners. More technical than most MBA students will need in full, but the overview chapters are accessible and thorough.

10. Hutto, C. J., & Gilbert, E. (2014). "VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text." Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media. The paper introducing VADER, the lexicon-based sentiment analysis tool discussed in this chapter. VADER is specifically tuned for social media text and handles punctuation emphasis (!!!), capitalization, emojis, and slang. While transformer-based models now surpass VADER in accuracy, VADER remains valuable for its speed, interpretability, and zero-training-data requirement. A good read for understanding the tradeoffs between rule-based and learned approaches.

Topic Modeling

11. Blei, D. M. (2012). "Probabilistic Topic Models." Communications of the ACM, 55(4), 77-84. David Blei, one of the inventors of LDA, wrote this accessible overview for a general computer science audience. It explains the intuition behind topic modeling — documents as mixtures of topics, topics as distributions over words — without requiring the full mathematical machinery. Essential reading for understanding the technique that discovered Athena's sustainability trend.

12. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). "Latent Dirichlet Allocation." Journal of Machine Learning Research, 3, 993-1022. The original LDA paper, for readers who want the full technical treatment. Blei, Ng, and Jordan introduce the generative model, the inference algorithm, and experimental results on document collections. More mathematical than most MBA students will need, but foundational for anyone building topic modeling systems. The 2012 Communications of the ACM article (above) is the more accessible version.

NLP in Business and Finance

13. Loughran, T., & McDonald, B. (2011). "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks." The Journal of Finance, 66(1), 35-65. A landmark paper demonstrating that general-purpose sentiment dictionaries (like the Harvard General Inquirer) perform poorly on financial text. Loughran and McDonald created a finance-specific sentiment dictionary that dramatically improved accuracy. The key lesson for business leaders: domain-specific NLP tools outperform general-purpose tools, and the difference matters for decisions. Directly relevant to the Bloomberg case study.

14. Wu, S., Irsoy, O., Lu, S., et al. (2023). "BloombergGPT: A Large Language Model for Finance." arXiv preprint arXiv:2303.17564. The paper introducing BloombergGPT, discussed in Case Study 2. The paper describes the training data, model architecture, and benchmark results that demonstrate the value of domain-specific pre-training for financial NLP. A useful read for understanding when building a domain-specific LLM is justified versus fine-tuning a general-purpose model.

15. Gentzkow, M., Kelly, B., & Taddy, M. (2019). "Text as Data." Journal of Economic Literature, 57(3), 535-574. An extensive survey of how text data is used in economics and finance research. Gentzkow, Kelly, and Taddy cover text representation, dimensionality reduction, and applications to measuring policy uncertainty, media bias, and corporate disclosure. Written for economists but highly relevant for business leaders thinking about text as a strategic data source.

Practical NLP Tools and Libraries

16. Honnibal, M., & Montani, I. (2017). "spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing." Explosion AI. spaCy is the industry-standard NLP library for production Python applications. It provides tokenization, NER, part-of-speech tagging, dependency parsing, and integration with transformer models — all optimized for speed. The documentation (spacy.io) is among the best in the open-source world. Any team building NLP applications in Python should start here.

17. Hugging Face. (2020-present). Transformers Library Documentation and Model Hub. Hugging Face has become the de facto platform for sharing and deploying pre-trained NLP models. The Transformers library provides a unified API for BERT, GPT, RoBERTa, DistilBERT, and hundreds of other models. The Model Hub hosts thousands of pre-trained and fine-tuned models for specific tasks and domains. For business teams adopting transformer-based NLP, Hugging Face is the essential resource. Available at: https://huggingface.co/

Ethics and Bias in NLP

18. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" Proceedings of FAccT 2021, 610-623. A widely cited paper examining the environmental costs, data biases, and societal risks of large language models. Bender and colleagues argue that the rush to build ever-larger models obscures serious concerns about training data representativeness, environmental impact, and the tendency of LLMs to amplify existing biases. Essential reading for any business leader deploying NLP at scale — connects to the Responsible Innovation theme of this textbook.

19. Bolukbasi, T., Chang, K., Zou, J., Saligrama, V., & Kalai, A. (2016). "Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings." Advances in Neural Information Processing Systems, 29. A foundational paper demonstrating that word embeddings trained on web text learn and amplify societal biases — associating "he" with "computer programmer" and "she" with "homemaker." The paper proposes debiasing techniques, but the broader lesson is that NLP models are not neutral: they reflect the biases in their training data. Any organization deploying NLP for decisions about people (hiring, performance evaluation, credit) must address embedding bias proactively.

Industry Applications and Case Studies

20. Cambria, E., Poria, S., Gelbukh, A., & Thelwall, M. (2017). "Sentiment Analysis Is a Big Suitcase." IEEE Intelligent Systems, 32(6), 74-80. A concise, practitioner-oriented overview of the state of sentiment analysis, its applications, and its limitations. Cambria and colleagues argue that sentiment analysis is an umbrella term covering many different tasks (polarity detection, emotion recognition, aspect extraction, sarcasm detection) and that treating them as a single problem leads to poor system design. A useful framework for business leaders scoping sentiment analysis projects.

21. Hirschberg, J., & Manning, C. D. (2015). "Advances in Natural Language Processing." Science, 349(6245), 261-266. A high-level survey of NLP progress published in Science, written for a broad scientific audience. Hirschberg and Manning trace the field's evolution from rule-based to statistical to neural methods, highlighting commercial applications in machine translation, information extraction, and question answering. Accessible and concise — a good starting point for executives who want the big picture before diving deeper.

22. Eisenstein, J. (2019). Introduction to Natural Language Processing. MIT Press. A modern NLP textbook that balances theory and practice. Eisenstein covers classical NLP methods and modern neural approaches with clear explanations and mathematical rigor. More technical than this chapter but less dense than Jurafsky and Martin. Recommended for MBA students who want to go deeper on the algorithms behind the techniques described in Chapter 14.

Additional Resources

23. Explosion AI. (2020-present). "Advanced NLP with spaCy" (free online course). Available at: https://course.spacy.io/ A free, hands-on course that teaches NLP concepts through building pipelines with spaCy. Covers tokenization, NER, text classification, and custom model training. Excellent for MBA students who want practical coding experience with production-grade NLP tools.

24. Stanford CS224N: Natural Language Processing with Deep Learning (course materials available online). Chris Manning's Stanford course on NLP with deep learning. Lecture videos, slides, and assignments are freely available. The course covers word embeddings, transformers, pre-training, and applications in depth. More advanced than this chapter but presented with exceptional clarity. Recommended for students who want to develop technical NLP skills beyond the MBA level.

25. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., et al. (2019). "Text Classification Algorithms: A Survey." Information, 10(4), 150. A comprehensive survey of text classification methods, from bag of words to deep learning. The paper compares algorithms across datasets and provides practical guidance on model selection. Useful for business teams evaluating which classification approach fits their specific requirements — directly relevant to the NLP decision framework presented in this chapter.

All URLs verified as of March 2026. For the most current resources on NLP libraries and pre-trained models, check the documentation for spaCy (spacy.io), Hugging Face (huggingface.co), and scikit-learn (scikit-learn.org). The NLP field evolves rapidly — prioritize resources published or updated within the past two years.