Chapter 14 Exercises: NLP for Business

DataField.Dev

Chapter 14 Exercises: NLP for Business

Section A: Recall and Comprehension

Exercise 14.1 Define the following NLP terms in your own words, using no more than two sentences each: (a) tokenization, (b) stopwords, (c) lemmatization, (d) TF-IDF, (e) word embedding.

Exercise 14.2 Explain the difference between a bag-of-words representation and a TF-IDF representation. Why does TF-IDF generally produce better results for text classification tasks?

Exercise 14.3 Describe the Word2Vec intuition in non-technical language suitable for a business executive. Why does the "king - man + woman = queen" example demonstrate that embeddings capture meaningful relationships?

Exercise 14.4 Compare and contrast the three approaches to sentiment analysis discussed in the chapter: lexicon-based, machine learning-based, and transformer-based. For each, state one advantage and one limitation.

Exercise 14.5 What is aspect-based sentiment analysis, and why is it more actionable for business than document-level sentiment analysis? Use the example of the review "The sizing runs small but the quality is amazing" to illustrate your answer.

Exercise 14.6 Explain the difference between topic modeling and text classification. Which is supervised and which is unsupervised? When would you use each?

Exercise 14.7 What is the attention mechanism, and why was its introduction in the transformer architecture considered revolutionary for NLP? Explain using the "it" pronoun resolution example from the chapter.

Exercise 14.8 Describe how transfer learning changed the economics of NLP projects. Specifically, how did BERT reduce the amount of labeled data required for high-accuracy text classification?

Section B: Application

Exercise 14.9: Preprocessing Pipeline Design You are building an NLP system to analyze social media posts about your company's products. The posts include hashtags (#AthenaFashion), mentions (@AthenaRetail), emojis, URLs, abbreviations (OMG, tbh, smh), intentional misspellings (soooo good, amazingggg), and mixed-case text.

(a) Design a preprocessing pipeline for this data. List each step in order and explain why it is necessary.
(b) Identify two preprocessing decisions that require domain judgment (not just default settings). Explain the tradeoff involved in each.
(c) Should you remove emojis or treat them as sentiment signals? Argue both sides and state your recommendation.

Exercise 14.10: TF-IDF Feature Engineering Consider the following five customer reviews for a software product:

"The software crashes constantly. Tech support is useless."
"Easy to install and the interface is intuitive. Love it."
"Good features but the software crashes during large exports."
"Tech support resolved my issue quickly. Great experience."
"The interface is confusing and the learning curve is steep."

(a) Construct the bag-of-words document-term matrix for these reviews (after lowercasing and removing common stopwords).
(b) Without computing exact TF-IDF values, identify which terms would have the highest IDF scores. Explain your reasoning.
(c) Which bigrams (two-word phrases) would be most useful for distinguishing between positive and negative reviews? List at least three.

Exercise 14.11: Sentiment Classification for Your Industry Choose an industry you are familiar with (healthcare, finance, hospitality, technology, education, etc.).

(a) Identify three text data sources in that industry that would benefit from automated sentiment analysis.
(b) For each source, describe at least two domain-specific challenges that a general-purpose sentiment model would struggle with (e.g., domain jargon, sarcasm patterns, regulatory language).
(c) For each source, recommend an approach (lexicon-based, ML-based, or transformer-based) and justify your choice based on accuracy requirements, data availability, and cost constraints.

Exercise 14.12: NER for Competitive Intelligence Athena's competitive intelligence team wants to monitor news articles about competitors. They have a feed of 500 articles per day from industry publications.

(a) What entity types should the NER system extract? List at least five, with examples from a hypothetical article.
(b) How would you use the extracted entities to build a competitor tracking dashboard? Describe three specific visualizations or metrics.
(c) What are the limitations of NER-based competitive intelligence? Identify two types of competitive signals that NER would miss.

Exercise 14.13: Topic Modeling for Product Feedback You have 100,000 customer reviews for a consumer electronics company. You run LDA with different numbers of topics (k = 3, 5, 8, 12, 20).

(a) What criteria would you use to select the optimal number of topics? Describe at least two quantitative methods and one qualitative method.
(b) With k = 5, you discover the following topics (top words listed):
Topic 1: battery, charge, life, hours, dies
Topic 2: screen, display, bright, resolution, crack
Topic 3: delivery, shipping, box, arrived, damaged
Topic 4: price, money, worth, expensive, cheap
Topic 5: customer, service, support, call, wait

Assign a human-readable label to each topic. For each, describe one specific business action the product team could take based on negative sentiment within that topic. - (c) You notice that Topic 3 (shipping/delivery) accounts for 28 percent of all reviews. Does this mean shipping is the company's biggest problem? Why or why not?

Exercise 14.14: Text Classification System Design You are asked to design a text classification system for an insurance company that processes 2,000 claims per day via email. Claims must be routed to one of five departments: auto, property, health, life, and general inquiry.

(a) Describe the complete pipeline from raw email to routed claim. Include preprocessing, feature extraction, classification, and human-in-the-loop steps.
(b) How much labeled training data would you need? How would you obtain it?
(c) What metrics would you use to evaluate the system? Why is accuracy alone insufficient — what other metrics matter and why?
(d) How would you handle the case where the model's confidence is low (e.g., below 70 percent)? Design a fallback mechanism.

Exercise 14.15: ReviewAnalyzer Extension (Python) Using the ReviewAnalyzer class from the chapter as a starting point, extend it to include the following capability:

(a) A find_similar_reviews method that takes a single review text as input and returns the top 5 most similar reviews from the corpus, using TF-IDF cosine similarity.
(b) Test your method with the query "The jacket quality is great but shipping was slow" and verify that the returned reviews are semantically relevant.
(c) Discuss: How would this similarity search feature be useful for Athena's customer support team?

from sklearn.metrics.pairwise import cosine_similarity

# Hint: After fitting the TF-IDF vectorizer, you can transform
# both the query and the corpus, then compute cosine similarity
# between the query vector and all corpus vectors.

def find_similar_reviews(self, query_text, corpus_df, top_n=5):
    """
    Find the most similar reviews to a query text.

    Parameters:
        query_text: str - the review to find matches for
        corpus_df: DataFrame with 'text' column
        top_n: int - number of similar reviews to return

    Returns:
        DataFrame with the top_n most similar reviews and their
        similarity scores
    """
    # Your implementation here
    pass

Section C: Analysis and Evaluation

Exercise 14.16: The Sarcasm Problem The chapter identifies sarcasm as one of the hardest challenges in NLP. Consider the following sarcastic reviews:

"Oh wonderful, another app update that breaks everything."
"Sure, I just love paying premium prices for economy quality."
"Five stars for making me wait 45 minutes on hold. Truly world-class service."

(a) Why do traditional NLP approaches (lexicon-based and simple ML-based) fail on these examples? Be specific about which features mislead the model.
(b) What contextual clues do humans use to detect sarcasm in these reviews? Can any of these clues be captured computationally?
(c) A colleague proposes a "sarcasm detector" as a preprocessing step: flag likely sarcastic reviews, invert their sentiment, then proceed. Evaluate this proposal. What are its strengths and weaknesses?

Exercise 14.17: Build vs. Buy for NLP Athena is deciding between three approaches for its customer review analysis system:

Option A: Build in-house. Use scikit-learn and open-source NLP libraries. Estimated 4 months of development by a team of 2 data scientists.
Option B: Cloud NLP API. Use a managed service (e.g., AWS Comprehend, Google Cloud NLP). Pay-per-call pricing at approximately $0.001 per review.
Option C: Specialized vendor. Purchase a SaaS platform designed for customer feedback analysis. $120,000/year license plus integration costs.

For each option, analyze: - (a) Total cost over 2 years (include personnel, infrastructure, and licensing) - (b) Time to deployment - (c) Customizability for Athena's specific needs - (d) Ongoing maintenance requirements - (e) Data privacy implications

Based on your analysis, recommend an approach for Athena. Justify your recommendation.

Exercise 14.18: Ethical Considerations in NLP A retail company deploys sentiment analysis on employee performance reviews written by managers. The system flags employees who receive predominantly negative language in their reviews for "performance improvement plans."

(a) Identify at least three potential harms or biases in this application.
(b) Research has shown that NLP models can exhibit racial and gender bias (e.g., associating certain names or pronouns with negative sentiment). How could this manifest in the employee review scenario?
(c) What safeguards would you recommend before deploying NLP on employee performance data? List at least four specific measures.
(d) Should this application be deployed at all? Argue both sides and state your position.

Exercise 14.19: NLP ROI Calculation Athena's CFO asks Ravi to justify the NLP investment. Using data from the chapter, calculate the ROI:

Costs: - Development: 2 data scientists x 4 months x $12,000/month fully loaded = ? - Infrastructure: GPU computing at $2,500/month ongoing - Maintenance: 0.5 FTE data scientist at $10,000/month ongoing

Benefits (from the chapter): - Defect detection 3 weeks faster: estimated $2.1M in avoided returns (one-time, but expect recurring) - Support ticket routing: 31 hours/week saved x $35/hour agent cost - Product team speed: estimated 2 months faster to market on eco-friendly line

(a) Calculate first-year and second-year ROI.
(b) Which benefits are easiest to quantify? Which are hardest? Why?
(c) The CFO is skeptical of the $2.1M defect avoidance figure because it relies on a counterfactual ("what would have happened without the system"). How would you defend this estimate?

Exercise 14.20: Multi-Model NLP Architecture A large e-commerce company receives customer feedback through five channels: product reviews, support emails, social media, survey responses, and chatbot transcripts. Each channel has different characteristics:

Channel	Volume/day	Avg. length	Language style	Labeled data
Reviews	5,000	50-200 words	Informal, varied	10,000 labeled
Support emails	3,000	100-500 words	Semi-formal	5,000 labeled
Social media	15,000	10-50 words	Very informal, slang	2,000 labeled
Surveys	500	20-100 words	Formal, prompted	3,000 labeled
Chatbot	8,000	5-20 words	Very short, fragmented	1,000 labeled

(a) Should the company train one unified NLP model or separate models for each channel? Argue the pros and cons of each approach.
(b) Which channel would benefit most from a transformer-based approach, and why?
(c) Design a unified insight dashboard that aggregates sentiment, topics, and trends across all five channels. What normalization challenges arise when comparing sentiment scores across channels with very different language styles?

Section D: Synthesis and Research

Exercise 14.21: NLP and the Customer Journey Map NLP applications to each stage of the customer journey (Awareness, Consideration, Purchase, Post-Purchase, Advocacy). For each stage, identify: - (a) The primary text data sources available - (b) The most valuable NLP technique to apply - (c) A specific business action enabled by the NLP insight - (d) A metric to measure the impact of the NLP application

Present your analysis as a table with five rows (one per stage) and four columns.

Exercise 14.22: The Future of NLP in Business The chapter describes the evolution from bag of words to transformers. Looking forward to Chapter 17 (LLMs): - (a) What NLP tasks described in this chapter will likely be performed by general-purpose LLMs within 2-3 years? Why? - (b) What NLP tasks will likely continue to use specialized, task-specific models? Why? - (c) A CEO tells you: "We don't need any of this NLP pipeline stuff anymore — we'll just use ChatGPT for everything." Write a diplomatic but evidence-based response explaining why this view is incomplete.

Exercises marked with (Python) require coding. All others can be completed with written analysis. Selected answers appear in Appendix B.