Case Study 35-1: Priya Reads 4,200 Support Tickets Before Breakfast
Characters
Priya Sharma — Data analyst at Acme Corp. Responsible for the quarterly operations report and increasingly the person Sandra Chen calls when she needs data to back up a decision.
Sandra Chen — COO of Acme Corp. She has a board meeting in three days and needs to explain why customer satisfaction scores dropped 12 points in Q4.
Marcus Webb — Head of Operations. He has a theory that the drop is all about shipping, but Priya suspects there is more to it.
The Setup
It is 8:47 on a Tuesday morning and Priya's calendar shows a 10 AM meeting with Sandra and Marcus. The subject line reads: "Q4 Customer Satisfaction — Board Prep."
Priya opens the Zendesk export Sandra's assistant sent yesterday. It is a 4,200-row CSV file containing every support ticket filed in Q4. The columns are: ticket_id, created_at, category (auto-assigned by Zendesk, not reliable), status, ticket_text, and star_rating (from the post-resolution survey, filled in by about 40% of customers).
Marcus's theory: shipping complaints are driving the satisfaction drop. His evidence is anecdotal — he heard about it at a warehouse meeting.
Priya does not argue with theories before she has data. She opens her laptop and gets to work.
Step 1: Loading and Inspecting the Data
import pandas as pd
import numpy as np
from pathlib import Path
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from collections import Counter, defaultdict
import matplotlib.pyplot as plt
import re
DATA_PATH = Path("data/acme_q4_tickets.csv")
df = pd.read_csv(DATA_PATH, parse_dates=['created_at'])
print(f"Total tickets: {len(df):,}")
print(f"Date range: {df['created_at'].min().date()} to {df['created_at'].max().date()}")
print(f"\nColumn overview:")
print(df.dtypes)
print(f"\nStar rating coverage: {df['star_rating'].notna().mean():.1%} of tickets")
print(f"\nTop auto-categories from Zendesk:")
print(df['category'].value_counts().head(10))
Output:
Total tickets: 4,200
Date range: 2023-10-01 to 2023-12-31
Column overview:
ticket_id int64
created_at datetime64[ns]
category object
status object
ticket_text object
star_rating float64
Star rating coverage: 38.7% of tickets
Top auto-categories from Zendesk:
Other 1,847
General Inquiry 623
Technical Support 418
Billing 312
Returns 287
...
Priya notes that Zendesk has dumped nearly half the tickets into "Other." The auto-categorization is almost useless. She will need to classify them herself.
Step 2: Sentiment Analysis
def score_sentiment(text: str) -> dict:
"""Return polarity and subjectivity for a text string."""
if not isinstance(text, str) or not text.strip():
return {'polarity': 0.0, 'subjectivity': 0.0}
blob = TextBlob(text)
return {
'polarity': blob.sentiment.polarity,
'subjectivity': blob.sentiment.subjectivity,
}
print("Scoring sentiment across 4,200 tickets...")
sentiment_scores = df['ticket_text'].apply(score_sentiment)
df['polarity'] = sentiment_scores.apply(lambda d: d['polarity'])
df['subjectivity'] = sentiment_scores.apply(lambda d: d['subjectivity'])
# Classify
df['sentiment_label'] = pd.cut(
df['polarity'],
bins=[-1.001, -0.1, 0.1, 1.001],
labels=['NEGATIVE', 'NEUTRAL', 'POSITIVE']
)
print("\nSentiment breakdown:")
for label in ['POSITIVE', 'NEUTRAL', 'NEGATIVE']:
n = (df['sentiment_label'] == label).sum()
pct = n / len(df) * 100
print(f" {label:<10}: {n:>5,} ({pct:.1f}%)")
print(f"\nMean polarity: {df['polarity'].mean():.4f}")
Output:
Scoring sentiment across 4,200 tickets...
Sentiment breakdown:
POSITIVE : 892 (21.2%)
NEUTRAL : 831 (19.8%)
NEGATIVE : 2,477 (59.0%)
Mean polarity: -0.2341
Nearly 60% of tickets are negative. That is not surprising — customers usually only write in when something has gone wrong — but the severity of the average polarity score concerns Priya. A mean of -0.23 is notably worse than the Q3 benchmark of -0.15 that she calculated three months ago.
Step 3: Reclassifying Tickets by Topic
The Zendesk categories are unreliable. Priya builds a more granular keyword-based classifier:
TOPIC_KEYWORDS = {
'shipping_delay': [
'delayed', 'delay', 'late', 'still waiting', 'not arrived',
'not received', 'not delivered', 'transit', 'tracking', 'lost',
'where is my order', 'when will', 'hasn\'t arrived',
],
'shipping_damage': [
'damaged', 'broken', 'crushed', 'torn', 'wet', 'smashed',
'arrived broken', 'box was damaged', 'not intact', 'open box',
],
'wrong_item': [
'wrong item', 'wrong product', 'incorrect item', 'not what i ordered',
'received wrong', 'sent wrong', 'different color', 'wrong size',
],
'billing': [
'charged twice', 'double charged', 'overcharged', 'refund',
'billing error', 'incorrect charge', 'price discrepancy',
'credit card', 'invoice', 'coupon not applied',
],
'product_quality': [
'defective', 'faulty', 'not working', 'stopped working',
'poor quality', 'cheap', 'broke', 'malfunction', 'doesn\'t work',
'not as described', 'misleading',
],
'returns_exchanges': [
'return', 'exchange', 'send back', 'return label',
'rma', 'replacement', 'swap', 'refund policy',
],
'account_tech': [
'login', 'password', 'account', 'can\'t sign in', 'locked out',
'app not working', 'website down', 'error message', 'page not loading',
],
'positive_feedback': [
'great service', 'excellent', 'love it', 'thank you',
'happy with', 'very pleased', 'well done', 'exceeded expectations',
],
}
def classify_ticket(text: str) -> str:
"""Classify a support ticket using keyword matching with scoring."""
if not isinstance(text, str):
return 'unclassified'
text_lower = text.lower()
scores = defaultdict(int)
for topic, keywords in TOPIC_KEYWORDS.items():
for kw in keywords:
if kw in text_lower:
# Multi-word keywords get higher weight
word_count = len(kw.split())
scores[topic] += word_count
if not scores:
return 'other'
return max(scores, key=scores.get)
df['topic'] = df['ticket_text'].apply(classify_ticket)
print("Topic distribution after reclassification:")
topic_counts = df['topic'].value_counts()
for topic, count in topic_counts.items():
pct = count / len(df) * 100
print(f" {topic:<25}: {count:>5,} ({pct:.1f}%)")
Output:
Topic distribution after reclassification:
shipping_delay : 1,283 (30.5%)
other : 612 (14.6%)
billing : 487 (11.6%)
product_quality : 412 (9.8%)
returns_exchanges : 387 (9.2%)
shipping_damage : 342 (8.1%)
wrong_item : 298 (7.1%)
positive_feedback : 241 (5.7%)
account_tech : 138 (3.3%)
Marcus was partly right. Shipping delays represent the single largest category at 30.5%. But billing issues and product quality together account for another 21%, which changes the story considerably.
Step 4: Sentiment by Topic
topic_sentiment = df.groupby('topic').agg(
ticket_count=('polarity', 'count'),
avg_polarity=('polarity', 'mean'),
median_polarity=('polarity', 'median'),
pct_negative=('sentiment_label', lambda x: (x == 'NEGATIVE').mean() * 100),
).sort_values('avg_polarity')
print("\nSentiment by Topic (most negative first):")
print(f"{'Topic':<25} {'Count':>7} {'Avg Pol':>9} {'Med Pol':>9} {'% Neg':>8}")
print("-" * 65)
for topic, row in topic_sentiment.iterrows():
print(
f" {topic:<23} {int(row['ticket_count']):>7,} "
f"{row['avg_polarity']:>9.4f} "
f"{row['median_polarity']:>9.4f} "
f"{row['pct_negative']:>7.1f}%"
)
Output:
Sentiment by Topic (most negative first):
Topic Count Avg Pol Med Pol % Neg
-----------------------------------------------------------------
shipping_damage 342 -0.4821 -0.5100 84.2%
billing 487 -0.3987 -0.4250 78.3%
wrong_item 298 -0.3744 -0.3900 75.2%
shipping_delay 1,283 -0.3102 -0.3300 71.8%
product_quality 412 -0.2876 -0.3100 68.9%
returns_exchanges 387 -0.2341 -0.2500 61.0%
account_tech 138 -0.1892 -0.2100 54.3%
other 612 -0.0823 -0.0900 38.1%
positive_feedback 241 0.6234 0.6500 3.7%
This is the first real surprise of the morning. Shipping delay is the largest category by volume, but shipping damage is significantly angrier. The average polarity for shipping damage tickets is -0.48, meaning customers who receive damaged packages write much more emotionally charged messages.
Billing errors are nearly as negative. Priya makes a note: this is not just a logistics problem.
Step 5: The Day-of-Week Discovery
Marcus mentioned in a previous meeting that Mondays always felt busier at the support desk. Priya decides to check:
df['day_of_week'] = df['created_at'].dt.day_name()
df['week_number'] = df['created_at'].dt.isocalendar().week
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
'Saturday', 'Sunday']
# Total ticket volume by day
daily_volumes = df.groupby('day_of_week')['ticket_id'].count()
daily_volumes = daily_volumes.reindex(day_order).fillna(0)
print("Ticket volume by day of week:")
for day, count in daily_volumes.items():
bar = '█' * int(count / 20)
print(f" {day:<12}: {int(count):>4} {bar}")
# Now break down by topic and day
shipping_by_day = (
df[df['topic'] == 'shipping_delay']
.groupby('day_of_week')['ticket_id']
.count()
.reindex(day_order)
.fillna(0)
)
print("\nShipping delay tickets by day:")
total_shipping = shipping_by_day.sum()
for day, count in shipping_by_day.items():
pct = count / total_shipping * 100
bar = '█' * int(count / 8)
print(f" {day:<12}: {int(count):>4} ({pct:.1f}%) {bar}")
Output:
Ticket volume by day of week:
Monday : 847 ████████████████████████████████████████
Tuesday : 631 ███████████████████████████████
Wednesday : 589 █████████████████████████████
Thursday : 612 ██████████████████████████████
Friday : 558 ███████████████████████████
Saturday : 489 ████████████████████████
Sunday : 474 ███████████████████████
Shipping delay tickets by day:
Monday : 441 (34.4%) █████████████████████████████████████████████████████
Tuesday : 183 (14.3%) ██████████████████████
Wednesday : 161 (12.5%) ████████████████████
Thursday : 174 (13.6%) █████████████████████
Friday : 152 (11.9%) ███████████████████
Saturday : 98 ( 7.6%) ████████████
Sunday : 74 ( 5.8%) █████████
There it is. Monday accounts for 34.4% of all shipping delay tickets — nearly three times a typical weekday. Priya pauses. She thinks about the pattern: customers shop over the weekend, carriers don't process pickups on Sundays, and customers who do not see movement in their tracking by Monday morning write in.
She checks whether this pattern holds across all ticket types:
# What percentage of each topic's tickets land on Monday?
monday_pct = (
df[df['day_of_week'] == 'Monday']
.groupby('topic')['ticket_id']
.count()
.div(df.groupby('topic')['ticket_id'].count())
.mul(100)
.sort_values(ascending=False)
)
print("\nMonday concentration by topic (% of each topic's tickets):")
for topic, pct in monday_pct.items():
bar = '█' * int(pct / 2)
print(f" {topic:<25}: {pct:.1f}% {bar}")
Output:
Monday concentration by topic:
shipping_delay : 34.4% █████████████████
wrong_item : 28.1% ██████████████
positive_feedback : 24.9% ████████████
billing : 21.3% ██████████
shipping_damage : 19.6% █████████
product_quality : 18.4% █████████
returns_exchanges : 17.8% █████████
account_tech : 15.2% ███████
other : 14.1% ███████
The shipping delay and wrong item categories are disproportionately Monday problems. Other categories are distributed more evenly. This points to a specific operational issue: weekend order processing and carrier handoffs, not a general service quality decline.
Step 6: Extracting the Most Negative Language
Priya wants specific quotes for Sandra's presentation. She extracts the language from the most negative shipping damage tickets:
most_negative_shipping = (
df[df['topic'] == 'shipping_damage']
.nsmallest(10, 'polarity')[['ticket_id', 'polarity', 'ticket_text']]
)
print("Top 5 most negative shipping damage tickets:")
for _, row in most_negative_shipping.head(5).iterrows():
print(f"\n ID: {row['ticket_id']} | Polarity: {row['polarity']:.4f}")
# Truncate for display
print(f" Text: {row['ticket_text'][:200]}...")
She also extracts keyword phrases from the shipping damage category:
damage_texts = df[df['topic'] == 'shipping_damage']['ticket_text'].tolist()
damage_texts_clean = [str(t) for t in damage_texts if isinstance(t, str)]
vec = TfidfVectorizer(
stop_words='english',
ngram_range=(2, 3), # Two and three word phrases
max_features=100,
min_df=3,
)
vec.fit_transform(damage_texts_clean)
feature_names = vec.get_feature_names_out()
# Simple frequency for phrases
phrase_counter = Counter()
for text in damage_texts_clean:
text_lower = text.lower()
for phrase in feature_names:
if phrase in text_lower:
phrase_counter[phrase] += 1
print("\nTop phrases in shipping damage tickets:")
for phrase, count in phrase_counter.most_common(15):
print(f" '{phrase}': {count}")
Output:
Top phrases in shipping damage tickets:
'package damaged': 187
'box crushed': 143
'arrived damaged': 138
'packaging terrible': 97
'not properly packaged': 84
'damaged on arrival': 76
'third party carrier': 71
'carrier damaged': 67
'never again': 58
'unacceptable condition': 52
'shrink wrap torn': 47
'clearly dropped': 41
'warehouse damage': 38
'not insured': 34
'filed claim': 31
The phrase "third party carrier" appears in 71 of the 342 damage tickets — 20.8%. This is a specific clue: customers are distinguishing between Acme's own delivery operations and third-party carriers. This is actionable for Marcus's team.
Step 7: The Summary for Sandra
By 9:45, Priya has assembled the key findings:
print("\n" + "=" * 60)
print("ACME CORP Q4 SUPPORT TICKET ANALYSIS")
print("Prepared for Sandra Chen | Board Prep Meeting")
print("=" * 60)
print(f"""
OVERVIEW
Total Q4 tickets: {len(df):,}
Overall sentiment: {df['polarity'].mean():.4f} (Q3 baseline: -0.1500)
Negative sentiment share: {(df['sentiment_label'] == 'NEGATIVE').mean():.1%}
TOP ISSUES BY VOLUME
1. Shipping delays {(df['topic'] == 'shipping_delay').sum():,} tickets (30.5%)
2. Billing issues {(df['topic'] == 'billing').sum():,} tickets (11.6%)
3. Product quality {(df['topic'] == 'product_quality').sum():,} tickets (9.8%)
MOST NEGATIVE CATEGORY
Shipping damage: -0.48 avg polarity, 84.2% negative
KEY PHRASE: 'third party carrier' in 20.8% of damage tickets
MONDAY SPIKE
34.4% of all shipping delay tickets arrive on Monday
(vs. 14.3% Tuesday through 8% Sunday/Saturday)
Pattern: weekend orders + no Sunday carrier processing
RECOMMENDATION
1. Immediate: Add proactive Monday delay notification emails
to all orders placed Thursday–Sunday
2. Short-term: Audit third-party carrier damage rates vs.
Acme direct delivery damage rates
3. Medium-term: Investigate Q4 billing error root causes
(487 tickets, 78% negative — this is not all shipping)
""")
The Meeting
At 10 AM, Priya walks Sandra and Marcus through the analysis on her laptop. Marcus initially pushes back on the billing numbers — "those are probably just coupon complaints" — but Priya shows him the specific phrases: "charged twice," "billing error," "incorrect charge." These are not coupon questions. These are errors in Acme's billing system.
Sandra asks for the Monday shipping analysis to be included in the board deck. She also asks Marcus to pull the Q4 carrier-by-carrier damage claim reports by end of week. The "third party carrier" cluster in the NLP output gives them a specific hypothesis to test against logistics data.
The meeting runs 22 minutes instead of the scheduled 60. Priya built the entire analysis between 8:47 and 9:45 AM.
Key Technical Decisions
Why rule-based classification instead of ML?
With 4,200 tickets, a trained ML classifier would likely outperform keyword rules. But Priya chose keywords for three reasons: 1. No labeled training data existed 2. Business stakeholders can read and audit keyword rules 3. The categories were well-defined, and domain-specific multi-word phrases ("charged twice," "still waiting") are highly reliable signals
For a production system processing ongoing ticket volume, she would label 500-800 tickets and train a Naive Bayes or logistic regression classifier (as shown in the chapter).
Why TextBlob instead of a more sophisticated model?
Aggregate sentiment patterns — "shipping damage has the most negative language" — are robust to individual misclassifications. Priya is not making decisions based on any single ticket's polarity score. She is looking for clusters and trends, where TextBlob's 70-80% accuracy is more than sufficient.
Validation against star ratings
# Check sentiment accuracy where star ratings exist
rated = df[df['star_rating'].notna()].copy()
rated['rating_sentiment'] = pd.cut(
rated['star_rating'],
bins=[0, 2.5, 3.5, 5.1],
labels=['NEGATIVE', 'NEUTRAL', 'POSITIVE']
)
from sklearn.metrics import accuracy_score, classification_report
accuracy = accuracy_score(
rated['rating_sentiment'],
rated['sentiment_label']
)
print(f"\nSentiment accuracy vs. star ratings: {accuracy:.1%}")
print(classification_report(rated['rating_sentiment'], rated['sentiment_label']))
Output shows 76.3% accuracy against star ratings — which is about what Priya expected, and sufficient for the aggregate analysis she is performing.
What Priya Would Do Differently Next Quarter
- Set up this analysis to run automatically at the end of each month
- Add spaCy NER to extract specific product SKUs and carrier names from ticket text
- Build a simple dashboard that shows the Monday spike in real time
- Label 500 tickets to train a proper ML classifier for more nuanced categorization
The code from this case study is in sentiment_analyzer.py. All of the techniques used here — preprocessing, TextBlob scoring, keyword classification, frequency analysis — are covered in detail in the main chapter.