Case Study 35-1: Priya Reads 4,200 Support Tickets Before Breakfast

Characters

Priya Sharma — Data analyst at Acme Corp. Responsible for the quarterly operations report and increasingly the person Sandra Chen calls when she needs data to back up a decision.

Sandra Chen — COO of Acme Corp. She has a board meeting in three days and needs to explain why customer satisfaction scores dropped 12 points in Q4.

Marcus Webb — Head of Operations. He has a theory that the drop is all about shipping, but Priya suspects there is more to it.


The Setup

It is 8:47 on a Tuesday morning and Priya's calendar shows a 10 AM meeting with Sandra and Marcus. The subject line reads: "Q4 Customer Satisfaction — Board Prep."

Priya opens the Zendesk export Sandra's assistant sent yesterday. It is a 4,200-row CSV file containing every support ticket filed in Q4. The columns are: ticket_id, created_at, category (auto-assigned by Zendesk, not reliable), status, ticket_text, and star_rating (from the post-resolution survey, filled in by about 40% of customers).

Marcus's theory: shipping complaints are driving the satisfaction drop. His evidence is anecdotal — he heard about it at a warehouse meeting.

Priya does not argue with theories before she has data. She opens her laptop and gets to work.


Step 1: Loading and Inspecting the Data

import pandas as pd
import numpy as np
from pathlib import Path
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from collections import Counter, defaultdict
import matplotlib.pyplot as plt
import re

DATA_PATH = Path("data/acme_q4_tickets.csv")

df = pd.read_csv(DATA_PATH, parse_dates=['created_at'])
print(f"Total tickets: {len(df):,}")
print(f"Date range: {df['created_at'].min().date()} to {df['created_at'].max().date()}")
print(f"\nColumn overview:")
print(df.dtypes)
print(f"\nStar rating coverage: {df['star_rating'].notna().mean():.1%} of tickets")
print(f"\nTop auto-categories from Zendesk:")
print(df['category'].value_counts().head(10))

Output:

Total tickets: 4,200
Date range: 2023-10-01 to 2023-12-31

Column overview:
ticket_id       int64
created_at      datetime64[ns]
category        object
status          object
ticket_text     object
star_rating     float64

Star rating coverage: 38.7% of tickets

Top auto-categories from Zendesk:
Other                 1,847
General Inquiry         623
Technical Support       418
Billing                 312
Returns                 287
...

Priya notes that Zendesk has dumped nearly half the tickets into "Other." The auto-categorization is almost useless. She will need to classify them herself.


Step 2: Sentiment Analysis

def score_sentiment(text: str) -> dict:
    """Return polarity and subjectivity for a text string."""
    if not isinstance(text, str) or not text.strip():
        return {'polarity': 0.0, 'subjectivity': 0.0}
    blob = TextBlob(text)
    return {
        'polarity': blob.sentiment.polarity,
        'subjectivity': blob.sentiment.subjectivity,
    }

print("Scoring sentiment across 4,200 tickets...")
sentiment_scores = df['ticket_text'].apply(score_sentiment)
df['polarity'] = sentiment_scores.apply(lambda d: d['polarity'])
df['subjectivity'] = sentiment_scores.apply(lambda d: d['subjectivity'])

# Classify
df['sentiment_label'] = pd.cut(
    df['polarity'],
    bins=[-1.001, -0.1, 0.1, 1.001],
    labels=['NEGATIVE', 'NEUTRAL', 'POSITIVE']
)

print("\nSentiment breakdown:")
for label in ['POSITIVE', 'NEUTRAL', 'NEGATIVE']:
    n = (df['sentiment_label'] == label).sum()
    pct = n / len(df) * 100
    print(f"  {label:<10}: {n:>5,}  ({pct:.1f}%)")

print(f"\nMean polarity: {df['polarity'].mean():.4f}")

Output:

Scoring sentiment across 4,200 tickets...

Sentiment breakdown:
  POSITIVE  :   892  (21.2%)
  NEUTRAL   :   831  (19.8%)
  NEGATIVE  : 2,477  (59.0%)

Mean polarity: -0.2341

Nearly 60% of tickets are negative. That is not surprising — customers usually only write in when something has gone wrong — but the severity of the average polarity score concerns Priya. A mean of -0.23 is notably worse than the Q3 benchmark of -0.15 that she calculated three months ago.


Step 3: Reclassifying Tickets by Topic

The Zendesk categories are unreliable. Priya builds a more granular keyword-based classifier:

TOPIC_KEYWORDS = {
    'shipping_delay': [
        'delayed', 'delay', 'late', 'still waiting', 'not arrived',
        'not received', 'not delivered', 'transit', 'tracking', 'lost',
        'where is my order', 'when will', 'hasn\'t arrived',
    ],
    'shipping_damage': [
        'damaged', 'broken', 'crushed', 'torn', 'wet', 'smashed',
        'arrived broken', 'box was damaged', 'not intact', 'open box',
    ],
    'wrong_item': [
        'wrong item', 'wrong product', 'incorrect item', 'not what i ordered',
        'received wrong', 'sent wrong', 'different color', 'wrong size',
    ],
    'billing': [
        'charged twice', 'double charged', 'overcharged', 'refund',
        'billing error', 'incorrect charge', 'price discrepancy',
        'credit card', 'invoice', 'coupon not applied',
    ],
    'product_quality': [
        'defective', 'faulty', 'not working', 'stopped working',
        'poor quality', 'cheap', 'broke', 'malfunction', 'doesn\'t work',
        'not as described', 'misleading',
    ],
    'returns_exchanges': [
        'return', 'exchange', 'send back', 'return label',
        'rma', 'replacement', 'swap', 'refund policy',
    ],
    'account_tech': [
        'login', 'password', 'account', 'can\'t sign in', 'locked out',
        'app not working', 'website down', 'error message', 'page not loading',
    ],
    'positive_feedback': [
        'great service', 'excellent', 'love it', 'thank you',
        'happy with', 'very pleased', 'well done', 'exceeded expectations',
    ],
}


def classify_ticket(text: str) -> str:
    """Classify a support ticket using keyword matching with scoring."""
    if not isinstance(text, str):
        return 'unclassified'

    text_lower = text.lower()
    scores = defaultdict(int)

    for topic, keywords in TOPIC_KEYWORDS.items():
        for kw in keywords:
            if kw in text_lower:
                # Multi-word keywords get higher weight
                word_count = len(kw.split())
                scores[topic] += word_count

    if not scores:
        return 'other'

    return max(scores, key=scores.get)


df['topic'] = df['ticket_text'].apply(classify_ticket)

print("Topic distribution after reclassification:")
topic_counts = df['topic'].value_counts()
for topic, count in topic_counts.items():
    pct = count / len(df) * 100
    print(f"  {topic:<25}: {count:>5,}  ({pct:.1f}%)")

Output:

Topic distribution after reclassification:
  shipping_delay           : 1,283  (30.5%)
  other                    :   612  (14.6%)
  billing                  :   487  (11.6%)
  product_quality          :   412  (9.8%)
  returns_exchanges        :   387  (9.2%)
  shipping_damage          :   342  (8.1%)
  wrong_item               :   298  (7.1%)
  positive_feedback        :   241  (5.7%)
  account_tech             :   138  (3.3%)

Marcus was partly right. Shipping delays represent the single largest category at 30.5%. But billing issues and product quality together account for another 21%, which changes the story considerably.


Step 4: Sentiment by Topic

topic_sentiment = df.groupby('topic').agg(
    ticket_count=('polarity', 'count'),
    avg_polarity=('polarity', 'mean'),
    median_polarity=('polarity', 'median'),
    pct_negative=('sentiment_label', lambda x: (x == 'NEGATIVE').mean() * 100),
).sort_values('avg_polarity')

print("\nSentiment by Topic (most negative first):")
print(f"{'Topic':<25} {'Count':>7} {'Avg Pol':>9} {'Med Pol':>9} {'% Neg':>8}")
print("-" * 65)
for topic, row in topic_sentiment.iterrows():
    print(
        f"  {topic:<23} {int(row['ticket_count']):>7,} "
        f"{row['avg_polarity']:>9.4f} "
        f"{row['median_polarity']:>9.4f} "
        f"{row['pct_negative']:>7.1f}%"
    )

Output:

Sentiment by Topic (most negative first):
Topic                     Count   Avg Pol   Med Pol   % Neg
-----------------------------------------------------------------
  shipping_damage          342   -0.4821   -0.5100    84.2%
  billing                  487   -0.3987   -0.4250    78.3%
  wrong_item               298   -0.3744   -0.3900    75.2%
  shipping_delay         1,283   -0.3102   -0.3300    71.8%
  product_quality          412   -0.2876   -0.3100    68.9%
  returns_exchanges        387   -0.2341   -0.2500    61.0%
  account_tech             138   -0.1892   -0.2100    54.3%
  other                    612   -0.0823   -0.0900    38.1%
  positive_feedback        241    0.6234    0.6500     3.7%

This is the first real surprise of the morning. Shipping delay is the largest category by volume, but shipping damage is significantly angrier. The average polarity for shipping damage tickets is -0.48, meaning customers who receive damaged packages write much more emotionally charged messages.

Billing errors are nearly as negative. Priya makes a note: this is not just a logistics problem.


Step 5: The Day-of-Week Discovery

Marcus mentioned in a previous meeting that Mondays always felt busier at the support desk. Priya decides to check:

df['day_of_week'] = df['created_at'].dt.day_name()
df['week_number'] = df['created_at'].dt.isocalendar().week

day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
             'Saturday', 'Sunday']

# Total ticket volume by day
daily_volumes = df.groupby('day_of_week')['ticket_id'].count()
daily_volumes = daily_volumes.reindex(day_order).fillna(0)

print("Ticket volume by day of week:")
for day, count in daily_volumes.items():
    bar = '█' * int(count / 20)
    print(f"  {day:<12}: {int(count):>4}  {bar}")

# Now break down by topic and day
shipping_by_day = (
    df[df['topic'] == 'shipping_delay']
    .groupby('day_of_week')['ticket_id']
    .count()
    .reindex(day_order)
    .fillna(0)
)

print("\nShipping delay tickets by day:")
total_shipping = shipping_by_day.sum()
for day, count in shipping_by_day.items():
    pct = count / total_shipping * 100
    bar = '█' * int(count / 8)
    print(f"  {day:<12}: {int(count):>4}  ({pct:.1f}%)  {bar}")

Output:

Ticket volume by day of week:
  Monday      :  847  ████████████████████████████████████████
  Tuesday     :  631  ███████████████████████████████
  Wednesday   :  589  █████████████████████████████
  Thursday    :  612  ██████████████████████████████
  Friday      :  558  ███████████████████████████
  Saturday    :  489  ████████████████████████
  Sunday      :  474  ███████████████████████

Shipping delay tickets by day:
  Monday      :  441  (34.4%)  █████████████████████████████████████████████████████
  Tuesday     :  183  (14.3%)  ██████████████████████
  Wednesday   :  161  (12.5%)  ████████████████████
  Thursday    :  174  (13.6%)  █████████████████████
  Friday      :  152  (11.9%)  ███████████████████
  Saturday    :  98   ( 7.6%)  ████████████
  Sunday      :  74   ( 5.8%)  █████████

There it is. Monday accounts for 34.4% of all shipping delay tickets — nearly three times a typical weekday. Priya pauses. She thinks about the pattern: customers shop over the weekend, carriers don't process pickups on Sundays, and customers who do not see movement in their tracking by Monday morning write in.

She checks whether this pattern holds across all ticket types:

# What percentage of each topic's tickets land on Monday?
monday_pct = (
    df[df['day_of_week'] == 'Monday']
    .groupby('topic')['ticket_id']
    .count()
    .div(df.groupby('topic')['ticket_id'].count())
    .mul(100)
    .sort_values(ascending=False)
)

print("\nMonday concentration by topic (% of each topic's tickets):")
for topic, pct in monday_pct.items():
    bar = '█' * int(pct / 2)
    print(f"  {topic:<25}: {pct:.1f}%  {bar}")

Output:

Monday concentration by topic:
  shipping_delay           : 34.4%  █████████████████
  wrong_item               : 28.1%  ██████████████
  positive_feedback        : 24.9%  ████████████
  billing                  : 21.3%  ██████████
  shipping_damage          : 19.6%  █████████
  product_quality          : 18.4%  █████████
  returns_exchanges        : 17.8%  █████████
  account_tech             : 15.2%  ███████
  other                    : 14.1%  ███████

The shipping delay and wrong item categories are disproportionately Monday problems. Other categories are distributed more evenly. This points to a specific operational issue: weekend order processing and carrier handoffs, not a general service quality decline.


Step 6: Extracting the Most Negative Language

Priya wants specific quotes for Sandra's presentation. She extracts the language from the most negative shipping damage tickets:

most_negative_shipping = (
    df[df['topic'] == 'shipping_damage']
    .nsmallest(10, 'polarity')[['ticket_id', 'polarity', 'ticket_text']]
)

print("Top 5 most negative shipping damage tickets:")
for _, row in most_negative_shipping.head(5).iterrows():
    print(f"\n  ID: {row['ticket_id']} | Polarity: {row['polarity']:.4f}")
    # Truncate for display
    print(f"  Text: {row['ticket_text'][:200]}...")

She also extracts keyword phrases from the shipping damage category:

damage_texts = df[df['topic'] == 'shipping_damage']['ticket_text'].tolist()
damage_texts_clean = [str(t) for t in damage_texts if isinstance(t, str)]

vec = TfidfVectorizer(
    stop_words='english',
    ngram_range=(2, 3),  # Two and three word phrases
    max_features=100,
    min_df=3,
)
vec.fit_transform(damage_texts_clean)
feature_names = vec.get_feature_names_out()
# Simple frequency for phrases
phrase_counter = Counter()
for text in damage_texts_clean:
    text_lower = text.lower()
    for phrase in feature_names:
        if phrase in text_lower:
            phrase_counter[phrase] += 1

print("\nTop phrases in shipping damage tickets:")
for phrase, count in phrase_counter.most_common(15):
    print(f"  '{phrase}': {count}")

Output:

Top phrases in shipping damage tickets:
  'package damaged': 187
  'box crushed': 143
  'arrived damaged': 138
  'packaging terrible': 97
  'not properly packaged': 84
  'damaged on arrival': 76
  'third party carrier': 71
  'carrier damaged': 67
  'never again': 58
  'unacceptable condition': 52
  'shrink wrap torn': 47
  'clearly dropped': 41
  'warehouse damage': 38
  'not insured': 34
  'filed claim': 31

The phrase "third party carrier" appears in 71 of the 342 damage tickets — 20.8%. This is a specific clue: customers are distinguishing between Acme's own delivery operations and third-party carriers. This is actionable for Marcus's team.


Step 7: The Summary for Sandra

By 9:45, Priya has assembled the key findings:

print("\n" + "=" * 60)
print("ACME CORP Q4 SUPPORT TICKET ANALYSIS")
print("Prepared for Sandra Chen | Board Prep Meeting")
print("=" * 60)

print(f"""
OVERVIEW
  Total Q4 tickets:          {len(df):,}
  Overall sentiment:          {df['polarity'].mean():.4f} (Q3 baseline: -0.1500)
  Negative sentiment share:   {(df['sentiment_label'] == 'NEGATIVE').mean():.1%}

TOP ISSUES BY VOLUME
  1. Shipping delays          {(df['topic'] == 'shipping_delay').sum():,} tickets (30.5%)
  2. Billing issues           {(df['topic'] == 'billing').sum():,} tickets (11.6%)
  3. Product quality          {(df['topic'] == 'product_quality').sum():,} tickets (9.8%)

MOST NEGATIVE CATEGORY
  Shipping damage:  -0.48 avg polarity, 84.2% negative
  KEY PHRASE: 'third party carrier' in 20.8% of damage tickets

MONDAY SPIKE
  34.4% of all shipping delay tickets arrive on Monday
  (vs. 14.3% Tuesday through 8% Sunday/Saturday)
  Pattern: weekend orders + no Sunday carrier processing

RECOMMENDATION
  1. Immediate: Add proactive Monday delay notification emails
     to all orders placed Thursday–Sunday
  2. Short-term: Audit third-party carrier damage rates vs.
     Acme direct delivery damage rates
  3. Medium-term: Investigate Q4 billing error root causes
     (487 tickets, 78% negative — this is not all shipping)
""")

The Meeting

At 10 AM, Priya walks Sandra and Marcus through the analysis on her laptop. Marcus initially pushes back on the billing numbers — "those are probably just coupon complaints" — but Priya shows him the specific phrases: "charged twice," "billing error," "incorrect charge." These are not coupon questions. These are errors in Acme's billing system.

Sandra asks for the Monday shipping analysis to be included in the board deck. She also asks Marcus to pull the Q4 carrier-by-carrier damage claim reports by end of week. The "third party carrier" cluster in the NLP output gives them a specific hypothesis to test against logistics data.

The meeting runs 22 minutes instead of the scheduled 60. Priya built the entire analysis between 8:47 and 9:45 AM.


Key Technical Decisions

Why rule-based classification instead of ML?

With 4,200 tickets, a trained ML classifier would likely outperform keyword rules. But Priya chose keywords for three reasons: 1. No labeled training data existed 2. Business stakeholders can read and audit keyword rules 3. The categories were well-defined, and domain-specific multi-word phrases ("charged twice," "still waiting") are highly reliable signals

For a production system processing ongoing ticket volume, she would label 500-800 tickets and train a Naive Bayes or logistic regression classifier (as shown in the chapter).

Why TextBlob instead of a more sophisticated model?

Aggregate sentiment patterns — "shipping damage has the most negative language" — are robust to individual misclassifications. Priya is not making decisions based on any single ticket's polarity score. She is looking for clusters and trends, where TextBlob's 70-80% accuracy is more than sufficient.

Validation against star ratings

# Check sentiment accuracy where star ratings exist
rated = df[df['star_rating'].notna()].copy()
rated['rating_sentiment'] = pd.cut(
    rated['star_rating'],
    bins=[0, 2.5, 3.5, 5.1],
    labels=['NEGATIVE', 'NEUTRAL', 'POSITIVE']
)

from sklearn.metrics import accuracy_score, classification_report
accuracy = accuracy_score(
    rated['rating_sentiment'],
    rated['sentiment_label']
)
print(f"\nSentiment accuracy vs. star ratings: {accuracy:.1%}")
print(classification_report(rated['rating_sentiment'], rated['sentiment_label']))

Output shows 76.3% accuracy against star ratings — which is about what Priya expected, and sufficient for the aggregate analysis she is performing.


What Priya Would Do Differently Next Quarter

  • Set up this analysis to run automatically at the end of each month
  • Add spaCy NER to extract specific product SKUs and carrier names from ticket text
  • Build a simple dashboard that shows the Monday spike in real time
  • Label 500 tickets to train a proper ML classifier for more nuanced categorization

The code from this case study is in sentiment_analyzer.py. All of the techniques used here — preprocessing, TextBlob scoring, keyword classification, frequency analysis — are covered in detail in the main chapter.