Case Study 35-2: Maya Reads Two Years of Client Feedback
Characters
Maya Reyes — Freelance management consultant specializing in operational process improvement. Sole proprietor, four to six active clients at any time, works out of a home office in Austin.
The Setup
Maya has been sending a five-question survey to clients at the end of every engagement for two years. Four questions are numerical scales (Net Promoter Score, overall satisfaction, value for money, quality of deliverables). The fifth question is open-text:
"In your own words, what did you find most valuable about this engagement — and is there anything you wish had been different?"
The open-text question was Maya's idea. She believed that the most useful feedback would come from letting clients describe things in their own words rather than clicking stars.
She was right. The problem is that she has never had the time to read all the responses carefully. She scrolls through them occasionally, notices a few quotes she likes, and moves on. After two years, she has 94 completed survey responses covering 31 distinct client engagements.
On a slow Friday afternoon in January, Maya decides to finally do the analysis she has been putting off.
Step 1: Loading and Cleaning the Data
import pandas as pd
import numpy as np
from pathlib import Path
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from gensim import corpora, models
from gensim.utils import simple_preprocess
from gensim.models.coherencemodel import CoherenceModel
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from collections import Counter
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
DATA_PATH = Path("data/maya_client_surveys.csv")
df = pd.read_csv(DATA_PATH)
print(f"Survey responses loaded: {len(df)}")
print(f"Engagements covered: {df['engagement_id'].nunique()}")
print(f"\nColumn overview:")
print(df[['client_size', 'industry', 'nps', 'satisfaction', 'open_text']].describe(include='all'))
# Filter to meaningful open-text responses
df['text_length'] = df['open_text'].fillna('').str.len()
print(f"\nResponse length distribution:")
print(df['text_length'].describe())
# Keep responses with at least 30 characters
meaningful = df[df['text_length'] >= 30].copy()
print(f"\nMeaningful responses (≥30 chars): {len(meaningful)} of {len(df)}")
Output:
Survey responses loaded: 94
Engagements covered: 31
Meaningful responses (≥30 chars): 81 of 94
Thirteen responses are too short to analyze (things like "Good job" or "n/a"). Maya has 81 usable responses.
Step 2: Initial Sentiment Overview
Before looking at themes, Maya checks whether satisfaction scores and sentiment are aligned:
# Sentiment scoring
meaningful['polarity'] = meaningful['open_text'].apply(
lambda t: TextBlob(str(t)).sentiment.polarity
)
meaningful['subjectivity'] = meaningful['open_text'].apply(
lambda t: TextBlob(str(t)).sentiment.subjectivity
)
# Overall sentiment distribution
print("\nSentiment Distribution:")
bins = [-1, -0.1, 0.1, 1]
labels = ['NEGATIVE', 'NEUTRAL', 'POSITIVE']
meaningful['sentiment_label'] = pd.cut(
meaningful['polarity'], bins=bins, labels=labels
)
print(meaningful['sentiment_label'].value_counts())
# Correlation between polarity and NPS
correlation = meaningful[['polarity', 'nps', 'satisfaction']].corr()
print(f"\nCorrelation between polarity and NPS: {correlation.loc['polarity', 'nps']:.3f}")
print(f"Correlation between polarity and satisfaction: {correlation.loc['polarity', 'satisfaction']:.3f}")
Output:
Sentiment Distribution:
POSITIVE 62
NEUTRAL 14
NEGATIVE 5
Correlation between polarity and NPS: 0.631
Correlation between polarity and satisfaction: 0.587
Unsurprising: 76.5% of the open-text responses are positive. Maya's clients are, on the whole, satisfied. The modest correlations (0.63 and 0.59) confirm that the open-text sentiment and the numerical scores are measuring something related but not identical. Clients sometimes give high NPS scores but write cautious or hedging comments — and vice versa.
The five negative responses are worth examining directly:
negative_responses = meaningful[meaningful['sentiment_label'] == 'NEGATIVE'].sort_values('polarity')
for _, row in negative_responses.iterrows():
print(f"\nNPS: {row.get('nps', 'N/A')} | Polarity: {row['polarity']:.4f}")
print(f"Industry: {row.get('industry', 'N/A')} | Client size: {row.get('client_size', 'N/A')}")
print(f"Text: {row['open_text']}")
print("-" * 60)
Output (representative examples):
NPS: 7 | Polarity: -0.2341
Industry: Manufacturing | Client size: Mid-market
Text: The process mapping work was thorough, but the timeline slipped
by three weeks and we had to delay a board presentation as a result.
Not ideal.
------------------------------------------------------------
NPS: 6 | Polarity: -0.3100
Industry: Healthcare | Client size: Enterprise
Text: Deliverables were fine but communication during the engagement
was inconsistent. We often didn't know where things stood. The final
report was excellent, the journey to get there was frustrating.
------------------------------------------------------------
These are genuine critiques, and both mention a similar theme: the quality of the work is acknowledged, but the process — timeline, communication — is where things broke down. Maya makes a mental note.
Step 3: What Do Clients Say They Valued?
The first part of the open-text question asks what clients found most valuable. Maya wants to know what language clients use to describe value.
def preprocess_for_nlp(text: str, lemmatize: bool = True) -> list[str]:
"""Tokenize, clean, and optionally lemmatize text."""
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()
text = text.lower()
tokens = word_tokenize(text)
tokens = [t for t in tokens if t.isalpha() and len(t) > 2]
tokens = [t for t in tokens if t not in stop_words]
if lemmatize:
tokens = [lemmatizer.lemmatize(t) for t in tokens]
return tokens
# Positive responses only — these describe what clients valued
positive_texts = meaningful[meaningful['sentiment_label'] == 'POSITIVE']['open_text'].tolist()
# Build keyword frequency from positive responses
all_tokens_positive = []
for text in positive_texts:
all_tokens_positive.extend(preprocess_for_nlp(str(text)))
positive_freq = Counter(all_tokens_positive)
print("\nTop 25 words in positive responses:")
for word, count in positive_freq.most_common(25):
bar = '█' * int(count / 2)
print(f" {word:<20}: {count:>3} {bar}")
Output:
Top 25 words in positive responses:
process : 34 █████████████████
recommendation : 31 ███████████████
clear : 28 ██████████████
implementation : 27 █████████████
team : 26 █████████████
practical : 24 ████████████
analysis : 23 ███████████
stakeholder : 22 ███████████
actionable : 21 ██████████
communication : 19 █████████
structured : 18 █████████
data : 17 ████████
quickly : 16 ████████
facilitation : 15 ███████
insight : 15 ███████
workshop : 14 ███████
deliverable : 13 ██████
timeline : 12 ██████
understand : 11 █████
alignment : 11 █████
executive : 10 █████
strategic : 9 ████
framework : 9 ████
tailored : 8 ████
specific : 8 ████
The words that jump out to Maya: practical, actionable, structured, clear. Her clients are not praising her for abstract strategic thinking. They are praising her for making things concrete and usable. This aligns with her self-perception — but it is validating to see it in data.
Facilitation and workshop also appear, which surprises her slightly. She does not think of herself as a workshop facilitator, but apparently clients value that component more than she realized.
Step 4: N-gram Analysis — What Phrases Are Clients Using?
Single words tell part of the story. Phrases tell more:
# Extract bigrams and trigrams from positive responses
vec_bigram = TfidfVectorizer(
ngram_range=(2, 3),
stop_words='english',
max_features=200,
min_df=3,
)
positive_texts_clean = [str(t) for t in positive_texts]
vec_bigram.fit(positive_texts_clean)
# Simple frequency count for phrases
phrase_counter = Counter()
for text in positive_texts_clean:
text_lower = text.lower()
for phrase in vec_bigram.get_feature_names_out():
if phrase in text_lower:
phrase_counter[phrase] += 1
print("\nTop phrases in positive responses:")
for phrase, count in phrase_counter.most_common(20):
print(f" '{phrase}': {count}")
Output:
Top phrases in positive responses:
'actionable recommendations': 18
'clear communication': 15
'practical advice': 14
'quick turnaround': 13
'stakeholder alignment': 12
'implementation roadmap': 11
'data driven': 10
'structured approach': 10
'executive presentation': 9
'easy to understand': 8
'real world examples': 8
'tailored to our': 7
'workshop facilitation': 7
'clear and concise': 7
'prioritized recommendations': 6
'immediately applicable': 6
'beyond the scope': 5
'exceeded expectations': 5
'went above and beyond': 4
'saved us time': 4
"Actionable recommendations" is by far the most frequent phrase. Maya's clients value outputs they can immediately act on. "Implementation roadmap" and "immediately applicable" reinforce this: clients want a path forward, not just analysis.
"Beyond the scope" and "went above and beyond" appear in five and four responses respectively. This is interesting — clients are noticing and commenting on times Maya exceeded the agreed scope. This is something she does deliberately (within reason), and the data confirms it is being received positively.
Step 5: Topic Modeling — What Are the Latent Themes?
Maya wants to understand the structure of what clients discuss, without imposing her own categories:
stop_words = set(stopwords.words('english'))
stop_words.update(['would', 'could', 'also', 'one', 'two', 'really',
'engagement', 'project', 'work', 'team', 'maya'])
def preprocess_for_lda(text: str) -> list[str]:
tokens = simple_preprocess(str(text), deacc=True)
return [t for t in tokens if t not in stop_words and len(t) > 2]
processed = [preprocess_for_lda(t) for t in meaningful['open_text'].tolist()]
# Build dictionary and corpus
dictionary = corpora.Dictionary(processed)
dictionary.filter_extremes(no_below=3, no_above=0.85)
corpus = [dictionary.doc2bow(doc) for doc in processed]
# Train with 6 topics
NUM_TOPICS = 6
lda = models.LdaModel(
corpus=corpus,
id2word=dictionary,
num_topics=NUM_TOPICS,
passes=20,
random_state=42,
alpha='auto',
)
print("Discovered Topics (LDA with 6 topics):")
print("=" * 65)
for topic_id in range(NUM_TOPICS):
topic_terms = lda.show_topic(topic_id, topn=8)
terms_str = ' | '.join([f"{word} ({weight:.3f})" for word, weight in topic_terms])
print(f"\nTopic {topic_id + 1}:")
print(f" {terms_str}")
Output:
Discovered Topics (LDA with 6 topics):
=================================================================
Topic 1:
process (0.089) | mapping (0.071) | workflow (0.063) | current state (0.058)
| documentation (0.047) | bottleneck (0.041) | efficiency (0.039) | improvement (0.036)
Topic 2:
recommendation (0.092) | actionable (0.081) | prioritized (0.067) | roadmap (0.059)
| implementation (0.055) | quick win (0.048) | next step (0.042) | clear (0.038)
Topic 3:
communication (0.087) | timeline (0.076) | update (0.064) | status (0.058)
| delayed (0.052) | unclear (0.044) | expected (0.038) | schedule (0.031)
Topic 4:
stakeholder (0.095) | alignment (0.083) | executive (0.071) | presentation (0.063)
| buy-in (0.052) | facilitation (0.047) | workshop (0.043) | consensus (0.038)
Topic 5:
data (0.088) | analysis (0.079) | dashboard (0.061) | metric (0.055)
| report (0.049) | kpi (0.043) | measurement (0.038) | baseline (0.032)
Topic 6:
tailored (0.091) | specific (0.083) | industry (0.072) | context (0.065)
| understand (0.058) | unique (0.048) | adapted (0.041) | relevant (0.036)
Maya reads the topics carefully and labels them:
| Topic | Label | Interpretation |
|---|---|---|
| 1 | Process Mapping Work | Clients discussing the process analysis deliverables |
| 2 | Actionable Deliverables | Quality and usability of recommendations |
| 3 | Communication & Timeline | Process concerns — delays, unclear status |
| 4 | Stakeholder Facilitation | Workshop and alignment facilitation work |
| 5 | Data & Analytics | Measurement, dashboards, KPI work |
| 6 | Customization | Clients noting Maya adapted to their specific context |
Topic 3 (Communication & Timeline) is the critical one. Even though only five responses score as "negative" in sentiment, this topic surfaces across a wider number of documents. Communication process issues are present but embedded in otherwise positive feedback — clients mention them without necessarily making the whole response negative.
Step 6: What Are Clients NOT Mentioning?
This is the question Maya finds most interesting. She looks at what is absent:
# Check for mentions of specific service components
service_components = {
'deliverable_quality': ['report', 'deliverable', 'document', 'output',
'presentation', 'deck'],
'communication_process': ['communication', 'update', 'status', 'progress',
'response time', 'availability'],
'pricing_value': ['price', 'cost', 'value', 'fee', 'budget', 'expensive',
'affordable', 'worth it'],
'speed': ['fast', 'quick', 'turnaround', 'timely', 'on time', 'quickly'],
'expertise': ['expertise', 'expert', 'knowledge', 'experienced', 'skilled',
'qualified', 'specialist'],
'personality': ['friendly', 'pleasant', 'professional', 'great to work with',
'easy to work with', 'approachable'],
'follow_up': ['follow up', 'follow-up', 'after', 'post-engagement',
'ongoing', 'continued'],
}
all_text = ' '.join(meaningful['open_text'].fillna('').str.lower().tolist())
print("\nMentions of key service components:")
print(f"{'Component':<28} {'Mentions':>10} {'% of Responses':>16}")
print("-" * 60)
for component, keywords in service_components.items():
mention_count = 0
for response in meaningful['open_text'].fillna('').str.lower():
if any(kw in response for kw in keywords):
mention_count += 1
pct = mention_count / len(meaningful) * 100
print(f" {component:<26} {mention_count:>10} {pct:>14.1f}%")
Output:
Mentions of key service components:
Component Mentions % of Responses
------------------------------------------------------------
deliverable_quality 71 87.7%
communication_process 34 42.0%
pricing_value 8 9.9%
speed 27 33.3%
expertise 19 23.5%
personality 22 27.2%
follow_up 4 4.9%
The silence around pricing/value is striking. Only 8 of 81 responses — less than 10% — mention price or value for money in any form. For a freelance consultant whose engagements range from $8,000 to $45,000, you might expect clients to comment on whether the investment felt worthwhile. They almost never do. Maya reads this two ways: either clients feel price is appropriate and unremarkable, or they are focusing on what they received rather than what they paid. Either reading is more positive than the alternative.
Follow-up is mentioned in only 4 responses. Maya makes a note: she has always assumed clients value the availability of post-engagement follow-up questions. The data suggests they either do not use it or do not think to mention it. It may not be a differentiator.
Step 7: Segmenting by Client Type
Maya's clients range from small businesses to enterprise. She wonders if the feedback differs:
# Client size segmentation
size_segments = meaningful[meaningful['client_size'].notna()].copy()
size_sentiment = size_segments.groupby('client_size').agg(
response_count=('polarity', 'count'),
avg_polarity=('polarity', 'mean'),
avg_nps=('nps', 'mean'),
avg_satisfaction=('satisfaction', 'mean'),
).round(3)
print("\nSentiment by Client Size:")
print(size_sentiment.sort_values('avg_polarity', ascending=False).to_string())
Output:
Sentiment by Client Size:
response_count avg_polarity avg_nps avg_satisfaction
client_size
Small 18 0.342 8.89 4.72
Mid-market 38 0.287 8.62 4.48
Enterprise 25 0.201 8.21 4.31
Small clients rate Maya most positively on all dimensions. Enterprise clients are still positive but less effusive. Maya suspects this is partly a writing style effect (enterprise respondents tend toward more formal, measured language) and partly genuine: enterprise engagements involve more stakeholders and more complex approval processes, which creates more friction.
Step 8: Extracting Representative Quotes
For a testimonial page on her website and for her annual self-review, Maya wants to extract the most representative positive statements:
from textblob import TextBlob
# Find sentences within responses that are highly positive
def extract_positive_sentences(text: str, min_polarity: float = 0.4) -> list[str]:
"""Extract individual sentences with polarity above threshold."""
blob = TextBlob(str(text))
positive = []
for sentence in blob.sentences:
if sentence.sentiment.polarity >= min_polarity:
sentence_text = str(sentence).strip()
# Filter for reasonable length quotes
if 20 < len(sentence_text) < 200:
positive.append((sentence_text, sentence.sentiment.polarity))
return sorted(positive, key=lambda x: x[1], reverse=True)
all_positive_sentences = []
for _, row in meaningful.iterrows():
sentences = extract_positive_sentences(row['open_text'])
for sentence, polarity in sentences[:2]: # Max 2 per response
all_positive_sentences.append({
'quote': sentence,
'polarity': polarity,
'industry': row.get('industry', 'Unknown'),
'client_size': row.get('client_size', 'Unknown'),
})
quotes_df = (
pd.DataFrame(all_positive_sentences)
.sort_values('polarity', ascending=False)
.drop_duplicates(subset=['quote'])
)
print("\nTop 5 highest-polarity quotes:")
for _, row in quotes_df.head(5).iterrows():
print(f"\n \"{row['quote']}\"")
print(f" — {row['industry']} client ({row['client_size']}) | Polarity: {row['polarity']:.4f}")
Step 9: The Insight Summary
print("\n" + "=" * 65)
print("MAYA REYES CONSULTING — CLIENT FEEDBACK ANALYSIS")
print("2 Years of Post-Engagement Survey Open-Text Responses")
print("=" * 65)
print("""
WHAT CLIENTS VALUE MOST (by keyword frequency and LDA topic analysis):
1. Actionable, practical recommendations they can implement immediately
2. Clear, structured deliverables (reports, roadmaps, presentations)
3. Stakeholder facilitation and buy-in building
4. Adaptation to their specific context and industry
WHAT CLIENTS NOTICE BUT DON'T ALWAYS SAY POSITIVELY:
5. Communication during the engagement (Timeline + status updates)
→ Appears in Topic 3 of LDA; present in ~20% of responses
→ 2 of 5 negative responses cite communication specifically
→ Even satisfied clients mention this theme
WHAT CLIENTS BARELY MENTION:
- Pricing / value for money (9.9% of responses)
- Post-engagement follow-up (4.9% of responses)
- Maya's credentials / expertise (23.5%)
PATTERN BY CLIENT SIZE:
- Small clients: highest satisfaction and NPS
- Enterprise clients: measurably lower, likely due to complexity
- Consider building an enterprise-specific communication protocol
ONE SPECIFIC FINDING:
'Workshop facilitation' appears in 7 positive responses.
Maya does not market this service explicitly.
Consider adding it to the website and proposal template.
""")
What Maya Does With This
Monday morning, Maya opens her website proposal template. She adds a section: Stakeholder Facilitation Services — something she has always done informally but never named.
She sets up a simple calendar reminder to send clients a brief mid-engagement "status check" message every ten days. It takes thirty seconds to send. The LDA and keyword analysis told her that communication during the engagement is the one area where positive clients still find something to mention. She wants to eliminate that friction entirely.
She also notes that she has never explicitly asked for reviews or testimonials. Now that she has the data, she identifies eight clients whose open-text responses contain highly positive, quotable sentences. She emails each of them directly, shares the quote from their survey response, and asks if they would be comfortable with it appearing on her website.
All eight say yes.
Technical Note: The Limits of 81 Responses
Maya is aware that her sample is small by data science standards. Some observations:
What 81 responses can support: Keyword frequency, broad theme identification via LDA, comparison between two or three segments, identification of obvious outliers.
What 81 responses cannot reliably support: Statistical significance testing between segments, nuanced topic models (she would ideally want 500+ documents), claims about rare subgroups (her healthcare clients, for example, are only 11 responses).
For the purposes of practical business insight — improving her communication process, adding workshop facilitation to her website, identifying quotable testimonials — 81 responses is more than enough. The value of NLP is not in reaching statistical significance; it is in processing text you would otherwise not have time to read carefully, and surfacing patterns worth investigating further.
The code patterns used in this case study are drawn directly from the chapter. See Section 35.8 for the complete survey analysis pipeline.