Case Study 1: Alex's Brand Voice Assistant — From Custom GPT to Team Resource
Background
Alex Chen manages content marketing for a 50-person B2B software company. Her team is five people: herself, two content writers, a designer who sometimes writes social copy, and a freelance writer who works on longer-form content. The company has a distinctive brand voice — direct, expertise-forward, occasionally self-deprecating, never corporate-speak — that took three years to develop and is documented in a 40-page brand guide.
The problem: the brand guide is thorough but not actionable in the moment. Writers know to read it; writers do not re-read it before every paragraph. The result is brand voice drift — posts that are technically well-written but sound like a different company. Alex spent roughly 90 minutes per week on brand voice corrections across the team's output, and freelance submissions regularly required a full revision pass.
Her goal was to make brand voice guidance available at the point of writing — not as a reference document to consult but as a check embedded in the workflow.
Design Phase
Alex spent one Saturday morning on the design. She did not open GPT Builder immediately. She spent the first two hours answering three questions on paper:
What should this GPT do? Two things: evaluate content for brand voice compliance and revise content to better match the brand voice. Not ideate, not strategize, not research — just evaluate and revise.
What information does it need? The brand voice principles (not the full 40-page guide — the ten pages that actually govern voice decisions). Sample content — both good examples and annotated examples of common mistakes. A vocabulary list of words and phrases the brand uses and avoids.
What should it not do? Review content for SEO, factual accuracy, or strategic alignment. Those require different tools or human judgment. The GPT should stay in its lane — brand voice only.
The clarity of the design phase turned a potentially vague tool into a specific, testable one.
Building the System Prompt
Alex wrote three drafts of the system prompt before she was satisfied.
Draft one: Two paragraphs about the brand voice. It produced inconsistent results — the GPT had too much latitude in how it interpreted "direct" and "expertise-forward."
Draft two: Added specific do's and don'ts, plus a list of prohibited phrases. Better, but the GPT was flagging things as off-brand that were actually fine, just unconventional.
Draft three (final): Added a "philosophy" section that explained why the brand sounds the way it does, which gave the GPT the interpretive foundation to make better judgment calls on ambiguous cases.
The final system prompt (condensed):
# Role
You are BrandVoice, a brand consistency assistant for [Company Name]'s marketing team.
Your job is to evaluate content for brand voice compliance and revise content to match the brand voice.
# What You Do
1. EVALUATE: When given content, assess it against the brand voice criteria below. Identify what is working and what is not, with specific examples.
2. REVISE: When asked, rewrite content to better match the brand voice. Show your changes clearly so writers understand why.
# What You Don't Do
- Comment on SEO, factual accuracy, or strategic decisions
- Judge whether content should exist — only whether it sounds like us
- Make up factual claims to improve voice
# The Brand Voice
## Philosophy
[Company Name] was built by practitioners for practitioners. Our voice reflects that: we are confident because we know what we're talking about, direct because our audience's time is valuable, and occasionally irreverent because we don't take ourselves too seriously. We are not a generic enterprise software company, and we don't sound like one.
## Core Principles
- DIRECT: Lead with the point. No wind-ups, no "In today's rapidly changing landscape..." Never end with a weak hedge when a strong statement is available.
- EXPERTISE-FORWARD: We make specific claims, not vague ones. "This reduces onboarding time by 40%" is on-brand. "This improves your onboarding experience" is not.
- HUMAN: We use contractions. We write like a smart colleague explaining something, not like a white paper.
- OCCASIONALLY IRREVERENT: One well-placed bit of self-awareness or dry humor per piece is fine and often great. Consistent snark is not our voice.
## Vocabulary
ON-BRAND: [list of characteristic phrases and constructions]
OFF-BRAND: [list of prohibited phrases and common violations]
## Escalation
If content raises questions about factual accuracy or strategic direction, say: "This is outside my scope — check with Alex or the relevant SME."
# Output Format
For evaluations: state an overall rating (strong/acceptable/needs revision), then list specific issues with line references.
For revisions: show the original and revised text side by side, with a brief note explaining each change.
Knowledge Files
Alex uploaded four files:
brand-voice-principles.pdf — The ten pages of the brand guide specifically about voice (extracted and reformatted). Section headers include: "The Direct Principle," "How We Handle Technical Complexity," "Tone Variation by Channel," and "Common Voice Violations."
vocabulary-guide.md — A Markdown table organized by category: "Opening sentence patterns we use," "Phrases we never use and why," "Words that signal off-brand hedging," and "Words that signal on-brand confidence."
on-brand-examples.md — Fifteen annotated examples of on-brand content — blog introductions, email subject lines, social posts, and product descriptions. Each example has a brief note on what makes it work.
off-brand-examples.md — Ten annotated examples with before/after comparisons. The "before" shows the off-brand version; the "after" shows the corrected version; the annotation explains the specific principle being applied.
The off-brand examples file turned out to be the highest-value addition. Learning from mistakes — seeing what goes wrong and why — made the GPT dramatically more consistent on the specific violation patterns Alex's team most commonly produced.
Testing
Alex tested the GPT against 40 real content samples: - 15 samples her team had produced that she rated as on-brand - 15 samples that had required revision in real editorial review - 10 samples from competitors (as a check against over-flagging)
Results from initial testing: - 13 of 15 on-brand samples: correctly identified as strong or acceptable - 11 of 15 off-brand samples: correctly identified as needing revision - 7 of 10 competitor samples: correctly identified as not matching the company's voice (expected, since it should only apply company voice standards)
Two failures from the off-brand sample set pointed to the same gap: the GPT was too permissive about hedging language in technical contexts. Alex added a section to the vocabulary guide specifically addressing technical hedging ("it may be possible to," "in some cases," "depending on your situation") and how it differs from appropriate epistemic caution. On retest, the miss rate on that pattern dropped from 40% to 0%.
Deployment
Alex deployed the GPT to the team via a shared link. Her deployment package included:
A one-page user guide covering: - When to use it (before submitting any content for editorial review) - What to paste in (the full draft, or specific sections you are unsure about) - How to interpret the output (what "needs revision" means, what a "strong" rating means) - What it does not review (factual accuracy, SEO, strategy) - How to give feedback to Alex if it gets something wrong
A five-minute team walkthrough — not training, just a live demo using a real piece of content so everyone could see what the output looked like.
A feedback channel — a shared Slack thread where anyone can post examples of GPT assessments they thought were wrong, with a note on why.
Results After Six Months
Alex tracked three metrics:
Brand voice compliance in content audits: The company ran quarterly brand audits where an external reviewer rated 20 pieces of content per quarter on brand voice compliance. Pre-GPT average: 71%. Post-GPT average after two quarters: 88%.
Time spent on brand voice corrections: Alex tracked this manually for two months pre- and post-deployment. Pre: approximately 90 minutes per week across all team members' content. Post: approximately 25 minutes per week. The freelance writer's revision rate dropped most dramatically — from requiring a full revision pass on nearly every submission to requiring minor revisions on about 30%.
Team feedback: Qualitatively, the most common comment from writers was that the GPT helped them understand why their instincts were wrong, not just that they were wrong. The annotated vocabulary guide and the explanation of each revision in the output created a learning loop that ad hoc feedback ("this doesn't sound right, try again") did not.
What Alex Would Do Differently
Three specific things:
Start with the off-brand examples file. The on-brand examples helped, but the off-brand examples were the most impactful knowledge file. "I would have spent twice as much time on that file and less on documenting general principles."
Build a test suite before the first deployment. Alex built her 40-sample test suite after initial testing revealed issues. Having it before the first test would have caught the hedging-in-technical-contexts problem earlier and saved an additional iteration cycle.
Set clearer expectations about the revision output. Some writers initially treated GPT revision suggestions as final — they copied the revised text directly without reviewing it. Alex updated the user guide to specify: "The GPT's revisions are starting points, not final copy. You should still apply your own judgment." She now includes a line in the GPT's output format instructions: "Revised text is a suggestion. Review it with your own judgment before using."
That last lesson is about human-in-the-loop design, not GPT configuration. The tool worked exactly as designed; the designers forgot to account for how humans would actually use it.