Case Study 2: Raj's Email Assistant — A Custom Triage and Draft Bot

Background

Raj's team receives approximately 60-80 technical emails per day across three shared inboxes: a customer support alias, a developer API questions alias, and an internal escalations alias. Three engineers rotate through email duty on a weekly basis. The person on email duty spends between two and four hours per day handling the inboxes — reading, triaging, drafting responses, escalating, and logging.

The work is genuinely difficult because it requires technical knowledge. But much of it is also genuinely repetitive: the same questions appear week after week, and experienced engineers spend significant time drafting responses to questions they have answered a dozen times before.

Raj built an email assistant that handles triage automatically and drafts responses for the on-duty engineer's review, reducing email duty time by approximately 60%.

Architecture Overview

The email assistant is a Python application with four components:

Email fetcher — polls the shared inboxes via IMAP and retrieves unprocessed emails
Triage engine — classifies and prioritizes each email using the Anthropic API
Response drafter — generates draft responses for emails requiring one
Output handler — writes triage results and drafts to a shared document for the engineer to review, approve, or revise

The engineer remains in the loop at the output stage: they review and approve every draft before it is sent. The assistant never sends email autonomously.

The Knowledge Base

Before building the triage engine, Raj spent three hours creating a knowledge base that the assistant would draw on for draft generation. This was the most important investment in the project.

The knowledge base was a Markdown file containing: - Common questions and authoritative answers, organized by category - Escalation criteria: which types of issues should always be escalated to a senior engineer - Response tone guidelines: how to match formality to the sender's style - Standard disclaimers and links (documentation links, status page URL, etc.) - Products and features the team supports, with brief descriptions

# Email Assistant Knowledge Base

## API Authentication Issues

### Q: "I'm getting a 401 Unauthorized error"
**Answer framework:**
- Ask for: SDK version, language, code snippet showing how they initialize the client
- Common causes: expired key, wrong key format, missing ANTHROPIC_API_KEY env variable
- First response: ask for the three items above before attempting to diagnose
- Known issue (as of Q3 2024): Python SDK versions < 0.28 have an auth header formatting bug

### Q: "My API key worked yesterday but not today"
**Answer framework:**
- Check: have they recently rotated keys in the dashboard?
- Ask: are they loading from environment variable or hardcoding?
- Standard response: include link to key management documentation
...

## Rate Limiting

### Q: "I'm getting 429 errors"
**Answer framework:**
- Explain: 429 = rate limit exceeded
- Ask: what is their current tier? How many requests per minute are they sending?
- Provide: rate limit table for their tier
- Suggest: exponential backoff implementation (include code snippet)
...

## ESCALATION CRITERIA
Always escalate to senior engineer (tag: ESCALATE):
- Any mention of data loss or data corruption
- Security vulnerabilities or suspected breaches
- Enterprise customer with SLA in their signature
- Any legal language or threats
- Issues with billing amounts > $500

## TONE GUIDELINES
- Developer emails: technical, direct, minimal pleasantries
- Non-technical stakeholders: clear, no jargon, explain acronyms
- Frustrated customers: acknowledge frustration explicitly before troubleshooting
- Short emails: match brevity; don't write three paragraphs in response to three sentences

The knowledge base served as the system prompt supplement for all draft generation calls.

Implementation

import anthropic
import imaplib
import email
import json
import os
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import Optional
from pathlib import Path
from email.header import decode_header

client = anthropic.Anthropic()

@dataclass
class EmailItem:
    uid: str
    inbox: str
    sender: str
    sender_name: str
    subject: str
    body: str
    date: str
    thread_history: str = ""

@dataclass
class TriageResult:
    uid: str
    inbox: str
    category: str
    priority: str
    sentiment: str
    escalate: bool
    escalation_reason: str
    one_line_summary: str
    key_technical_issue: str
    confidence: float
    draft_response: Optional[str] = None
    draft_quality_note: Optional[str] = None


KNOWLEDGE_BASE = Path("email_knowledge_base.md").read_text(encoding="utf-8")

TRIAGE_SYSTEM = """You are an email triage specialist for a software development team.
You classify and prioritize technical support emails with precision.
Always respond with valid JSON only — no explanatory text outside the JSON object.
Base all assessments strictly on what the email explicitly states."""

DRAFT_SYSTEM = f"""You are a technical support engineer.
You draft professional, helpful email responses.
Always use the knowledge base below when relevant.

KNOWLEDGE BASE:
{KNOWLEDGE_BASE}

Guidelines:
- Match the sender's technical level and formality
- Be direct and helpful; avoid padding
- If you don't have the answer, say so and explain how to get it
- Never promise functionality or fixes that don't exist
- End with a clear next step for the sender"""


def triage_email(item: EmailItem) -> TriageResult:
    """Classify and prioritize an email."""
    triage_prompt = f"""Classify this email for a technical support team.

From: {item.sender_name} <{item.sender}>
Inbox: {item.inbox}
Subject: {item.subject}
Date: {item.date}

Body:
{item.body[:3000]}  # Truncate very long emails

{f'Thread context:{item.thread_history[:1000]}' if item.thread_history else ''}

Respond with JSON:
{{
  "category": "api_error|authentication|rate_limiting|billing|feature_request|documentation|integration|performance|security|general_question|escalation|spam",
  "priority": "critical|high|medium|low",
  "sentiment": "positive|neutral|frustrated|angry",
  "escalate": true|false,
  "escalation_reason": "reason if escalate=true, else empty string",
  "one_line_summary": "one sentence describing this email",
  "key_technical_issue": "the specific technical problem or question, if any",
  "confidence": 0.0 to 1.0
}}"""

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=400,
        system=TRIAGE_SYSTEM,
        messages=[{"role": "user", "content": triage_prompt}]
    )

    try:
        data = json.loads(response.content[0].text)
    except json.JSONDecodeError:
        data = {
            "category": "parse_error",
            "priority": "medium",
            "sentiment": "neutral",
            "escalate": True,
            "escalation_reason": "Classification parsing failed — review manually",
            "one_line_summary": "Parse error",
            "key_technical_issue": "",
            "confidence": 0.0
        }

    return TriageResult(uid=item.uid, inbox=item.inbox, **data)


def draft_response(item: EmailItem, triage: TriageResult) -> tuple[str, str]:
    """
    Draft a response to an email.
    Returns (draft_text, quality_note).
    """
    # Do not draft for escalations, spam, or very low confidence classifications
    if triage.escalate:
        return "", "Escalated — no draft generated"
    if triage.category == "spam":
        return "", "Spam — no draft needed"
    if triage.confidence < 0.5:
        return "", f"Low confidence ({triage.confidence:.2f}) — draft manually"

    draft_prompt = f"""Draft a response to this email.

TRIAGE CONTEXT:
- Category: {triage.category}
- Priority: {triage.priority}
- Sender sentiment: {triage.sentiment}
- Key issue: {triage.key_technical_issue}

EMAIL TO RESPOND TO:
From: {item.sender_name} <{item.sender}>
Subject: {item.subject}

{item.body[:3000]}

Requirements:
- Professional but not overly formal
- Address the specific issue identified in triage
- Use relevant information from the knowledge base if applicable
- Keep under 200 words unless the issue requires detailed explanation
- Do not include subject line — only the email body

After the draft, on a new line write: QUALITY_NOTE: [one sentence about confidence in this draft or anything the reviewer should check]"""

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        system=DRAFT_SYSTEM,
        messages=[{"role": "user", "content": draft_prompt}]
    )

    full_response = response.content[0].text

    # Split draft from quality note
    if "QUALITY_NOTE:" in full_response:
        parts = full_response.split("QUALITY_NOTE:", 1)
        draft = parts[0].strip()
        quality_note = parts[1].strip()
    else:
        draft = full_response.strip()
        quality_note = "No quality note generated"

    return draft, quality_note


def generate_review_document(
    processed_emails: list[tuple[EmailItem, TriageResult]],
    output_file: str = None
) -> str:
    """Generate a review document for the on-duty engineer."""
    if output_file is None:
        output_file = f"email_review_{datetime.now().strftime('%Y%m%d_%H%M')}.md"

    # Sort: escalations first, then by priority
    priority_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
    sorted_emails = sorted(
        processed_emails,
        key=lambda x: (0 if x[1].escalate else 1, priority_order.get(x[1].priority, 4))
    )

    doc = f"# Email Review — {datetime.now().strftime('%A, %B %d %Y')}\n\n"

    # Summary statistics
    total = len(sorted_emails)
    escalations = sum(1 for _, t in sorted_emails if t.escalate)
    critical = sum(1 for _, t in sorted_emails if t.priority == "critical" and not t.escalate)
    high = sum(1 for _, t in sorted_emails if t.priority == "high" and not t.escalate)
    with_drafts = sum(1 for _, t in sorted_emails if t.draft_response)

    doc += f"**{total} emails to review** | "
    doc += f"{escalations} escalations | "
    doc += f"{critical} critical | {high} high priority | "
    doc += f"{with_drafts} with AI drafts\n\n---\n\n"

    # Escalations section
    if escalations > 0:
        doc += "## 🔴 ESCALATIONS — Review First\n\n"
        for item, triage in sorted_emails:
            if triage.escalate:
                doc += f"### [{triage.inbox}] {item.subject}\n"
                doc += f"**From:** {item.sender_name} <{item.sender}>\n"
                doc += f"**Reason for escalation:** {triage.escalation_reason}\n"
                doc += f"**Summary:** {triage.one_line_summary}\n\n"
                doc += f"**Email preview:**\n> {item.body[:300]}...\n\n---\n\n"

    # Regular emails section
    doc += "## Regular Emails\n\n"
    for item, triage in sorted_emails:
        if triage.escalate:
            continue

        priority_indicator = {"critical": "🔴", "high": "🟠", "medium": "🟡", "low": "⚪"}.get(
            triage.priority, "⚪"
        )
        doc += f"### {priority_indicator} [{triage.inbox}] {item.subject}\n"
        doc += f"**From:** {item.sender_name} <{item.sender}> | "
        doc += f"**Category:** {triage.category} | "
        doc += f"**Sentiment:** {triage.sentiment}\n\n"
        doc += f"**Summary:** {triage.one_line_summary}\n\n"

        if triage.key_technical_issue:
            doc += f"**Technical issue:** {triage.key_technical_issue}\n\n"

        if triage.draft_response:
            doc += "**AI Draft Response:**\n"
            doc += f"> *Note: {triage.draft_quality_note}*\n\n"
            doc += "```\n"
            doc += triage.draft_response + "\n"
            doc += "```\n\n"
            doc += "[ ] Approve as-is  [ ] Edit and send  [ ] Discard and write manually\n\n"
        else:
            doc += f"**Draft status:** {triage.draft_quality_note or 'No draft generated'}\n\n"

        doc += "---\n\n"

    Path(output_file).write_text(doc)
    print(f"Review document saved: {output_file}")
    return output_file

Workflow Integration

The email assistant runs twice daily: at 8 AM (catching overnight emails) and at 12 PM (catching morning emails). The on-duty engineer opens the review document, which is posted to their Slack channel automatically.

The review document workflow: 1. Escalations appear at the top. The engineer handles these first, without AI assistance. 2. For each regular email with a draft: read the email preview, read the draft, mark one of the three checkboxes (approve/edit/discard). Most approvals take 30-45 seconds. 3. For emails without drafts (low confidence classifications, unusual requests): the engineer handles manually. 4. Approved drafts are copied into the email client and sent.

The engineer never receives raw emails through the assistant — they always see the triage summary and draft together, which primes their review with the key context.

Results and Iterations

Initial deployment (weeks 1-2): Draft approval rate was 52% — just over half the drafts were used with minimal editing. The team considered this a good starting baseline.

After knowledge base expansion (week 4): Raj asked the rotating engineers to document every case where they significantly rewrote a draft. After two weeks, he identified three categories responsible for 80% of rewrites: rate-limiting explanations (the knowledge base had outdated tier information), authentication questions (the draft was too generic and needed specific SDK-version guidance), and escalation judgment calls (the assistant sometimes drafted instead of escalating borderline cases).

He updated the knowledge base and tightened the escalation criteria. Draft approval rate rose to 74%.

Current state (after three months): Draft approval rate is 82%. Average email duty time dropped from 2.5-4 hours to 45-90 minutes. The primary remaining manual work is escalated items (which require senior engineer judgment) and genuinely novel questions the knowledge base does not cover well.

The team tracks one additional metric: customer response satisfaction, measured by the percentage of email threads that close in one exchange (the customer's issue is resolved without follow-up). This held steady at 68% after the assistant was deployed — it did not improve, but it did not decline. "We were worried the AI drafts would be less helpful than what we wrote ourselves," Raj notes. "The data showed they're about the same quality, which is what you want when the goal is efficiency, not improvement."

Human-in-the-Loop Design Decisions

Raj made three deliberate decisions to keep humans in the loop:

No autonomous sending. The assistant drafts; the engineer sends. This is not a technical limitation — the IMAP library can send email. It is a deliberate choice. Customer-facing communication is too high-stakes for fully autonomous handling, and the engineer's review catches the approximately 18% of drafts that need revision.

Escalation logic is conservative. The escalation criteria were set to flag any borderline case. Raj designed the threshold knowing that false positives (flagging things that could have been handled routinely) are far less costly than false negatives (failing to escalate something that needed senior attention). The current false positive rate for escalation is approximately 15%.

No draft for low-confidence classifications. When the triage engine is uncertain about what kind of email it is looking at, it does not attempt to draft a response. An uncertain classification means uncertain context for the draft, which means a draft that will likely mislead the reviewer rather than help them. Better to flag the email for manual handling than to generate a plausible-looking but poorly-grounded draft.

These design choices reflect a principle Raj articulates clearly: "The assistant is here to reduce the mechanical work, not to make decisions. Every decision still has a human attached to it."