Case Study 2: AI-Powered Content Management System

Adding AI Content Generation and Classification to a CMS

The Scenario

Meridian Publishing runs an online magazine that publishes 40 articles per week across six content categories: Technology, Business, Health, Science, Culture, and Opinion. Their editorial team of 12 writers and 3 editors spends significant time on repetitive tasks: writing SEO metadata, categorizing articles, generating social media posts, and summarizing content for newsletter digests.

The CTO proposes adding AI features to their existing Django-based CMS to automate these tasks, freeing the editorial team to focus on original reporting and long-form content. The goal is not to replace writers but to eliminate the mechanical work that surrounds every article.

This case study follows the development of four AI features, from design through deployment and optimization.

The Existing System

Meridian's CMS is a Django application with PostgreSQL. The core data model includes:

class Article(models.Model):
    title = models.CharField(max_length=300)
    body = models.TextField()
    author = models.ForeignKey(User, on_delete=models.CASCADE)
    category = models.CharField(max_length=50, choices=CATEGORY_CHOICES)
    status = models.CharField(max_length=20, choices=STATUS_CHOICES)
    published_at = models.DateTimeField(null=True)
    seo_title = models.CharField(max_length=70, blank=True)
    seo_description = models.CharField(max_length=160, blank=True)
    tags = models.ManyToManyField(Tag, blank=True)
    social_posts = models.JSONField(default=dict)
    summary = models.TextField(blank=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

The editorial workflow: writers draft articles, editors review them, and approved articles are published. SEO metadata, social media posts, and newsletter summaries are currently written manually by the editorial team — a process that takes 20-30 minutes per article.

Feature 1: Automatic Content Classification

The problem: When writers create a new article, they select a category and add tags manually. Miscategorization happens regularly (about 15% of articles need recategorization during review), and tag selection is inconsistent across writers.

The solution: An AI classifier that suggests the category and tags when the article is saved as a draft.

Implementation: The team builds a classification service that analyzes the article's title and first 500 words:

class ContentClassifier:
    """AI-powered content classification for articles."""

    CLASSIFICATION_PROMPT = """Classify this article into exactly one category
and suggest 3-5 relevant tags.

Categories: Technology, Business, Health, Science, Culture, Opinion

Article Title: {title}
Article Content (excerpt): {content_excerpt}

Respond with JSON:
{{
    "category": "the single best category",
    "confidence": 0.0-1.0,
    "tags": ["tag1", "tag2", "tag3"],
    "reasoning": "brief explanation of classification"
}}"""

    def __init__(self, client):
        self.client = client

    def classify(self, title: str, body: str) -> dict:
        """Classify an article by category and tags."""
        excerpt = body[:2000]  # First ~500 words
        prompt = self.CLASSIFICATION_PROMPT.format(
            title=title,
            content_excerpt=excerpt
        )
        response = self.client.messages.create(
            model="claude-3-5-haiku-20241022",  # Fast, cheap model
            max_tokens=200,
            messages=[{"role": "user", "content": prompt}]
        )
        return json.loads(response.content[0].text)

Key design decisions: - Uses Claude Haiku for speed and cost — classification is a simple task that does not need the most powerful model. - Returns a confidence score so the UI can highlight low-confidence suggestions for human review. - Runs automatically on save but does not override the author's choice — it suggests, and the author accepts or rejects.

Integration with Django:

@receiver(post_save, sender=Article)
def suggest_classification(sender, instance, **kwargs):
    """Suggest classification when an article is saved as draft."""
    if instance.status == "draft" and not instance.category:
        classifier = ContentClassifier(get_ai_client())
        suggestion = classifier.classify(instance.title, instance.body)
        ArticleAISuggestion.objects.update_or_create(
            article=instance,
            suggestion_type="classification",
            defaults={
                "suggestion_data": suggestion,
                "confidence": suggestion["confidence"],
                "status": "pending"
            }
        )

Results after one month: Category accuracy improved from 85% to 97%. Time spent on categorization and tagging dropped from 5 minutes per article to 15 seconds (reviewing and accepting the suggestion). Monthly cost: approximately $12 (Haiku is extremely cheap for short classification tasks).

Feature 2: SEO Metadata Generation

The problem: Writing SEO-optimized titles (under 70 characters) and meta descriptions (under 160 characters) for every article is tedious. Writers often skip it or write poor metadata because they find it boring. Missing or low-quality SEO metadata reduces search traffic.

The solution: Generate SEO metadata automatically when an article enters the review stage, with editor approval before publication.

Implementation: The team creates a pipeline that generates SEO metadata with brand voice consistency:

class SEOGenerator:
    """Generate SEO-optimized metadata for articles."""

    SYSTEM_PROMPT = """You are an SEO specialist for Meridian Publishing,
an online magazine covering Technology, Business, Health, Science,
Culture, and Opinion.

Brand voice: Authoritative but accessible. We explain complex topics
clearly. We never use clickbait or sensationalism.

Your job is to write SEO metadata that:
1. Accurately represents the article content
2. Includes relevant keywords naturally
3. Compels users to click from search results
4. Follows our brand voice guidelines"""

    GENERATION_PROMPT = """Generate SEO metadata for this article.

Title: {title}
Category: {category}
Article excerpt: {excerpt}

Requirements:
- seo_title: Under 70 characters, include primary keyword
- seo_description: Under 160 characters, compelling and accurate
- focus_keyword: The single most important search keyword
- secondary_keywords: 2-3 additional relevant keywords

Respond with JSON:
{{
    "seo_title": "...",
    "seo_description": "...",
    "focus_keyword": "...",
    "secondary_keywords": ["...", "..."]
}}"""

    def __init__(self, client):
        self.client = client

    def generate(self, article: dict) -> dict:
        """Generate SEO metadata for an article."""
        prompt = self.GENERATION_PROMPT.format(
            title=article["title"],
            category=article["category"],
            excerpt=article["body"][:3000]
        )
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=300,
            system=self.SYSTEM_PROMPT,
            messages=[{"role": "user", "content": prompt}]
        )
        result = json.loads(response.content[0].text)

        # Quality gate: verify character limits
        if len(result["seo_title"]) > 70:
            result["seo_title"] = result["seo_title"][:67] + "..."
        if len(result["seo_description"]) > 160:
            result["seo_description"] = result["seo_description"][:157] + "..."

        return result

Quality gate: The system validates that the SEO title is under 70 characters, the description is under 160 characters, and the focus keyword appears in both. Violations trigger regeneration with explicit feedback about what needs to change.

A/B testing: The team runs an A/B test comparing AI-generated SEO metadata against human-written metadata for 100 articles over four weeks. They track click-through rate (CTR) from Google Search Console.

Results: - AI-generated metadata: 3.2% average CTR - Human-written metadata: 2.8% average CTR - AI-generated metadata performed 14% better, likely because it was consistently optimized while human-written metadata quality varied

After this validation, the editorial team adopts AI-generated metadata as the default, with editors making adjustments as needed.

The problem: Each published article needs posts for three platforms: Twitter/X (under 280 characters), LinkedIn (under 700 characters), and a newsletter teaser (100-150 words). Writing three versions of promotional content for every article consumes 15-20 minutes of a writer's or social media manager's time.

The solution: Generate platform-specific social media posts automatically when an article is published.

Implementation: The team builds a multi-variant generator that produces all three formats in a single AI call:

class SocialMediaGenerator:
    """Generate platform-specific social media posts."""

    PROMPT = """Generate social media posts for this article across three
platforms. Each post should capture the article's key insight and
encourage engagement.

Article Title: {title}
Article Summary: {summary}
Category: {category}
Key Quote (if available): {key_quote}

Platform requirements:
1. Twitter/X: Under 280 characters. Punchy, conversational. Include a hook.
   Do NOT include hashtags — our social team adds those manually.
2. LinkedIn: Under 700 characters. Professional, insightful. Include a
   question or call to discussion.
3. Newsletter teaser: 100-150 words. Informative, draws the reader in.
   End with a reason to read the full article.

Respond with JSON:
{{
    "twitter": "...",
    "linkedin": "...",
    "newsletter": "..."
}}"""

    def generate(self, article: dict) -> dict:
        """Generate social media posts for all platforms."""
        prompt = self.PROMPT.format(
            title=article["title"],
            summary=article.get("summary", article["body"][:500]),
            category=article["category"],
            key_quote=article.get("key_quote", "None available")
        )
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )
        result = json.loads(response.content[0].text)

        # Quality gates
        if len(result["twitter"]) > 280:
            result["twitter_warning"] = "Exceeds 280 characters, needs trimming"
        if len(result["linkedin"]) > 700:
            result["linkedin_warning"] = "Exceeds 700 characters, needs trimming"

        return result

Prompt versioning: The social media team experiments with different prompt styles. They register three prompt versions in the prompt registry:

v1 (informative): Focus on the article's main finding or argument.
v2 (curiosity): Lead with a surprising fact or question.
v3 (narrative): Start with a brief story or scenario.

An A/B test across 200 articles over six weeks shows that v2 (curiosity) produces the highest engagement rate on Twitter (+22% vs. v1), while v3 (narrative) performs best on LinkedIn (+18% vs. v1). The newsletter performs similarly across all versions. The team configures platform-specific prompt versions: v2 for Twitter, v3 for LinkedIn, v1 for newsletters.

The problem: Meridian publishes a weekly newsletter digest featuring summaries of the top 10 articles. An editor currently spends 3-4 hours writing these summaries, distilling 40 articles into 10 selections with 100-150 word summaries each.

The solution: A content pipeline that generates candidate summaries for all articles, with the editor selecting and refining the top 10.

Implementation: The pipeline has three stages:

class ArticleSummarizer:
    """Multi-stage article summarization pipeline."""

    def summarize_for_digest(self, article: dict) -> dict:
        """Generate a newsletter-quality summary of an article."""
        # Stage 1: Extract key points
        key_points = self._extract_key_points(article["body"])

        # Stage 2: Generate summary from key points
        summary = self._generate_summary(
            article["title"],
            key_points,
            target_words=125
        )

        # Stage 3: Quality evaluation
        quality = self._evaluate_summary(
            article["title"],
            article["body"],
            summary
        )

        return {
            "summary": summary,
            "key_points": key_points,
            "quality_score": quality["overall_score"],
            "quality_details": quality
        }

    def _extract_key_points(self, body: str) -> list[str]:
        """Extract 3-5 key points from an article."""
        response = self.client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=300,
            messages=[{
                "role": "user",
                "content": f"Extract 3-5 key points from this article. "
                           f"Return as a JSON array of strings.\n\n{body[:5000]}"
            }]
        )
        return json.loads(response.content[0].text)

    def _generate_summary(
        self,
        title: str,
        key_points: list[str],
        target_words: int
    ) -> str:
        """Generate a newsletter summary from key points."""
        points_text = "\n".join(f"- {p}" for p in key_points)
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=250,
            messages=[{
                "role": "user",
                "content": f"Write a {target_words}-word newsletter summary for "
                           f"the article '{title}' based on these key points:\n"
                           f"{points_text}\n\nThe summary should entice readers "
                           f"to click through to the full article. Write in "
                           f"Meridian Publishing's voice: authoritative but "
                           f"accessible."
            }]
        )
        return response.content[0].text

    def _evaluate_summary(
        self,
        title: str,
        body: str,
        summary: str
    ) -> dict:
        """Evaluate summary quality."""
        response = self.client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=200,
            messages=[{
                "role": "user",
                "content": f"Evaluate this newsletter summary on a 1-10 scale "
                           f"for: accuracy, engagement, completeness, and "
                           f"conciseness.\n\nArticle title: {title}\n"
                           f"Article (excerpt): {body[:2000]}\n"
                           f"Summary: {summary}\n\n"
                           f"Respond with JSON: "
                           f'{{\"accuracy\": N, \"engagement\": N, '
                           f'\"completeness\": N, \"conciseness\": N, '
                           f'\"overall_score\": N}}'
            }]
        )
        return json.loads(response.content[0].text)

Editor workflow: The system generates summaries for all 40 articles published that week. The editor reviews the summaries, sorted by quality score. They select the top 10 articles for the digest, make any necessary edits to the summaries, and publish the newsletter. The process takes 45 minutes instead of 3-4 hours.

Cost Management and Monitoring

With four AI features running across 40+ articles per week, cost management is essential.

Cost breakdown (monthly):

Feature	Model	Calls/month	Estimated Cost
Classification	Haiku	320	$2
SEO Generation	Sonnet	160	$12
Social Media	Sonnet	160	$10
Summarization (3-stage)	Haiku + Sonnet	480	$18
Total		1,120	$42

The total AI cost is $42/month — trivial compared to the editorial time saved. Even at 10x the volume, costs would remain under $500/month.

Monitoring dashboard: The team builds a simple dashboard that tracks: - Suggestion acceptance rate per feature (how often editors use AI suggestions as-is vs. modifying them) - Quality scores per feature over time - Cost per article and cost per feature - Model latency per feature

Key monitoring insight: After three months, the dashboard reveals that classification acceptance rate is 94% but SEO metadata acceptance rate has dropped to 71%. Investigation shows that the SEO prompt was optimized for informational articles but produces poor results for opinion pieces, which have a different SEO strategy. The team creates a separate prompt variant for opinion articles, and acceptance rate recovers to 89%.

Architecture Decisions

Why an AI service layer: The team wraps all AI calls behind a ContentAIService class that all four features use. This gives them: - A single place to configure API clients, retries, and timeouts - Centralized cost tracking and logging - Easy model switching (they migrated from GPT-4 to Claude mid-project) - Consistent error handling across all features

Why synchronous processing: All four features run synchronously because the results are needed immediately in the editorial workflow — when a writer saves a draft, they want to see the classification suggestion in the same page load. Latency is acceptable (1-3 seconds per feature) because it is much faster than the manual process it replaces.

Why not fine-tuning: The team considered fine-tuning a model on Meridian's content style but decided against it. RAG and prompt engineering achieve sufficient quality for their use cases, and they avoid the complexity and cost of maintaining fine-tuned models. The brand voice is effectively captured in the system prompts.

Lessons Learned

AI features do not need to be perfect to be valuable. A 94% classification acceptance rate means the AI saves time on 94% of articles even though it is wrong 6% of the time. The editor catches the errors in seconds.
Different features need different models. Classification and evaluation use Haiku (cheap, fast). Generation uses Sonnet (higher quality). Matching model capability to task complexity saved 60% on costs compared to using Sonnet for everything.
Prompt versioning pays off quickly. The team has already created 14 prompt versions across four features. The ability to roll back when a change degrades quality has prevented several potential incidents.
Editor acceptance rate is the most important metric. It directly measures whether the AI feature is actually useful. A feature with high automated quality scores but low acceptance rate is not serving the team's needs.
Start with the most tedious task, not the most impressive one. SEO metadata generation is unglamorous but saved the most time per article. Article summarization is more impressive but was the last feature built. Delivering boring-but-useful features first builds organizational trust in AI features.

The complete implementation code for this case study is available in code/case-study-code.py.