Case Study 2: When Context Matters

How the Same Prompt Produces Different Results with Different Context Windows

The Scenario

Marcus is building a web application for a small business. He needs a function to validate customer email addresses. He plans to ask his AI coding assistant: "Write a function to validate an email address."

But Marcus has heard that context affects output quality. So he decides to run an experiment: he will submit the same core request three times, each time with a different amount of context. He wants to see firsthand how context changes the AI's output.


Experiment 1: Minimal Context

Prompt:

Write a function to validate an email address.

That is the entire prompt -- 9 words, approximately 12 tokens. The AI has no information about Marcus's project, programming language, existing codebase, quality requirements, or use case.

The AI's Output:

def validate_email(email):
    import re
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

Analysis of This Output:

The function is concise and works for basic cases. But let us examine the decisions the model made and why:

  1. Language choice: Python. The model chose Python because it is the most common language in code generation training data and the most "default" choice without other context. If Marcus needed JavaScript, he got the wrong language.

  2. No type hints. Without context suggesting a modern Python project with type annotations, the model defaulted to a simpler style.

  3. No docstring. The minimal prompt did not signal that documentation was important, so the model produced minimal code.

  4. Import inside function. Placing import re inside the function is generally considered poor practice. The model has seen both patterns in training data and chose the self-contained version since there was no module context to suggest top-level imports.

  5. Basic regex only. The regex pattern covers common email formats but misses many valid addresses. It does not handle internationalized domain names, quoted local parts, or plus addressing quirks.

  6. No error handling. What if the input is None? An integer? A list? The function will crash with an unhelpful error.

  7. Boolean return only. The function returns True or False but gives no information about why an email is invalid. This makes debugging user input difficult.

The model generated perfectly reasonable code given the information available. With only 12 tokens of context, the probability distribution was broad, and the model chose the most common, generic implementation pattern.


Experiment 2: Moderate Context

Prompt:

I'm building a Flask web application in Python 3.12. We use type hints throughout
the codebase and follow PEP 8 conventions.

Write a function to validate an email address. It should be thorough enough for
a customer registration form. Return both a boolean and an error message.

This prompt provides approximately 60 tokens of context. The AI now knows the language version, framework, coding standards, use case, and desired return format.

The AI's Output:

import re
from typing import NamedTuple


class EmailValidationResult(NamedTuple):
    """Result of an email validation check."""

    is_valid: bool
    error_message: str


def validate_email(email: str) -> EmailValidationResult:
    """Validate an email address for customer registration.

    Performs multiple validation checks including format, length,
    and domain structure. Designed for use in registration forms
    where user-friendly error messages are important.

    Args:
        email: The email address string to validate.

    Returns:
        An EmailValidationResult with is_valid flag and error_message.
        The error_message is empty if the email is valid.

    Examples:
        >>> validate_email("user@example.com")
        EmailValidationResult(is_valid=True, error_message='')
        >>> validate_email("")
        EmailValidationResult(is_valid=False, error_message='Email address is required.')
    """
    if not email or not email.strip():
        return EmailValidationResult(
            is_valid=False,
            error_message="Email address is required.",
        )

    email = email.strip().lower()

    if len(email) > 254:
        return EmailValidationResult(
            is_valid=False,
            error_message="Email address is too long (maximum 254 characters).",
        )

    if "@" not in email:
        return EmailValidationResult(
            is_valid=False,
            error_message="Email address must contain an '@' symbol.",
        )

    local_part, _, domain = email.rpartition("@")

    if not local_part:
        return EmailValidationResult(
            is_valid=False,
            error_message="Email address is missing the username before '@'.",
        )

    if len(local_part) > 64:
        return EmailValidationResult(
            is_valid=False,
            error_message="The username part of the email is too long (maximum 64 characters).",
        )

    if not domain or "." not in domain:
        return EmailValidationResult(
            is_valid=False,
            error_message="Email address has an invalid domain.",
        )

    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    if not re.match(pattern, email):
        return EmailValidationResult(
            is_valid=False,
            error_message="Email address format is invalid.",
        )

    return EmailValidationResult(is_valid=True, error_message="")

Analysis of This Output:

The difference is striking. With moderate context, the model produced:

  1. Type hints throughout. The context mentioned "type hints throughout the codebase," and the attention mechanism picked up on this requirement, applying it consistently.

  2. A structured return type. The prompt asked for "both a boolean and an error message," and the model chose a NamedTuple -- a Pythonic, type-safe way to return multiple values. This reflects the "Python 3.12" and "PEP 8" context signals.

  3. Comprehensive docstring. The coding standards context triggered more thorough documentation, including examples.

  4. User-friendly error messages. The "customer registration form" context told the model these messages would be shown to end users, so it generated clear, non-technical error messages.

  5. Step-by-step validation. Instead of a single regex, the function checks multiple conditions individually, providing specific error messages for each failure mode.

  6. Input sanitization. The function strips whitespace and converts to lowercase, reflecting awareness of real-world form input.

  7. RFC-aware length limits. The function checks the 254-character total limit and 64-character local part limit from the email specification, showing the model's knowledge of email standards.

  8. Import at module level. With the context of a Flask application (implying a module structure), the model correctly placed the import at the top level.

The moderate context prompt consumed about 5 times more tokens than the minimal prompt, but the output quality improved dramatically. The key context signals -- Python 3.12, Flask, type hints, PEP 8, registration form, return format -- each activated different knowledge in the model, steering the probability distribution toward a more specific, higher-quality output.


Experiment 3: Rich Context

Prompt:

I'm building a Flask web application in Python 3.12 for a B2B SaaS product. Here
is our existing validation module pattern:

```python
# validators/base.py
from dataclasses import dataclass
from enum import Enum


class ValidationSeverity(Enum):
    ERROR = "error"
    WARNING = "warning"


@dataclass(frozen=True)
class ValidationResult:
    is_valid: bool
    message: str
    severity: ValidationSeverity
    field_name: str


def create_error(field: str, message: str) -> ValidationResult:
    return ValidationResult(
        is_valid=False,
        message=message,
        severity=ValidationSeverity.ERROR,
        field_name=field,
    )


def create_warning(field: str, message: str) -> ValidationResult:
    return ValidationResult(
        is_valid=True,
        message=message,
        severity=ValidationSeverity.WARNING,
        field_name=field,
    )


def create_success(field: str) -> ValidationResult:
    return ValidationResult(
        is_valid=True,
        message="",
        severity=ValidationSeverity.ERROR,  # not used when valid
        field_name=field,
    )

Here is an example of an existing validator in our codebase:

# validators/phone.py
import re
from validators.base import ValidationResult, create_error, create_success, create_warning


def validate_phone(phone: str, field_name: str = "phone") -> ValidationResult:
    """Validate a phone number for the registration form.

    Accepts formats: +1-555-555-5555, (555) 555-5555, 5555555555
    """
    if not phone or not phone.strip():
        return create_error(field_name, "Phone number is required.")

    cleaned = re.sub(r"[\s\-\(\)\.]", "", phone.strip())

    if not cleaned.lstrip("+").isdigit():
        return create_error(field_name, "Phone number contains invalid characters.")

    if len(cleaned.lstrip("+")) < 10:
        return create_error(field_name, "Phone number is too short.")

    if len(cleaned.lstrip("+")) > 15:
        return create_error(field_name, "Phone number is too long.")

    if not cleaned.startswith("+"):
        return create_warning(field_name, "Consider adding country code (e.g., +1).")

    return create_success(field_name)

Our email validation needs: - Business emails only (warn on free providers like gmail.com, yahoo.com) - Must handle the case where the email domain has no MX record (warn, don't error) - Disposable email detection (error on known disposable domains) - Follow the same pattern as validate_phone above

Write a validate_email function that fits into this validation module.


This prompt is approximately 400 tokens -- substantially larger. But it gives the model rich, specific context about the existing codebase.

**The AI's Output:**

```python
# validators/email.py
import re

from validators.base import (
    ValidationResult,
    create_error,
    create_success,
    create_warning,
)

# Common free email providers to warn about for B2B context
FREE_EMAIL_PROVIDERS: set[str] = {
    "gmail.com",
    "yahoo.com",
    "hotmail.com",
    "outlook.com",
    "aol.com",
    "icloud.com",
    "mail.com",
    "protonmail.com",
    "zoho.com",
    "yandex.com",
    "gmx.com",
    "live.com",
}

# Known disposable email domains
DISPOSABLE_EMAIL_DOMAINS: set[str] = {
    "mailinator.com",
    "guerrillamail.com",
    "tempmail.com",
    "throwaway.email",
    "temp-mail.org",
    "10minutemail.com",
    "trashmail.com",
    "fakeinbox.com",
    "sharklasers.com",
    "guerrillamailblock.com",
    "grr.la",
    "dispostable.com",
    "yopmail.com",
    "maildrop.cc",
    "discard.email",
}


def _check_mx_record(domain: str) -> bool:
    """Check if a domain has MX DNS records.

    Returns True if MX records are found, False otherwise.
    Falls back to True on DNS resolution errors to avoid
    false negatives from network issues.
    """
    try:
        import dns.resolver

        answers = dns.resolver.resolve(domain, "MX")
        return len(answers) > 0
    except ImportError:
        # dns.resolver not available; skip MX check
        return True
    except Exception:
        # DNS resolution failed; assume valid to avoid false negatives
        return True


def validate_email(email: str, field_name: str = "email") -> ValidationResult:
    """Validate an email address for B2B customer registration.

    Performs format validation, length checks, and business-context
    checks including free provider warnings and disposable email
    detection.

    Accepts standard email formats: user@domain.com, user+tag@domain.com
    """
    if not email or not email.strip():
        return create_error(field_name, "Email address is required.")

    cleaned = email.strip().lower()

    if len(cleaned) > 254:
        return create_error(
            field_name, "Email address is too long (maximum 254 characters)."
        )

    if "@" not in cleaned:
        return create_error(field_name, "Email address must contain an '@' symbol.")

    local_part, _, domain = cleaned.rpartition("@")

    if not local_part:
        return create_error(
            field_name, "Email address is missing the username before '@'."
        )

    if len(local_part) > 64:
        return create_error(
            field_name,
            "The username part of the email is too long (maximum 64 characters).",
        )

    if not domain or "." not in domain:
        return create_error(field_name, "Email address has an invalid domain.")

    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    if not re.match(pattern, cleaned):
        return create_error(field_name, "Email address format is invalid.")

    # Check for disposable email domains (error)
    if domain in DISPOSABLE_EMAIL_DOMAINS:
        return create_error(
            field_name, "Disposable email addresses are not allowed."
        )

    # Check MX record (warn only)
    if not _check_mx_record(domain):
        return create_warning(
            field_name,
            f"The domain '{domain}' does not appear to accept email. "
            "Please verify the address.",
        )

    # Check for free email providers in B2B context (warn only)
    if domain in FREE_EMAIL_PROVIDERS:
        return create_warning(
            field_name,
            "Please use your business email address instead of a personal one.",
        )

    return create_success(field_name)

Analysis of This Output:

The rich context produced output that is dramatically different from both previous experiments:

  1. Perfect pattern matching. The function follows exactly the same structure as validate_phone: same parameter pattern (field_name with default), same return type, same import structure, same docstring style. The attention mechanism locked onto the existing pattern and replicated it precisely.

  2. Correct file placement. The model even included the file path comment # validators/email.py, matching the project structure it inferred from the context.

  3. Correct imports. Instead of importing re and defining its own return types, the model imports from validators.base exactly as the existing code does.

  4. Business logic awareness. The B2B context triggered domain-specific logic: warning about free email providers and blocking disposable addresses. These requirements were explicitly stated, but the model also chose appropriate severity levels (warning vs. error) matching the patterns in the existing code.

  5. MX record handling. The model implemented MX record checking with appropriate fallback behavior -- returning True (no warning) when DNS resolution fails, to avoid false negatives. This matches the "warn, don't error" requirement.

  6. Defensive import handling. The _check_mx_record helper gracefully handles the case where dns.resolver is not installed, which is smart defensive coding for an optional feature.

  7. Consistent code style. Variable naming, string formatting, comment style, and function structure all match the existing codebase perfectly. This is the attention mechanism doing its best work -- attending to the specific patterns in the provided code examples.


Comparing the Three Outputs

Dimension Minimal Context Moderate Context Rich Context
Lines of code 4 55 95
Type hints None Full Full (matching existing pattern)
Docstring None Comprehensive Matching existing style
Error handling None Yes Yes, with severity levels
Error messages None User-friendly User-friendly, B2B-appropriate
Return type bool Custom NamedTuple Project's ValidationResult
Business logic None None Disposable/free email checks
Pattern consistency Generic Good Python Exact match to codebase
Integration readiness Needs rewrite Needs adaptation Drop-in ready

Why Context Matters: The Technical Explanation

This experiment illustrates several principles from Chapter 2 in action:

Probability distributions narrow with context. With minimal context, the model's probability distribution was spread across many plausible implementations. Each token had many reasonable alternatives. With rich context, the distribution was concentrated: the model "knew" which import style, return type, parameter pattern, and error handling approach to use.

Attention mechanisms leverage provided code. When the model generated the rich-context version, its attention mechanism heavily focused on the provided code examples. The phone validator served as a direct template, and the base module defined the types and helper functions to use. The model did not have to guess at architecture -- it could attend to specific examples.

Training patterns activate with appropriate signals. The mention of "B2B SaaS" activated patterns the model learned about business email validation. The mention of "Flask" activated patterns about web application structure. The mention of "Python 3.12" activated patterns about modern Python features. Each context signal acts as a key that unlocks relevant knowledge.

Context quality matters more than quantity. The moderate context was only about 60 tokens, but each token was meaningful: language version, framework, coding standards, use case, return format. The rich context was about 400 tokens, with most tokens being actual code examples. Neither prompt wasted tokens on irrelevant information.


Practical Lessons for Vibe Coders

Lesson 1: Always provide architectural context. If you have existing patterns, show them. The AI will match your patterns more faithfully than any verbal description of conventions.

Lesson 2: Show, do not just tell. "Follow our validation pattern" is vague. Providing the actual validation pattern (as in Experiment 3) is unambiguous. The model's attention mechanism works with concrete examples, not abstract descriptions.

Lesson 3: Business context shapes behavior. Mentioning "B2B SaaS" in Experiment 3 activated domain-specific knowledge that was completely absent in Experiments 1 and 2. A few words of business context can dramatically change the output.

Lesson 4: Context is an investment, not a cost. Yes, the rich context prompt consumed more tokens (and thus more of the context window budget). But the output required virtually no modification. The minimal context output would need extensive rewriting to fit into Marcus's project. The time saved in post-generation editing far outweighs the time spent providing good context.

Lesson 5: Diminishing returns exist. There is a point beyond which additional context does not improve output quality and may even dilute attention. If Marcus had included ten validators instead of one, the model might have been confused about which pattern to follow. Provide the most relevant example, not every example.


Conclusion

This case study demonstrates that AI coding assistants are not magic boxes that either work or do not work. They are sophisticated pattern-matching systems whose output quality is directly proportional to the quality and relevance of the context you provide. Understanding the context window as the AI's working memory -- and curating that working memory carefully -- is one of the most impactful skills a vibe coder can develop.

The difference between the minimal and rich outputs is not that the AI "tried harder" in one case. It is that the rich context constrained and guided the probability distributions at every single token, steering the output toward a specific, high-quality implementation that fits seamlessly into an existing codebase. That is the power of context, and it follows directly from how transformers and attention mechanisms work.