Case Study 1: TDD with AI — Building a URL Shortener

Overview

This case study walks through the complete process of building a URL shortener application using test-driven development with AI as the implementation partner. You will see how writing tests first gives you precise control over the AI-generated code, catches errors early, and produces a robust, well-verified application.

The URL shortener supports three core operations: shortening a long URL into a short code, resolving a short code back to the original URL, and tracking how many times each short URL has been accessed. We build the entire application test-first, prompting the AI to implement each component only after the tests are written.

The Application Architecture

Before writing any tests, we sketch the architecture:

  • URLShortener class: The core service that creates, stores, and resolves shortened URLs.
  • CodeGenerator module: Generates unique short codes.
  • Storage backend: An in-memory store (with the interface suitable for swapping to a database later).
  • Click CLI: A command-line interface for interacting with the shortener.

Phase 1: Core Code Generation Logic

Writing the First Tests

We start with the most fundamental behavior: generating short codes from URLs.

# test_code_generator.py
import pytest
from url_shortener.code_generator import generate_code


class TestCodeGenerator:
    """Tests for the short code generation module."""

    def test_generate_code_returns_string(self):
        """Generated codes should be strings."""
        code = generate_code("https://example.com")
        assert isinstance(code, str)

    def test_generate_code_has_correct_length(self):
        """Generated codes should be exactly 6 characters."""
        code = generate_code("https://example.com")
        assert len(code) == 6

    def test_generate_code_is_alphanumeric(self):
        """Generated codes should contain only alphanumeric characters."""
        code = generate_code("https://example.com")
        assert code.isalnum()

    def test_generate_code_is_deterministic(self):
        """The same URL should produce the same code."""
        code1 = generate_code("https://example.com")
        code2 = generate_code("https://example.com")
        assert code1 == code2

    def test_different_urls_produce_different_codes(self):
        """Different URLs should produce different codes."""
        code1 = generate_code("https://example.com")
        code2 = generate_code("https://different.com")
        assert code1 != code2

    @pytest.mark.parametrize("url", [
        "https://example.com",
        "https://example.com/path/to/page",
        "https://example.com/path?query=value&other=123",
        "https://sub.domain.example.com/page#fragment",
        "http://example.com",
    ])
    def test_generate_code_handles_various_urls(self, url):
        """Code generation should work for various URL formats."""
        code = generate_code(url)
        assert len(code) == 6
        assert code.isalnum()

Prompting the AI

With these tests written, we prompt the AI:

"Implement a generate_code function in url_shortener/code_generator.py that passes all of the following tests. The function takes a URL string and returns a 6-character alphanumeric short code. The same URL should always produce the same code. Use hashing to generate deterministic codes."

AI Implementation

The AI produces:

# url_shortener/code_generator.py
"""Generate deterministic short codes from URLs."""
import hashlib
import string

ALPHABET = string.ascii_letters + string.digits
BASE = len(ALPHABET)


def generate_code(url: str, length: int = 6) -> str:
    """Generate a deterministic short code from a URL.

    Uses SHA-256 hashing to create a deterministic mapping from URLs
    to short alphanumeric codes.

    Args:
        url: The URL to generate a code for.
        length: The length of the generated code (default 6).

    Returns:
        A string of alphanumeric characters of the specified length.
    """
    hash_bytes = hashlib.sha256(url.encode("utf-8")).digest()
    hash_int = int.from_bytes(hash_bytes[:8], byteorder="big")

    code_chars = []
    for _ in range(length):
        hash_int, remainder = divmod(hash_int, BASE)
        code_chars.append(ALPHABET[remainder])

    return "".join(code_chars)

Running the Tests

$ pytest test_code_generator.py -v
========================= test session starts ==========================
test_code_generator.py::TestCodeGenerator::test_generate_code_returns_string PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_has_correct_length PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_is_alphanumeric PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_is_deterministic PASSED
test_code_generator.py::TestCodeGenerator::test_different_urls_produce_different_codes PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_handles_various_urls[url0] PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_handles_various_urls[url1] PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_handles_various_urls[url2] PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_handles_various_urls[url3] PASSED
test_code_generator.py::TestCodeGenerator::test_generate_code_handles_various_urls[url4] PASSED
========================= 10 passed in 0.02s ===========================

All tests pass on the first try. The specification was clear enough for the AI to generate correct code.

Phase 2: The URL Shortener Service

Writing Service Tests

Now we test the main service class:

# test_shortener_service.py
import pytest
from url_shortener.service import URLShortener


@pytest.fixture
def shortener():
    """Create a fresh URLShortener instance for each test."""
    return URLShortener()


class TestURLShortener:
    """Tests for the URL shortener service."""

    def test_shorten_returns_short_code(self, shortener):
        """Shortening a URL should return a short code string."""
        code = shortener.shorten("https://example.com")
        assert isinstance(code, str)
        assert len(code) == 6

    def test_resolve_returns_original_url(self, shortener):
        """Resolving a code should return the original URL."""
        code = shortener.shorten("https://example.com")
        original = shortener.resolve(code)
        assert original == "https://example.com"

    def test_resolve_unknown_code_raises_error(self, shortener):
        """Resolving an unknown code should raise KeyError."""
        with pytest.raises(KeyError, match="not found"):
            shortener.resolve("ZZZZZZ")

    def test_shorten_same_url_returns_same_code(self, shortener):
        """Shortening the same URL twice should return the same code."""
        code1 = shortener.shorten("https://example.com")
        code2 = shortener.shorten("https://example.com")
        assert code1 == code2

    def test_shorten_invalid_url_raises_error(self, shortener):
        """Shortening an invalid URL should raise ValueError."""
        with pytest.raises(ValueError, match="Invalid URL"):
            shortener.shorten("not-a-url")

    def test_shorten_empty_string_raises_error(self, shortener):
        """Shortening an empty string should raise ValueError."""
        with pytest.raises(ValueError, match="Invalid URL"):
            shortener.shorten("")

    def test_access_count_starts_at_zero(self, shortener):
        """A newly shortened URL should have zero access count."""
        code = shortener.shorten("https://example.com")
        assert shortener.get_access_count(code) == 0

    def test_resolve_increments_access_count(self, shortener):
        """Each resolution should increment the access count."""
        code = shortener.shorten("https://example.com")
        shortener.resolve(code)
        shortener.resolve(code)
        shortener.resolve(code)
        assert shortener.get_access_count(code) == 3

    def test_get_access_count_unknown_code_raises_error(self, shortener):
        """Getting access count for unknown code should raise KeyError."""
        with pytest.raises(KeyError):
            shortener.get_access_count("ZZZZZZ")

    def test_list_all_urls(self, shortener):
        """Listing all URLs should return all shortened entries."""
        shortener.shorten("https://example.com")
        shortener.shorten("https://other.com")
        all_urls = shortener.list_all()
        assert len(all_urls) == 2

Prompting the AI

"Implement a URLShortener class in url_shortener/service.py that passes all of the following tests. The class should use the generate_code function from url_shortener/code_generator.py. Store URLs in memory using a dictionary. Validate URLs using urllib.parse. Track access counts."

AI Implementation and First Failure

The AI generates the implementation. We run the tests:

$ pytest test_shortener_service.py -v
...
test_shortener_service.py::TestURLShortener::test_shorten_invalid_url_raises_error FAILED
...
E       ValueError not raised

The AI's URL validation was too lenient --- it accepted any non-empty string. We share the failure with the AI:

"The test test_shorten_invalid_url_raises_error failed. Your URL validation accepts 'not-a-url' as valid. Please fix the validation to require that URLs start with 'http://' or 'https://' and contain a valid domain."

The AI corrects the validation, and all tests pass. This is the TDD cycle working exactly as intended: the test caught a real deficiency in the AI's implementation.

Phase 3: Integration Testing

With individual components working, we write integration tests that verify the components work together:

# test_integration.py
import pytest
from url_shortener.service import URLShortener


@pytest.fixture
def populated_shortener():
    """Create a shortener with several URLs already stored."""
    shortener = URLShortener()
    urls = [
        "https://example.com",
        "https://python.org",
        "https://github.com/user/repo",
        "https://docs.python.org/3/library/unittest.html",
    ]
    codes = {}
    for url in urls:
        code = shortener.shorten(url)
        codes[url] = code
    return shortener, codes


class TestIntegration:
    """Integration tests for the URL shortener system."""

    def test_full_shorten_resolve_cycle(self, populated_shortener):
        """Test the complete cycle: shorten -> resolve -> verify."""
        shortener, codes = populated_shortener
        for original_url, code in codes.items():
            resolved = shortener.resolve(code)
            assert resolved == original_url

    def test_access_tracking_across_multiple_urls(self, populated_shortener):
        """Test that access counts are tracked independently."""
        shortener, codes = populated_shortener
        urls = list(codes.keys())

        # Access first URL 3 times
        for _ in range(3):
            shortener.resolve(codes[urls[0]])

        # Access second URL 1 time
        shortener.resolve(codes[urls[1]])

        assert shortener.get_access_count(codes[urls[0]]) == 3
        assert shortener.get_access_count(codes[urls[1]]) == 1
        assert shortener.get_access_count(codes[urls[2]]) == 0

    def test_many_urls_no_collision(self):
        """Test that 1000 different URLs produce unique codes."""
        shortener = URLShortener()
        codes = set()
        for i in range(1000):
            code = shortener.shorten(f"https://example.com/page/{i}")
            codes.add(code)
        assert len(codes) == 1000

Phase 4: Property-Based Testing

Finally, we add Hypothesis tests to catch edge cases we might not think of:

# test_properties.py
from hypothesis import given, assume
from hypothesis import strategies as st
from url_shortener.service import URLShortener
from url_shortener.code_generator import generate_code


@given(st.from_regex(r"https://[a-z]{3,20}\.[a-z]{2,6}/[a-z0-9/]{0,50}", fullmatch=True))
def test_roundtrip_property(url):
    """Shortening and resolving should return the original URL."""
    shortener = URLShortener()
    code = shortener.shorten(url)
    assert shortener.resolve(code) == url


@given(st.from_regex(r"https://[a-z]{3,20}\.[a-z]{2,6}", fullmatch=True))
def test_code_length_property(url):
    """Generated codes should always be 6 characters."""
    code = generate_code(url)
    assert len(code) == 6


@given(st.from_regex(r"https://[a-z]{3,20}\.[a-z]{2,6}", fullmatch=True))
def test_code_determinism_property(url):
    """The same URL should always produce the same code."""
    assert generate_code(url) == generate_code(url)

These property-based tests run hundreds of times with randomly generated URLs, providing far stronger verification than our hand-picked examples alone.

Results and Lessons Learned

Final Test Count

  • Unit tests: 10 (code generator) + 10 (shortener service) = 20
  • Integration tests: 3
  • Property-based tests: 3
  • Total: 26 tests

Key Takeaways

  1. Tests caught a real bug. The AI's first URL validation was too lenient. Without the test, this would have gone unnoticed until a user supplied a malformed URL.

  2. Tests made prompts more precise. Including test code in our prompts gave the AI an unambiguous specification to work from, resulting in better first-attempt implementations.

  3. Incremental building works. By building one component at a time with tests, we caught problems early when they were cheap to fix.

  4. Property-based tests added confidence. Our hand-written tests covered the cases we thought of. Hypothesis covered the cases we did not.

  5. The developer stayed in control. At no point did the AI decide what the application should do. The tests defined the requirements; the AI implemented them. This is the TDD-AI workflow at its best.

Time Investment

The TDD approach took approximately 30% longer than generating the entire application in one prompt. However, the resulting code had zero bugs when deployed, while a one-shot generation approach (tested on the same specification) had three bugs that required debugging. The TDD approach was ultimately faster when you account for debugging time.

Conclusion

This case study demonstrates that TDD with AI is not just viable --- it is a superior workflow for building reliable applications. By writing tests first, you maintain control over the specification while leveraging AI's implementation speed. The tests serve as both verification and documentation, and they catch errors that visual code review would miss. For any application where correctness matters, the TDD-AI approach should be your default workflow.