Case Study 1: Raj's Feature Sprint — Building a REST API Endpoint with AI Assistance
The Feature
It is Wednesday morning. Raj is building an endpoint that did not exist on Tuesday: a transaction search API that allows customer support teams to find payment records by a combination of filters — customer ID, date range, status, amount range, and merchant identifier. The business need is concrete: the support team is spending forty minutes per ticket doing manual database queries to answer basic customer questions.
The endpoint needs to be: - Paginated (the transaction database has 80 million records) - Performant (customer support cannot wait ten seconds for a search result) - Secure (not all support agents should see all transaction data — scope is controlled by the agent's role and assigned customer segments) - Audited (every search must be logged with the agent identity and query parameters)
Raj estimates that without AI assistance, designing and implementing this endpoint would take approximately three days. He has one day.
Architecture Discussion (45 Minutes)
Raj begins by providing Claude with complete context: the existing database schema (PostgreSQL), the current API conventions in the codebase (FastAPI, SQLAlchemy 2.0, Pydantic v2), the authentication model (JWT with role claims), and the performance requirement (p95 response time under 2 seconds).
He asks for an architecture discussion, not code:
"Before writing any code: given these constraints, discuss the architecture for the transaction search endpoint. Specifically: (1) how should the query be built dynamically from optional filter parameters, (2) how should pagination be implemented — offset vs keyset cursor — and why, (3) where should the permission scoping logic live, and (4) what indexing strategy would support the required query patterns?"
The discussion covers all four areas. On pagination, Claude recommends keyset (cursor-based) pagination over offset pagination for the 80-million-record table — offset pagination degrades significantly at high page numbers because the database must scan and skip all rows up to the offset. Raj already knew this, but the discussion surfaces a subtlety he had not considered: cursor-based pagination requires a stable sort order that includes a unique column, and his current transaction table does not have a single column suitable for this. The discussion leads to a composite cursor approach using (created_at, transaction_id) that Raj validates against his indexing strategy.
On permission scoping: Claude suggests a query-level filter applied before any search predicates execute, rather than a results-level filter applied after. Raj agrees — results-level filtering is both less performant and potentially leaky (the total count in a paginated response could reveal information about records the agent is not supposed to see). He makes a note to verify this in security review.
Forty-five minutes of discussion. No code. But Raj knows exactly what he is building.
Implementation: Pydantic Models (20 Minutes)
Raj starts with the request and response models. He asks for an explanation first:
"For the search endpoint, I need Pydantic v2 models for the request parameters and the response. Before writing code, describe what fields should be in each model and any validation constraints that should be applied."
The proposed models look correct. He asks for the code. He reviews it:
The request model correctly uses Optional fields for all filter parameters and adds sensible validators — date range validation that ensures date_from is before date_to, amount range validation that ensures amount_min is less than amount_max, and a page_size constraint capped at 100 to prevent accidentally requesting 10,000 records per page.
One issue: the model allows page_size of 0. Raj adds a gt=0 constraint.
The response model is correct, including the cursor field for pagination. Raj adds one field AI did not include: has_more, a boolean indicating whether there are additional pages. This is convenient for the frontend team and not something AI would know was needed without seeing the frontend requirements.
Implementation: Query Builder (35 Minutes)
The query building logic is the most complex part. Raj asks for it with a context note:
"Write the SQLAlchemy 2.0 query builder for the transaction search. Important: (1) use the select() syntax, not the legacy query() syntax, (2) apply permission scoping as a base filter before any user-provided filters, (3) implement keyset pagination using (created_at DESC, transaction_id DESC) composite cursor, and (4) the filters should be applied only when the corresponding parameter is provided — they are all optional."
The generated query builder function is approximately 80 lines. Raj reads every line.
He finds three things to fix:
Issue 1: The cursor decoding assumes the cursor string is valid base64 without error handling. If a client sends an invalid cursor, the endpoint will return a 500 error instead of a 400. Raj adds try/except with an appropriate HTTP exception.
Issue 2: The permission scoping filter uses an in_ clause on a list that could be empty for agents with no assigned segments. An empty in_() clause generates WHERE column IN (), which is a syntax error in some PostgreSQL versions. Raj adds a check: if the segment list is empty, return an empty result immediately.
Issue 3: The amount_min / amount_max filter uses floating point comparison for monetary values. Raj changes this to use the Decimal type throughout, consistent with how the rest of the codebase handles monetary amounts. This is a codebase convention AI did not know about.
Audit Logging (20 Minutes)
Raj asks for the audit logging middleware using the "explain then generate" pattern. The approach: a FastAPI dependency that runs after the endpoint handler, captures the request parameters and response metadata (not the response body — that would be too expensive for 80M records), and writes a structured audit record to the audit_events table.
The generated implementation is clean. Raj adds one element: the audit record should capture the number of results returned, because this is forensically important (did an agent run a search that returned all records?). He modifies the audit dependency to receive the result count from the endpoint handler.
Security Review (25 Minutes)
Before writing tests, Raj runs a security review prompt focused on the specific risk areas he has identified:
"Security review this transaction search endpoint implementation. Specific concerns: (1) could a carefully crafted cursor allow information disclosure about records the agent is not permitted to see? (2) could the filter parameters be used for a time-based enumeration attack? (3) is the audit log tamper-evident — could an agent cover their tracks? (4) are there any injection risks in the query construction?"
The security review surfaces one significant issue: the cursor encoding uses base64 but stores the raw timestamp and transaction ID. If the cursor is predictable, an agent with limited permissions could construct a cursor that starts pagination at a specific point, potentially inferring information about the transaction timeline. Raj adds HMAC signing to the cursor to make it tamper-evident.
The review also notes that the audit log is in the same database as the transaction data, meaning a database administrator (with direct database access) could modify it. Raj adds a note in the PR description: "Future improvement — route audit events to an append-only audit service. Out of scope for this implementation."
Test Generation (30 Minutes)
Raj generates tests in three rounds:
Round 1 — Unit tests for query builder: Tests for each filter combination, the pagination logic, and the permission scoping. AI generates approximately thirty tests. Raj adds five: specifically the empty segment list edge case, the invalid cursor edge case, the boundary case where page_size equals the exact number of results, and two business-logic edge cases related to date ranges that span daylight saving time boundaries.
Round 2 — Integration tests for the endpoint: Tests that make HTTP requests against a test database. Raj uses a real test database (not a mock) because the PostgreSQL-specific query behavior needs to be validated against an actual database engine.
Round 3 — Security test cases: Raj manually writes these, not AI. Tests that verify: an agent cannot construct a valid cursor for records outside their segment; a request without a valid JWT returns 401; a request with a valid JWT but insufficient permissions returns 403; the audit log contains a record for every search.
Final Review and Merge
Before submitting the PR, Raj runs the "can I explain this?" check. He walks through the implementation end to end in his head. One section — the cursor HMAC signing — he needs to re-read twice. He writes a comment in the code explaining the security rationale for the signing, because a future developer might reasonably ask "why is this signed?"
The PR is submitted Thursday morning — 26 hours after the feature was assigned. In his PR description, Raj notes that AI assistance was used throughout and lists the specific issues he caught during code review. His team has a practice of noting AI assistance in PRs to maintain transparency and to help build a shared understanding of where AI assistance is and is not effective for their codebase.
The endpoint ships to staging Thursday afternoon, production Friday morning. The support team tickets that previously took forty minutes each are resolved in under three minutes with the search interface.
Lessons from the Feature Sprint
Raj's post-implementation notes:
The architecture discussion time was the highest-leverage investment. 45 minutes of discussion before any code was written prevented three potential re-implementations: the offset pagination approach, a results-level permission filter, and an unsafe cursor implementation.
AI caught its own issues when given the right review prompts. The HMAC signing requirement came from AI's own security review. The security review found a security issue that AI's code generation had missed — which is exactly the layered review approach working as designed.
Code review quality depends on reading thoroughly. The three implementation issues he caught (cursor error handling, empty segment list, Decimal type for money) were all caught by reading every line. None were obvious from glancing at the code.
The audit logging decision to capture result count was human knowledge. AI generated a correct audit logging implementation. The result count field was added because Raj knows from experience that forensic investigators consistently need to know "how many results did this query return?" AI had no way to know this without being told.