Case Study 2: The Shared Prompt Library
Overview
Company: Meridian Health, a health-tech company building clinical workflow software Team size: 14 backend engineers, 8 frontend engineers, 4 data engineers Tech stack: Python (Django), TypeScript (Next.js), PostgreSQL, Apache Kafka Timeline: September 2025 through February 2026 Key challenge: Building and maintaining a team prompt library that improves quality and consistency over time
The Problem
Meridian Health's engineering team had been using AI coding assistants for nearly a year, but each developer maintained their own collection of prompts. Some developers had elaborate personal prompt collections stored in private Notion pages or text files. Others typed prompts from memory each time, producing slightly different versions each session.
The consequences were predictable: inconsistent code quality, duplicated effort, and a persistent knowledge gap between developers who had invested time in prompt engineering and those who had not.
The tipping point came during a sprint retrospective when three developers independently reported spending significant time on the same task: generating Django model serializers with proper validation. Each had written their own prompt. Each had produced working but stylistically different code. When the team compared the three approaches, they realized that the best prompt produced code that was roughly three times faster to review than the worst prompt, yet the developer with the worst prompt had no idea the better version existed.
Tech lead Priya Sharma proposed a solution: "What if we had a shared prompt library? Like a code library, but for prompts. Versioned, reviewed, and continuously improved."
Designing the Library
Requirements Gathering
Priya surveyed the team to understand their needs. The survey revealed:
- Developers used prompts for 12 distinct task categories, from model generation to migration scripts to API documentation.
- The average developer had 8-15 "go-to" prompts they used regularly.
- 85% of developers said they would use a shared library if it were easy to search and access.
- The top complaint about shared resources was staleness -- prompts and documentation that were not kept up to date.
Architecture Decisions
Based on the survey, the team made several key design decisions:
Storage: Git repository, not a wiki. The team chose to store the prompt library as a Git repository rather than a wiki or Notion workspace. The reasoning: Git provided version control (critical for tracking prompt evolution), pull request workflows (natural fit for prompt review), and proximity to the codebase (developers already had Git open).
Format: YAML with Markdown content. Each prompt was stored as a YAML file with structured metadata and Markdown-formatted prompt content. This made prompts both machine-parseable and human-readable:
id: django-serializer-v3
category: code-generation
subcategory: django
title: "Django REST Framework Serializer"
description: "Generates a DRF serializer with validation, nested relationships, and custom field handling."
author: priya.sharma
created: "2025-09-20"
updated: "2025-12-15"
version: 3
tags:
- django
- drf
- serializer
- validation
- api
effectiveness_rating: 4.6
usage_count: 89
min_ai_tool: "claude-3.5-sonnet or equivalent"
variables:
- name: model_name
description: "Name of the Django model to serialize"
example: "Patient"
- name: fields_spec
description: "List of fields with types and validation requirements"
example: "name (str, required, max 100 chars), date_of_birth (date, required), email (email, optional)"
- name: nested_relations
description: "Any nested serializers needed"
example: "appointments (list, read-only), primary_physician (single, writable)"
- name: custom_validation
description: "Any custom validation rules"
example: "date_of_birth must be in the past, email must be unique"
prompt: |
Generate a Django REST Framework serializer for the {{model_name}} model.
Fields:
{{fields_spec}}
Nested relationships:
{{nested_relations}}
Custom validation:
{{custom_validation}}
Requirements:
- Use ModelSerializer as the base class
- Add explicit field declarations with help_text for API documentation
- Implement validate_<field> methods for field-level validation
- Implement validate() for cross-field validation
- Handle nested serializers with proper create/update methods
- Include type hints on all methods
- Use Google-style docstrings
- Follow Meridian Health coding conventions (see .ai/conventions.md)
version_history:
- version: 3
date: "2025-12-15"
changes: "Added support for nested writable serializers, improved validation method generation"
author: priya.sharma
- version: 2
date: "2025-10-28"
changes: "Added custom validation section, fixed help_text generation"
author: james.wu
- version: 1
date: "2025-09-20"
changes: "Initial version"
author: priya.sharma
Organization: Category-based directory structure. The repository was organized by category:
prompt-library/
├── README.md
├── CONTRIBUTING.md
├── code-generation/
│ ├── django/
│ │ ├── django-model-v2.yaml
│ │ ├── django-serializer-v3.yaml
│ │ ├── django-view-v2.yaml
│ │ └── django-migration-v1.yaml
│ ├── api/
│ │ ├── api-endpoint-v3.yaml
│ │ └── api-error-handling-v2.yaml
│ └── typescript/
│ ├── react-component-v2.yaml
│ └── next-api-route-v1.yaml
├── testing/
│ ├── unit-test-v4.yaml
│ ├── integration-test-v2.yaml
│ └── fixture-generation-v1.yaml
├── refactoring/
│ ├── extract-function-v2.yaml
│ └── simplify-conditionals-v1.yaml
├── documentation/
│ ├── api-docs-v2.yaml
│ └── architecture-decision-v1.yaml
├── debugging/
│ ├── trace-analysis-v1.yaml
│ └── performance-diagnosis-v1.yaml
└── tools/
├── search.py
├── validate.py
└── report.py
Tooling: A simple CLI. The team built a lightweight Python CLI tool (stored in the tools/ directory) that could search prompts, display them with variables filled in, and record usage statistics.
The Contribution Workflow
Submitting a New Prompt
Any engineer could submit a new prompt by creating a YAML file and opening a pull request. The CONTRIBUTING.md file documented the process:
- Create the YAML file following the template structure.
- Test the prompt at least three times with different realistic inputs. Include the test results in the PR description.
- Open a pull request with: - The prompt YAML file - A description of the prompt's purpose and when to use it - Three example outputs from testing - Any known limitations
The Review Process
Prompt reviews were lighter than code reviews but followed a defined process:
- Format check. An automated CI check validated the YAML structure and required fields.
- Peer review. One reviewer from a different sub-team tested the prompt with their own inputs and evaluated the output against team conventions.
- Merge. If the review was positive, the prompt was merged. If issues were found, the author revised the prompt and resubmitted.
Reviews typically took one to two days. The team agreed on a norm: prompt PRs should never block on review for more than 48 hours.
Updating Existing Prompts
Updates followed the same PR process. The key requirement: the version number must be incremented, and the version_history must be updated. This made it easy to track what changed and when.
Growing the Library
Month 1: Seeding (September 2025)
The team held a "prompt harvest" session where every developer contributed their three best prompts. This produced 39 candidate prompts. After deduplication and review, 22 prompts were merged into the library.
Initial reception was mixed. Some developers eagerly adopted the library. Others continued using their personal prompts out of habit.
Month 2: Building Momentum (October 2025)
Priya introduced two practices that accelerated adoption:
The "Prompt of the Week" highlight. Each Monday, one prompt was featured in the team's Slack channel with a short write-up explaining when and how to use it. This kept the library visible and gradually introduced all prompts to the team.
Usage tracking. The CLI tool logged each time a developer used a prompt from the library (with the developer's consent). Monthly usage reports showed which prompts were most popular and which were unused.
By the end of month two, library usage had increased from 40% to 72% of developers using it at least once per week.
Months 3-4: Maturation (November-December 2025)
The library grew to 38 prompts. More importantly, existing prompts were being improved. The Django serializer prompt, for example, went through three versions as different developers discovered edge cases and improved the output.
The team noticed a pattern: prompts that were updated frequently were the most used and highest rated. Active maintenance was a signal of value.
Two prompts were retired during this period. The migration-rollback-v1 prompt consistently produced incorrect rollback scripts for complex migrations involving data transformations. After two failed attempts to fix it, the team decided to retire the prompt and document the limitation. The docs-changelog-v1 prompt was retired because a new AI-powered documentation tool made it unnecessary.
Months 5-6: Optimization (January-February 2026)
By this point, the library was a natural part of the team's workflow. New features added during this phase:
Prompt composition. Some complex tasks required combining multiple prompts. The team added a "related_prompts" field that linked prompts together. For example, the API endpoint prompt linked to the serializer prompt and the test generation prompt, creating a workflow for building a complete API endpoint.
Context packs. The team created "context packs" -- pre-built context snippets that could be included with any prompt. For example, a "patient model context" pack included the model definition, related models, and relevant business rules. Developers could include a context pack when using any prompt, ensuring consistent context regardless of who was prompting.
Effectiveness dashboards. Monthly reports showed usage trends, rating changes, and correlation between prompt library usage and code quality metrics. The data showed a clear correlation: pull requests that used library prompts had 35% fewer review comments and 28% fewer post-merge defects than those using ad hoc prompts.
Impact Measurement
Quantitative Results
After six months, the team measured the library's impact:
| Metric | Before Library | After Library | Change |
|---|---|---|---|
| Avg. review comments per PR | 7.2 | 4.7 | -35% |
| Avg. review cycles per PR | 2.1 | 1.4 | -33% |
| Post-merge defects (per sprint) | 8.3 | 5.2 | -37% |
| Time to generate serializer code | 22 min avg | 8 min avg | -64% |
| Naming convention violations | 14 per sprint | 3 per sprint | -79% |
| New developer time to first PR | 6 days | 3 days | -50% |
Qualitative Results
The developer survey revealed softer but equally important improvements:
- Confidence. Junior developers reported feeling more confident because the prompts embedded senior developers' knowledge and conventions.
- Consistency. Code reviews became faster and more focused on business logic rather than style issues.
- Collaboration. The prompt library became a natural conversation starter. Developers discussed prompts in the way they discussed code: suggesting improvements, reporting bugs, and sharing discoveries.
- Onboarding. New hires described the prompt library as "the most useful onboarding resource" because it simultaneously taught them the team's AI workflow and coding conventions.
Unexpected Benefits
Several benefits were not anticipated:
Codified knowledge. The process of writing a prompt forced developers to articulate their expertise explicitly. The serializer prompt, for example, encoded Priya's deep knowledge of DRF best practices in a form that any developer could use. The library became an unintentional knowledge base.
AI model upgrade resilience. When the team upgraded their AI model in December 2025, some personal prompts broke. Library prompts were quickly updated by the team, and all developers immediately benefited from the fixes. Without the library, each developer would have had to fix their own prompts individually.
Convention enforcement. The prompts themselves enforced coding conventions more effectively than any linter. By including convention requirements in every prompt, the AI naturally produced compliant code. Convention violations dropped dramatically.
Challenges and Solutions
The Freshness Problem
The biggest ongoing challenge was keeping prompts up to date. AI models change, team conventions evolve, and frameworks release new versions. Stale prompts produce suboptimal or incorrect code.
Solution: The team implemented a "freshness check" policy. Every prompt had a "last verified" date. Prompts not verified within 60 days were flagged with a warning in the CLI output. High-usage prompts were verified monthly; low-usage prompts were verified quarterly.
The Governance Debate
Midway through the project, the team debated governance. Some engineers wanted stricter quality gates: multiple reviewers, formal testing requirements, mandatory usage quotas. Others wanted maximum openness: anyone can add anything, let the ratings sort it out.
Solution: The team chose a middle path. A single cross-team reviewer was required for merging, but there were no usage quotas or mandatory adoption. Prompts with ratings below 3.0 after ten uses were automatically flagged for review. This balanced quality with accessibility.
The Discoverability Problem
With 38+ prompts, finding the right one was not always obvious, especially for new team members.
Solution: The team added three discovery mechanisms: (1) category-based browsing in the repository, (2) tag-based search in the CLI tool, and (3) a "getting started" guide that recommended the ten most important prompts for new developers, organized by common task.
The "Not Invented Here" Syndrome
A few developers resisted using shared prompts, preferring their own versions. They felt that shared prompts were too generic or did not match their personal style.
Solution: The team addressed this by showing concrete data: code generated from library prompts had fewer review comments and fewer defects. They also emphasized that developers were welcome to modify library prompts for their specific needs -- the library was a starting point, not a straitjacket.
Lessons Learned
-
Seed the library with real prompts. Starting with prompts that developers were already using successfully created immediate value and credibility. Starting from scratch with theoretical prompts would have been slower and less compelling.
-
Make usage easy and visible. The CLI tool and weekly highlights kept the library accessible and top-of-mind. A library that requires effort to use will not be used.
-
Track and share metrics. Quantitative evidence of the library's impact was the strongest argument for continued investment. Without data, the library would have been perceived as "nice to have" rather than essential.
-
Embrace retirement. Removing underperforming prompts was as important as adding good ones. A library with dead prompts loses credibility. Active curation signals quality.
-
Prompts encode knowledge. The act of writing a prompt for the library forces developers to make their implicit knowledge explicit. This knowledge transfer is a valuable side effect that justifies the investment in prompt authorship.
-
Composition beats monoliths. Small, focused prompts that can be combined are more flexible than large, all-in-one prompts. The "related prompts" feature enabled complex workflows without complex individual prompts.
-
The library is a living system. A prompt library is not a project that is "done." It requires ongoing curation, updates, and investment. Budget time for maintenance just as you would for any codebase.
Current State and Future Plans
As of February 2026, Meridian Health's prompt library contains 52 prompts across six categories. It is used by 92% of engineers at least weekly. The team plans three enhancements for the next quarter:
- AI-powered prompt suggestions. Building a feature that analyzes the current file and task context and suggests relevant prompts from the library.
- Cross-organization sharing. Meridian Health is part of a health-tech consortium. The team plans to share a subset of non-proprietary prompts with partner companies and receive prompts in return.
- Automated testing. Building a CI pipeline that automatically tests each prompt against a set of standard inputs whenever the prompt or the AI model is updated, flagging regressions before they reach developers.
Discussion Questions
- Meridian Health chose YAML files in a Git repository. What would be the advantages and disadvantages of using a database-backed web application instead?
- The team required one cross-team reviewer for prompt PRs. How would you adapt this process for a 100-person engineering organization?
- The "Not Invented Here" syndrome was addressed with data. What other strategies could help developers who resist using shared prompts?
- How would you handle prompts that work well with one AI model but poorly with another, given that team members might use different tools?
- The case study mentions "context packs." Design a context pack for a domain you are familiar with. What would it include?