Case Study 1: Chain-of-Thought for Algorithm Design

Using Step-by-Step Reasoning Prompts to Implement a Complex Scheduling Algorithm


Background

MedShift Solutions is a healthcare technology company that builds workforce management software for hospitals. Their flagship product handles nurse scheduling — a notoriously difficult problem that involves balancing dozens of constraints: shift coverage requirements, nurse qualifications, union rules, fatigue regulations, personal preferences, and fair distribution of undesirable shifts.

The scheduling team had been using AI coding assistants for routine development tasks — CRUD endpoints, data models, utility functions — with good results. But when they attempted to use AI to implement the core scheduling algorithm, the results were consistently disappointing. Direct prompts like "write a nurse scheduling algorithm" produced naive implementations that ignored critical constraints, or overly complex solutions copied from academic papers that did not fit their specific domain.

The team lead, Priya Mehta, had recently studied chain-of-thought prompting and proposed an experiment: could structured, step-by-step reasoning prompts guide the AI through the complexity of the scheduling problem and produce a usable implementation?

The Problem

The scheduling system needed to solve a constraint satisfaction problem with the following requirements:

Hard constraints (must never be violated): - Every shift must have the minimum required number of nurses - No nurse works more than 5 consecutive days - At least 12 hours must separate the end of one shift from the start of the next - Nurses can only be assigned to shifts matching their qualifications (ICU, ER, general ward) - No nurse exceeds 40 hours per week (or 48 with overtime approval)

Soft constraints (should be satisfied when possible): - Distribute weekend shifts fairly across all nurses - Honor nurse shift preferences when possible - Minimize the number of split weekends (working Saturday but not Sunday, or vice versa) - Group shifts together (prefer 3 consecutive days on, 4 off over scattered single days) - Balance overtime equally across the team

Scale: The system needed to schedule 50-80 nurses across 3 shifts per day for a 4-week period, respecting all hard constraints and optimizing soft constraints.

The First Attempt: Direct Prompting

Priya's team first tried a direct prompt approach:

Write a Python function that generates a nurse scheduling
solution for a hospital. It should handle multiple shifts,
nurse qualifications, maximum hours, rest requirements, and
fair distribution of undesirable shifts. Use constraint
satisfaction programming.

The AI produced a response using the python-constraint library. The code was syntactically correct but had serious problems:

  1. Incomplete constraint modeling. The implementation handled shift coverage and qualifications but ignored the 12-hour rest requirement and consecutive-day limits.
  2. No optimization. All hard constraints were treated equally, and soft constraints were ignored entirely. The algorithm found a valid schedule but not a good one.
  3. Scalability failure. For 50 nurses over 4 weeks, the constraint solver ran for over 30 minutes without finding a solution because the search space was not adequately pruned.
  4. No explanation. The code was 200 lines of constraint definitions with no comments explaining the modeling decisions.

The Chain-of-Thought Approach

Priya redesigned the approach using chain-of-thought prompting, breaking the problem into reasoning phases. Here is how the process unfolded over a series of carefully structured prompts.

Phase 1: Problem Modeling

I need to implement a nurse scheduling algorithm. Before writing
any code, I need you to think through the problem carefully.

## Step 1: Problem Classification
What type of optimization problem is nurse scheduling? Compare
it to known problem classes (constraint satisfaction, integer
linear programming, combinatorial optimization). What are the
implications of this classification for our choice of algorithm?

## Step 2: Decision Variables
What are the decision variables? Define them formally. What does
the solution space look like? Estimate its size for 60 nurses,
3 shifts per day, and 28 days.

## Step 3: Constraint Analysis
For each constraint listed below, classify it as either a hard
constraint (absolute requirement) or a modeling challenge (how
do we represent this mathematically?):
- Every shift needs minimum staffing
- No more than 5 consecutive work days
- 12-hour minimum rest between shifts
- Qualification-based assignment
- 40-hour weekly limit
- Fair weekend distribution
- Preference honoring
- Split weekend minimization

For each constraint, describe how you would represent it
mathematically or programmatically.

Do not write any code yet. Focus only on analysis.

The AI's response was revelatory. It classified the problem as a multi-objective constraint optimization problem, related to the Nurse Scheduling Problem (NSP) — a well-studied NP-hard problem in operations research. It estimated the solution space at approximately 3^(60 x 84) possibilities (each of 60 nurses can be assigned to one of 3 shifts or off for each of 84 shifts), confirming that brute-force was infeasible. It recommended a hybrid approach: use constraint propagation to enforce hard constraints and a metaheuristic (simulated annealing or genetic algorithm) to optimize soft constraints.

Critically, the reasoning process identified a modeling subtlety that direct prompting had missed: the 12-hour rest constraint creates temporal dependencies between consecutive days, meaning shifts cannot be assigned independently day-by-day.

Phase 2: Algorithm Selection

Based on your analysis, you recommended a hybrid approach with
constraint propagation and metaheuristic optimization.

## Step 4: Algorithm Trade-off Analysis
Compare these three algorithmic approaches for our specific
problem:

A) Pure constraint programming (python-constraint or OR-Tools CP-SAT)
B) Integer Linear Programming (PuLP or OR-Tools LP)
C) Hybrid: greedy initial solution + simulated annealing refinement

For each approach, analyze:
1. Can it handle all our hard constraints?
2. Can it optimize our soft constraints?
3. What is the expected runtime for 60 nurses, 28 days?
4. How easy is it to add new constraints later?
5. How interpretable is the solution (can we explain why a
   nurse got a specific schedule)?

Recommend the best approach and justify your choice. Still no
code — just analysis.

The AI's step-by-step comparison was thorough. It recommended Approach C (hybrid greedy + simulated annealing) for three reasons: (1) it produces solutions quickly even for large instances, (2) the greedy initial solution guarantees all hard constraints are met from the start, (3) simulated annealing naturally handles multi-objective optimization of soft constraints by combining them into a weighted objective function. The AI also noted that approaches A and B, while theoretically optimal, were likely to struggle with the problem size and the complexity of soft constraints like "fair distribution."

Phase 3: Data Model Design

Good analysis. Now let's design the data structures.

## Step 5: Data Model
Design Python dataclasses for:
- Nurse (with qualifications, preferences, constraints)
- Shift (with time, required staffing, required qualifications)
- Schedule (the full assignment matrix)
- ScheduleConstraints (configuration of all constraints)

Think through:
- How will we efficiently check if a nurse can work a shift?
- How will we calculate the objective function quickly?
- What data structure allows fast neighbor generation for
  simulated annealing?

Design the data model with these performance requirements in
mind. Include type hints and docstrings. This is the first code
we'll write.

Because the AI had already thought through the algorithm requirements in phases 1 and 2, the data models were designed with the algorithm's needs in mind. The Schedule class used a NumPy array for the assignment matrix (nurses x shifts) for efficient manipulation, with helper methods for checking constraint violations and calculating the objective function.

Phase 4: Implementation with Reasoning

Now implement the scheduling algorithm. For each major function,
explain your reasoning before writing the code:

## Step 6: Greedy Initial Solution
Before coding the greedy assignment:
1. What order should nurses be assigned? (Most constrained first?
   Random?)
2. What order should shifts be filled? (Understaffed first?)
3. How do you handle situations where no valid assignment exists
   for a shift?

Then implement the greedy solver.

## Step 7: Simulated Annealing Refinement
Before coding the SA loop:
1. What is the neighbor function? (What changes do you make to
   explore nearby solutions?)
2. What is the objective function? (How do you weight the soft
   constraints?)
3. What cooling schedule? (Initial temperature, cooling rate,
   stopping condition)
4. How do you ensure hard constraints are never violated during
   neighbor generation?

Then implement the simulated annealing optimizer.

The reasoning before each function caught several important design decisions:

  • Nurse ordering: The AI reasoned that most-constrained-first ordering (nurses with the fewest qualification matches or the most restrictions) should be assigned first, because they have fewer valid options and are harder to place later. This insight significantly improved the quality of the greedy solution.

  • Neighbor function: The AI identified three neighbor operations — swap (exchange shifts between two nurses), move (reassign a shift from one nurse to another), and block swap (exchange entire day assignments). It reasoned that a mix of these operations would provide both fine-grained and coarse-grained exploration of the solution space.

  • Hard constraint preservation: The AI recognized that random neighbor generation might violate hard constraints, so it designed a "constraint-aware neighbor generator" that only proposes moves that maintain hard constraint satisfaction. This was a critical insight that direct prompting had missed entirely.

Phase 5: Verification

Now let's verify the implementation.

## Step 8: Verification
1. Design a small test case (5 nurses, 3 shifts, 7 days) where
   you can manually verify the correct solution
2. Trace through the greedy algorithm step by step on this test case
3. Identify any edge cases that could cause the algorithm to fail
4. Write pytest tests for:
   - All hard constraints are satisfied
   - The greedy solver produces a valid initial solution
   - Simulated annealing improves the objective function
   - The algorithm handles edge cases (more shifts than nurses,
     nurse with no valid shifts)

Results

The chain-of-thought approach produced a scheduling algorithm that was dramatically better than the direct-prompting attempt:

Metric Direct Prompt Chain-of-Thought
Hard constraints satisfied 3 of 5 5 of 5
Soft constraints modeled 0 of 5 5 of 5
Runtime (60 nurses, 28 days) >30 min (timeout) 45 seconds
Code lines 200 580
Comments and docstrings 5 47
Test cases 0 18
Correctness on manual test Failed Passed

The final implementation was not production-ready out of the box — the team spent another two weeks integrating it with their database layer, adding a web interface, and tuning the simulated annealing parameters. But the core algorithm was sound, well-structured, and well-documented. Priya estimated that the chain-of-thought approach saved the team approximately three weeks compared to implementing the algorithm from scratch, even accounting for the time spent crafting and iterating on prompts.

Key Lessons

Lesson 1: Separate reasoning from coding. The most impactful improvement came from explicitly asking the AI to analyze the problem before writing any code. The first three phases produced no code at all — only analysis — but that analysis shaped every subsequent decision.

Lesson 2: Phase the reasoning. Do not ask the AI to reason about everything at once. Phase 1 focused on problem classification. Phase 2 focused on algorithm selection. Phase 3 focused on data modeling. Each phase built on the previous one, and each phase got the AI's full attention.

Lesson 3: Ask for reasoning at the function level. In Phase 4, asking "before coding the greedy assignment, what order should nurses be assigned?" produced an insight (most-constrained-first) that the AI would not have discovered if asked to just "implement the greedy solver." The per-function reasoning caught design decisions that would otherwise have been made implicitly and often incorrectly.

Lesson 4: Use reasoning to catch omissions. The 12-hour rest constraint was missed by the direct prompt but caught during Phase 1's constraint analysis. The constraint-aware neighbor generator was not something the team had thought of, but emerged naturally from the Phase 4 reasoning about hard constraint preservation.

Lesson 5: Chain-of-thought scales with problem complexity. For simple CRUD operations, chain-of-thought adds overhead without benefit. But for NP-hard constraint satisfaction problems, the explicit reasoning is not just helpful — it is essential. The more complex the problem, the more value chain-of-thought provides.

Applying This to Your Work

You do not need to be solving NP-hard problems to benefit from chain-of-thought prompting. Any time you face a coding task where the "right approach" is not immediately obvious, try this three-phase structure:

  1. Analyze: Ask the AI to classify the problem, estimate complexity, and identify key challenges — without writing code.
  2. Design: Ask the AI to compare approaches, select the best one, and design data structures — with reasoning for each decision.
  3. Implement: Ask the AI to implement with per-function reasoning, explaining its approach before coding each piece.

The total prompting effort is higher than a single direct prompt, but the quality of the result makes the investment worthwhile for any problem of meaningful complexity.

Epilogue

Six months after implementing the chain-of-thought-developed scheduler, MedShift's team had extended it with new constraint types (nurse seniority preferences, training day requirements, float pool assignments) and reported that the modular, well-documented structure made each extension straightforward. The chain-of-thought reasoning in the original prompts served as design documentation, helping new team members understand not just what the algorithm does but why it works that way.

Priya's team now uses chain-of-thought prompting as their default approach for any algorithmic task. As Priya put it: "Direct prompting gives you code. Chain-of-thought gives you understanding and code. When you're building something complex, you need both."