Case Study 1: Refactoring a Monolith into Modules
Overview
In this case study, we take a realistic monolithic Pascal program — a student grade management system of approximately 800 lines — and methodically refactor it into a clean multi-unit architecture. The process illustrates how to identify module boundaries, manage dependencies, and verify that the refactored version preserves all original behavior.
This case study mirrors the real-world experience of inheriting a working but poorly organized codebase and improving its structure without breaking it.
The Monolithic Program
Imagine you have inherited GradeTracker.pas, a single-file program that:
- Defines a
TStudentrecord with name, ID, and an array of grades - Stores students in a global array
- Provides procedures for adding students, entering grades, and calculating averages
- Saves and loads data from a binary file
- Generates reports (individual transcripts, class summaries, honor roll lists)
- Has a text-based menu UI
The program works correctly but has grown organically. There are 47 procedures and functions, 12 global variables, and no clear separation between domain logic, storage, reporting, and user interface. Several functions are duplicated with slight variations (two different average calculators, three different formatting functions).
Here is a representative fragment of the structure:
program GradeTracker;
uses SysUtils;
const
MaxStudents = 200;
MaxGrades = 50;
type
TStudent = record
Name: string[60];
StudentID: string[10];
Grades: array[1..MaxGrades] of Real;
GradeCount: Integer;
Active: Boolean;
end;
var
Students: array[1..MaxStudents] of TStudent;
StudentCount: Integer;
DataFile: string;
Modified: Boolean;
{ ...8 more global variables... }
procedure AddStudent(const AName, AID: string);
begin { ...modifies Students array directly... } end;
procedure EnterGrade(Index: Integer; Grade: Real);
begin { ...modifies Students[Index] directly... } end;
function CalcAverage(Index: Integer): Real;
begin { ...reads Students[Index] directly... } end;
procedure SaveData;
begin { ...uses DataFile global, writes Students array... } end;
procedure LoadData;
begin { ...reads into Students array... } end;
procedure PrintTranscript(Index: Integer);
begin { ...reads Students[Index], formats output... } end;
procedure PrintClassSummary;
begin { ...reads entire Students array, formats output... } end;
procedure PrintHonorRoll;
begin { ...reads Students, calls CalcAverage, formats output... } end;
{ ...38 more procedures... }
procedure MainMenu;
begin
repeat
WriteLn('1. Add Student');
WriteLn('2. Enter Grade');
{ ...etc... }
until Choice = 0;
end;
begin
LoadData;
MainMenu;
if Modified then SaveData;
end.
Step 1: Identify Responsibilities
Before writing any code, we analyze the monolith and categorize every procedure:
| Category | Procedures | Count |
|---|---|---|
| Domain Model | TStudent, AddStudent, EnterGrade, CalcAverage, CalcGPA, IsPasssing | 6 |
| Storage | SaveData, LoadData, ExportCSV, ImportCSV, BackupData | 5 |
| Reporting | PrintTranscript, PrintClassSummary, PrintHonorRoll, PrintGradeDistribution | 4 |
| UI / Menu | MainMenu, GetStudentChoice, ShowStudentList, ValidateInput, ConfirmAction | 5 |
| Utility | FormatGrade, FormatName, PadString, RepeatChar | 4 |
| Duplicates | CalcAvg2 (duplicate of CalcAverage), FormatGrade2 (slightly different) | 2 |
This gives us a natural decomposition into five units (four categories plus a utility unit), and reveals two duplicates to eliminate.
Step 2: Design the Dependency Graph
Before splitting files, we design the dependency structure:
GradeTracker.pas (main program)
|
+-- GradeUI
| |
| +-- GradeModels
| +-- GradeReports
| +-- GradeStorage
|
+-- GradeReports
| |
| +-- GradeModels
| +-- FormatUtils
|
+-- GradeStorage
| |
| +-- GradeModels
|
+-- GradeModels
| |
| (no dependencies beyond SysUtils)
|
+-- FormatUtils
|
(no dependencies beyond SysUtils)
Key decisions: - GradeModels is the foundation — no upward dependencies - FormatUtils is a leaf utility — no domain dependencies - GradeStorage depends only on GradeModels - GradeReports depends on GradeModels and FormatUtils - GradeUI depends on everything (it is the top-level coordinator) - No circular dependencies exist
Step 3: Extract the Domain Model
We start with GradeModels because everything depends on it:
unit GradeModels;
{$mode objfpc}{$H+}
interface
uses
SysUtils;
const
MaxGrades = 50;
type
TStudent = record
Name: string;
StudentID: string;
Grades: array[1..MaxGrades] of Real;
GradeCount: Integer;
Active: Boolean;
end;
TStudentArray = array of TStudent;
function CreateStudent(const AName, AID: string): TStudent;
procedure AddGrade(var S: TStudent; Grade: Real);
function CalcAverage(const S: TStudent): Real;
function CalcGPA(const S: TStudent): Real;
function IsPassing(const S: TStudent): Boolean;
function LetterGrade(NumericGrade: Real): string;
implementation
function CreateStudent(const AName, AID: string): TStudent;
begin
Result.Name := AName;
Result.StudentID := AID;
Result.GradeCount := 0;
Result.Active := True;
end;
procedure AddGrade(var S: TStudent; Grade: Real);
begin
if S.GradeCount >= MaxGrades then
raise Exception.Create('Maximum grades reached for student ' + S.Name);
Inc(S.GradeCount);
S.Grades[S.GradeCount] := Grade;
end;
function CalcAverage(const S: TStudent): Real;
var
I: Integer;
Sum: Real;
begin
if S.GradeCount = 0 then Exit(0.0);
Sum := 0;
for I := 1 to S.GradeCount do
Sum := Sum + S.Grades[I];
Result := Sum / S.GradeCount;
end;
function CalcGPA(const S: TStudent): Real;
begin
{ Convert percentage average to 4.0 scale }
Result := CalcAverage(S) / 25.0;
if Result > 4.0 then Result := 4.0;
end;
function IsPassing(const S: TStudent): Boolean;
begin
Result := CalcAverage(S) >= 60.0;
end;
function LetterGrade(NumericGrade: Real): string;
begin
case Trunc(NumericGrade) div 10 of
10, 9: Result := 'A';
8: Result := 'B';
7: Result := 'C';
6: Result := 'D';
else
Result := 'F';
end;
end;
end.
What changed from the monolith:
- Functions now take TStudent parameters instead of array indices
- No global state — each function operates on the student record it receives
- The duplicate CalcAvg2 was eliminated (it was identical to CalcAverage except for rounding)
- LetterGrade was extracted from the reporting code where it was inline
Step 4: Extract Storage, Reports, and UI
With the domain model clean, we extract the other units following the same pattern. Each unit depends only on what it needs:
GradeStorage — Takes a TStudentArray and a filename, saves/loads. No formatting, no UI.
GradeReports — Takes a TStudentArray, produces formatted text output. No storage, no UI interaction.
GradeUI — Handles the menu loop, user input, and coordinates calls to the other units.
The main program becomes minimal:
program GradeTracker;
{$mode objfpc}{$H+}
uses
SysUtils, GradeModels, GradeStorage, GradeReports, GradeUI;
var
Students: TStudentArray;
DataFile: string;
begin
DataFile := 'grades.dat';
Students := LoadStudents(DataFile);
RunMainMenu(Students, DataFile);
end.
Step 5: Verify Correctness
The refactored program must produce exactly the same output as the original for any given input. Verification steps:
- Compile all units. Any missing dependencies or type mismatches will appear as compile errors.
- Run with test data. Enter the same students and grades in both versions, compare output character by character.
- Test edge cases. Empty student list, maximum grades, duplicate IDs.
- Test file compatibility. Load a data file created by the old version in the new version (and vice versa, if the format has not changed).
Results
| Metric | Before (Monolith) | After (Modular) |
|---|---|---|
| Files | 1 | 6 (5 units + main) |
| Global variables | 12 | 2 (in main program only) |
| Duplicate functions | 2 | 0 |
| Lines per file (average) | 800 | 120 |
| Compilation time (full rebuild) | 1.2s | 1.3s |
| Compilation time (change one report) | 1.2s | 0.3s |
The full rebuild is slightly slower (the compiler opens more files), but incremental builds are much faster. More importantly, changing a report function now requires reading only 120 lines of context instead of 800.
Lessons Learned
- Start with the domain model. Extract the types and core operations first — everything else depends on them.
- Eliminate duplication during the refactoring. This is the perfect time to consolidate duplicate functions.
- Remove global state aggressively. Every global variable you eliminate is a coupling you break. Pass data through parameters instead.
- Test at each step. Extract one unit, compile, test. Extract the next, compile, test. Do not try to do it all at once.
- The main program should be thin. It coordinates the units but contains almost no logic of its own.
The total effort for this refactoring was approximately four hours — a one-time investment that will save far more time in every future maintenance session.