Case Study 1: Refactoring a Monolith into Modules

Overview

In this case study, we take a realistic monolithic Pascal program — a student grade management system of approximately 800 lines — and methodically refactor it into a clean multi-unit architecture. The process illustrates how to identify module boundaries, manage dependencies, and verify that the refactored version preserves all original behavior.

This case study mirrors the real-world experience of inheriting a working but poorly organized codebase and improving its structure without breaking it.


The Monolithic Program

Imagine you have inherited GradeTracker.pas, a single-file program that:

  1. Defines a TStudent record with name, ID, and an array of grades
  2. Stores students in a global array
  3. Provides procedures for adding students, entering grades, and calculating averages
  4. Saves and loads data from a binary file
  5. Generates reports (individual transcripts, class summaries, honor roll lists)
  6. Has a text-based menu UI

The program works correctly but has grown organically. There are 47 procedures and functions, 12 global variables, and no clear separation between domain logic, storage, reporting, and user interface. Several functions are duplicated with slight variations (two different average calculators, three different formatting functions).

Here is a representative fragment of the structure:

program GradeTracker;

uses SysUtils;

const
  MaxStudents = 200;
  MaxGrades = 50;

type
  TStudent = record
    Name: string[60];
    StudentID: string[10];
    Grades: array[1..MaxGrades] of Real;
    GradeCount: Integer;
    Active: Boolean;
  end;

var
  Students: array[1..MaxStudents] of TStudent;
  StudentCount: Integer;
  DataFile: string;
  Modified: Boolean;
  { ...8 more global variables... }

procedure AddStudent(const AName, AID: string);
begin { ...modifies Students array directly... } end;

procedure EnterGrade(Index: Integer; Grade: Real);
begin { ...modifies Students[Index] directly... } end;

function CalcAverage(Index: Integer): Real;
begin { ...reads Students[Index] directly... } end;

procedure SaveData;
begin { ...uses DataFile global, writes Students array... } end;

procedure LoadData;
begin { ...reads into Students array... } end;

procedure PrintTranscript(Index: Integer);
begin { ...reads Students[Index], formats output... } end;

procedure PrintClassSummary;
begin { ...reads entire Students array, formats output... } end;

procedure PrintHonorRoll;
begin { ...reads Students, calls CalcAverage, formats output... } end;

{ ...38 more procedures... }

procedure MainMenu;
begin
  repeat
    WriteLn('1. Add Student');
    WriteLn('2. Enter Grade');
    { ...etc... }
  until Choice = 0;
end;

begin
  LoadData;
  MainMenu;
  if Modified then SaveData;
end.

Step 1: Identify Responsibilities

Before writing any code, we analyze the monolith and categorize every procedure:

Category Procedures Count
Domain Model TStudent, AddStudent, EnterGrade, CalcAverage, CalcGPA, IsPasssing 6
Storage SaveData, LoadData, ExportCSV, ImportCSV, BackupData 5
Reporting PrintTranscript, PrintClassSummary, PrintHonorRoll, PrintGradeDistribution 4
UI / Menu MainMenu, GetStudentChoice, ShowStudentList, ValidateInput, ConfirmAction 5
Utility FormatGrade, FormatName, PadString, RepeatChar 4
Duplicates CalcAvg2 (duplicate of CalcAverage), FormatGrade2 (slightly different) 2

This gives us a natural decomposition into five units (four categories plus a utility unit), and reveals two duplicates to eliminate.


Step 2: Design the Dependency Graph

Before splitting files, we design the dependency structure:

GradeTracker.pas (main program)
    |
    +-- GradeUI
    |     |
    |     +-- GradeModels
    |     +-- GradeReports
    |     +-- GradeStorage
    |
    +-- GradeReports
    |     |
    |     +-- GradeModels
    |     +-- FormatUtils
    |
    +-- GradeStorage
    |     |
    |     +-- GradeModels
    |
    +-- GradeModels
    |     |
    |     (no dependencies beyond SysUtils)
    |
    +-- FormatUtils
          |
          (no dependencies beyond SysUtils)

Key decisions: - GradeModels is the foundation — no upward dependencies - FormatUtils is a leaf utility — no domain dependencies - GradeStorage depends only on GradeModels - GradeReports depends on GradeModels and FormatUtils - GradeUI depends on everything (it is the top-level coordinator) - No circular dependencies exist


Step 3: Extract the Domain Model

We start with GradeModels because everything depends on it:

unit GradeModels;

{$mode objfpc}{$H+}

interface

uses
  SysUtils;

const
  MaxGrades = 50;

type
  TStudent = record
    Name: string;
    StudentID: string;
    Grades: array[1..MaxGrades] of Real;
    GradeCount: Integer;
    Active: Boolean;
  end;

  TStudentArray = array of TStudent;

function CreateStudent(const AName, AID: string): TStudent;
procedure AddGrade(var S: TStudent; Grade: Real);
function CalcAverage(const S: TStudent): Real;
function CalcGPA(const S: TStudent): Real;
function IsPassing(const S: TStudent): Boolean;
function LetterGrade(NumericGrade: Real): string;

implementation

function CreateStudent(const AName, AID: string): TStudent;
begin
  Result.Name := AName;
  Result.StudentID := AID;
  Result.GradeCount := 0;
  Result.Active := True;
end;

procedure AddGrade(var S: TStudent; Grade: Real);
begin
  if S.GradeCount >= MaxGrades then
    raise Exception.Create('Maximum grades reached for student ' + S.Name);
  Inc(S.GradeCount);
  S.Grades[S.GradeCount] := Grade;
end;

function CalcAverage(const S: TStudent): Real;
var
  I: Integer;
  Sum: Real;
begin
  if S.GradeCount = 0 then Exit(0.0);
  Sum := 0;
  for I := 1 to S.GradeCount do
    Sum := Sum + S.Grades[I];
  Result := Sum / S.GradeCount;
end;

function CalcGPA(const S: TStudent): Real;
begin
  { Convert percentage average to 4.0 scale }
  Result := CalcAverage(S) / 25.0;
  if Result > 4.0 then Result := 4.0;
end;

function IsPassing(const S: TStudent): Boolean;
begin
  Result := CalcAverage(S) >= 60.0;
end;

function LetterGrade(NumericGrade: Real): string;
begin
  case Trunc(NumericGrade) div 10 of
    10, 9: Result := 'A';
    8: Result := 'B';
    7: Result := 'C';
    6: Result := 'D';
  else
    Result := 'F';
  end;
end;

end.

What changed from the monolith: - Functions now take TStudent parameters instead of array indices - No global state — each function operates on the student record it receives - The duplicate CalcAvg2 was eliminated (it was identical to CalcAverage except for rounding) - LetterGrade was extracted from the reporting code where it was inline


Step 4: Extract Storage, Reports, and UI

With the domain model clean, we extract the other units following the same pattern. Each unit depends only on what it needs:

GradeStorage — Takes a TStudentArray and a filename, saves/loads. No formatting, no UI.

GradeReports — Takes a TStudentArray, produces formatted text output. No storage, no UI interaction.

GradeUI — Handles the menu loop, user input, and coordinates calls to the other units.

The main program becomes minimal:

program GradeTracker;

{$mode objfpc}{$H+}

uses
  SysUtils, GradeModels, GradeStorage, GradeReports, GradeUI;

var
  Students: TStudentArray;
  DataFile: string;
begin
  DataFile := 'grades.dat';
  Students := LoadStudents(DataFile);
  RunMainMenu(Students, DataFile);
end.

Step 5: Verify Correctness

The refactored program must produce exactly the same output as the original for any given input. Verification steps:

  1. Compile all units. Any missing dependencies or type mismatches will appear as compile errors.
  2. Run with test data. Enter the same students and grades in both versions, compare output character by character.
  3. Test edge cases. Empty student list, maximum grades, duplicate IDs.
  4. Test file compatibility. Load a data file created by the old version in the new version (and vice versa, if the format has not changed).

Results

Metric Before (Monolith) After (Modular)
Files 1 6 (5 units + main)
Global variables 12 2 (in main program only)
Duplicate functions 2 0
Lines per file (average) 800 120
Compilation time (full rebuild) 1.2s 1.3s
Compilation time (change one report) 1.2s 0.3s

The full rebuild is slightly slower (the compiler opens more files), but incremental builds are much faster. More importantly, changing a report function now requires reading only 120 lines of context instead of 800.


Lessons Learned

  1. Start with the domain model. Extract the types and core operations first — everything else depends on them.
  2. Eliminate duplication during the refactoring. This is the perfect time to consolidate duplicate functions.
  3. Remove global state aggressively. Every global variable you eliminate is a coupling you break. Pass data through parameters instead.
  4. Test at each step. Extract one unit, compile, test. Extract the next, compile, test. Do not try to do it all at once.
  5. The main program should be thin. It coordinates the units but contains almost no logic of its own.

The total effort for this refactoring was approximately four hours — a one-time investment that will save far more time in every future maintenance session.