Case Study 1: Parallel File Processing
Overview
In this case study, we build a parallel file processing system that scans a directory of text files, counts words in each file using multiple threads, and produces a summary report. This demonstrates thread creation, work distribution, result collection with synchronization, and the measurable performance improvement of parallel processing over sequential processing.
Problem Statement
Given a directory containing many text files, count the total words in each file and produce a report showing: - Each filename with its word count - The total word count across all files - The processing time (sequential vs. parallel)
The files range from small (1 KB) to large (100 KB), so static work distribution (equal files per thread) may lead to imbalanced load.
Sequential Baseline
First, we build a sequential version to measure baseline performance:
function CountWordsInFile(const Filename: string): Integer;
var
F: TextFile;
Line, Word: string;
InWord: Boolean;
I: Integer;
begin
Result := 0;
AssignFile(F, Filename);
Reset(F);
try
while not Eof(F) do
begin
ReadLn(F, Line);
InWord := False;
for I := 1 to Length(Line) do
begin
if Line[I] in [' ', #9, #10, #13] then
InWord := False
else if not InWord then
begin
InWord := True;
Inc(Result);
end;
end;
end;
finally
CloseFile(F);
end;
end;
procedure SequentialProcess(const Files: TStringList);
var
I, Total, Count: Integer;
StartTime: QWord;
begin
StartTime := GetTickCount64;
Total := 0;
for I := 0 to Files.Count - 1 do
begin
Count := CountWordsInFile(Files[I]);
WriteLn(Format('%-40s %8d words', [ExtractFileName(Files[I]), Count]));
Total := Total + Count;
end;
WriteLn(Format('Sequential: %d files, %d words, %d ms',
[Files.Count, Total, GetTickCount64 - StartTime]));
end;
Parallel Version with Worker Threads
type
TFileResult = record
Filename: string;
WordCount: Integer;
end;
TWordCountThread = class(TThread)
private
FFilename: string;
FWordCount: Integer;
FResults: ^TFileResult;
FResultIndex: Integer;
FLock: TCriticalSection;
protected
procedure Execute; override;
public
constructor Create(const AFilename: string;
Results: Pointer; Index: Integer; Lock: TCriticalSection);
end;
constructor TWordCountThread.Create(const AFilename: string;
Results: Pointer; Index: Integer; Lock: TCriticalSection);
begin
inherited Create(True);
FFilename := AFilename;
FResults := Results;
FResultIndex := Index;
FLock := Lock;
FreeOnTerminate := False;
end;
procedure TWordCountThread.Execute;
var
Count: Integer;
begin
Count := CountWordsInFile(FFilename);
FLock.Enter;
try
FResults[FResultIndex].Filename := FFilename;
FResults[FResultIndex].WordCount := Count;
finally
FLock.Leave;
end;
end;
Work Distribution with Thread Pool Pattern
Rather than creating one thread per file (wasteful if there are hundreds of files), we use a bounded set of workers:
procedure ParallelProcess(const Files: TStringList; MaxThreads: Integer);
var
Threads: array of TWordCountThread;
Results: array of TFileResult;
Lock: TCriticalSection;
I, Batch, BatchStart, BatchSize: Integer;
Total: Integer;
StartTime: QWord;
begin
StartTime := GetTickCount64;
SetLength(Results, Files.Count);
Lock := TCriticalSection.Create;
try
{ Process in batches of MaxThreads }
BatchStart := 0;
while BatchStart < Files.Count do
begin
BatchSize := Files.Count - BatchStart;
if BatchSize > MaxThreads then
BatchSize := MaxThreads;
SetLength(Threads, BatchSize);
{ Create and start threads for this batch }
for I := 0 to BatchSize - 1 do
begin
Threads[I] := TWordCountThread.Create(
Files[BatchStart + I], @Results[0], BatchStart + I, Lock);
Threads[I].Start;
end;
{ Wait for all threads in this batch }
for I := 0 to BatchSize - 1 do
begin
Threads[I].WaitFor;
Threads[I].Free;
end;
BatchStart := BatchStart + BatchSize;
end;
{ Display results }
Total := 0;
for I := 0 to High(Results) do
begin
WriteLn(Format('%-40s %8d words', [
ExtractFileName(Results[I].Filename),
Results[I].WordCount
]));
Total := Total + Results[I].WordCount;
end;
WriteLn(Format('Parallel (%d threads): %d files, %d words, %d ms',
[MaxThreads, Files.Count, Total, GetTickCount64 - StartTime]));
finally
Lock.Free;
end;
end;
Performance Results
Testing with 50 text files totaling 2.5 MB on a 4-core machine:
| Approach | Time | Speedup |
|---|---|---|
| Sequential | 420 ms | 1.0x |
| 2 threads | 235 ms | 1.8x |
| 4 threads | 128 ms | 3.3x |
| 8 threads | 115 ms | 3.7x |
The speedup with 4 threads is close to the theoretical 4x maximum. Beyond 4 threads on a 4-core machine, the improvement is minimal because threads compete for CPU cores.
Lessons Learned
-
Measure before optimizing. The sequential baseline tells you whether parallelism is worth the complexity. For 10 small files, the threading overhead might exceed the time savings.
-
Match threads to cores for CPU-bound work. Creating more threads than cores does not help for computation-heavy tasks and can actually hurt due to context-switching overhead.
-
Use batching for many tasks. Creating 500 threads for 500 files wastes memory. Process them in batches of 4-8.
-
Lock only what you must. The lock in this example protects only the result storage, not the file reading. Each thread reads its own file independently — no lock needed for that.
-
WaitFor is essential. Without WaitFor, the main thread might read results before the workers have finished writing them — a race condition.