Case Study 1: Parallel File Processing

Overview

In this case study, we build a parallel file processing system that scans a directory of text files, counts words in each file using multiple threads, and produces a summary report. This demonstrates thread creation, work distribution, result collection with synchronization, and the measurable performance improvement of parallel processing over sequential processing.


Problem Statement

Given a directory containing many text files, count the total words in each file and produce a report showing: - Each filename with its word count - The total word count across all files - The processing time (sequential vs. parallel)

The files range from small (1 KB) to large (100 KB), so static work distribution (equal files per thread) may lead to imbalanced load.


Sequential Baseline

First, we build a sequential version to measure baseline performance:

function CountWordsInFile(const Filename: string): Integer;
var
  F: TextFile;
  Line, Word: string;
  InWord: Boolean;
  I: Integer;
begin
  Result := 0;
  AssignFile(F, Filename);
  Reset(F);
  try
    while not Eof(F) do
    begin
      ReadLn(F, Line);
      InWord := False;
      for I := 1 to Length(Line) do
      begin
        if Line[I] in [' ', #9, #10, #13] then
          InWord := False
        else if not InWord then
        begin
          InWord := True;
          Inc(Result);
        end;
      end;
    end;
  finally
    CloseFile(F);
  end;
end;

procedure SequentialProcess(const Files: TStringList);
var
  I, Total, Count: Integer;
  StartTime: QWord;
begin
  StartTime := GetTickCount64;
  Total := 0;
  for I := 0 to Files.Count - 1 do
  begin
    Count := CountWordsInFile(Files[I]);
    WriteLn(Format('%-40s %8d words', [ExtractFileName(Files[I]), Count]));
    Total := Total + Count;
  end;
  WriteLn(Format('Sequential: %d files, %d words, %d ms',
    [Files.Count, Total, GetTickCount64 - StartTime]));
end;

Parallel Version with Worker Threads

type
  TFileResult = record
    Filename: string;
    WordCount: Integer;
  end;

  TWordCountThread = class(TThread)
  private
    FFilename: string;
    FWordCount: Integer;
    FResults: ^TFileResult;
    FResultIndex: Integer;
    FLock: TCriticalSection;
  protected
    procedure Execute; override;
  public
    constructor Create(const AFilename: string;
      Results: Pointer; Index: Integer; Lock: TCriticalSection);
  end;

constructor TWordCountThread.Create(const AFilename: string;
  Results: Pointer; Index: Integer; Lock: TCriticalSection);
begin
  inherited Create(True);
  FFilename := AFilename;
  FResults := Results;
  FResultIndex := Index;
  FLock := Lock;
  FreeOnTerminate := False;
end;

procedure TWordCountThread.Execute;
var
  Count: Integer;
begin
  Count := CountWordsInFile(FFilename);

  FLock.Enter;
  try
    FResults[FResultIndex].Filename := FFilename;
    FResults[FResultIndex].WordCount := Count;
  finally
    FLock.Leave;
  end;
end;

Work Distribution with Thread Pool Pattern

Rather than creating one thread per file (wasteful if there are hundreds of files), we use a bounded set of workers:

procedure ParallelProcess(const Files: TStringList; MaxThreads: Integer);
var
  Threads: array of TWordCountThread;
  Results: array of TFileResult;
  Lock: TCriticalSection;
  I, Batch, BatchStart, BatchSize: Integer;
  Total: Integer;
  StartTime: QWord;
begin
  StartTime := GetTickCount64;
  SetLength(Results, Files.Count);
  Lock := TCriticalSection.Create;
  try
    { Process in batches of MaxThreads }
    BatchStart := 0;
    while BatchStart < Files.Count do
    begin
      BatchSize := Files.Count - BatchStart;
      if BatchSize > MaxThreads then
        BatchSize := MaxThreads;

      SetLength(Threads, BatchSize);

      { Create and start threads for this batch }
      for I := 0 to BatchSize - 1 do
      begin
        Threads[I] := TWordCountThread.Create(
          Files[BatchStart + I], @Results[0], BatchStart + I, Lock);
        Threads[I].Start;
      end;

      { Wait for all threads in this batch }
      for I := 0 to BatchSize - 1 do
      begin
        Threads[I].WaitFor;
        Threads[I].Free;
      end;

      BatchStart := BatchStart + BatchSize;
    end;

    { Display results }
    Total := 0;
    for I := 0 to High(Results) do
    begin
      WriteLn(Format('%-40s %8d words', [
        ExtractFileName(Results[I].Filename),
        Results[I].WordCount
      ]));
      Total := Total + Results[I].WordCount;
    end;
    WriteLn(Format('Parallel (%d threads): %d files, %d words, %d ms',
      [MaxThreads, Files.Count, Total, GetTickCount64 - StartTime]));
  finally
    Lock.Free;
  end;
end;

Performance Results

Testing with 50 text files totaling 2.5 MB on a 4-core machine:

Approach Time Speedup
Sequential 420 ms 1.0x
2 threads 235 ms 1.8x
4 threads 128 ms 3.3x
8 threads 115 ms 3.7x

The speedup with 4 threads is close to the theoretical 4x maximum. Beyond 4 threads on a 4-core machine, the improvement is minimal because threads compete for CPU cores.


Lessons Learned

  1. Measure before optimizing. The sequential baseline tells you whether parallelism is worth the complexity. For 10 small files, the threading overhead might exceed the time savings.

  2. Match threads to cores for CPU-bound work. Creating more threads than cores does not help for computation-heavy tasks and can actually hurt due to context-switching overhead.

  3. Use batching for many tasks. Creating 500 threads for 500 files wastes memory. Process them in batches of 4-8.

  4. Lock only what you must. The lock in this example protects only the result storage, not the file reading. Each thread reads its own file independently — no lock needed for that.

  5. WaitFor is essential. Without WaitFor, the main thread might read results before the workers have finished writing them — a race condition.