Case Study 2: Log File Analysis

The Scenario

You have been hired as a summer intern at a small web hosting company. The senior developer hands you a USB drive containing a web server log file and says, "We need to know what is going on with our server. Can you write a tool that analyzes this log and gives us some useful statistics?"

The log file follows the Common Log Format used by Apache and many other web servers. Each line looks like this:

192.168.1.45 - - [15/Mar/2026:10:22:01 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.72 - - [15/Mar/2026:10:22:03 +0000] "GET /about.html HTTP/1.1" 200 3201
10.0.0.15 - - [15/Mar/2026:10:22:05 +0000] "GET /missing.html HTTP/1.1" 404 287
192.168.1.45 - - [15/Mar/2026:10:22:08 +0000] "POST /login HTTP/1.1" 200 0
203.0.113.5 - - [15/Mar/2026:10:22:10 +0000] "GET /admin HTTP/1.1" 403 156

The format is: IP - - [Timestamp] "Method Path Protocol" StatusCode BytesSent

Your task: read the log file, parse each line, and produce a useful summary report.


Parsing Strategy

Each log line has a predictable structure. We will extract: - IP address: Everything before the first space - Timestamp: Between the brackets [...] - HTTP method: The first word inside the quotes - Path: The second word inside the quotes - Status code: The number after the closing quote - Bytes sent: The last number on the line

The Record

type
  TLogEntry = record
    IP: string[45];
    Timestamp: string[30];
    Method: string[10];
    Path: string[100];
    StatusCode: Integer;
    BytesSent: LongInt;
  end;

The Parser

function ParseLogLine(const Line: string; var Entry: TLogEntry): Boolean;
var
  P, P2: Integer;
  StatusStr, BytesStr: string;
begin
  Result := False;

  { Extract IP address: everything before first space }
  P := Pos(' ', Line);
  if P = 0 then Exit;
  Entry.IP := Copy(Line, 1, P - 1);

  { Extract timestamp: between [ and ] }
  P := Pos('[', Line);
  P2 := Pos(']', Line);
  if (P = 0) or (P2 = 0) or (P2 <= P) then Exit;
  Entry.Timestamp := Copy(Line, P + 1, P2 - P - 1);

  { Extract method and path: between quotes }
  P := Pos('"', Line);
  if P = 0 then Exit;
  P2 := Pos('"', Copy(Line, P + 1, Length(Line)));
  if P2 = 0 then Exit;
  P2 := P + P2;  { Absolute position of closing quote }

  { Method is first word inside quotes }
  Entry.Method := '';
  Entry.Path := '';
  P := P + 1;  { Skip opening quote }
  while (P < P2) and (Line[P] <> ' ') do
  begin
    Entry.Method := Entry.Method + Line[P];
    Inc(P);
  end;
  Inc(P);  { Skip space }

  { Path is second word inside quotes }
  while (P < P2) and (Line[P] <> ' ') do
  begin
    Entry.Path := Entry.Path + Line[P];
    Inc(P);
  end;

  { Status code and bytes: after closing quote }
  { Find the closing quote, then parse remaining numbers }
  P := P2 + 2;  { Skip closing quote and space }
  StatusStr := '';
  while (P <= Length(Line)) and (Line[P] <> ' ') do
  begin
    StatusStr := StatusStr + Line[P];
    Inc(P);
  end;
  Val(StatusStr, Entry.StatusCode, P2);
  if P2 <> 0 then Entry.StatusCode := 0;

  { Bytes sent: last token }
  Inc(P);  { Skip space }
  BytesStr := '';
  while P <= Length(Line) do
  begin
    BytesStr := BytesStr + Line[P];
    Inc(P);
  end;
  Val(BytesStr, Entry.BytesSent, P2);
  if P2 <> 0 then Entry.BytesSent := 0;

  Result := True;
end;

The Analysis Engine

Data Structures for Aggregation

We need to count occurrences of various values. Since we have not yet covered more advanced data structures like hash maps, we will use parallel arrays:

const
  MAX_IPS = 1000;
  MAX_PATHS = 1000;
  MAX_STATUS = 10;

var
  { IP frequency tracking }
  UniqueIPs: array[0..MAX_IPS - 1] of string[45];
  IPCounts: array[0..MAX_IPS - 1] of Integer;
  IPCount: Integer;

  { Path frequency tracking }
  UniquePaths: array[0..MAX_PATHS - 1] of string[100];
  PathCounts: array[0..MAX_PATHS - 1] of Integer;
  PathCount: Integer;

  { Status code tracking }
  UniqueStatuses: array[0..MAX_STATUS - 1] of Integer;
  StatusCounts: array[0..MAX_STATUS - 1] of Integer;
  StatusCount: Integer;

Incrementing Counters

A helper function searches for a value and increments its count, or adds it if new:

procedure IncrementIP(const IP: string);
var
  I: Integer;
begin
  for I := 0 to IPCount - 1 do
  begin
    if UniqueIPs[I] = IP then
    begin
      Inc(IPCounts[I]);
      Exit;
    end;
  end;

  { Not found — add new entry }
  if IPCount < MAX_IPS then
  begin
    UniqueIPs[IPCount] := IP;
    IPCounts[IPCount] := 1;
    Inc(IPCount);
  end;
end;

Similar procedures would exist for IncrementPath and IncrementStatus.

Processing the Log File

procedure AnalyzeLogFile(const FileName: string);
var
  F: TextFile;
  Line: string;
  Entry: TLogEntry;
  TotalLines, ParseErrors: Integer;
  TotalBytes: Int64;
  GetCount, PostCount, OtherMethodCount: Integer;
  Error4xx, Error5xx: Integer;
begin
  if not FileExists(FileName) then
  begin
    WriteLn('Error: File "', FileName, '" not found.');
    Exit;
  end;

  AssignFile(F, FileName);
  Reset(F);

  { Initialize counters }
  TotalLines := 0;
  ParseErrors := 0;
  TotalBytes := 0;
  GetCount := 0;
  PostCount := 0;
  OtherMethodCount := 0;
  Error4xx := 0;
  Error5xx := 0;
  IPCount := 0;
  PathCount := 0;
  StatusCount := 0;

  { Process each line }
  while not Eof(F) do
  begin
    ReadLn(F, Line);
    Inc(TotalLines);

    if not ParseLogLine(Line, Entry) then
    begin
      Inc(ParseErrors);
      Continue;
    end;

    { Aggregate statistics }
    TotalBytes := TotalBytes + Entry.BytesSent;

    { Count methods }
    if Entry.Method = 'GET' then
      Inc(GetCount)
    else if Entry.Method = 'POST' then
      Inc(PostCount)
    else
      Inc(OtherMethodCount);

    { Count error categories }
    if (Entry.StatusCode >= 400) and (Entry.StatusCode < 500) then
      Inc(Error4xx)
    else if (Entry.StatusCode >= 500) then
      Inc(Error5xx);

    { Track unique IPs, paths, and status codes }
    IncrementIP(Entry.IP);
    IncrementPath(Entry.Path);
    IncrementStatus(Entry.StatusCode);
  end;

  CloseFile(F);

  { Display results }
  PrintReport(TotalLines, ParseErrors, TotalBytes,
              GetCount, PostCount, OtherMethodCount,
              Error4xx, Error5xx);
end;

Generating the Report

procedure PrintReport(TotalLines, ParseErrors: Integer;
                      TotalBytes: Int64;
                      GetCount, PostCount, OtherMethodCount: Integer;
                      Error4xx, Error5xx: Integer);
var
  I, MaxIdx, MaxVal: Integer;
begin
  WriteLn;
  WriteLn('=============================================');
  WriteLn('         WEB SERVER LOG ANALYSIS REPORT      ');
  WriteLn('=============================================');
  WriteLn;

  WriteLn('--- General Statistics ---');
  WriteLn('  Total log lines:     ', TotalLines);
  WriteLn('  Parsed successfully: ', TotalLines - ParseErrors);
  WriteLn('  Parse errors:        ', ParseErrors);
  WriteLn('  Total bytes served:  ', TotalBytes);
  if TotalLines > 0 then
    WriteLn('  Avg bytes/request:   ', TotalBytes div TotalLines);
  WriteLn;

  WriteLn('--- HTTP Methods ---');
  WriteLn('  GET:   ', GetCount);
  WriteLn('  POST:  ', PostCount);
  WriteLn('  Other: ', OtherMethodCount);
  WriteLn;

  WriteLn('--- Status Code Summary ---');
  for I := 0 to StatusCount - 1 do
    WriteLn('  ', UniqueStatuses[I], ': ', StatusCounts[I], ' requests');
  WriteLn('  Client errors (4xx): ', Error4xx);
  WriteLn('  Server errors (5xx): ', Error5xx);
  WriteLn;

  WriteLn('--- Top 10 IP Addresses ---');
  for I := 1 to 10 do
  begin
    if I > IPCount then Break;

    { Find the IP with the highest count (simple selection) }
    MaxIdx := FindMaxIndex(IPCounts, IPCount);
    if MaxIdx = -1 then Break;

    WriteLn('  ', I:2, '. ', UniqueIPs[MaxIdx]:-20, ' ', IPCounts[MaxIdx], ' requests');
    IPCounts[MaxIdx] := -1;  { Mark as reported }
  end;
  WriteLn;

  WriteLn('--- Top 10 Requested Paths ---');
  for I := 1 to 10 do
  begin
    if I > PathCount then Break;

    MaxIdx := FindMaxIndex(PathCounts, PathCount);
    if MaxIdx = -1 then Break;

    WriteLn('  ', I:2, '. ', UniquePaths[MaxIdx]:-40, ' ', PathCounts[MaxIdx], ' hits');
    PathCounts[MaxIdx] := -1;
  end;
  WriteLn;

  WriteLn('--- Unique Visitors ---');
  WriteLn('  Unique IP addresses: ', IPCount);
  WriteLn('  Unique paths:        ', PathCount);
  WriteLn;
  WriteLn('=============================================');
end;

Saving the Report to a File

The senior developer also wants the report saved as a text file for archival:

procedure SaveReport(const ReportFileName: string);
var
  F: TextFile;
  OldOutput: TextFile;
begin
  AssignFile(F, ReportFileName);
  {$I-}
  Rewrite(F);
  if IOResult <> 0 then
  begin
    WriteLn('Could not create report file.');
    Exit;
  end;
  {$I+}

  { Redirect WriteLn to file by passing F to all output calls }
  WriteLn(F, '=== Log Analysis Report ===');
  WriteLn(F, 'Generated: ', DateTimeToStr(Now));
  WriteLn(F, 'Total requests: ', TotalLines);
  { ... write all statistics to F ... }

  CloseFile(F);
  WriteLn('Report saved to: ', ReportFileName);
end;

Testing with a Sample Log

Here is a sample log file we can use for testing. Save this as sample_access.log:

192.168.1.45 - - [15/Mar/2026:10:22:01 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.72 - - [15/Mar/2026:10:22:03 +0000] "GET /about.html HTTP/1.1" 200 3201
10.0.0.15 - - [15/Mar/2026:10:22:05 +0000] "GET /missing.html HTTP/1.1" 404 287
192.168.1.45 - - [15/Mar/2026:10:22:08 +0000] "POST /login HTTP/1.1" 200 0
203.0.113.5 - - [15/Mar/2026:10:22:10 +0000] "GET /admin HTTP/1.1" 403 156
192.168.1.72 - - [15/Mar/2026:10:22:15 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.45 - - [15/Mar/2026:10:22:18 +0000] "GET /products HTTP/1.1" 200 12044
10.0.0.15 - - [15/Mar/2026:10:22:20 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.45 - - [15/Mar/2026:10:22:25 +0000] "GET /contact.html HTTP/1.1" 200 2105
203.0.113.5 - - [15/Mar/2026:10:22:30 +0000] "GET /index.html HTTP/1.1" 200 5432
172.16.0.99 - - [15/Mar/2026:10:23:01 +0000] "GET /api/data HTTP/1.1" 500 89
192.168.1.45 - - [15/Mar/2026:10:23:05 +0000] "GET /images/logo.png HTTP/1.1" 200 24576
10.0.0.15 - - [15/Mar/2026:10:23:10 +0000] "DELETE /api/user/5 HTTP/1.1" 405 0

Expected output for this sample:

  Total log lines:     13
  Parsed successfully: 13
  Parse errors:        0
  Total bytes served:  64186

  GET:   11
  POST:  1
  Other: 1

  Client errors (4xx): 2
  Server errors (5xx): 1

  Top IP: 192.168.1.45 — 5 requests
  Most requested: /index.html — 4 hits
  Unique IPs: 5

Analysis: Lessons Learned

Text Parsing Is Harder Than It Looks

Even with a "standard" format, real log files contain surprises: lines with missing fields, unusual encodings, extremely long URLs, or embedded quotes. Robust parsing must handle these gracefully. Our ParseLogLine function returns False for unparseable lines rather than crashing — this is the right approach.

Sequential vs. Random Access

Log files are inherently sequential. We read from the beginning to the end, processing each line once. There is no need for random access, which is why a text file is the natural choice. Trying to use a typed file here would be awkward because log lines have variable length.

The Value of Text Files

The log file is a text file, and our report is a text file. Both are human-readable. Both can be opened in any text editor. Both can be processed by other tools — grep, sort, awk, Python scripts, or another Pascal program. This interoperability is the great strength of text files.

Memory vs. Disk Trade-offs

Our analysis accumulates statistics in memory (the parallel arrays) while reading the file sequentially from disk. This means we only need enough memory for the aggregated counters, not for the entire log file. A 10 GB log file with millions of entries can be analyzed with just a few kilobytes of memory — as long as the number of unique IPs and paths stays within our array bounds.


Exercises Based on This Case Study

  1. Time range filter: Modify the analyzer to accept a start and end timestamp. Only count log entries within that range.

  2. Hourly breakdown: Extract the hour from each timestamp and produce an hourly request count — a simple histogram showing traffic patterns through the day.

  3. Error detail report: For all 4xx and 5xx responses, write the full log line to a separate error report file. This helps the system administrator focus on problems.

  4. Multiple files: Modify the program to accept multiple log filenames and produce a combined report. This is useful when logs are rotated daily.

  5. Output to CSV: In addition to the human-readable report, generate a CSV file with columns for IP, request count, and bytes served. This can be imported into a spreadsheet for graphing.