Case Study 2: Log File Analysis
The Scenario
You have been hired as a summer intern at a small web hosting company. The senior developer hands you a USB drive containing a web server log file and says, "We need to know what is going on with our server. Can you write a tool that analyzes this log and gives us some useful statistics?"
The log file follows the Common Log Format used by Apache and many other web servers. Each line looks like this:
192.168.1.45 - - [15/Mar/2026:10:22:01 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.72 - - [15/Mar/2026:10:22:03 +0000] "GET /about.html HTTP/1.1" 200 3201
10.0.0.15 - - [15/Mar/2026:10:22:05 +0000] "GET /missing.html HTTP/1.1" 404 287
192.168.1.45 - - [15/Mar/2026:10:22:08 +0000] "POST /login HTTP/1.1" 200 0
203.0.113.5 - - [15/Mar/2026:10:22:10 +0000] "GET /admin HTTP/1.1" 403 156
The format is: IP - - [Timestamp] "Method Path Protocol" StatusCode BytesSent
Your task: read the log file, parse each line, and produce a useful summary report.
Parsing Strategy
Each log line has a predictable structure. We will extract:
- IP address: Everything before the first space
- Timestamp: Between the brackets [...]
- HTTP method: The first word inside the quotes
- Path: The second word inside the quotes
- Status code: The number after the closing quote
- Bytes sent: The last number on the line
The Record
type
TLogEntry = record
IP: string[45];
Timestamp: string[30];
Method: string[10];
Path: string[100];
StatusCode: Integer;
BytesSent: LongInt;
end;
The Parser
function ParseLogLine(const Line: string; var Entry: TLogEntry): Boolean;
var
P, P2: Integer;
StatusStr, BytesStr: string;
begin
Result := False;
{ Extract IP address: everything before first space }
P := Pos(' ', Line);
if P = 0 then Exit;
Entry.IP := Copy(Line, 1, P - 1);
{ Extract timestamp: between [ and ] }
P := Pos('[', Line);
P2 := Pos(']', Line);
if (P = 0) or (P2 = 0) or (P2 <= P) then Exit;
Entry.Timestamp := Copy(Line, P + 1, P2 - P - 1);
{ Extract method and path: between quotes }
P := Pos('"', Line);
if P = 0 then Exit;
P2 := Pos('"', Copy(Line, P + 1, Length(Line)));
if P2 = 0 then Exit;
P2 := P + P2; { Absolute position of closing quote }
{ Method is first word inside quotes }
Entry.Method := '';
Entry.Path := '';
P := P + 1; { Skip opening quote }
while (P < P2) and (Line[P] <> ' ') do
begin
Entry.Method := Entry.Method + Line[P];
Inc(P);
end;
Inc(P); { Skip space }
{ Path is second word inside quotes }
while (P < P2) and (Line[P] <> ' ') do
begin
Entry.Path := Entry.Path + Line[P];
Inc(P);
end;
{ Status code and bytes: after closing quote }
{ Find the closing quote, then parse remaining numbers }
P := P2 + 2; { Skip closing quote and space }
StatusStr := '';
while (P <= Length(Line)) and (Line[P] <> ' ') do
begin
StatusStr := StatusStr + Line[P];
Inc(P);
end;
Val(StatusStr, Entry.StatusCode, P2);
if P2 <> 0 then Entry.StatusCode := 0;
{ Bytes sent: last token }
Inc(P); { Skip space }
BytesStr := '';
while P <= Length(Line) do
begin
BytesStr := BytesStr + Line[P];
Inc(P);
end;
Val(BytesStr, Entry.BytesSent, P2);
if P2 <> 0 then Entry.BytesSent := 0;
Result := True;
end;
The Analysis Engine
Data Structures for Aggregation
We need to count occurrences of various values. Since we have not yet covered more advanced data structures like hash maps, we will use parallel arrays:
const
MAX_IPS = 1000;
MAX_PATHS = 1000;
MAX_STATUS = 10;
var
{ IP frequency tracking }
UniqueIPs: array[0..MAX_IPS - 1] of string[45];
IPCounts: array[0..MAX_IPS - 1] of Integer;
IPCount: Integer;
{ Path frequency tracking }
UniquePaths: array[0..MAX_PATHS - 1] of string[100];
PathCounts: array[0..MAX_PATHS - 1] of Integer;
PathCount: Integer;
{ Status code tracking }
UniqueStatuses: array[0..MAX_STATUS - 1] of Integer;
StatusCounts: array[0..MAX_STATUS - 1] of Integer;
StatusCount: Integer;
Incrementing Counters
A helper function searches for a value and increments its count, or adds it if new:
procedure IncrementIP(const IP: string);
var
I: Integer;
begin
for I := 0 to IPCount - 1 do
begin
if UniqueIPs[I] = IP then
begin
Inc(IPCounts[I]);
Exit;
end;
end;
{ Not found — add new entry }
if IPCount < MAX_IPS then
begin
UniqueIPs[IPCount] := IP;
IPCounts[IPCount] := 1;
Inc(IPCount);
end;
end;
Similar procedures would exist for IncrementPath and IncrementStatus.
Processing the Log File
procedure AnalyzeLogFile(const FileName: string);
var
F: TextFile;
Line: string;
Entry: TLogEntry;
TotalLines, ParseErrors: Integer;
TotalBytes: Int64;
GetCount, PostCount, OtherMethodCount: Integer;
Error4xx, Error5xx: Integer;
begin
if not FileExists(FileName) then
begin
WriteLn('Error: File "', FileName, '" not found.');
Exit;
end;
AssignFile(F, FileName);
Reset(F);
{ Initialize counters }
TotalLines := 0;
ParseErrors := 0;
TotalBytes := 0;
GetCount := 0;
PostCount := 0;
OtherMethodCount := 0;
Error4xx := 0;
Error5xx := 0;
IPCount := 0;
PathCount := 0;
StatusCount := 0;
{ Process each line }
while not Eof(F) do
begin
ReadLn(F, Line);
Inc(TotalLines);
if not ParseLogLine(Line, Entry) then
begin
Inc(ParseErrors);
Continue;
end;
{ Aggregate statistics }
TotalBytes := TotalBytes + Entry.BytesSent;
{ Count methods }
if Entry.Method = 'GET' then
Inc(GetCount)
else if Entry.Method = 'POST' then
Inc(PostCount)
else
Inc(OtherMethodCount);
{ Count error categories }
if (Entry.StatusCode >= 400) and (Entry.StatusCode < 500) then
Inc(Error4xx)
else if (Entry.StatusCode >= 500) then
Inc(Error5xx);
{ Track unique IPs, paths, and status codes }
IncrementIP(Entry.IP);
IncrementPath(Entry.Path);
IncrementStatus(Entry.StatusCode);
end;
CloseFile(F);
{ Display results }
PrintReport(TotalLines, ParseErrors, TotalBytes,
GetCount, PostCount, OtherMethodCount,
Error4xx, Error5xx);
end;
Generating the Report
procedure PrintReport(TotalLines, ParseErrors: Integer;
TotalBytes: Int64;
GetCount, PostCount, OtherMethodCount: Integer;
Error4xx, Error5xx: Integer);
var
I, MaxIdx, MaxVal: Integer;
begin
WriteLn;
WriteLn('=============================================');
WriteLn(' WEB SERVER LOG ANALYSIS REPORT ');
WriteLn('=============================================');
WriteLn;
WriteLn('--- General Statistics ---');
WriteLn(' Total log lines: ', TotalLines);
WriteLn(' Parsed successfully: ', TotalLines - ParseErrors);
WriteLn(' Parse errors: ', ParseErrors);
WriteLn(' Total bytes served: ', TotalBytes);
if TotalLines > 0 then
WriteLn(' Avg bytes/request: ', TotalBytes div TotalLines);
WriteLn;
WriteLn('--- HTTP Methods ---');
WriteLn(' GET: ', GetCount);
WriteLn(' POST: ', PostCount);
WriteLn(' Other: ', OtherMethodCount);
WriteLn;
WriteLn('--- Status Code Summary ---');
for I := 0 to StatusCount - 1 do
WriteLn(' ', UniqueStatuses[I], ': ', StatusCounts[I], ' requests');
WriteLn(' Client errors (4xx): ', Error4xx);
WriteLn(' Server errors (5xx): ', Error5xx);
WriteLn;
WriteLn('--- Top 10 IP Addresses ---');
for I := 1 to 10 do
begin
if I > IPCount then Break;
{ Find the IP with the highest count (simple selection) }
MaxIdx := FindMaxIndex(IPCounts, IPCount);
if MaxIdx = -1 then Break;
WriteLn(' ', I:2, '. ', UniqueIPs[MaxIdx]:-20, ' ', IPCounts[MaxIdx], ' requests');
IPCounts[MaxIdx] := -1; { Mark as reported }
end;
WriteLn;
WriteLn('--- Top 10 Requested Paths ---');
for I := 1 to 10 do
begin
if I > PathCount then Break;
MaxIdx := FindMaxIndex(PathCounts, PathCount);
if MaxIdx = -1 then Break;
WriteLn(' ', I:2, '. ', UniquePaths[MaxIdx]:-40, ' ', PathCounts[MaxIdx], ' hits');
PathCounts[MaxIdx] := -1;
end;
WriteLn;
WriteLn('--- Unique Visitors ---');
WriteLn(' Unique IP addresses: ', IPCount);
WriteLn(' Unique paths: ', PathCount);
WriteLn;
WriteLn('=============================================');
end;
Saving the Report to a File
The senior developer also wants the report saved as a text file for archival:
procedure SaveReport(const ReportFileName: string);
var
F: TextFile;
OldOutput: TextFile;
begin
AssignFile(F, ReportFileName);
{$I-}
Rewrite(F);
if IOResult <> 0 then
begin
WriteLn('Could not create report file.');
Exit;
end;
{$I+}
{ Redirect WriteLn to file by passing F to all output calls }
WriteLn(F, '=== Log Analysis Report ===');
WriteLn(F, 'Generated: ', DateTimeToStr(Now));
WriteLn(F, 'Total requests: ', TotalLines);
{ ... write all statistics to F ... }
CloseFile(F);
WriteLn('Report saved to: ', ReportFileName);
end;
Testing with a Sample Log
Here is a sample log file we can use for testing. Save this as sample_access.log:
192.168.1.45 - - [15/Mar/2026:10:22:01 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.72 - - [15/Mar/2026:10:22:03 +0000] "GET /about.html HTTP/1.1" 200 3201
10.0.0.15 - - [15/Mar/2026:10:22:05 +0000] "GET /missing.html HTTP/1.1" 404 287
192.168.1.45 - - [15/Mar/2026:10:22:08 +0000] "POST /login HTTP/1.1" 200 0
203.0.113.5 - - [15/Mar/2026:10:22:10 +0000] "GET /admin HTTP/1.1" 403 156
192.168.1.72 - - [15/Mar/2026:10:22:15 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.45 - - [15/Mar/2026:10:22:18 +0000] "GET /products HTTP/1.1" 200 12044
10.0.0.15 - - [15/Mar/2026:10:22:20 +0000] "GET /index.html HTTP/1.1" 200 5432
192.168.1.45 - - [15/Mar/2026:10:22:25 +0000] "GET /contact.html HTTP/1.1" 200 2105
203.0.113.5 - - [15/Mar/2026:10:22:30 +0000] "GET /index.html HTTP/1.1" 200 5432
172.16.0.99 - - [15/Mar/2026:10:23:01 +0000] "GET /api/data HTTP/1.1" 500 89
192.168.1.45 - - [15/Mar/2026:10:23:05 +0000] "GET /images/logo.png HTTP/1.1" 200 24576
10.0.0.15 - - [15/Mar/2026:10:23:10 +0000] "DELETE /api/user/5 HTTP/1.1" 405 0
Expected output for this sample:
Total log lines: 13
Parsed successfully: 13
Parse errors: 0
Total bytes served: 64186
GET: 11
POST: 1
Other: 1
Client errors (4xx): 2
Server errors (5xx): 1
Top IP: 192.168.1.45 — 5 requests
Most requested: /index.html — 4 hits
Unique IPs: 5
Analysis: Lessons Learned
Text Parsing Is Harder Than It Looks
Even with a "standard" format, real log files contain surprises: lines with missing fields, unusual encodings, extremely long URLs, or embedded quotes. Robust parsing must handle these gracefully. Our ParseLogLine function returns False for unparseable lines rather than crashing — this is the right approach.
Sequential vs. Random Access
Log files are inherently sequential. We read from the beginning to the end, processing each line once. There is no need for random access, which is why a text file is the natural choice. Trying to use a typed file here would be awkward because log lines have variable length.
The Value of Text Files
The log file is a text file, and our report is a text file. Both are human-readable. Both can be opened in any text editor. Both can be processed by other tools — grep, sort, awk, Python scripts, or another Pascal program. This interoperability is the great strength of text files.
Memory vs. Disk Trade-offs
Our analysis accumulates statistics in memory (the parallel arrays) while reading the file sequentially from disk. This means we only need enough memory for the aggregated counters, not for the entire log file. A 10 GB log file with millions of entries can be analyzed with just a few kilobytes of memory — as long as the number of unique IPs and paths stays within our array bounds.
Exercises Based on This Case Study
-
Time range filter: Modify the analyzer to accept a start and end timestamp. Only count log entries within that range.
-
Hourly breakdown: Extract the hour from each timestamp and produce an hourly request count — a simple histogram showing traffic patterns through the day.
-
Error detail report: For all 4xx and 5xx responses, write the full log line to a separate error report file. This helps the system administrator focus on problems.
-
Multiple files: Modify the program to accept multiple log filenames and produce a combined report. This is useful when logs are rotated daily.
-
Output to CSV: In addition to the human-readable report, generate a CSV file with columns for IP, request count, and bytes served. This can be imported into a spreadsheet for graphing.