Case Study 1: GlobalBank's Dynamic Reporting System

Background

GlobalBank operates 47 branches across three states. Each branch manager needs regular reports on account balances, transaction volumes, dormant accounts, and overdraft trends. Historically, each report was a separate COBOL-DB2 batch program — 23 programs in total, each with its own SQL, formatting logic, and JCL.

Maria Chen's team spent an estimated 30% of their time maintaining these report programs. Every time the ACCOUNT table schema changed, all 23 programs required modification, retesting, and redeployment. When a branch manager requested a new report variation, the development cycle was 2-3 weeks.

The Problem

During a quarterly review, Priya Kapoor presented the numbers:

  • 23 report programs averaging 800 lines each = 18,400 lines of COBOL to maintain
  • 12 schema changes in the past year, each requiring updates to all 23 programs
  • 47 branch managers requesting an average of 2 custom report modifications per quarter
  • Total maintenance burden: approximately 1,200 developer-hours per year

"We are spending more time maintaining report programs than building new capabilities," Maria told the steering committee.

The Solution: A Dynamic SQL Report Engine

Maria's team designed a single program — RPTENGINE — that reads report specifications from a control table and constructs dynamic SQL at runtime.

Architecture

  1. REPORT_SPEC table: Stores report definitions including base table, column list, filter criteria, sort order, and output format
  2. RPTENGINE program: Reads the specification, builds dynamic SQL with parameter markers, executes the query, and formats output
  3. Report Catalog: A CICS screen (later, a web interface) where branch managers select predefined report templates or request new ones

Key Design Decisions

Decision 1: Parameter markers for all filter values. The team initially considered building filter values directly into the SQL string for simplicity. Tomás Rivera vetoed this approach: "One branch manager will eventually figure out how to enter a semicolon followed by DROP TABLE. Use parameter markers." This decision prevented SQL injection and improved plan cache efficiency.

Decision 2: Whitelist validation for table and column names. Since parameter markers cannot be used for table names, column names, or ORDER BY clauses, the team maintained a whitelist of allowed identifiers. Any table or column name not in the whitelist was rejected with an error message.

Decision 3: EXPLAIN before deployment. Every new report specification was tested with EXPLAIN to verify its access path. Reports that would cause tablespace scans on large tables were flagged for index review before activation.

Results

After six months in production:

  • 1 program replaced 23, reducing code from 18,400 lines to 2,100 lines
  • Schema changes now require updating one program instead of 23
  • New report turnaround dropped from 2-3 weeks to 1-2 days (just a row in REPORT_SPEC)
  • Maintenance effort dropped from 1,200 hours/year to approximately 240 hours/year
  • Zero SQL injection incidents thanks to parameter markers and whitelist validation

Lessons Learned

  1. Dynamic SQL introduces runtime failure modes. A typo in a REPORT_SPEC row causes a PREPARE failure at runtime, not a compile-time error. The team added extensive logging and a "test mode" that PREPAREs without executing.

  2. Access plan variability. Because dynamic SQL is optimized at PREPARE time, the access plan depends on current table statistics. The team scheduled RUNSTATS after every major data load to keep plans optimal.

  3. PREPARE caching matters. For frequently-run reports, the team implemented a mechanism to PREPARE once at program start and reuse the prepared statement, avoiding repeated optimizer invocations.

Discussion Questions

  1. What are the risks of allowing users to define report specifications that become dynamic SQL? How would you mitigate those risks?
  2. Why is whitelist validation (allowing only known-good values) preferred over blacklist validation (blocking known-bad values) for SQL identifiers?
  3. How would you test the RPTENGINE program to ensure it handles all edge cases — empty result sets, very large result sets, NULL values in output columns, invalid specifications?
  4. The team chose to maintain the whitelist as a DB2 table rather than hardcoded values in the COBOL program. What are the advantages and disadvantages of this approach?