Case Study: Code Review Process at a Financial Services Firm

Background

Meridian Financial Services is a mid-sized financial services firm based in Charlotte, North Carolina, providing payment processing, merchant acquiring, and fraud detection services. Their technology infrastructure runs on an IBM z/OS mainframe, with a COBOL application portfolio comprising approximately 380 programs, 620 copybooks, and 1,800 JCL procedures. The team consists of eighteen COBOL developers, ranging from two years to thirty-one years of experience.

In January 2022, Meridian experienced a production incident that cost the company $2.3 million. A routine change to a fee calculation program introduced a rounding error that went undetected for eleven business days. The root cause was straightforward: a developer had changed a field definition from PIC S9(7)V99 COMP-3 to PIC S9(7)V9 COMP-3, inadvertently dropping a decimal place. The change had been reviewed only by the developer's direct manager, who signed off on the change request without examining the code in detail.

This incident became the catalyst for implementing a formal code review process. The CTO, David Okafor, commissioned a project to establish rigorous peer code reviews for all COBOL changes. He appointed senior developer Patricia Yamamoto as the Code Review Process Lead.

Designing the Review Process

Patricia spent the first six weeks researching code review practices from both the mainframe world and modern software development. She spoke with peers at other financial institutions, reviewed academic literature on code inspection techniques, and examined how open-source projects conducted reviews. From this research, she designed a process tailored to Meridian's environment and culture.

The process distinguished between three levels of review based on risk:

Level 1 (Standard Review) applied to routine changes such as report modifications, field additions, and straightforward business rule updates. These required one peer reviewer in addition to the author.

Level 2 (Enhanced Review) applied to changes affecting financial calculations, data transformations, or inter-system interfaces. These required two peer reviewers, at least one of whom had to be a designated Senior Reviewer.

Level 3 (Critical Review) applied to changes in the payment processing pipeline, fraud detection logic, and regulatory reporting programs. These required two Senior Reviewers plus a walkthrough meeting where the author presented the changes to the reviewers.

The classification was determined by a risk matrix that mapped programs to risk levels based on their function, processing volume, and regulatory significance. Patricia maintained this matrix as a simple reference table, updated quarterly.

The Review Checklist

The cornerstone of the process was a detailed review checklist. Patricia worked with the development team to identify the categories of defects most likely to appear in COBOL code changes. The checklist was organized into eight sections.

Section 1: Data Definition Accuracy

This section addressed the class of defect that had caused the original $2.3 million incident. Reviewers were required to verify every changed or added data definition against the relevant copybook, database schema, or interface specification.

      *================================================================*
      * REVIEW CHECKPOINT: Data Definition Verification                 *
      *                                                                 *
      * Verify: PIC clauses match source/target field definitions       *
      * Verify: COMP-3 used for financial amounts                       *
      * Verify: Signed fields (S) where negative values possible        *
      * Verify: USAGE clauses appropriate for field purpose              *
      * Verify: Value clauses for initialization are correct             *
      *================================================================*

      * EXAMPLE - Reviewer should verify these match the DB2 table DDL
       01  WS-TRANSACTION-RECORD.
           05  WS-TXN-ACCOUNT-NUMBER   PIC X(16).
           05  WS-TXN-AMOUNT           PIC S9(13)V99  COMP-3.
           05  WS-TXN-FEE-AMOUNT       PIC S9(07)V99  COMP-3.
           05  WS-TXN-DATE             PIC X(10).
           05  WS-TXN-TYPE-CODE        PIC X(02).
           05  WS-TXN-STATUS           PIC X(01).
               88  TXN-PENDING                        VALUE 'P'.
               88  TXN-APPROVED                       VALUE 'A'.
               88  TXN-DECLINED                       VALUE 'D'.
               88  TXN-REVERSED                       VALUE 'R'.

Section 2: Arithmetic and Financial Calculations

Given Meridian's business, this section received particular attention. Reviewers had to verify that all financial calculations used appropriate data types, handled rounding correctly, and preserved precision through intermediate calculations.

      *================================================================*
      * REVIEW CHECKPOINT: Financial Calculation Integrity               *
      *                                                                 *
      * Verify: ROUNDED phrase used on financial assignments             *
      * Verify: ON SIZE ERROR handled for all COMPUTE/ADD/MULTIPLY      *
      * Verify: Intermediate results have sufficient precision           *
      * Verify: Rounding method matches business requirements            *
      *================================================================*

      * CORRECT - Proper rounding and size error handling
           COMPUTE WS-TXN-FEE-AMOUNT ROUNDED
               = WS-TXN-AMOUNT * WS-FEE-RATE / 100
               ON SIZE ERROR
                   MOVE 'FEE CALCULATION OVERFLOW'
                       TO ER-MESSAGE-TEXT
                   PERFORM 8500-LOG-ERROR
                   SET TXN-DECLINED TO TRUE
               NOT ON SIZE ERROR
                   CONTINUE
           END-COMPUTE

      * DEFECT PATTERN - Missing ROUNDED and no SIZE ERROR
      *    (Reviewer should flag this)
           COMPUTE WS-TXN-FEE-AMOUNT
               = WS-TXN-AMOUNT * WS-FEE-RATE / 100.

Section 3: Conditional Logic Completeness

Reviewers checked that all conditional structures handled every possible case, including boundary conditions and unexpected values.

      * DEFECT PATTERN - Missing WHEN OTHER in EVALUATE
      * Reviewer should flag: What happens for unknown status codes?
           EVALUATE WS-TXN-STATUS
               WHEN 'P'
                   PERFORM 3100-PROCESS-PENDING
               WHEN 'A'
                   PERFORM 3200-PROCESS-APPROVED
               WHEN 'D'
                   PERFORM 3300-PROCESS-DECLINED
           END-EVALUATE

      * CORRECTED - All cases handled
           EVALUATE WS-TXN-STATUS
               WHEN 'P'
                   PERFORM 3100-PROCESS-PENDING
               WHEN 'A'
                   PERFORM 3200-PROCESS-APPROVED
               WHEN 'D'
                   PERFORM 3300-PROCESS-DECLINED
               WHEN 'R'
                   PERFORM 3400-PROCESS-REVERSED
               WHEN OTHER
                   STRING 'UNKNOWN STATUS: ' WS-TXN-STATUS
                       DELIMITED BY SIZE
                       INTO ER-MESSAGE-TEXT
                   END-STRING
                   PERFORM 8500-LOG-ERROR
           END-EVALUATE

Section 4: File and Database Operations

This section ensured that all file operations checked return status codes and that database operations handled SQLCODE values appropriately.

Section 5: Loop Termination and Performance

Reviewers verified that all loops had proper termination conditions and that performance-sensitive sections did not contain unnecessary I/O operations or inefficient data access patterns.

Section 6: Copybook and Interface Consistency

When changes affected shared copybooks or inter-program interfaces, reviewers verified that all programs using the modified copybook had been identified and assessed for impact.

Section 7: Standards Compliance

This section checked adherence to Meridian's coding standards, including naming conventions, program structure, comment headers, and prohibited constructs.

Section 8: JCL and Deployment Artifacts

COBOL changes rarely exist in isolation. Reviewers verified that any necessary JCL changes, DBRM binds, or deployment instructions were included and correct.

Tooling and Automation

Patricia recognized that relying entirely on human reviewers for every checklist item was inefficient and error-prone. She worked with the infrastructure team to implement automated standards checking as the first line of defense, reserving human reviewers for the judgment-intensive items.

The automated tooling pipeline operated in three stages.

The first stage was a static analysis scan that ran automatically when a developer checked code into the version control system. This scan checked for standards violations, flagged potentially dangerous constructs like ALTER statements and GO TO paragraphs that did not target an EXIT paragraph, and verified that comment headers were present.

The second stage was a comparison analysis that examined the differences between the current and previous versions of each changed program. This tool flagged changes to PIC clauses in financial fields, modifications to EVALUATE or IF structures that reduced the number of handled cases, and removals of error handling code. These flags did not indicate definite defects but directed reviewer attention to the areas most likely to contain problems.

The third stage was a cross-reference analysis that identified all programs, copybooks, and JCL procedures potentially affected by the change. This ensured that reviewers could assess the full blast radius of a modification.

The output from these tools was bundled into a Review Package that was presented to reviewers alongside the code changes:

================================================================
MERIDIAN FINANCIAL SERVICES - CODE REVIEW PACKAGE
================================================================
Change Request:  CR-2023-0847
Program(s):      TXNFEE01.CBL, FEECALC.CPY
Developer:       R. MARTINEZ
Risk Level:      LEVEL 2 (Financial Calculation)
Reviewers:       P. YAMAMOTO (Senior), K. OKONKWO
================================================================

AUTOMATED SCAN RESULTS:
  Standards Compliance:     PASS (0 violations)
  Dangerous Constructs:     PASS (0 flags)

CHANGE ANALYSIS FLAGS:
  >> PIC clause change in FEECALC.CPY line 47
     OLD: 05 FC-TIER-RATE  PIC S9(3)V9(4) COMP-3.
     NEW: 05 FC-TIER-RATE  PIC S9(3)V9(6) COMP-3.
     NOTE: Financial field precision increased - verify all
           consumers handle extended precision correctly.

  >> New EVALUATE branch added in TXNFEE01.CBL line 312
     NOTE: Verify WHEN OTHER still present.

CROSS-REFERENCE IMPACT:
  FEECALC.CPY is referenced by:
     TXNFEE01.CBL  (modified in this CR)
     TXNFEE02.CBL  (NOT modified - review required?)
     MNTHFEE.CBL   (NOT modified - review required?)
     FEERPT01.CBL  (NOT modified - review required?)
================================================================

This review package became one of the most valued aspects of the process. Reviewers reported that it cut their review time roughly in half by directing their attention to the highest-risk areas rather than requiring them to read every line of every changed program from scratch.

Common Defects Found Through Reviews

Over the first eighteen months of the code review process, Patricia maintained a database of every defect found during reviews. By the end of that period, the database contained 847 defects across 1,243 reviewed change requests. Analysis of these defects revealed clear patterns.

The most common defect category, representing 23% of all findings, was incomplete error handling. Developers frequently added new file operations or database calls without implementing proper status checking. This was particularly common in quick-fix changes where developers were under time pressure.

The second most common category, at 19%, was data definition mismatches. These included PIC clause disagreements between programs and copybooks, missing COMP-3 specifications on financial fields, and signed versus unsigned mismatches.

The third category, at 15%, was logic coverage gaps. These were cases where conditional structures did not handle all possible input values. Missing WHEN OTHER clauses in EVALUATE statements were the single most common specific defect in the entire database.

The fourth category, at 12%, was copybook impact failures. Developers would modify a copybook used by multiple programs but only update and test the program they were working on, leaving other programs with potentially incompatible changes.

A defect that illustrated the value of the review process particularly well occurred in month eight. A developer modified the daily settlement batch program to add a new fee category. The change itself was correct, but reviewer Karen Okonkwo noticed that the developer had placed the new fee calculation inside a paragraph that was performed once per settlement batch rather than once per transaction. The result would have been that the fee was calculated using only the last transaction's amount rather than each transaction's actual amount. The potential financial impact was estimated at $180,000 per day in incorrect fee assessments.

      * DEFECT AS SUBMITTED - Fee calc in wrong paragraph
       3000-SETTLEMENT-SUMMARY.
           PERFORM 3100-CALC-TOTAL-VOLUME
           PERFORM 3200-CALC-STANDARD-FEES
           PERFORM 3250-CALC-PREMIUM-FEES
      *    >> REVIEWER FLAG: This paragraph runs once per batch.
      *    >> New fee should be calculated per-transaction in 2000.
           PERFORM 3260-CALC-NEW-SURCHARGE-FEE
           PERFORM 3300-GENERATE-SETTLEMENT-RECORD
           .

      * CORRECTED - Fee calc moved to per-transaction processing
       2000-PROCESS-TRANSACTION.
           PERFORM 2100-VALIDATE-TRANSACTION
           PERFORM 2200-CALC-STANDARD-TXN-FEE
           PERFORM 2250-CALC-PREMIUM-TXN-FEE
           PERFORM 2260-CALC-SURCHARGE-TXN-FEE
           PERFORM 2300-ACCUMULATE-TOTALS
           PERFORM 8000-READ-NEXT-TRANSACTION
           .

Cultural Transformation

The introduction of code reviews changed the development culture at Meridian in ways that went beyond defect detection.

Initially, some developers viewed reviews as an implicit criticism of their abilities. Two senior developers, in particular, were visibly uncomfortable having their code examined by peers. Patricia addressed this by establishing clear ground rules: reviews examined code, not people. Review comments were phrased as questions ("Could this EVALUATE benefit from a WHEN OTHER clause?") rather than directives ("You forgot the WHEN OTHER clause"). Every reviewer was also regularly reviewed, reinforcing that no one was exempt.

Over time, a shift occurred. Developers began to write code with the awareness that it would be reviewed, which naturally improved quality at the point of creation. Several developers reported that they caught their own errors while preparing code for review because the act of organizing changes for someone else to read forced them to think more carefully about what they had written.

The reviews also became a powerful knowledge-sharing mechanism. Junior developers who reviewed senior developers' code learned advanced techniques and patterns. Senior developers who reviewed junior developers' code gained insight into where the team's knowledge gaps lay and could target mentoring accordingly. One junior developer, Rafael Martinez, credited the review process with accelerating his understanding of DB2 cursor management by at least six months, because reviewing cursor-heavy programs written by senior developers exposed him to patterns he would not have encountered in his own assignments for years.

Cross-domain knowledge improved as well. Before the review process, developers tended to specialize narrowly. The payment processing developers rarely saw fraud detection code, and vice versa. The review assignments intentionally mixed domain expertise, so that developers gained familiarity with parts of the system outside their primary area. This paid dividends when developers needed to cover for colleagues during vacations or when changes spanned multiple subsystems.

Measuring Results Over Eighteen Months

Patricia tracked metrics rigorously throughout the initiative. The results over eighteen months told a clear story.

Production defects related to code changes dropped from an average of 4.1 per month in the twelve months before reviews to 0.8 per month in the final six months of the measurement period. This represented an 80% reduction in escaped defects.

The defect detection rate during reviews (defects found per change request) started at 0.52 in the first quarter and rose to 0.71 by the fourth quarter, not because code quality was declining but because reviewers were getting better at finding subtle issues. By the sixth quarter, the rate had dropped to 0.41, which Patricia interpreted as evidence that developers were writing better code in the first place due to the review discipline.

The cost of the review process was non-trivial. On average, each change request required 3.2 hours of reviewer time in addition to the author's development time. For Meridian's volume of approximately 70 change requests per month, this represented roughly 224 hours of reviewer effort monthly, equivalent to about 1.4 full-time developers. However, the reduction in production incidents (which previously consumed an average of 380 hours per month in emergency fixes, root cause analysis, and management reporting) more than offset this cost.

The mean time to deploy a change request increased by 1.8 days due to the review cycle. Patricia addressed this concern by implementing an expedited review track for emergency production fixes, which required a single Senior Reviewer and could be completed within four hours. This track was used sparingly, averaging twice per month, and was followed by a retrospective standard review within five business days.

Customer-facing incidents attributed to code defects dropped from seven in the twelve months before the review process to one in the eighteen months after. The single incident that did occur was traced to a defect in a third-party interface specification rather than a code error, meaning that zero customer-facing incidents were caused by code defects that the review process should have caught.

The review process evolved over the eighteen months based on team feedback and metrics analysis.

The most significant refinement was the introduction of review templates specific to common change types. A developer modifying a batch fee calculation program would receive a template pre-populated with the checklist items most relevant to that type of change. This reduced both the time to complete a review and the likelihood of overlooking domain-specific concerns.

Another refinement was the "review buddy" system for junior developers. Each junior developer was paired with a senior developer who served as their default reviewer for the first year. This ensured consistent mentoring and helped junior developers calibrate their understanding of what constituted acceptable code quality.

Patricia also introduced quarterly review retrospectives where the team examined the defects found during the quarter, identified patterns, and updated the checklist accordingly. These retrospectives led to the addition of three new checklist items and the retirement of two items that had never yielded findings.

Conclusion

Meridian Financial Services' experience demonstrates that formal code review is one of the highest-leverage quality practices available to COBOL development teams. The combination of automated pre-screening to handle mechanical checks and human review to address design and logic concerns created a multi-layered defense against defects.

The financial case was unambiguous: the review process cost approximately $340,000 per year in reviewer time but prevented an estimated $2.1 million in production incident costs, yielding a return of more than six to one. Beyond the financial metrics, the process transformed the development culture from one of isolated individual work to collaborative shared ownership of code quality.

For organizations considering implementing code reviews for COBOL development, Meridian's experience offers several practical recommendations: start with a risk-based classification to focus effort where it matters most, invest in automated tooling to handle mechanical checks, build checklists grounded in the actual defect patterns observed in your codebase, treat the process as a learning opportunity rather than a compliance exercise, and measure results relentlessly to maintain organizational support and guide continuous improvement.