Case Study 2: MedClaim's Test Environment Disaster
Background
On a Tuesday morning in March, James Okafor arrived at MedClaim's office to find his phone already buzzing with messages. A junior developer on his team had run a claims processing test job the previous evening — but instead of processing the test claims dataset, the job had processed a portion of the production claims file. Approximately 3,200 real insurance claims had been adjudicated using test pricing tables, resulting in incorrect payment amounts being calculated and queued for the next payment run.
What Happened
The root cause was simple and devastating: a JCL error. The developer had copied a production JCL procedure to create a test job and had changed most of the DD statements to point to test datasets. But one DD statement — the provider pricing table — still pointed to a dataset name that, in the test LPAR (Logical Partition), resolved to production data due to a catalog search order issue.
The relevant JCL lines:
//* Developer intended test data:
//CLMIN DD DSN=TEST.CLAIMS.SAMPLE,DISP=SHR (Correct - test)
//MEMFILE DD DSN=TEST.MEMBER.MASTER,DISP=SHR (Correct - test)
//*
//* Developer forgot to change this one:
//PRICETB DD DSN=PROD.PROVIDER.PRICING,DISP=SHR (WRONG - production!)
//*
//* Output went to test dataset (not production), so the
//* incorrect results were contained — but the test results
//* were meaningless because they mixed test claims with
//* production pricing.
Fortunately, the output dataset was a test dataset, so no production data was corrupted. But the developer had wasted a full day of testing, the investigation consumed four hours of senior staff time, and the incident triggered a mandatory review of test environment practices.
The Fix
James and Tomás Rivera (the DBA) implemented several changes:
-
Naming convention enforcement: Test JCL must use only dataset names with the
TEST.high-level qualifier. A JCL validation script scans for anyPROD.references before submission. -
Cataloged procedures for testing: Instead of copying and modifying production JCL, developers now use pre-built test procedures where all DD statements are pre-configured for the test environment.
-
Separate test LPAR catalog: The test system's catalog was reconfigured so that unqualified dataset names cannot accidentally resolve to production datasets.
-
Code review for JCL: JCL changes now require peer review, just like COBOL code changes.
Analysis Questions
-
The developer copied production JCL and modified it for testing — a common practice. What safer alternatives exist for creating test JCL?
-
Why is the DD statement connection between JCL and COBOL programs both a strength and a risk? How does the separation of logical and physical file names contribute to this type of error?
-
James implemented a "JCL validation script" that scans for production dataset references. What other automated checks could prevent similar incidents?
-
The incident was caught before payment processing ran, so no real financial harm occurred. How might the outcome have differed if the output had gone to a production dataset instead of a test dataset?
-
This case illustrates the textbook's "Defensive Programming" theme applied to JCL rather than COBOL. How would you extend the concept of defensive programming to encompass the entire development workflow, not just the source code?
Lessons for Students
- JCL errors can be as consequential as COBOL bugs. The glue between programs and data is just as important as the programs themselves.
- Test environments must be isolated from production. This is not just good practice — in regulated industries like insurance, it is a compliance requirement.
- Never copy-and-modify production JCL without a checklist. Every DD statement must be verified against the intended environment.
- Peer review applies to everything, not just code. JCL, configuration files, and deployment scripts all deserve the same scrutiny as COBOL source.