Appendix J: Answers to Selected Exercises

This appendix provides detailed solutions to selected exercises from each major part of the textbook. These are not the only correct answers — in many cases there are multiple valid approaches. What matters is that your reasoning is sound and your solution handles production realities (error conditions, edge cases, scale).

If your answer differs from the one below but addresses all the requirements, it may be equally valid. The explanations here are meant to illuminate the thinking process, not just the final answer.


Chapter 1: The z/OS Ecosystem

Exercise A1: Address Space Isolation

Problem: Explain why z/OS uses separate address spaces for each major subsystem (DB2, CICS, MQ, JES2) rather than running them all within a single address space. Identify at least three architectural benefits.

Solution:

z/OS places each major subsystem in its own address space because the benefits of isolation far outweigh the cost of cross-memory communication. Three key benefits:

  1. Failure isolation. If DB2 abends, CICS can continue processing transactions that don't require DB2 access (VSAM-only transactions, for example). If all subsystems shared an address space, a storage corruption in DB2 could take down CICS, MQ, and JES2 simultaneously. In a Tier-1 bank like CNB processing 500 million transactions daily, the difference between "DB2 is down" and "the entire LPAR is down" is measured in millions of dollars.

  2. Independent resource management. Each address space has its own virtual storage, and WLM can manage each subsystem's CPU and I/O priority independently. DB2 can get more CPU during a batch window while CICS maintains online response times. In a single address space, resource management granularity would be lost — you'd be managing one giant process rather than tuning each subsystem independently.

  3. Independent maintenance. DB2 can be stopped, upgraded, and restarted without affecting CICS or MQ. In a shared address space, any subsystem maintenance would require stopping everything. At CNB, Kwame Mensah schedules DB2 maintenance windows that don't coincide with CICS maintenance — this is only possible because they're in separate address spaces.

Additional benefits include independent security contexts (each subsystem runs under its own userid with specific RACF authorities) and independent diagnostic capabilities (a dump of the DB2 address space doesn't include CICS storage, making analysis faster and cleaner).


Exercise B1: Batch Job Trace Analysis

Problem: Given the SMF data for job CNBAR200 (47 min elapsed, 2 min 14 sec CPU, 18 min DB2 CPU, DISCRETIONARY WLM class), explain the discrepancy between elapsed time and CPU time.

Solution:

(a) Reasons for elapsed time much higher than CPU time:

The job's own CPU time is 2 minutes 14 seconds out of 47 minutes elapsed — a CPU/elapsed ratio of about 4.8%. The "missing" 44+ minutes are consumed by:

  • DB2 cross-memory overhead: The 1,200,000 DB2 calls each involve a cross-memory call to the DB2 address space. The DB2 processing time (18 min) is consumed in the DB2 address space, not the job's address space, so it doesn't appear in the job's CPU time.
  • I/O wait time: The 342,187 VSAM EXCPs represent physical I/O operations. Even with good buffer management, physical reads to disk take milliseconds each.
  • WLM delay: The DISCRETIONARY service class means the job gets CPU only when higher-priority work doesn't need it. At CNB, DISCRETIONARY work can experience significant WLM queuing delays during busy periods.
  • Potential lock waits: If the DB2 calls are updates, the job may be waiting for locks held by other programs.
  • Paging: If the REGION is tight, the job may be paging, causing additional I/O waits.

(b) DB2 CPU time higher than job CPU time:

DB2 CPU time (18 minutes) is consumed in the DB2 address space (DBM1), not in the job's address space. When the COBOL program issues an EXEC SQL call, control transfers via cross-memory to DB2's address space. The CPU time for parsing, optimizing (for dynamic SQL), executing access paths, managing locks, and writing log records is all charged to DB2, not to the calling program. The 2 min 14 sec of job CPU time is just the COBOL program's own processing (data manipulation, file I/O, business logic).

(c) DISCRETIONARY service class impact:

DISCRETIONARY means WLM treats this job as lowest priority — it only gets resources when nothing higher-priority wants them. On a busy system, this can cause significant delays. For a job processing 1.2 million DB2 calls, even small per-call delays accumulate.

Recommendation: If this job is on the critical path or has a deadline, it should be classified into a higher WLM service class with a velocity or completion goal. Change the JCL JOBCLASS or add a scheduling subsystem classification rule that maps this job to an appropriate service class.


Chapter 3: Language Environment Internals

Exercise A3: LE Runtime Options Impact

Problem: Describe the effects of the LE runtime option STORAGE(00,00,00,00) versus STORAGE(NONE,NONE,NONE,NONE) on a production COBOL program's behavior.

Solution:

STORAGE(00,00,00,00) initializes all storage areas (heap, stack, static, external) to binary zeros (x'00') at allocation and deallocation. This is the debug setting — it ensures all fields start at known values and it scrubs storage after use, making debugging easier but consuming measurable CPU for every storage operation.

STORAGE(NONE,NONE,NONE,NONE) performs no storage initialization. Allocated storage contains whatever was in that memory previously ("garbage"). Freed storage retains its contents until reused.

Production implications:

  1. Performance: STORAGE(NONE,...) is significantly faster for storage-intensive programs. In a CICS environment processing thousands of transactions per second, the per-transaction cost of zeroing storage accumulates. At CNB, switching from STORAGE(00,...) to STORAGE(NONE,...) for a high-volume CICS program reduced CPU per transaction by 8%.

  2. Data integrity risk: With STORAGE(NONE,...), any WORKING-STORAGE field that isn't explicitly initialized by the program may contain unpredictable data. COBOL VALUE clauses still work (they're compiled into initialization code), but any field without a VALUE clause gets whatever was in memory. This is why the coding standard requires VALUE clauses on all WORKING-STORAGE fields.

  3. Security risk: With STORAGE(00,...), freed storage is scrubbed, so sensitive data (account numbers, SSNs) doesn't linger in memory. With STORAGE(NONE,...), freed storage retains its contents. If the same storage is later allocated to another program or user, they could theoretically see residual data. For applications handling PCI or HIPAA data, consider STORAGE(NONE,NONE,00,NONE) — which only scrubs at free time, not allocation time, halving the overhead while still clearing sensitive residuals.

Recommendation for production: Use STORAGE(NONE,NONE,NONE,NONE) for performance-critical programs, but enforce VALUE clauses on all WORKING-STORAGE fields through code review standards. Use STORAGE(00,00,00,00) in test environments to catch uninitialized field bugs early.


Chapter 6: DB2 Optimizer Internals

Exercise 1: Basic Filter Factor Calculation

Problem: Table CUSTOMER has 2,000,000 rows, REGION has COLCARDF = 8, no frequency statistics. Calculate filter factors for several predicates.

Solution:

(a) WHERE REGION = 'NORTHEAST'

Without frequency statistics, the filter factor for an equality predicate is:

FF = 1 / COLCARDF = 1 / 8 = 0.125

Estimated rows: 2,000,000 * 0.125 = 250,000 rows

(b) WHERE REGION IN ('NORTHEAST', 'SOUTHEAST', 'MIDWEST')

For an IN-list with N values:

FF = N / COLCARDF = 3 / 8 = 0.375

Estimated rows: 2,000,000 * 0.375 = 750,000 rows

(c) WHERE REGION <> 'WEST'

For a not-equal predicate:

FF = 1 - (1 / COLCARDF) = 1 - (1/8) = 0.875

Estimated rows: 2,000,000 * 0.875 = 1,750,000 rows

(d) WHERE REGION = 'NORTHEAST' AND CUST_TYPE = 'RETAIL'

Under the independence assumption, DB2 multiplies the filter factors:

FF(REGION) = 1/8 = 0.125
FF(CUST_TYPE) = 1/4 = 0.25
FF(combined) = 0.125 * 0.25 = 0.03125

Estimated rows: 2,000,000 * 0.03125 = 62,500 rows

Why the independence assumption fails here: If all RETAIL customers happen to be in NORTHEAST (a highly correlated distribution), the actual filter factor for the combined predicate is:

FF(actual) = (number of RETAIL in NORTHEAST) / total rows
           = (say 500,000) / 2,000,000 = 0.25

The optimizer estimates 62,500 rows but the actual result is 500,000 — off by a factor of 8x. This is exactly the scenario where column group statistics (RUNSTATS COLGROUP) should be collected to give the optimizer the correlation information. Without column group statistics, the optimizer will underestimate the result set size and may choose an index scan when a tablespace scan would be more efficient.


Exercise 6: Reading a Simple PLAN_TABLE

Problem: Interpret the PLAN_TABLE output showing EMPLOYEE accessed via index IX_EMP_DEPTDT with 2 matching columns, joined (NL) to DEPARTMENT accessed index-only via IX_DEPT_PK with 1 matching column.

Solution:

Step-by-step interpretation:

  • PLANNO 1, METHOD 0: EMPLOYEE is the outer table (METHOD 0 = first table accessed). ACCESSTYPE = I means index access. MATCHCOLS = 2 means two leading columns of IX_EMP_DEPTDT match predicates in the WHERE clause. INDEXONLY = N means DB2 must access the base table data pages (the index alone doesn't satisfy the query). PREFETCH = S means sequential prefetch is used.

  • PLANNO 2, METHOD 1: DEPARTMENT is the inner table, joined via nested loop join (METHOD 1). ACCESSTYPE = I via IX_DEPT_PK with MATCHCOLS = 1. INDEXONLY = Y means all columns needed from DEPARTMENT exist in the index — no base table access needed. No sort operations on either table.

A plausible query:

SELECT E.EMPLOYEE_ID, E.EMPLOYEE_NAME, E.HIRE_DATE,
       D.DEPARTMENT_NAME
FROM   EMPLOYEE E
       INNER JOIN DEPARTMENT D
         ON E.DEPARTMENT_ID = D.DEPARTMENT_ID
WHERE  E.DEPARTMENT_ID = :HV-DEPT
  AND  E.HIRE_DATE > :HV-DATE
ORDER BY E.HIRE_DATE

Here, IX_EMP_DEPTDT is likely defined on (DEPARTMENT_ID, HIRE_DATE) — explaining 2 matching columns. IX_DEPT_PK is the primary key index on (DEPARTMENT_ID) — explaining 1 matching column. The index-only access on DEPARTMENT means DEPARTMENT_NAME is likely an INCLUDE column in the PK index, or the query only selects the key column from DEPARTMENT.


Chapter 8: DB2 Locking, Concurrency, and Deadlock Resolution

Exercise 4: The Missing FOR UPDATE OF

Problem: Identify the locking defect in a cursor that reads accounts and then updates WHERE CURRENT OF without FOR UPDATE OF. Explain the deadlock scenario.

Solution:

(a) The defect: The cursor is declared without FOR UPDATE OF ACCT_BAL. This means the FETCH acquires an S (share) lock, not a U (update) lock. When the subsequent UPDATE WHERE CURRENT OF executes, it must promote the S lock to an X (exclusive) lock. This S-to-X promotion is the root cause of the deadlocks.

(b) The deadlock scenario:

  1. Program instance A FETCHes row 100, acquiring S lock on row 100.
  2. Program instance B FETCHes row 100, acquiring S lock on row 100. (S is compatible with S, so B's lock is granted.)
  3. Program instance A issues UPDATE WHERE CURRENT OF on row 100. It needs to promote its S lock to X. But B holds S on the same row — X is incompatible with S. A waits.
  4. Program instance B issues UPDATE WHERE CURRENT OF on row 100. It needs to promote its S lock to X. But A holds S on the same row. B waits.
  5. Deadlock. A waits for B; B waits for A. DB2 detects the deadlock and rolls back one of them (-911).

This is a classic "S-to-X promotion deadlock." Two holders of S locks each need X, and neither can get it because the other holds S.

(c) The corrected cursor:

       EXEC SQL
           DECLARE CSR1 CURSOR FOR
           SELECT ACCT_BAL
           FROM ACCT_MASTER
           WHERE BRANCH_ID = :WS-BRANCH
           FOR UPDATE OF ACCT_BAL
       END-EXEC

With FOR UPDATE OF, the FETCH acquires a U (update) lock instead of S. U is compatible with S (so readers aren't blocked), but U is incompatible with U. This means the second program instance will wait at FETCH time — before it ever gets to the UPDATE. The wait is resolved when the first instance commits, and the deadlock is prevented entirely.

The general rule: any cursor that will be used with UPDATE WHERE CURRENT OF must be declared FOR UPDATE OF. This is not optional. It is a production correctness requirement.


Exercise 18: Retry Logic — Find the Bug

Problem: The retry logic performs UPDATE-ACCOUNT three times on -911 without rolling back or re-establishing position.

Solution:

Defects identified:

  1. No ROLLBACK before retry. When DB2 returns -911 (deadlock or timeout), it has already rolled back the unit of work. However, the program doesn't reset its state — WS-ACCT-ID might have been part of a multi-step operation where earlier steps were also rolled back. The retry must start from the beginning of the logical unit of work, not just replay the last UPDATE.

  2. No delay between retries. Immediately retrying a deadlocked operation will likely deadlock again because the competing program is in the same state. A brief delay (even 100 milliseconds) allows the other transaction to complete.

  3. No logging of the retry. When production programs retry, operations needs to know. Silent retries mask systemic problems.

  4. Stacked IF structure instead of a loop. The code duplicates the retry logic three times — a maintenance hazard and a sign that the developer was working around the lack of a proper retry loop.

Corrected code:

       WORKING-STORAGE SECTION.
       01  WS-RETRY-COUNT       PIC 9(2)  VALUE 0.
       01  WS-MAX-RETRIES       PIC 9(2)  VALUE 3.
       01  WS-RETRY-DELAY       PIC 9(8)  VALUE 100.
       01  WS-UOW-COMPLETE      PIC X     VALUE 'N'.

       PROCEDURE DIVISION.
       1000-MAIN.
           MOVE 'N' TO WS-UOW-COMPLETE
           MOVE 0   TO WS-RETRY-COUNT

           PERFORM UNTIL WS-UOW-COMPLETE = 'Y'
              OR WS-RETRY-COUNT > WS-MAX-RETRIES
               PERFORM 2000-UPDATE-ACCOUNT
               EVALUATE TRUE
                   WHEN SQLCODE = 0
                       EXEC SQL COMMIT END-EXEC
                       MOVE 'Y' TO WS-UOW-COMPLETE
                   WHEN SQLCODE = -911
                       EXEC SQL ROLLBACK END-EXEC
                       ADD 1 TO WS-RETRY-COUNT
                       DISPLAY 'BATCHUPD: RETRY '
                               WS-RETRY-COUNT
                               ' FOR ACCT ' WS-ACCT-ID
                               ' SQLCODE=-911'
                       IF WS-RETRY-COUNT <=
                                       WS-MAX-RETRIES
                           CALL 'CEELOCT' USING
                               WS-RETRY-DELAY
                       END-IF
                   WHEN OTHER
                       PERFORM 9000-ABEND-PROGRAM
               END-EVALUATE
           END-PERFORM

           IF WS-UOW-COMPLETE NOT = 'Y'
               DISPLAY 'BATCHUPD: MAX RETRIES EXCEEDED'
                       ' FOR ACCT ' WS-ACCT-ID
               PERFORM 9000-ABEND-PROGRAM
           END-IF.

       2000-UPDATE-ACCOUNT.
           EXEC SQL
               UPDATE ACCT_MASTER
               SET ACCT_BAL = :WS-NEW-BAL
               WHERE ACCT_ID = :WS-ACCT-ID
           END-EXEC.

Key corrections: explicit ROLLBACK after -911, retry counter in a loop, delay between retries, logging of every retry event, clean abend after max retries exceeded.


Chapter 13: CICS Architecture for Architects

Exercise 9: Routing Program Performance

Problem: A routing program performs a 2ms DB2 query per transaction. TOR processes 3,000 TPS. Calculate the overhead and propose alternatives.

Solution:

(a) Total CPU time consumed by routing per second:

3,000 transactions/sec * 2 ms/transaction = 6,000 ms = 6 seconds of DB2 time per second

This is physically impossible on a single CPU — you'd need 6 CPUs dedicated solely to routing decisions. In practice, the TOR would be unable to sustain 3,000 TPS; transactions would queue waiting for the routing program, and response times would skyrocket.

(b) Why this is problematic:

The routing program executes synchronously in the TOR for every transaction before the transaction is shipped to an AOR. The routing program is on the critical path of every single transaction. Even 0.5ms per routing decision at 3,000 TPS would consume 1.5 seconds of CPU per second — 150% of one CPU. DB2 access in a routing program is an anti-pattern.

(c) Alternatives:

Option 1: In-memory lookup table. Load the account-type-to-AOR mapping into a CICS shared data table or a COBOL table at CICS startup. The routing program does an in-memory lookup instead of a DB2 query. Cost: microseconds instead of milliseconds. Refresh the table periodically (every N minutes) via a maintenance transaction.

Option 2: CICSPlex SM workload management. Use CICSPlex SM's built-in routing based on transaction ID and WLM health. If the routing needs to be account-type-aware, encode the account type in the COMMAREA and use a simple routing program that reads the first byte of the COMMAREA — no DB2 access, just a COMMAREA byte comparison.

Option 3: Static routing by transaction ID. Assign different 4-character transaction IDs per account type (XFRC for checking transfers, XFRS for savings transfers). Route by transaction ID, which CICSPlex SM does natively with zero custom code.

The lesson: a routing program must be stateless, in-memory, and sub-millisecond. Any I/O — DB2, VSAM, MQ — in a routing program is a design defect.


Chapter 19: IBM MQ for COBOL

Exercise 7: Transactional Boundaries

Problem: Explain why the code issues SYNCPOINT between the DB2 UPDATE and the MQPUT.

Solution:

The code is dangerous because it breaks the transactional boundary between the DB2 update and the MQ message. Here's what can go wrong:

  1. The EXEC SQL UPDATE debits the account by WS-AMOUNT.
  2. EXEC CICS SYNCPOINT commits the DB2 update — the debit is now permanent.
  3. The CALL 'MQPUT' attempts to send the notification message.
  4. If the MQPUT fails (queue full, authorization failure, queue manager quiescing), the debit has already been committed. The downstream system never receives the message. Money has been debited but no notification was sent.

The fundamental problem: The SYNCPOINT between the UPDATE and the MQPUT creates two separate units of work where there should be one. The DB2 update and the MQ message must be committed together — atomically.

The fix: Remove the SYNCPOINT between the two operations. Both DB2 and MQ participate in CICS's two-phase commit protocol, so a single SYNCPOINT at the end commits both:

           EXEC SQL
               UPDATE ACCOUNTS
               SET BALANCE = BALANCE - :WS-AMOUNT
               WHERE ACCOUNT_NUM = :WS-ACCT
           END-EXEC

           IF SQLCODE = 0
               MOVE MQPMO-SYNCPOINT TO MQPMO-OPTIONS
               ADD MQPMO-FAIL-IF-QUIESCING
                   TO MQPMO-OPTIONS
               CALL 'MQPUT1' USING WS-HCONN
                                   WS-OD
                                   MQMD
                                   MQPMO
                                   WS-MSG-LENGTH
                                   WS-MSG-BUFFER
                                   WS-COMPCODE
                                   WS-REASON
               IF WS-COMPCODE = MQCC-OK
                   EXEC CICS SYNCPOINT END-EXEC
               ELSE
                   EXEC CICS SYNCPOINT ROLLBACK END-EXEC
               END-IF
           ELSE
               EXEC CICS SYNCPOINT ROLLBACK END-EXEC
           END-IF

Note two critical details: (1) MQPMO-SYNCPOINT is set so the MQPUT participates in CICS's unit of work, and (2) MQPMO-FAIL-IF-QUIESCING is set so the program exits cleanly if the queue manager is shutting down. If either the SQL or the MQPUT fails, the entire unit of work is rolled back.


Chapter 23: Batch Window Engineering

Exercise 2: Critical Path Identification

Problem: Given the DAG from Exercise 1 with job durations, identify all paths, the critical path, and slack.

Solution:

All paths through the DAG:

Path 1: A → C → E → F → H → J  = 20+30+25+40+35+10 = 160 min
Path 2: A → C → E → I → J       = 20+30+25+15+10    = 100 min
Path 3: A → C → E → G → I → J   = 20+30+25+20+15+10 = 120 min
Path 4: B → D → E → F → H → J   = 15+10+25+40+35+10 = 135 min
Path 5: B → D → E → I → J        = 15+10+25+15+10    =  75 min
Path 6: B → D → E → G → I → J    = 15+10+25+20+15+10 =  95 min

Wait — let me recalculate. From the DAG: - E depends on C and D - F depends on E - G depends on E - H depends on F - I depends on F and G - J depends on H and I

Path 1: A → C → E → F → H → J   = 20+30+25+40+35+10 = 160 min  ← CRITICAL PATH
Path 2: A → C → E → F → I → J   = 20+30+25+40+15+10 = 140 min
Path 3: A → C → E → G → I → J   = 20+30+25+20+15+10 = 120 min
Path 4: B → D → E → F → H → J   = 15+10+25+40+35+10 = 135 min
Path 5: B → D → E → F → I → J   = 15+10+25+40+15+10 = 115 min
Path 6: B → D → E → G → I → J   = 15+10+25+20+15+10 =  95 min

(b) Critical path: A → C → E → F → H → J = 160 minutes

(c) Slack calculations:

Slack = Critical Path Duration - Longest Path Through This Job

  • Job A: On critical path → slack = 0 min
  • Job B: Longest path through B is Path 4 (135 min). Slack = 160 - 135 = 25 min
  • Job C: On critical path → slack = 0 min
  • Job D: Longest path through D is Path 4 (135 min). Slack = 160 - 135 = 25 min
  • Job E: On critical path → slack = 0 min
  • Job F: On critical path → slack = 0 min
  • Job G: Longest path through G is Path 3 (120 min). Slack = 160 - 120 = 40 min
  • Job H: On critical path → slack = 0 min
  • Job I: Longest path through I is Path 2 (140 min). Slack = 160 - 140 = 20 min
  • Job J: On critical path → slack = 0 min

(d) Margin with 210-minute window: 210 - 160 = 50 minutes of margin. At 2% monthly growth with 0.9 elasticity, the critical path grows by approximately 1.8% per month. The 160-minute critical path will reach 210 minutes in approximately:

160 * 1.018^n = 210
n = ln(210/160) / ln(1.018) = ln(1.3125) / ln(1.018) ≈ 0.2719 / 0.01784 ≈ 15.2 months

About 15 months before the window is exhausted. This should be on the capacity planning radar now.


Chapter 28: Mainframe Security for COBOL Developers

Exercise A1: Hardware-Enforced Address Space Isolation

Problem: Explain why z/OS address space isolation is hardware-enforced rather than software-enforced, and how this differs from Linux/Windows.

Solution:

z/OS address space isolation is enforced by the z/Architecture hardware through the Dynamic Address Translation (DAT) mechanism. Each address space has its own set of page tables, managed by z/OS but enforced by the CPU. When a program in address space A references virtual address 0x00500000, the hardware translates this to a physical address using A's page tables. Address space B's reference to the same virtual address translates to a completely different physical location using B's page tables. There is no software bypass — the CPU itself prevents cross-address-space access at the hardware instruction level.

On Linux and Windows, process isolation is also managed through page tables, but the difference lies in the architecture's relationship between the operating system and the hardware security model. z/OS has several additional hardware-enforced mechanisms:

  1. Storage keys. z/OS assigns storage protection keys (0-15) to memory pages. The CPU enforces key-based access — a program running with key 8 cannot write to a page with key 5. This provides sub-page-table granularity of protection. Linux and Windows lack storage key equivalents.

  2. Authorized state. The z/Architecture PSW (Program Status Word) includes a state bit distinguishing supervisor state from problem state. Only programs in supervisor state (running from APF-authorized libraries) can issue privileged instructions. This is similar to ring 0/3 on x86, but z/OS's authority model is more granular.

  3. Cross-memory security. When authorized cross-memory communication occurs (via PC routines), the hardware enforces security through the linkage stack and PC number table. An unauthorized program cannot invoke a cross-memory routine.

Implications for exploitation: Because isolation is hardware-enforced, a buffer overflow in one address space cannot directly corrupt memory in another address space, even if the attacker has arbitrary write capability within their own address space. On commodity platforms, certain kernel vulnerabilities (side-channel attacks, DMA attacks) can potentially breach process isolation because the isolation mechanism is partially software-managed. On z/Architecture, the isolation is absolute at the hardware level — there is no "Spectre" equivalent for cross-address-space reads because the DAT translation is complete and deterministic.


Chapter 32: Modernization Strategy

Exercise A2: Why Automated COBOL-to-Java Conversion Produces "COBOL in a Java Costume"

Problem: Explain why automated conversion tools produce code that retains COBOL's structural characteristics within Java syntax.

Solution:

Automated conversion tools perform syntax transformation, not architectural transformation. They translate COBOL constructs into Java equivalents — but the result preserves COBOL's programming model within Java's syntax. Specific problems:

  1. PERFORM THRU. COBOL's PERFORM THRU executes a range of paragraphs sequentially and returns control to the caller. Java has no equivalent construct. Automated tools typically convert each paragraph to a Java method and create a dispatcher method that calls them in sequence. The result is Java code with dozens of small methods called in rigid sequence — no polymorphism, no composition, no object-oriented design. It's procedural Java.

  2. WORKING-STORAGE. COBOL's WORKING-STORAGE is a flat, global data area accessible by every paragraph in the program. Conversion tools typically translate this into a single Java class with hundreds of public fields — a "God object" anti-pattern that violates every principle of object-oriented encapsulation. There's no data hiding, no immutability, no separation of concerns.

  3. REDEFINES. COBOL's REDEFINES allows multiple data descriptions to overlay the same physical storage — a 10-byte field can be simultaneously a character string, a packed decimal number, and a date depending on which REDEFINES view you use. Java has no direct equivalent. Conversion tools typically use byte arrays with manual offset calculations — exactly the kind of unsafe, error-prone code that Java was designed to eliminate.

  4. Copybooks. COBOL's COPY statement literally pastes source code. Conversion tools convert each copybook to a Java class or interface, but the result doesn't respect Java's package structure or dependency management. You end up with circular dependencies and classes that exist solely because a copybook existed.

  5. Embedded SQL. COBOL's EXEC SQL is handled by a precompiler. Java uses JDBC, which is a fundamentally different programming model (connection management, prepared statements, result sets). Conversion tools generate JDBC code that mirrors the COBOL cursor-at-a-time pattern rather than using Java-idiomatic data access patterns (ORMs, batch operations, connection pools).

The fundamental issue is that COBOL programs encode business logic in procedural structure, data layout, and paragraph sequencing. These structural decisions are themselves the program's architecture. Converting the syntax to Java preserves the architecture — and COBOL's architecture is not a good Java architecture. True modernization requires understanding the business rules (what the program does) and re-implementing them using Java-appropriate patterns (what a Java architect would design). That's a human task, not an automated one.

Sandra Chen at FBA has a slide she shows to every vendor who proposes automated conversion: a before-and-after code sample where the converted Java is longer, less readable, and harder to maintain than the original COBOL. Her point is not that the conversion tool failed — it performed exactly as designed. The problem is that the design goal (syntactic transformation) is the wrong goal.


Chapter 38: Capstone — Architecting a High-Availability Payment Processing System

Exercise 38.1: Requirements Analysis (Selected Portion)

Problem: Expand the one-paragraph brief from the Head of Payments into a structured requirements document. (Solution covers the non-functional requirements portion.)

Solution (Non-Functional Requirements):

ID Requirement Priority Rationale
NFR-1 System availability: 99.999% (five nines) for real-time payment authorization Must FedNow and RTGS require near-continuous availability. 5.26 min/year max planned+unplanned downtime.
NFR-2 Transaction response time: 95th percentile < 200ms for authorization, < 500ms for initiation Must FedNow SLA requires sub-second processing. User experience for mobile/online channels.
NFR-3 Throughput: 5,000 transactions per second at peak (Day 1), scalable to 15,000 TPS (Year 3) Must Based on projected payment volumes with 3x headroom for peak events (payroll days, tax season).
NFR-4 RPO = 0 (zero data loss) for all payment data Must Financial transactions cannot be lost. Any data loss creates reconciliation nightmares and regulatory exposure.
NFR-5 RTO < 2 hours for full DR failover Must Regulatory requirement for systemically important payment systems.
NFR-6 PCI-DSS Level 1 compliance for all card-based payment processing Must Required by card networks. Non-compliance results in fines and potential loss of processing rights.
NFR-7 SOX audit trail for all payment authorization, modification, and cancellation events Must Public company requirement. Every payment decision must be traceable to a userid, timestamp, and terminal/IP.
NFR-8 Batch window: all EOD settlement, reconciliation, and reporting must complete within 6 hours Should Current window is 8 hours; business wants to extend online hours. Target: 10PM–4AM.

Tension analysis (one example):

NFR-2 (response time < 200ms) vs. NFR-7 (complete audit trail):

Writing a comprehensive audit record for every transaction adds I/O latency. A synchronous DB2 INSERT for the audit record adds 1-3ms per transaction. At 5,000 TPS, this is manageable. Resolution: the audit trail write is included within the transaction's unit of work (committed with SYNCPOINT alongside the payment DB2 updates) so it adds minimal incremental cost. The audit table is in a separate tablespace with its own buffer pool to prevent interference with payment processing I/O. The accepted trade-off: we audit synchronously (not asynchronously) which adds a small amount of latency but guarantees that every committed payment has a committed audit record — no gaps.


A note on using these solutions. These answers demonstrate how an experienced practitioner thinks through production problems. The code, the calculations, and the architectural reasoning are all important — but the most valuable thing in each answer is the thinking process. When your answer differs from the one here, ask yourself: does my reasoning address the same concerns? Does my solution handle the failure cases? Would it survive the first week of production? If yes, you're on the right track.