Case Study 2: Federal Benefits Administration's Modernization Capacity Impact Assessment
Background
Federal Benefits Administration (FBA) runs a 40-year codebase — 15 million lines of COBOL and IMS on a z16 Model T02. The system processes benefit payments for 4.2 million recipients: veterans, retirees, disability recipients, and survivors. Every payment is someone's rent, medication, or groceries. The system doesn't have the luxury of "acceptable downtime."
Sandra Chen, FBA's modernization lead, is three years into a modernization program that will incrementally migrate public-facing inquiry functions to cloud-hosted microservices while keeping payment processing, eligibility determination, and the IMS master records on the mainframe. The modernization is driven by two forces: a congressional mandate to improve the beneficiary web portal experience, and the approaching retirement of Marcus Whitfield, whose undocumented knowledge of the IMS database structure and COBOL business logic is irreplaceable.
The capacity planning challenge is unique to FBA's situation: the modernization program simultaneously adds capacity (new API layers, new data synchronization, new middleware) and removes capacity (migrated inquiry workloads). The net effect depends on timing, and timing is controlled by the congressional appropriations cycle, not by Sandra's project plan.
The Modernization Architecture
FBA's modernization follows a strangler fig pattern (Chapter 33 provides the full architectural pattern). The key components:
Current State:
Beneficiary → IMS transaction (BNINQ) → IMS DB → response
Internal staff → CICS screen (BNSCREEN) → IMS DB → display
Target State (Phase 3):
Beneficiary → Cloud portal → REST API → z/OS Connect → MQ → response
Internal staff → Cloud SPA → REST API → z/OS Connect → MQ → response
Payments → (unchanged) IMS batch → IMS DB → ACH/wire
Interim State (current — Phase 1 complete, Phase 2 in progress):
Beneficiary → Cloud portal (50%) + IMS BNINQ (50%) → responses
Internal staff → CICS BNSCREEN (100%) → IMS DB → display
Payments → (unchanged) IMS batch
The modernization has three phases, each with a distinct capacity impact:
Phase 1 (Complete): Migrated 50% of beneficiary inquiry traffic to the cloud portal. Added z/OS Connect and MQ as the communication bridge. Retired 50% of IMS BNINQ transaction volume.
Phase 2 (In Progress): Migrate remaining 50% of beneficiary inquiry to cloud. Begin migrating internal staff screens from CICS to cloud SPA. Add real-time data synchronization between IMS and the cloud data store.
Phase 3 (Planned): Complete internal staff migration. Deploy new analytics capabilities on cloud. Maintain IMS as system of record for payments and eligibility.
The Capacity Impact Assessment
Sandra commissioned a formal Capacity Impact Assessment for Phases 2 and 3, recognizing that the capacity implications are complex enough to require dedicated analysis. She assigned Marcus Whitfield (who knows the IMS workload inside and out) and a contractor capacity planner to produce the assessment.
Current Baseline
Marcus started by establishing the current capacity baseline, measured post-Phase 1:
FBA Current Capacity Baseline (Post-Phase 1)
Workload Category Daily CPU (sec) Peak MSU zIIP % Service Class
──────────────────────── ─────────────── ──────── ────── ─────────────
IMS Online (BNINQ) 28,400 185 0% SC_IMS_ONLINE
IMS Online (other txns) 42,000 265 0% SC_IMS_ONLINE
CICS Online (BNSCREEN) 18,500 125 0% SC_CICS_ONLINE
CICS Online (other txns) 12,200 85 0% SC_CICS_ONLINE
Batch (payments) 95,000 780 0% SC_BATCH_PAY
Batch (eligibility) 45,000 340 0% SC_BATCH_ELIG
Batch (reporting) 32,000 220 0% SC_BATCH_RPT
DB2 (queries) 15,000 105 35% SC_DB2_QUERY
z/OS Connect (Phase 1) 8,200 60 82% SC_API
MQ (Phase 1) 3,500 28 15% SC_MQ
Infrastructure 12,000 80 0% SC_INFRA
──────────────────────── ─────────────── ──────── ──────
Total 311,800 2,273
(GP component) 2,180
(zIIP component) 93
Machine rated capacity: 3,800 MSU (GP), 1,200 MSU (zIIP)
Current R4HA: 2,680 MSU (GP)
Headroom: 1,120 MSU (29%)
Phase 2 Impact Analysis
Phase 2 involves three changes with capacity implications:
Change 1: Complete beneficiary inquiry migration.
The remaining 50% of IMS BNINQ traffic moves to the cloud portal via z/OS Connect.
IMS BNINQ retirement (remaining 50%):
Current daily CPU: 28,400 sec
Retirement: 50% = -14,200 sec
Peak MSU reduction: -92 MSU (GP)
z/OS Connect increase:
Additional API calls: 1.8M/day
CPU per call: 0.0038 sec (measured from Phase 1)
Additional daily CPU: 6,840 sec
Peak MSU increase: +50 MSU (12 GP + 38 zIIP)
MQ increase (bridge messages):
Additional messages: 3.6M/day (request + reply per API call)
CPU per message: 0.0004 sec
Additional daily CPU: 1,440 sec
Peak MSU increase: +10 MSU (8 GP + 2 zIIP)
Net Change 1 impact:
GP: -92 + 12 + 8 = -72 MSU
zIIP: 0 + 38 + 2 = +40 MSU
Change 2: Internal staff screen migration (Phase 2 partial — 40% of screens).
CICS BNSCREEN transactions migrate to a cloud-hosted Single Page Application that calls the same z/OS Connect APIs.
CICS BNSCREEN partial retirement (40%):
Current daily CPU: 18,500 sec
Retirement: 40% = -7,400 sec
Peak MSU reduction: -50 MSU (GP)
z/OS Connect increase (staff API calls):
Additional API calls: 800K/day
CPU per call: 0.0042 sec (staff APIs are more complex than beneficiary inquiry)
Additional daily CPU: 3,360 sec
Peak MSU increase: +25 MSU (4 GP + 21 zIIP)
Net Change 2 impact:
GP: -50 + 4 = -46 MSU
zIIP: +21 MSU
Change 3: Real-time data synchronization.
A new data synchronization process keeps the cloud data store in sync with IMS. This uses IMS log-based change data capture, MQ messaging, and a COBOL transformation program.
IMS log capture processing:
Daily CPU: 4,200 sec (new — GP only, IMS-based)
Peak MSU: +30 MSU (GP)
MQ synchronization messages:
Messages/day: 5M (one per IMS update)
CPU per message: 0.0005 sec (slightly higher due to transformation)
Daily CPU: 2,500 sec
Peak MSU: +18 MSU (15 GP + 3 zIIP)
COBOL transformation program:
Daily CPU: 6,800 sec (new COBOL program running on GP)
Peak MSU: +48 MSU (GP)
Net Change 3 impact:
GP: +30 + 15 + 48 = +93 MSU
zIIP: +3 MSU
Phase 2 Composite Impact:
GP Impact zIIP Impact
Change 1 (inquiry): -72 MSU +40 MSU
Change 2 (screens): -46 MSU +21 MSU
Change 3 (sync): +93 MSU +3 MSU
────────────────── ───────── ──────────
Phase 2 Net: -25 MSU +64 MSU
Marcus was surprised. He expected Phase 2 to be a clear capacity reduction — after all, significant workload is being migrated off the mainframe. But the data synchronization requirement (Change 3) nearly offsets the savings from Changes 1 and 2. "The data has to live in two places now," Marcus explained. "That synchronization processing is the tax you pay for running a hybrid architecture."
Sandra was not surprised. "This is the strangler fig paradox," she told Marcus. "During the transition, you run both the old system and the new system plus the integration between them. Total system capacity increases before it decreases. I've seen this at every modernization I've been involved with."
Phase 3 Impact Analysis
Phase 3 completes the migration and introduces cloud-based analytics:
Change 4: Complete internal staff screen migration (remaining 60%).
CICS BNSCREEN complete retirement:
Remaining daily CPU: 11,100 sec
Peak MSU reduction: -75 MSU (GP)
z/OS Connect increase:
Additional API calls: 1.2M/day
Peak MSU increase: +38 MSU (6 GP + 32 zIIP)
Net Change 4: -69 MSU GP, +32 MSU zIIP
Change 5: IMS online retirement (non-inquiry transactions remain but BNINQ is fully retired).
IMS BNINQ complete retirement:
Remaining daily CPU: 14,200 sec
Peak MSU reduction: -93 MSU (GP)
Note: IMS subsystem remains for payment processing and other online transactions
Change 6: Cloud analytics data feed.
A new nightly batch extract feeds IMS and DB2 data to the cloud analytics platform. This is an additional batch workload — it doesn't replace anything.
Analytics extract batch:
Daily CPU: 18,000 sec (COBOL extract + DFSORT + compression)
Peak MSU: +95 MSU (GP) — runs during batch window
Frequency: Nightly
Change 7: Reduced IMS data synchronization.
Once Phase 3 is complete and the cloud data store is the primary source for inquiries, the real-time synchronization from Change 3 can be simplified to a batch process (nightly rather than real-time). This reduces the continuous MQ and transformation overhead.
Synchronization simplification:
MQ processing reduction: -2,500 sec/day → -800 sec/day (batch MQ)
Transformation reduction: -6,800 sec/day → -2,200 sec/day (batch)
Net daily CPU change: -6,300 sec
Peak MSU change: -45 MSU (GP), -3 MSU (zIIP)
Note: batch MSU increase is smaller because the work is spread across
the batch window rather than concentrated during online peak
Phase 3 Composite Impact:
GP Impact zIIP Impact
Change 4 (staff screens): -69 MSU +32 MSU
Change 5 (IMS BNINQ): -93 MSU 0
Change 6 (analytics): +95 MSU 0
Change 7 (sync simplify): -45 MSU -3 MSU
────────────────────── ───────── ──────────
Phase 3 Net: -112 MSU +29 MSU
Three-Phase Cumulative View
FBA Modernization — Cumulative Capacity Impact
Phase GP MSU Change zIIP MSU Change Cumulative GP Cumulative zIIP
────── ───────────── ─────────────── ───────────── ───────────────
Base 0 0 2,180 93
Phase 1 -55 (actual) +45 (actual) 2,125 138
Phase 2 -25 +64 2,100 202
Phase 3 -112 +29 1,988 231
Total modernization impact: -192 MSU GP, +138 MSU zIIP
Financial Analysis
Sandra asked Marcus to translate the capacity impact into dollars:
FBA Modernization — Financial Impact Analysis
GP MLC Savings Cloud Cost Net Annual
(at $14/MSU/mo) (annual) Impact
────────────────────── ──────────────── ─────────── ──────────
Phase 1 (actual): $9,240/year $120,000 -$110,760
Phase 2: $4,200/year $95,000 -$90,800
Phase 3: $18,816/year $180,000 -$161,184
Total (at completion): $32,256/year $395,000 -$362,744
The numbers stopped Sandra cold. The modernization, from a pure mainframe capacity perspective, costs money. The MLC savings from reduced GP consumption ($32,256/year) are dwarfed by the cloud hosting costs ($395,000/year). The net annual cost increase is $362,744.
"This is the wrong analysis," Sandra told the team. "We're not doing this to save money on the mainframe. We're doing this because Congress mandated an improved beneficiary experience, because Marcus is retiring and we need the inquiry logic in a maintainable codebase, and because the cloud portal can iterate features monthly instead of the quarterly release cycle we have on IMS. The capacity analysis tells us the cost — but the business case is about capability, not cost."
Marcus, who had spent weeks on the capacity analysis, took this in stride. "The analysis isn't wasted," he said. "Now we know exactly what the transition costs. And we know where the cost levers are — that data synchronization is the biggest single cost driver. If we can simplify the sync architecture, we reduce the cost significantly."
Risk Analysis
The assessment identified four capacity risks:
Risk 1: Phase 2 data synchronization exceeds estimate. The 5M messages/day estimate for IMS change data capture is based on current IMS update volumes. If a new benefits program is legislated (which happens unpredictably), update volumes could increase 30-50%. Mitigation: Design the sync architecture with backpressure — if MQ depth exceeds threshold, switch to batch synchronization for low-priority data.
Risk 2: Cloud service latency causes mainframe retry storms. If the cloud portal experiences latency, beneficiaries may retry requests, amplifying the z/OS Connect and MQ workload. Phase 1 experienced this during a cloud incident — API call volume spiked 3x for 20 minutes. Mitigation: Rate limiting at the z/OS Connect level (configured in Chapter 21's API design). Circuit breaker pattern to fail fast rather than queue.
Risk 3: Congressional mandate accelerates Phase 3 timeline. Congress has previously accelerated IT mandates with short notice. If Phase 3 is compressed from 18 months to 12 months, the capacity planning must accommodate the Phase 2 and Phase 3 impacts overlapping — running the synchronization and the analytics extract simultaneously, at higher volumes. Mitigation: Kwame's contingent planning approach — prepare a "compressed timeline" capacity scenario and identify the CoD requirements to bridge it.
Risk 4: Marcus Whitfield's retirement knowledge gap. Marcus is the only person who understands the IMS database structure well enough to design efficient data synchronization queries. If he retires before Phase 2's synchronization component is fully tested and optimized, the sync processing could be significantly less efficient than estimated. Mitigation: Knowledge transfer sessions (Chapter 40) are running in parallel with the capacity assessment. Marcus is documenting every IMS segment relationship, every DL/I call pattern, and every performance consideration.
The Presentation to Agency Leadership
Sandra presented the capacity impact assessment to FBA's deputy administrator and CIO. She organized the presentation around three questions:
Question 1: Will the mainframe support the modernization?
Yes. Current headroom is 29% (1,120 MSU). Phase 2 reduces GP consumption by 25 MSU. Phase 3 reduces it by another 112 MSU. At no point during the modernization does FBA exceed its current capacity. No hardware upgrade is needed for the modernization itself.
However, the organic growth of payment processing (batch) at 6% per year will eventually consume the headroom regardless of modernization. Sandra projects a hardware upgrade will be needed in FY2029 based on batch growth alone.
Question 2: What does the modernization cost from a mainframe perspective?
The net annual mainframe cost change is small — approximately $32,000 in MLC savings against $395,000 in cloud hosting. The modernization is not a cost reduction exercise on the mainframe; it's a capability investment. The business case is justified by the congressional mandate, the improved beneficiary experience, and the reduced risk from Marcus's retirement.
Sandra emphasized that the capacity assessment is one input to the business case, not the entire business case. "We don't modernize to save MSUs. We modernize to serve beneficiaries better."
Question 3: What are the risks?
Sandra presented the four risks with mitigations. The deputy administrator focused on Risk 3 (congressional acceleration): "So if Congress tells us to speed this up, can we?" Sandra's answer: "Yes, with 60 days of Capacity on Demand at approximately $35,000. The technology can absorb the compressed timeline; the constraint is testing and deployment capacity, not mainframe capacity."
Outcome and Lessons
The capacity impact assessment was approved. Phase 2 proceeded on schedule. Three months into Phase 2, the data synchronization workload was 15% higher than estimated (Risk 1 materialized partially — a new cost-of-living adjustment triggered higher IMS update volumes). Sandra activated the backpressure mechanism, which throttled non-critical sync processing during peak hours without affecting beneficiary experience.
Marcus documented the assessment methodology in a capacity planning playbook for FBA, so that his successor could repeat the process for future phases. "The numbers will change," he wrote in the playbook's introduction. "The method doesn't."
Key Lessons
-
Modernization temporarily increases total system capacity. The strangler fig pattern requires running both old and new systems plus the integration layer. Capacity planners must account for the "transition hump" — the period where capacity consumption peaks before declining.
-
Data synchronization is the hidden capacity cost of hybrid architectures. Keeping data consistent across platforms (mainframe IMS and cloud data store) requires continuous processing that has its own capacity footprint. This cost is often overlooked in modernization business cases.
-
Capacity savings are rarely the business case for modernization. The financial analysis showed a net cost increase, not a savings. The modernization is justified by capability, agility, risk reduction, and compliance — not by mainframe cost reduction. Capacity analysis provides cost transparency, not cost justification.
-
Contingent capacity planning enables organizational agility. Sandra could answer the deputy administrator's "can we speed this up?" question immediately because the capacity assessment included compressed timeline scenarios. Capacity planning is not just about predicting the future — it's about preparing for multiple futures.
-
Knowledge transfer and capacity planning are linked. Marcus's retirement risk affected both the modernization timeline and the capacity estimate accuracy. His deep knowledge of IMS internals was essential for designing efficient data synchronization — without it, the sync processing would have been significantly less efficient, consuming more capacity and costing more.
Discussion Questions
-
The financial analysis showed the modernization costs $362,744/year more than the current state. How would you present this to a budget-conscious CIO who is focused on mainframe cost reduction? What framing makes the investment justifiable?
-
Risk 2 (retry storms from cloud latency) represents a capacity threat that originates outside the mainframe. How should mainframe capacity planning account for risks that originate in cloud or distributed systems?
-
Marcus Whitfield's retirement creates a knowledge gap that directly affects capacity planning accuracy. Design a knowledge transfer plan that captures Marcus's capacity-relevant knowledge (IMS performance characteristics, efficient DL/I call patterns, data volume drivers) in a format that a successor can use.
-
The data synchronization (Change 3) is the largest single capacity addition. Evaluate alternative synchronization architectures that might reduce this cost: a) Batch-only synchronization (no real-time) b) Log-based CDC without transformation (simpler processing) c) Direct IMS-to-cloud API calls (bypassing MQ) What are the trade-offs of each alternative from a capacity, cost, and functionality perspective?
-
FBA's modernization is driven by congressional mandate rather than business choice. How does this change the capacity planning process compared to a commercial organization where projects can be deferred or cancelled based on ROI? What governance differences are required?