Case Study 1: Continental National Bank's Dataset Management Strategy
Naming Conventions, SMS Classes, and Storage Architecture for 500 Million Transactions Per Day
Background
Continental National Bank (CNB) processes over 500 million transactions per day across four LPARs in a Parallel Sysplex configuration. Their mainframe environment supports retail banking, commercial banking, treasury services, and regulatory reporting. The dataset estate numbers over 180,000 cataloged datasets, growing by roughly 6,000 GDG generations per nightly batch cycle.
Three years ago, CNB's dataset management was a mess. Kwame Mensah, then newly promoted to chief mainframe architect, inherited a system where:
- Dataset naming followed no consistent convention — some used account codes, others used project names, some used programmer initials
- SMS classes had been defined once during the DFSMS migration in 2004 and never updated
- Eleven different BLKSIZE values were in use for FB-80 datasets
- GDG limits ranged from 5 to 255 with no documented rationale
- Catalog contention during the batch window added 15 minutes to every job's allocation time
"The first thing I did was inventory the damage," Kwame says. "I pulled SMF records for three months. We had 40,000 VSAM datasets with SHAREOPTIONS(3 3). We had GDGs with EMPTY that no one realized were ticking time bombs. We had datasets on volume pools that hadn't been defragged since the Bush administration. The first Bush."
The Naming Convention Redesign
Kwame's first initiative was a comprehensive naming convention. His design principles:
- Self-documenting: Any operator, any DBA, any auditor should understand a dataset's purpose from its name alone
- Security-friendly: RACF rules should map cleanly to naming patterns
- SMS-friendly: ACS routines should be able to assign classes based on name qualifiers
- Future-proof: The convention should accommodate new applications and environments without restructuring
The convention:
<org>.<env>.<app>.<type>.<subtype>.<descriptive>
Examples:
CNB.PROD.ACCTS.VSAM.KSDS.CUSTMAST Customer master KSDS
CNB.PROD.ACCTS.VSAM.ESDS.TXNJOURN Transaction journal ESDS
CNB.PROD.BATCH.GDG.DAILY.TXNEXT Daily transaction extract GDG
CNB.PROD.BATCH.GDG.WEEKLY.RECONSUM Weekly reconciliation summary
CNB.PROD.CICS.VSAM.KSDS.TRANTBL CICS transaction table
CNB.PROD.DB2.TS.ACCTDB.ACCTMAST DB2 tablespace
CNB.PROD.REG.SEQ.MONTHLY.SARFILE SAR regulatory report
CNB.TEST.ACCTS.VSAM.KSDS.CUSTMAST Test customer master
CNB.DEV.ACCTS.VSAM.KSDS.CUSTMAST Dev customer master
Qualifiers explained:
| Position | Qualifier | Values | Purpose |
|---|---|---|---|
| 1 | Organization | CNB | RACF top-level, catalog alias root |
| 2 | Environment | PROD, TEST, DEV, UAT | Catalog separation, security boundary |
| 3 | Application | ACCTS, LOANS, TREAS, BATCH, CICS, DB2, REG | Application ownership, ACS routing |
| 4 | Type | VSAM, SEQ, GDG, PDS, PDSE, TS, IX | Data organization, data class assignment |
| 5 | Subtype | KSDS, ESDS, RRDS, DAILY, WEEKLY, MONTHLY | Further classification |
| 6 | Descriptive | Free-form | Human-readable identifier |
Catalog Architecture
Kwame redesigned the catalog hierarchy to match the naming convention:
MASTER CATALOG: MCAT.PROD.VCNB01
│
├── Alias CNB.PROD → UCAT.CNB.PROD (production online)
├── Alias CNB.TEST → UCAT.CNB.TEST (test)
├── Alias CNB.DEV → UCAT.CNB.DEV (development)
├── Alias CNB.UAT → UCAT.CNB.UAT (user acceptance)
└── Alias CNB.BATCH → UCAT.CNB.BATCH (production batch - separate!)
Wait — why does CNB.PROD.BATCH.GDG.DAILY.TXNEXT go to UCAT.CNB.PROD instead of UCAT.CNB.BATCH? Because multi-level alias matching uses the longest matching alias. CNB.PROD is a longer match than CNB for any dataset starting with CNB.PROD.*. But batch datasets start with CNB.BATCH.*, matching CNB.BATCH.
This is a critical distinction. Kwame deliberately put batch-created datasets under the CNB.BATCH high-level qualifier (not CNB.PROD.BATCH) so they would route to the batch-specific catalog. "The nightly cycle creates and deletes 6,000 datasets. That's 12,000 catalog updates in four hours. I don't want that serialization anywhere near the catalog that CICS uses to open files."
Lisa Tran pushed back initially: "I wanted all production datasets in one catalog for simplicity. But Kwame showed me the SMF data — catalog serialization was adding 200 milliseconds to every dataset open in the CICS region during batch. For online transactions, 200 milliseconds is eternity."
SMS Class Configuration
Storage Classes:
| Class | Availability | Accessibility | I/O Priority | Guaranteed Space | Target |
|---|---|---|---|---|---|
| SCPRODHI | Continuous | Continuous Update | High | Yes | CICS regions, DB2 |
| SCPRODONL | Continuous Preferred | Continuous Read | Standard | Yes | Online VSAM |
| SCBATCHCR | Standard | Standard | Standard | Yes | Critical batch |
| SCBATCHSTD | Standard | Standard | Low | No | Standard batch work |
| SCTEST | Standard | Standard | Low | No | Test/Dev |
| SCARCHIVE | Standard | Standard | Low | No | Archive/regulatory |
Key design decisions:
- SCPRODHI gets Continuous availability: If the primary volume is unavailable, z/OS automatically fails over to a duplex copy. This is essential for CICS — a volume failure shouldn't take down online banking.
- SCBATCHCR vs. SCBATCHSTD: Not all batch is equal. The transaction extract and GL reconciliation are critical (if they fail, the bank doesn't balance). Report generation is standard (if it fails, someone reads yesterday's report). Different storage classes mean different WLM treatment.
- Guaranteed Space on critical classes: Ensures space is reserved for critical datasets even if the volume is heavily utilized. Batch work datasets don't get this guarantee — they can wait for space constraint relief.
Management Classes:
| Class | Backup Freq | Versions | Migrate ML1 | Migrate ML2 | Expire | Target |
|---|---|---|---|---|---|---|
| MCPRODCRIT | Daily | 5 | Never | Never | Never | Online master files |
| MCPRODSTD | Daily | 3 | 60 days | 120 days | Never | Standard production |
| MCBATCHGDG | Per cycle | 2 | 7 days | 30 days | Per GDG LIMIT | GDG generations |
| MCBATCHWORK | Weekly | 1 | 1 day | 3 days | 3 days | Temporary batch work |
| MCREGULATORY | Daily | 5 | 365 days | Never* | 2555 days (7yr) | Regulatory/compliance |
| MCTEST | Never | 0 | 1 day | 7 days | 30 days | Test datasets |
*Regulatory datasets: "Never migrate to ML2" — Ahmad-equivalent at CNB requires all regulatory data on DASD for rapid auditor access.
Data Classes:
| Class | RECFM | LRECL | BLKSIZE | CI Size | Free Space | SPACE | Target |
|---|---|---|---|---|---|---|---|
| DCFB80 | FB | 80 | 27920 | — | — | CYL(10,10) | JCL, copybooks |
| DCFB200 | FB | 200 | 27800 | — | — | CYL(50,10) | Extracts, reports |
| DCVB | VB | 32756 | 32760 | — | — | CYL(20,5) | Variable-length |
| DCKSDS | — | — | — | 4096 | 10/15 | CYL(100,20) | Standard KSDS |
| DCKSDSHI | — | — | — | 4096 | 10/20 | CYL(200,50) | High-volume KSDS |
| DCKSDSLG | — | — | — | 8192 | 5/10 | CYL(500,100) | Large-record KSDS |
| DCESDS | — | — | — | 4096 | 0/0 | CYL(100,20) | ESDS (journals) |
| DCPRINT | FBA | 133 | 27930 | — | — | CYL(5,5) | Print output |
ACS Routine Design
Kwame's ACS routines implement a tiered decision tree. Here is the storage class ACS routine (simplified from the production version, which handles 47 edge cases):
PROC STORCLAS
/* ---- PRODUCTION ONLINE ---- */
FILTLIST ONLINE_HI INCLUDE(
'CNB.PROD.CICS.**',
'CNB.PROD.DB2.**')
FILTLIST ONLINE_STD INCLUDE(
'CNB.PROD.ACCTS.VSAM.**',
'CNB.PROD.LOANS.VSAM.**',
'CNB.PROD.TREAS.VSAM.**')
/* ---- BATCH ---- */
FILTLIST BATCH_CRIT INCLUDE(
'CNB.BATCH.GDG.DAILY.TXNEXT',
'CNB.BATCH.GDG.DAILY.GLRECON',
'CNB.BATCH.SEQ.RECONBRIDGE')
FILTLIST BATCH_STD INCLUDE(
'CNB.BATCH.**')
/* ---- ARCHIVE/REGULATORY ---- */
FILTLIST ARCHIVE INCLUDE(
'CNB.PROD.REG.**',
'CNB.ARCHIVE.**')
/* ---- NON-PRODUCTION ---- */
FILTLIST NONPROD INCLUDE(
'CNB.TEST.**',
'CNB.DEV.**',
'CNB.UAT.**')
SELECT
WHEN (&DSN = &ONLINE_HI) SET &STORCLAS = 'SCPRODHI'
WHEN (&DSN = &ONLINE_STD) SET &STORCLAS = 'SCPRODONL'
WHEN (&DSN = &BATCH_CRIT) SET &STORCLAS = 'SCBATCHCR'
WHEN (&DSN = &BATCH_STD) SET &STORCLAS = 'SCBATCHSTD'
WHEN (&DSN = &ARCHIVE) SET &STORCLAS = 'SCARCHIVE'
WHEN (&DSN = &NONPROD) SET &STORCLAS = 'SCTEST'
OTHERWISE SET &STORCLAS = 'SCBATCHSTD'
END
END
Results After Three Years
Kwame tracks six metrics to measure the health of the dataset estate:
| Metric | Before (2023) | After (2026) | Change |
|---|---|---|---|
| Avg batch job allocation time | 4.2 sec | 0.8 sec | -81% |
| Catalog-related abends/month | 12 | 0.3 | -97.5% |
| VSAM datasets with SHROPT(3,3) | 40,000 | 0 | -100% |
| Avg VSAM extent count | 23 | 4.2 | -82% |
| Batch window duration | 5.8 hrs | 3.9 hrs | -33% |
| Storage utilization efficiency | 61% | 89% | +46% |
"The 33% batch window improvement came from three things roughly equally," Kwame explains. "One-third from fixing block sizes, one-third from eliminating catalog contention, and one-third from proper VSAM tuning — buffer settings, free space, and regular reorgs. None of it was rocket science. It was just disciplined engineering that had been neglected for twenty years."
Lessons Learned
-
Naming conventions must be enforced, not suggested. Kwame implemented RACF rules that reject dataset creation outside the naming standard. "If it doesn't match the convention, the allocation fails. Period. We got pushback for two months. Then it became normal."
-
Separate catalogs for separate workloads. The batch/online catalog split was the single biggest performance win. "We didn't change a single line of COBOL or a single VSAM definition. We just moved the catalog entries and changed the aliases."
-
SMS classes need annual review. "The classes we defined in 2023 were wrong by 2025 because our workload changed. Now we review SMS classes quarterly with the application teams."
-
Data classes prevent drift. By encoding BLKSIZE, CI size, and free-space percentages in data classes — and having ACS routines assign the right data class — individual JCL coders can't override standards. "The data class is the standard. If you want an exception, you write an Architecture Decision Record and it goes through review."
-
Monitor everything. Rob Calloway's daily capacity report and weekly VSAM health report are what keep the environment stable. "You can design the most beautiful storage architecture in the world, but without monitoring, it degrades in three months."
Discussion Questions
-
Kwame chose to separate the batch catalog from the production online catalog by using different high-level qualifiers (CNB.PROD. vs. CNB.BATCH.). An alternative would be to keep all production datasets under CNB.PROD.* but use different user catalogs with multi-level aliases (CNB.PROD.BATCH → UCAT.BATCH, CNB.PROD.ONLINE → UCAT.ONLINE). What are the trade-offs of each approach?
-
CNB's data class DCKSDS specifies CI=4096 and FREESPACE(10 15). But their high-volume KSDS data class DCKSDSHI uses the same CI size with FREESPACE(10 20). Why the different CA free space? What access pattern differences justify this?
-
The management class MCREGULATORY specifies "Never migrate to ML2." This means 7 years of regulatory data stays on primary DASD. At CNB's scale, this is expensive. What alternative approaches could reduce cost while still meeting the regulatory requirement for rapid auditor access?
-
Kwame enforces naming conventions through RACF rejection. This is an operational overhead — every new dataset type requires a RACF rule update. What's the alternative, and why did Kwame choose enforcement over education?
-
The 33% batch window improvement came from three roughly equal sources. If Kwame had time to implement only one of the three (block sizes, catalog separation, or VSAM tuning), which would you prioritize and why?