Case Study 1: Continental National Bank's Dataset Management Strategy

Naming Conventions, SMS Classes, and Storage Architecture for 500 Million Transactions Per Day

Background

Continental National Bank (CNB) processes over 500 million transactions per day across four LPARs in a Parallel Sysplex configuration. Their mainframe environment supports retail banking, commercial banking, treasury services, and regulatory reporting. The dataset estate numbers over 180,000 cataloged datasets, growing by roughly 6,000 GDG generations per nightly batch cycle.

Three years ago, CNB's dataset management was a mess. Kwame Mensah, then newly promoted to chief mainframe architect, inherited a system where:

  • Dataset naming followed no consistent convention — some used account codes, others used project names, some used programmer initials
  • SMS classes had been defined once during the DFSMS migration in 2004 and never updated
  • Eleven different BLKSIZE values were in use for FB-80 datasets
  • GDG limits ranged from 5 to 255 with no documented rationale
  • Catalog contention during the batch window added 15 minutes to every job's allocation time

"The first thing I did was inventory the damage," Kwame says. "I pulled SMF records for three months. We had 40,000 VSAM datasets with SHAREOPTIONS(3 3). We had GDGs with EMPTY that no one realized were ticking time bombs. We had datasets on volume pools that hadn't been defragged since the Bush administration. The first Bush."

The Naming Convention Redesign

Kwame's first initiative was a comprehensive naming convention. His design principles:

  1. Self-documenting: Any operator, any DBA, any auditor should understand a dataset's purpose from its name alone
  2. Security-friendly: RACF rules should map cleanly to naming patterns
  3. SMS-friendly: ACS routines should be able to assign classes based on name qualifiers
  4. Future-proof: The convention should accommodate new applications and environments without restructuring

The convention:

<org>.<env>.<app>.<type>.<subtype>.<descriptive>

Examples:
CNB.PROD.ACCTS.VSAM.KSDS.CUSTMAST     Customer master KSDS
CNB.PROD.ACCTS.VSAM.ESDS.TXNJOURN     Transaction journal ESDS
CNB.PROD.BATCH.GDG.DAILY.TXNEXT       Daily transaction extract GDG
CNB.PROD.BATCH.GDG.WEEKLY.RECONSUM    Weekly reconciliation summary
CNB.PROD.CICS.VSAM.KSDS.TRANTBL      CICS transaction table
CNB.PROD.DB2.TS.ACCTDB.ACCTMAST      DB2 tablespace
CNB.PROD.REG.SEQ.MONTHLY.SARFILE      SAR regulatory report
CNB.TEST.ACCTS.VSAM.KSDS.CUSTMAST    Test customer master
CNB.DEV.ACCTS.VSAM.KSDS.CUSTMAST     Dev customer master

Qualifiers explained:

Position Qualifier Values Purpose
1 Organization CNB RACF top-level, catalog alias root
2 Environment PROD, TEST, DEV, UAT Catalog separation, security boundary
3 Application ACCTS, LOANS, TREAS, BATCH, CICS, DB2, REG Application ownership, ACS routing
4 Type VSAM, SEQ, GDG, PDS, PDSE, TS, IX Data organization, data class assignment
5 Subtype KSDS, ESDS, RRDS, DAILY, WEEKLY, MONTHLY Further classification
6 Descriptive Free-form Human-readable identifier

Catalog Architecture

Kwame redesigned the catalog hierarchy to match the naming convention:

MASTER CATALOG: MCAT.PROD.VCNB01
  │
  ├── Alias CNB.PROD    → UCAT.CNB.PROD    (production online)
  ├── Alias CNB.TEST    → UCAT.CNB.TEST    (test)
  ├── Alias CNB.DEV     → UCAT.CNB.DEV     (development)
  ├── Alias CNB.UAT     → UCAT.CNB.UAT     (user acceptance)
  └── Alias CNB.BATCH   → UCAT.CNB.BATCH   (production batch - separate!)

Wait — why does CNB.PROD.BATCH.GDG.DAILY.TXNEXT go to UCAT.CNB.PROD instead of UCAT.CNB.BATCH? Because multi-level alias matching uses the longest matching alias. CNB.PROD is a longer match than CNB for any dataset starting with CNB.PROD.*. But batch datasets start with CNB.BATCH.*, matching CNB.BATCH.

This is a critical distinction. Kwame deliberately put batch-created datasets under the CNB.BATCH high-level qualifier (not CNB.PROD.BATCH) so they would route to the batch-specific catalog. "The nightly cycle creates and deletes 6,000 datasets. That's 12,000 catalog updates in four hours. I don't want that serialization anywhere near the catalog that CICS uses to open files."

Lisa Tran pushed back initially: "I wanted all production datasets in one catalog for simplicity. But Kwame showed me the SMF data — catalog serialization was adding 200 milliseconds to every dataset open in the CICS region during batch. For online transactions, 200 milliseconds is eternity."

SMS Class Configuration

Storage Classes:

Class Availability Accessibility I/O Priority Guaranteed Space Target
SCPRODHI Continuous Continuous Update High Yes CICS regions, DB2
SCPRODONL Continuous Preferred Continuous Read Standard Yes Online VSAM
SCBATCHCR Standard Standard Standard Yes Critical batch
SCBATCHSTD Standard Standard Low No Standard batch work
SCTEST Standard Standard Low No Test/Dev
SCARCHIVE Standard Standard Low No Archive/regulatory

Key design decisions:

  • SCPRODHI gets Continuous availability: If the primary volume is unavailable, z/OS automatically fails over to a duplex copy. This is essential for CICS — a volume failure shouldn't take down online banking.
  • SCBATCHCR vs. SCBATCHSTD: Not all batch is equal. The transaction extract and GL reconciliation are critical (if they fail, the bank doesn't balance). Report generation is standard (if it fails, someone reads yesterday's report). Different storage classes mean different WLM treatment.
  • Guaranteed Space on critical classes: Ensures space is reserved for critical datasets even if the volume is heavily utilized. Batch work datasets don't get this guarantee — they can wait for space constraint relief.

Management Classes:

Class Backup Freq Versions Migrate ML1 Migrate ML2 Expire Target
MCPRODCRIT Daily 5 Never Never Never Online master files
MCPRODSTD Daily 3 60 days 120 days Never Standard production
MCBATCHGDG Per cycle 2 7 days 30 days Per GDG LIMIT GDG generations
MCBATCHWORK Weekly 1 1 day 3 days 3 days Temporary batch work
MCREGULATORY Daily 5 365 days Never* 2555 days (7yr) Regulatory/compliance
MCTEST Never 0 1 day 7 days 30 days Test datasets

*Regulatory datasets: "Never migrate to ML2" — Ahmad-equivalent at CNB requires all regulatory data on DASD for rapid auditor access.

Data Classes:

Class RECFM LRECL BLKSIZE CI Size Free Space SPACE Target
DCFB80 FB 80 27920 CYL(10,10) JCL, copybooks
DCFB200 FB 200 27800 CYL(50,10) Extracts, reports
DCVB VB 32756 32760 CYL(20,5) Variable-length
DCKSDS 4096 10/15 CYL(100,20) Standard KSDS
DCKSDSHI 4096 10/20 CYL(200,50) High-volume KSDS
DCKSDSLG 8192 5/10 CYL(500,100) Large-record KSDS
DCESDS 4096 0/0 CYL(100,20) ESDS (journals)
DCPRINT FBA 133 27930 CYL(5,5) Print output

ACS Routine Design

Kwame's ACS routines implement a tiered decision tree. Here is the storage class ACS routine (simplified from the production version, which handles 47 edge cases):

PROC STORCLAS

  /* ---- PRODUCTION ONLINE ---- */
  FILTLIST ONLINE_HI INCLUDE(
    'CNB.PROD.CICS.**',
    'CNB.PROD.DB2.**')

  FILTLIST ONLINE_STD INCLUDE(
    'CNB.PROD.ACCTS.VSAM.**',
    'CNB.PROD.LOANS.VSAM.**',
    'CNB.PROD.TREAS.VSAM.**')

  /* ---- BATCH ---- */
  FILTLIST BATCH_CRIT INCLUDE(
    'CNB.BATCH.GDG.DAILY.TXNEXT',
    'CNB.BATCH.GDG.DAILY.GLRECON',
    'CNB.BATCH.SEQ.RECONBRIDGE')

  FILTLIST BATCH_STD INCLUDE(
    'CNB.BATCH.**')

  /* ---- ARCHIVE/REGULATORY ---- */
  FILTLIST ARCHIVE INCLUDE(
    'CNB.PROD.REG.**',
    'CNB.ARCHIVE.**')

  /* ---- NON-PRODUCTION ---- */
  FILTLIST NONPROD INCLUDE(
    'CNB.TEST.**',
    'CNB.DEV.**',
    'CNB.UAT.**')

  SELECT
    WHEN (&DSN = &ONLINE_HI)   SET &STORCLAS = 'SCPRODHI'
    WHEN (&DSN = &ONLINE_STD)  SET &STORCLAS = 'SCPRODONL'
    WHEN (&DSN = &BATCH_CRIT)  SET &STORCLAS = 'SCBATCHCR'
    WHEN (&DSN = &BATCH_STD)   SET &STORCLAS = 'SCBATCHSTD'
    WHEN (&DSN = &ARCHIVE)     SET &STORCLAS = 'SCARCHIVE'
    WHEN (&DSN = &NONPROD)     SET &STORCLAS = 'SCTEST'
    OTHERWISE                  SET &STORCLAS = 'SCBATCHSTD'
  END

END

Results After Three Years

Kwame tracks six metrics to measure the health of the dataset estate:

Metric Before (2023) After (2026) Change
Avg batch job allocation time 4.2 sec 0.8 sec -81%
Catalog-related abends/month 12 0.3 -97.5%
VSAM datasets with SHROPT(3,3) 40,000 0 -100%
Avg VSAM extent count 23 4.2 -82%
Batch window duration 5.8 hrs 3.9 hrs -33%
Storage utilization efficiency 61% 89% +46%

"The 33% batch window improvement came from three things roughly equally," Kwame explains. "One-third from fixing block sizes, one-third from eliminating catalog contention, and one-third from proper VSAM tuning — buffer settings, free space, and regular reorgs. None of it was rocket science. It was just disciplined engineering that had been neglected for twenty years."

Lessons Learned

  1. Naming conventions must be enforced, not suggested. Kwame implemented RACF rules that reject dataset creation outside the naming standard. "If it doesn't match the convention, the allocation fails. Period. We got pushback for two months. Then it became normal."

  2. Separate catalogs for separate workloads. The batch/online catalog split was the single biggest performance win. "We didn't change a single line of COBOL or a single VSAM definition. We just moved the catalog entries and changed the aliases."

  3. SMS classes need annual review. "The classes we defined in 2023 were wrong by 2025 because our workload changed. Now we review SMS classes quarterly with the application teams."

  4. Data classes prevent drift. By encoding BLKSIZE, CI size, and free-space percentages in data classes — and having ACS routines assign the right data class — individual JCL coders can't override standards. "The data class is the standard. If you want an exception, you write an Architecture Decision Record and it goes through review."

  5. Monitor everything. Rob Calloway's daily capacity report and weekly VSAM health report are what keep the environment stable. "You can design the most beautiful storage architecture in the world, but without monitoring, it degrades in three months."

Discussion Questions

  1. Kwame chose to separate the batch catalog from the production online catalog by using different high-level qualifiers (CNB.PROD. vs. CNB.BATCH.). An alternative would be to keep all production datasets under CNB.PROD.* but use different user catalogs with multi-level aliases (CNB.PROD.BATCH → UCAT.BATCH, CNB.PROD.ONLINE → UCAT.ONLINE). What are the trade-offs of each approach?

  2. CNB's data class DCKSDS specifies CI=4096 and FREESPACE(10 15). But their high-volume KSDS data class DCKSDSHI uses the same CI size with FREESPACE(10 20). Why the different CA free space? What access pattern differences justify this?

  3. The management class MCREGULATORY specifies "Never migrate to ML2." This means 7 years of regulatory data stays on primary DASD. At CNB's scale, this is expensive. What alternative approaches could reduce cost while still meeting the regulatory requirement for rapid auditor access?

  4. Kwame enforces naming conventions through RACF rejection. This is an operational overhead — every new dataset type requires a RACF rule update. What's the alternative, and why did Kwame choose enforcement over education?

  5. The 33% batch window improvement came from three roughly equal sources. If Kwame had time to implement only one of the three (block sizes, catalog separation, or VSAM tuning), which would you prioritize and why?