Case Study 1: Continental National Bank's Dataset Management Strategy

DataField.Dev

Case Study 1: Continental National Bank's Dataset Management Strategy

Naming Conventions, SMS Classes, and Storage Architecture for 500 Million Transactions Per Day

Background

Continental National Bank (CNB) processes over 500 million transactions per day across four LPARs in a Parallel Sysplex configuration. Their mainframe environment supports retail banking, commercial banking, treasury services, and regulatory reporting. The dataset estate numbers over 180,000 cataloged datasets, growing by roughly 6,000 GDG generations per nightly batch cycle.

Three years ago, CNB's dataset management was a mess. Kwame Mensah, then newly promoted to chief mainframe architect, inherited a system where:

Dataset naming followed no consistent convention — some used account codes, others used project names, some used programmer initials
SMS classes had been defined once during the DFSMS migration in 2004 and never updated
Eleven different BLKSIZE values were in use for FB-80 datasets
GDG limits ranged from 5 to 255 with no documented rationale
Catalog contention during the batch window added 15 minutes to every job's allocation time

"The first thing I did was inventory the damage," Kwame says. "I pulled SMF records for three months. We had 40,000 VSAM datasets with SHAREOPTIONS(3 3). We had GDGs with EMPTY that no one realized were ticking time bombs. We had datasets on volume pools that hadn't been defragged since the Bush administration. The first Bush."

The Naming Convention Redesign

Kwame's first initiative was a comprehensive naming convention. His design principles:

Self-documenting: Any operator, any DBA, any auditor should understand a dataset's purpose from its name alone
Security-friendly: RACF rules should map cleanly to naming patterns
SMS-friendly: ACS routines should be able to assign classes based on name qualifiers
Future-proof: The convention should accommodate new applications and environments without restructuring

The convention:

<org>.<env>.<app>.<type>.<subtype>.<descriptive>

Examples:
CNB.PROD.ACCTS.VSAM.KSDS.CUSTMAST     Customer master KSDS
CNB.PROD.ACCTS.VSAM.ESDS.TXNJOURN     Transaction journal ESDS
CNB.PROD.BATCH.GDG.DAILY.TXNEXT       Daily transaction extract GDG
CNB.PROD.BATCH.GDG.WEEKLY.RECONSUM    Weekly reconciliation summary
CNB.PROD.CICS.VSAM.KSDS.TRANTBL      CICS transaction table
CNB.PROD.DB2.TS.ACCTDB.ACCTMAST      DB2 tablespace
CNB.PROD.REG.SEQ.MONTHLY.SARFILE      SAR regulatory report
CNB.TEST.ACCTS.VSAM.KSDS.CUSTMAST    Test customer master
CNB.DEV.ACCTS.VSAM.KSDS.CUSTMAST     Dev customer master

Qualifiers explained:

Position	Qualifier	Values	Purpose
1	Organization	CNB	RACF top-level, catalog alias root
2	Environment	PROD, TEST, DEV, UAT	Catalog separation, security boundary
3	Application	ACCTS, LOANS, TREAS, BATCH, CICS, DB2, REG	Application ownership, ACS routing
4	Type	VSAM, SEQ, GDG, PDS, PDSE, TS, IX	Data organization, data class assignment
5	Subtype	KSDS, ESDS, RRDS, DAILY, WEEKLY, MONTHLY	Further classification
6	Descriptive	Free-form	Human-readable identifier

Catalog Architecture

Kwame redesigned the catalog hierarchy to match the naming convention:

MASTER CATALOG: MCAT.PROD.VCNB01
  │
  ├── Alias CNB.PROD    → UCAT.CNB.PROD    (production online)
  ├── Alias CNB.TEST    → UCAT.CNB.TEST    (test)
  ├── Alias CNB.DEV     → UCAT.CNB.DEV     (development)
  ├── Alias CNB.UAT     → UCAT.CNB.UAT     (user acceptance)
  └── Alias CNB.BATCH   → UCAT.CNB.BATCH   (production batch - separate!)

Wait — why does CNB.PROD.BATCH.GDG.DAILY.TXNEXT go to UCAT.CNB.PROD instead of UCAT.CNB.BATCH? Because multi-level alias matching uses the longest matching alias. CNB.PROD is a longer match than CNB for any dataset starting with CNB.PROD.*. But batch datasets start with CNB.BATCH.*, matching CNB.BATCH.

This is a critical distinction. Kwame deliberately put batch-created datasets under the CNB.BATCH high-level qualifier (not CNB.PROD.BATCH) so they would route to the batch-specific catalog. "The nightly cycle creates and deletes 6,000 datasets. That's 12,000 catalog updates in four hours. I don't want that serialization anywhere near the catalog that CICS uses to open files."

Lisa Tran pushed back initially: "I wanted all production datasets in one catalog for simplicity. But Kwame showed me the SMF data — catalog serialization was adding 200 milliseconds to every dataset open in the CICS region during batch. For online transactions, 200 milliseconds is eternity."

SMS Class Configuration

Storage Classes:

Class	Availability	Accessibility	I/O Priority	Guaranteed Space	Target
SCPRODHI	Continuous	Continuous Update	High	Yes	CICS regions, DB2
SCPRODONL	Continuous Preferred	Continuous Read	Standard	Yes	Online VSAM
SCBATCHCR	Standard	Standard	Standard	Yes	Critical batch
SCBATCHSTD	Standard	Standard	Low	No	Standard batch work
SCTEST	Standard	Standard	Low	No	Test/Dev
SCARCHIVE	Standard	Standard	Low	No	Archive/regulatory

Key design decisions:

SCPRODHI gets Continuous availability: If the primary volume is unavailable, z/OS automatically fails over to a duplex copy. This is essential for CICS — a volume failure shouldn't take down online banking.
SCBATCHCR vs. SCBATCHSTD: Not all batch is equal. The transaction extract and GL reconciliation are critical (if they fail, the bank doesn't balance). Report generation is standard (if it fails, someone reads yesterday's report). Different storage classes mean different WLM treatment.
Guaranteed Space on critical classes: Ensures space is reserved for critical datasets even if the volume is heavily utilized. Batch work datasets don't get this guarantee — they can wait for space constraint relief.

Management Classes:

Class	Backup Freq	Versions	Migrate ML1	Migrate ML2	Expire	Target
MCPRODCRIT	Daily	5	Never	Never	Never	Online master files
MCPRODSTD	Daily	3	60 days	120 days	Never	Standard production
MCBATCHGDG	Per cycle	2	7 days	30 days	Per GDG LIMIT	GDG generations
MCBATCHWORK	Weekly	1	1 day	3 days	3 days	Temporary batch work
MCREGULATORY	Daily	5	365 days	Never*	2555 days (7yr)	Regulatory/compliance
MCTEST	Never	0	1 day	7 days	30 days	Test datasets

*Regulatory datasets: "Never migrate to ML2" — Ahmad-equivalent at CNB requires all regulatory data on DASD for rapid auditor access.

Data Classes:

Class	RECFM	LRECL	BLKSIZE	CI Size	Free Space	SPACE	Target
DCFB80	FB	80	27920	—	—	CYL(10,10)	JCL, copybooks
DCFB200	FB	200	27800	—	—	CYL(50,10)	Extracts, reports
DCVB	VB	32756	32760	—	—	CYL(20,5)	Variable-length
DCKSDS	—	—	—	4096	10/15	CYL(100,20)	Standard KSDS
DCKSDSHI	—	—	—	4096	10/20	CYL(200,50)	High-volume KSDS
DCKSDSLG	—	—	—	8192	5/10	CYL(500,100)	Large-record KSDS
DCESDS	—	—	—	4096	0/0	CYL(100,20)	ESDS (journals)
DCPRINT	FBA	133	27930	—	—	CYL(5,5)	Print output

ACS Routine Design

Kwame's ACS routines implement a tiered decision tree. Here is the storage class ACS routine (simplified from the production version, which handles 47 edge cases):

PROC STORCLAS

  /* ---- PRODUCTION ONLINE ---- */
  FILTLIST ONLINE_HI INCLUDE(
    'CNB.PROD.CICS.**',
    'CNB.PROD.DB2.**')

  FILTLIST ONLINE_STD INCLUDE(
    'CNB.PROD.ACCTS.VSAM.**',
    'CNB.PROD.LOANS.VSAM.**',
    'CNB.PROD.TREAS.VSAM.**')

  /* ---- BATCH ---- */
  FILTLIST BATCH_CRIT INCLUDE(
    'CNB.BATCH.GDG.DAILY.TXNEXT',
    'CNB.BATCH.GDG.DAILY.GLRECON',
    'CNB.BATCH.SEQ.RECONBRIDGE')

  FILTLIST BATCH_STD INCLUDE(
    'CNB.BATCH.**')

  /* ---- ARCHIVE/REGULATORY ---- */
  FILTLIST ARCHIVE INCLUDE(
    'CNB.PROD.REG.**',
    'CNB.ARCHIVE.**')

  /* ---- NON-PRODUCTION ---- */
  FILTLIST NONPROD INCLUDE(
    'CNB.TEST.**',
    'CNB.DEV.**',
    'CNB.UAT.**')

  SELECT
    WHEN (&DSN = &ONLINE_HI)   SET &STORCLAS = 'SCPRODHI'
    WHEN (&DSN = &ONLINE_STD)  SET &STORCLAS = 'SCPRODONL'
    WHEN (&DSN = &BATCH_CRIT)  SET &STORCLAS = 'SCBATCHCR'
    WHEN (&DSN = &BATCH_STD)   SET &STORCLAS = 'SCBATCHSTD'
    WHEN (&DSN = &ARCHIVE)     SET &STORCLAS = 'SCARCHIVE'
    WHEN (&DSN = &NONPROD)     SET &STORCLAS = 'SCTEST'
    OTHERWISE                  SET &STORCLAS = 'SCBATCHSTD'
  END

END

Results After Three Years

Kwame tracks six metrics to measure the health of the dataset estate:

Metric	Before (2023)	After (2026)	Change
Avg batch job allocation time	4.2 sec	0.8 sec	-81%
Catalog-related abends/month	12	0.3	-97.5%
VSAM datasets with SHROPT(3,3)	40,000	0	-100%
Avg VSAM extent count	23	4.2	-82%
Batch window duration	5.8 hrs	3.9 hrs	-33%
Storage utilization efficiency	61%	89%	+46%

"The 33% batch window improvement came from three things roughly equally," Kwame explains. "One-third from fixing block sizes, one-third from eliminating catalog contention, and one-third from proper VSAM tuning — buffer settings, free space, and regular reorgs. None of it was rocket science. It was just disciplined engineering that had been neglected for twenty years."

Lessons Learned

Naming conventions must be enforced, not suggested. Kwame implemented RACF rules that reject dataset creation outside the naming standard. "If it doesn't match the convention, the allocation fails. Period. We got pushback for two months. Then it became normal."
Separate catalogs for separate workloads. The batch/online catalog split was the single biggest performance win. "We didn't change a single line of COBOL or a single VSAM definition. We just moved the catalog entries and changed the aliases."
SMS classes need annual review. "The classes we defined in 2023 were wrong by 2025 because our workload changed. Now we review SMS classes quarterly with the application teams."
Data classes prevent drift. By encoding BLKSIZE, CI size, and free-space percentages in data classes — and having ACS routines assign the right data class — individual JCL coders can't override standards. "The data class is the standard. If you want an exception, you write an Architecture Decision Record and it goes through review."
Monitor everything. Rob Calloway's daily capacity report and weekly VSAM health report are what keep the environment stable. "You can design the most beautiful storage architecture in the world, but without monitoring, it degrades in three months."

Discussion Questions

Kwame chose to separate the batch catalog from the production online catalog by using different high-level qualifiers (CNB.PROD. vs. CNB.BATCH.). An alternative would be to keep all production datasets under CNB.PROD.* but use different user catalogs with multi-level aliases (CNB.PROD.BATCH → UCAT.BATCH, CNB.PROD.ONLINE → UCAT.ONLINE). What are the trade-offs of each approach?
CNB's data class DCKSDS specifies CI=4096 and FREESPACE(10 15). But their high-volume KSDS data class DCKSDSHI uses the same CI size with FREESPACE(10 20). Why the different CA free space? What access pattern differences justify this?
The management class MCREGULATORY specifies "Never migrate to ML2." This means 7 years of regulatory data stays on primary DASD. At CNB's scale, this is expensive. What alternative approaches could reduce cost while still meeting the regulatory requirement for rapid auditor access?
Kwame enforces naming conventions through RACF rejection. This is an operational overhead — every new dataset type requires a RACF rule update. What's the alternative, and why did Kwame choose enforcement over education?
The 33% batch window improvement came from three roughly equal sources. If Kwame had time to implement only one of the three (block sizes, catalog separation, or VSAM tuning), which would you prioritize and why?