Case Study 1: Designing the Dataset Architecture for a Banking System

Background

Pacific Mutual Savings (PMS) is a regional bank with 1.2 million customer accounts, 320 branches, and a mainframe-based core banking system that has been in continuous operation since 1988. Over 36 years, the dataset architecture has grown organically -- developers created datasets with ad hoc names, inconsistent organizations, and default space allocations. A recent audit identified 14,700 cataloged datasets related to the banking application, of which nearly 4,000 were duplicates, orphans, or datasets with incorrect naming that no job could reliably reference.

The symptoms of this architectural debt are severe:

  • Batch job failures: JCL abends due to dataset-not-found errors average 12 per week, each requiring manual intervention by on-call operators
  • Storage waste: Over 800 GB of DASD is consumed by datasets that no active job references
  • Compliance risk: Auditors cannot determine which datasets contain sensitive customer data because the naming convention provides no classification signal
  • Onboarding difficulty: New developers require weeks to understand the dataset landscape before they can write or modify JCL

The bank's CTO has approved a dataset architecture redesign project. The goal is to establish enterprise naming conventions, select optimal dataset organizations for each data category, calculate appropriate space allocations, and assign SMS (Storage Management Subsystem) classes that align with data lifecycle and performance requirements.

This case study walks through the complete redesign process, from naming conventions through SMS class assignments.


Phase 1: Enterprise Naming Convention

The team established a hierarchical naming convention that encodes essential metadata directly in the dataset name. Every dataset name follows this pattern:

HLQ.APP.ENV.TYPE.OBJECT.QUALIFIER

Where: - HLQ (High-Level Qualifier): Organization code, always PMS for Pacific Mutual Savings - APP: Application identifier (3-4 characters) - ENV: Environment code - TYPE: Data classification - OBJECT: Business object name - QUALIFIER: Optional additional descriptor

Defined Values

Qualifier Valid Values Description
APP CORE, LOAN, CARD, WIRE, GL, RPT Application area
ENV PROD, UAT, SIT, DEV, MIGR Environment
TYPE MAST, TRAN, WORK, CTRL, REPT, COPY, JCL, LOAD, PROC, LOG Data type
OBJECT ACCT, CUST, BRNCH, RATE, FEE, etc. Business entity

Naming Convention Examples

PMS.CORE.PROD.MAST.ACCT              Account master (VSAM KSDS)
PMS.CORE.PROD.MAST.CUST              Customer master (VSAM KSDS)
PMS.CORE.PROD.TRAN.DAILY             Daily transaction file (PS)
PMS.CORE.PROD.TRAN.DAILY.G0000V00    GDG transaction archive
PMS.CORE.PROD.WORK.SORTTEMP          Sort work file (PS, temporary)
PMS.CORE.PROD.CTRL.PARMS             Control parameter file (PS)
PMS.CORE.PROD.REPT.DAILYBAL          Daily balance report (PS)
PMS.CORE.PROD.LOG.BATCH              Batch processing log (PS)
PMS.CORE.PROD.COPY.ACCTLAY           Account copybook (PDS)
PMS.CORE.PROD.JCL.LIBRARY            JCL library (PDS)
PMS.CORE.PROD.LOAD.LIBRARY           Load module library (PDS)
PMS.CORE.PROD.PROC.LIBRARY           Cataloged procedures (PDS)
PMS.LOAN.PROD.MAST.LOANACCT          Loan account master (VSAM KSDS)
PMS.CARD.PROD.TRAN.AUTHLOG           Card authorization log (ESDS)
PMS.GL.PROD.REPT.MONTHEND            GL month-end report (PS)

Naming Convention Enforcement

The team wrote an IEFUSI exit routine and RACF filter rules to enforce naming conventions. Any attempt to allocate a dataset that does not match the approved pattern is rejected with a descriptive error message:

//*================================================================*
//* RACF FILTER TO ENFORCE NAMING CONVENTION
//* PERMIT ONLY DATASETS MATCHING PMS.xxxx.ENV.TYPE PATTERN
//*================================================================*
//* DEFINED IN RACF:
//*   ADDSD 'PMS.**' UACC(NONE)
//*   PERMIT 'PMS.**' ID(PMSBATCH) ACCESS(ALTER)
//*   ADDSD 'PMS.CORE.PROD.**' UACC(NONE)
//*   ADDSD 'PMS.CORE.DEV.**' UACC(READ)
//*   PERMIT 'PMS.CORE.DEV.**' ID(DEVGROUP) ACCESS(ALTER)

Phase 2: Dataset Organization Selection

The team categorized every dataset by its primary access pattern and assigned the optimal dataset organization.

Physical Sequential (PS) for Reports and Batch Files

Reports, daily transaction extracts, and work files are read and written sequentially from beginning to end. Physical sequential (PS) organization provides the best performance for this pattern because there is no index overhead and the system can use sequential buffer prefetching.

//*------------------------------------------------------------*
//* DAILY BALANCE REPORT - PHYSICAL SEQUENTIAL
//* WRITTEN BY EOD BATCH, READ BY REPORT DISTRIBUTION SYSTEM
//*------------------------------------------------------------*
//BALREPT  DD DSN=PMS.CORE.PROD.REPT.DAILYBAL,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(50,10),RLSE),
//            DCB=(RECFM=FBA,LRECL=133,BLKSIZE=26600),
//            STORCLAS=SCTEMP,
//            MGMTCLAS=MCREPORT,
//            DATACLAS=DCFB133

The DCB parameters deserve attention: - RECFM=FBA: Fixed-length blocked records with ASA carriage control characters. The A indicates that position 1 of each record contains a printer control character (space for single-space, 0 for double-space, 1 for page eject, + for overprint). - LRECL=133: 132 printable positions plus 1 byte for the ASA control character. - BLKSIZE=26600: Calculated as 200 records per block (200 x 133 = 26,600), fitting within the 27,998-byte half-track limit for 3390 DASD. Larger blocks reduce I/O operations because fewer blocks must be read to process the entire file.

VSAM KSDS for Master Files

Master files require both random access (for online inquiries) and sequential access (for batch processing). VSAM KSDS provides both through its index structure.

//*------------------------------------------------------------*
//* ACCOUNT MASTER - VSAM KSDS
//* RANDOM ACCESS FOR CICS ONLINE, SEQUENTIAL FOR BATCH EOD
//*------------------------------------------------------------*
//* DEFINED VIA IDCAMS:
//*   DEFINE CLUSTER (
//*     NAME(PMS.CORE.PROD.MAST.ACCT)
//*     INDEXED
//*     KEYS(10 0)
//*     RECORDSIZE(500 500)
//*     FREESPACE(15 10)
//*     SHAREOPTIONS(2 3)
//*     STORCLAS(SCVSAM)
//*     MGMTCLAS(MCMASTER)
//*     DATACLAS(DCKSDS50)
//*   )

The master file category includes: - Account Master: keyed by account number - Customer Master: keyed by customer ID - Branch Master: keyed by branch code - Rate Table: keyed by rate type + effective date - Fee Schedule: keyed by fee code + account type

Each of these requires indexed access for online transaction processing and sequential access for batch reporting and maintenance.

PDS/PDSE for Copybooks, JCL, and Load Modules

Partitioned datasets store members that are individually accessed by name. The team uses PDSE (Partitioned Data Set Extended) rather than traditional PDS for all new libraries because PDSE supports dynamic member addition without periodic compression and eliminates the risk of space fragmentation.

//*------------------------------------------------------------*
//* COBOL COPYBOOK LIBRARY - PDSE
//* CONTAINS RECORD LAYOUTS SHARED ACROSS PROGRAMS
//*------------------------------------------------------------*
//COPYLIB  DD DSN=PMS.CORE.PROD.COPY.ACCTLAY,
//            DISP=SHR
//*
//* DEFINED VIA IEFBR14 OR ISPF 3.2:
//*   DSN=PMS.CORE.PROD.COPY.ACCTLAY
//*   DSORG=PO (PDSE)
//*   RECFM=FB
//*   LRECL=80
//*   BLKSIZE=27920
//*   SPACE=(TRK,(50,20,100))  <- 100 DIRECTORY BLOCKS
//*   DSNTYPE=LIBRARY          <- PDSE INDICATOR
//*   STORCLAS=SCPDS
//*   MGMTCLAS=MCLIB

The library structure:

Library Content Naming
PMS.CORE.PROD.COPY.ACCTLAY Account record copybooks Member = copybook name
PMS.CORE.PROD.JCL.LIBRARY Production JCL Member = job name
PMS.CORE.PROD.LOAD.LIBRARY Compiled load modules Member = program name
PMS.CORE.PROD.PROC.LIBRARY Cataloged procedures Member = proc name

Phase 3: Space Calculations

The team developed a space calculation methodology based on projected data volumes, growth rates, and DASD geometry.

Calculation Formula

Primary allocation (cylinders) =
    (Record count x Record length) / (Bytes per cylinder)
    x (1 + Free space factor)
    x (1 + Growth factor)

Secondary allocation (cylinders) =
    Primary allocation x 0.20

For 3390 DASD: - Bytes per track = 56,664 - Tracks per cylinder = 15 - Bytes per cylinder = 849,960

Account Master KSDS Calculation

Records:           1,200,000 accounts
Record length:     500 bytes
CI size:           4,096 bytes
Records per CI:    floor(4096 / 500) = 8
                   (less 10 bytes CIDF + 7 bytes RDF per record)
                   Usable CI space = 4096 - 10 = 4086
                   500 + 7 = 507 bytes per record with RDF
                   floor(4086 / 507) = 8 records per CI
CIs needed:        1,200,000 / 8 = 150,000 CIs
                   At 15% CI free space: 150,000 / 0.85 = 176,471 CIs
Bytes for data:    176,471 x 4,096 = 722,921,216 bytes
                   At 10% CA free space: 722,921,216 / 0.90 = 803,245,795
Cylinders:         803,245,795 / 849,960 = 945 cylinders

Primary:           945 cylinders (rounded to 950)
Secondary:         190 cylinders (20% of primary)

Space Calculation Summary

Dataset Organization Records LRECL Primary Secondary
Account Master KSDS 1,200,000 500 950 CYL 190 CYL
Customer Master KSDS 800,000 600 710 CYL 142 CYL
Daily Transactions PS 3,500,000 250 120 CYL 24 CYL
Transaction Archive GDG PS 3,500,000 250 120 CYL 24 CYL
Daily Balance Report PS 1,200,000 133 25 CYL 5 CYL
JCL Library PDSE 500 members 80 50 TRK 20 TRK
Copybook Library PDSE 200 members 80 30 TRK 10 TRK
Load Module Library PDSE 150 members varies 100 TRK 50 TRK

Phase 4: SMS Class Assignments

The bank's SMS configuration defines storage classes, management classes, and data classes that automate dataset placement, backup, and lifecycle management.

Storage Classes

Storage classes determine where datasets are physically placed and how I/O is managed:

STORCLAS=SCVSAM    - VSAM master files
  Guaranteed space: YES
  Volume selection: VSAM pool (high-performance 3390-9 volumes)
  Cache: DFW (Direct Fast Write) enabled
  Multi-tiered storage: Tier 1 (solid-state)

STORCLAS=SCBATCH   - Batch processing files
  Guaranteed space: YES
  Volume selection: Batch pool (standard 3390 volumes)
  Cache: Sequential mode enabled
  Multi-tiered storage: Tier 2 (standard disk)

STORCLAS=SCTEMP    - Temporary and work files
  Guaranteed space: NO
  Volume selection: Any available
  Cache: None (files are short-lived)
  Multi-tiered storage: Tier 3 (lowest cost)

STORCLAS=SCPDS     - PDS/PDSE libraries
  Guaranteed space: YES
  Volume selection: Library pool
  Cache: DFW enabled
  Multi-tiered storage: Tier 1 (solid-state)

Management Classes

Management classes define backup, migration, and retention policies:

MGMTCLAS=MCMASTER  - Master files
  Backup frequency: Daily
  Backup copies: 3 versions retained
  Migration: Never (always on primary)
  Expire after: NOLIMIT (never auto-delete)

MGMTCLAS=MCTRANS   - Transaction files
  Backup frequency: Daily
  Backup copies: 2 versions retained
  Migration: After 7 days to ML2
  Expire after: 90 days

MGMTCLAS=MCREPORT  - Report files
  Backup frequency: None (regenerable)
  Backup copies: 0
  Migration: After 3 days to ML2
  Expire after: 30 days

MGMTCLAS=MCLIB     - Program libraries
  Backup frequency: Daily
  Backup copies: 5 versions retained
  Migration: Never
  Expire after: NOLIMIT

Data Classes

Data classes provide default DCB attributes, reducing the JCL coding burden and enforcing standards:

DATACLAS=DCFB250   - Fixed block 250-byte records
  RECFM: FB
  LRECL: 250
  BLKSIZE: 27750 (111 records per block)
  Space allocation: Determined by JCL

DATACLAS=DCFB133   - Fixed block 133-byte report records
  RECFM: FBA
  LRECL: 133
  BLKSIZE: 26600 (200 records per block)

DATACLAS=DCKSDS50  - KSDS with 500-byte records
  RECFM: (not applicable to VSAM)
  RECORDSIZE: (500 500)
  CISIZE: 4096
  FREESPACE: (15 10)

DATACLAS=DCPDSE80  - PDSE with 80-byte records
  RECFM: FB
  LRECL: 80
  BLKSIZE: 27920
  DSNTYPE: LIBRARY

Putting It All Together: A Complete JCL Example

The following JCL fragment shows how a production batch job references datasets using the new naming convention and SMS classes:

//PMSEOD01 JOB (ACCT),'PMS EOD PROCESSING',
//         CLASS=A,MSGCLASS=X,MSGLEVEL=(1,1),
//         NOTIFY=&SYSUID
//*
//*================================================================*
//* END-OF-DAY BATCH PROCESSING - ACCOUNT POSTING
//* READS:  DAILY TRANSACTIONS, ACCOUNT MASTER
//* WRITES: UPDATED ACCOUNT MASTER, DAILY BALANCE REPORT,
//*         TRANSACTION ARCHIVE (GDG +1)
//*================================================================*
//*
//POSTING  EXEC PGM=PMSPOST1
//STEPLIB  DD DSN=PMS.CORE.PROD.LOAD.LIBRARY,DISP=SHR
//*
//* INPUT FILES
//TRANIN   DD DSN=PMS.CORE.PROD.TRAN.DAILY,
//            DISP=SHR
//ACCTMAST DD DSN=PMS.CORE.PROD.MAST.ACCT,
//            DISP=SHR
//CUSTMAST DD DSN=PMS.CORE.PROD.MAST.CUST,
//            DISP=SHR
//*
//* OUTPUT FILES
//BALREPT  DD DSN=PMS.CORE.PROD.REPT.DAILYBAL,
//            DISP=(NEW,CATLG,DELETE),
//            STORCLAS=SCTEMP,
//            MGMTCLAS=MCREPORT,
//            DATACLAS=DCFB133,
//            SPACE=(CYL,(25,5),RLSE)
//*
//TRANARCH DD DSN=PMS.CORE.PROD.TRAN.DAILY(+1),
//            DISP=(NEW,CATLG,DELETE),
//            STORCLAS=SCBATCH,
//            MGMTCLAS=MCTRANS,
//            DATACLAS=DCFB250,
//            SPACE=(CYL,(120,24),RLSE)
//*
//BATCHLOG DD DSN=PMS.CORE.PROD.LOG.BATCH,
//            DISP=MOD,
//            STORCLAS=SCBATCH,
//            MGMTCLAS=MCTRANS
//*
//SYSOUT   DD SYSOUT=*
//PARMFILE DD DSN=PMS.CORE.PROD.CTRL.PARMS,
//            DISP=SHR

Notice how the naming convention makes the JCL self-documenting. Any operator or developer can immediately understand each DD statement's purpose by reading the dataset name. The SMS class assignments (STORCLAS, MGMTCLAS, DATACLAS) replace individual DCB, UNIT, and VOL parameters, reducing JCL complexity while enforcing organizational standards.


Lessons Learned

1. Naming Conventions Are Infrastructure, Not Bureaucracy

The single most impactful change in this project was the naming convention. Before the redesign, finding all datasets related to the account master required searching through 14,700 dataset names with patterns like PROD.ACCTMST, ACCTMASTER.PROD.FILE, CORE.ACCT.MASTER.V2, and BANKACCT.DATA. After the redesign, a LISTCAT of PMS.CORE.PROD.MAST.** returns exactly the master files for the core banking production environment. This transformed troubleshooting from a multi-hour investigation into a 30-second catalog search.

2. SMS Classes Eliminate Human Error in Dataset Placement

Before SMS, JCL writers had to specify UNIT, VOL, and DCB parameters for every dataset allocation. Incorrect parameters led to datasets placed on wrong volumes (performance-sensitive VSAM on slow archive disks), wrong block sizes (space waste or I/O inefficiency), and missing backup schedules. SMS classes encode all of these decisions once, centrally, and apply them consistently to every dataset allocation.

3. Space Over-Allocation Is Cheaper Than Under-Allocation

The team initially tried to minimize space allocations to conserve DASD. This led to frequent B37 and E37 abends (out of space) during volume batch runs when transaction counts exceeded estimates. They adopted a policy of allocating 120% of projected need for primary space, with secondary extents at 20% of primary. The marginal cost of extra DASD is trivial compared to the operational cost of abended batch jobs.

4. PDSE Eliminates the PDS Compress Problem

Traditional PDS datasets develop internal fragmentation when members are added and deleted. Eventually, the directory runs out of space even though the data area has room, requiring an IEBCOPY compress operation. PDSE (DSNTYPE=LIBRARY) manages space dynamically and never requires compression. The team converted all PDS libraries to PDSE, eliminating a chronic source of library-full abends.

5. Dataset Organization Choice Affects Application Architecture

Choosing VSAM KSDS for the account master was not just a storage decision -- it shaped the entire application architecture. CICS programs use random READ by key for online inquiries. Batch programs use sequential START/READ NEXT for end-of-day processing. The alternate index supports lookups by SSN. If the team had chosen a flat file organization, the application would have needed separate index structures, and online access would have required loading the entire file into memory.


Discussion Questions

  1. The naming convention uses a fixed structure of HLQ.APP.ENV.TYPE.OBJECT. What happens when a dataset legitimately belongs to two applications (e.g., an interface file between CORE and LOAN)? How would you handle cross-application datasets in this naming scheme?

  2. The team chose 3390-specific block sizes optimized for half-track blocking. What would happen if the bank migrated to a different DASD type with different track geometry? How do SMS data classes help manage this transition?

  3. The space calculation for the Account Master KSDS accounts for CI free space and CA free space. Explain why both levels of free space are necessary, and describe the performance impact when free space is exhausted at each level.

  4. Management class MCREPORT specifies zero backup copies because reports are "regenerable." What risks does this policy accept, and what compensating controls would you recommend?

  5. The team converted all PDS libraries to PDSE. Are there any situations where a traditional PDS would be preferable to a PDSE? Consider compatibility, performance, and operational characteristics.


Connection to Chapter Concepts

This case study integrates several key concepts from Chapter 30:

  • Dataset naming conventions (Section: Dataset Naming Rules and Standards): The hierarchical naming convention demonstrates the 44-character name limit, qualifier rules, and the importance of meaningful high-level qualifiers.

  • Dataset organizations (Section: Physical Sequential, Partitioned, and VSAM): The selection of PS, VSAM KSDS, and PDS/PDSE for different data types illustrates the decision criteria for each organization.

  • Space allocation (Section: Space Allocation and Management): The cylinder-based space calculations show how record counts, record lengths, and blocking factors determine primary and secondary space allocations.

  • SMS classes (Section: Storage Management Subsystem): The storage class, management class, and data class definitions demonstrate how SMS automates dataset placement, backup, and lifecycle management.

  • DASD geometry (Section: Understanding DASD Volumes): The block size calculations reference 3390 track capacity and half-track blocking, connecting physical DASD characteristics to logical dataset design decisions.