Case Study 1: Designing the Dataset Architecture for a Banking System
Background
Pacific Mutual Savings (PMS) is a regional bank with 1.2 million customer accounts, 320 branches, and a mainframe-based core banking system that has been in continuous operation since 1988. Over 36 years, the dataset architecture has grown organically -- developers created datasets with ad hoc names, inconsistent organizations, and default space allocations. A recent audit identified 14,700 cataloged datasets related to the banking application, of which nearly 4,000 were duplicates, orphans, or datasets with incorrect naming that no job could reliably reference.
The symptoms of this architectural debt are severe:
- Batch job failures: JCL abends due to dataset-not-found errors average 12 per week, each requiring manual intervention by on-call operators
- Storage waste: Over 800 GB of DASD is consumed by datasets that no active job references
- Compliance risk: Auditors cannot determine which datasets contain sensitive customer data because the naming convention provides no classification signal
- Onboarding difficulty: New developers require weeks to understand the dataset landscape before they can write or modify JCL
The bank's CTO has approved a dataset architecture redesign project. The goal is to establish enterprise naming conventions, select optimal dataset organizations for each data category, calculate appropriate space allocations, and assign SMS (Storage Management Subsystem) classes that align with data lifecycle and performance requirements.
This case study walks through the complete redesign process, from naming conventions through SMS class assignments.
Phase 1: Enterprise Naming Convention
The team established a hierarchical naming convention that encodes essential metadata directly in the dataset name. Every dataset name follows this pattern:
HLQ.APP.ENV.TYPE.OBJECT.QUALIFIER
Where:
- HLQ (High-Level Qualifier): Organization code, always PMS for Pacific Mutual Savings
- APP: Application identifier (3-4 characters)
- ENV: Environment code
- TYPE: Data classification
- OBJECT: Business object name
- QUALIFIER: Optional additional descriptor
Defined Values
| Qualifier | Valid Values | Description |
|---|---|---|
| APP | CORE, LOAN, CARD, WIRE, GL, RPT | Application area |
| ENV | PROD, UAT, SIT, DEV, MIGR | Environment |
| TYPE | MAST, TRAN, WORK, CTRL, REPT, COPY, JCL, LOAD, PROC, LOG | Data type |
| OBJECT | ACCT, CUST, BRNCH, RATE, FEE, etc. | Business entity |
Naming Convention Examples
PMS.CORE.PROD.MAST.ACCT Account master (VSAM KSDS)
PMS.CORE.PROD.MAST.CUST Customer master (VSAM KSDS)
PMS.CORE.PROD.TRAN.DAILY Daily transaction file (PS)
PMS.CORE.PROD.TRAN.DAILY.G0000V00 GDG transaction archive
PMS.CORE.PROD.WORK.SORTTEMP Sort work file (PS, temporary)
PMS.CORE.PROD.CTRL.PARMS Control parameter file (PS)
PMS.CORE.PROD.REPT.DAILYBAL Daily balance report (PS)
PMS.CORE.PROD.LOG.BATCH Batch processing log (PS)
PMS.CORE.PROD.COPY.ACCTLAY Account copybook (PDS)
PMS.CORE.PROD.JCL.LIBRARY JCL library (PDS)
PMS.CORE.PROD.LOAD.LIBRARY Load module library (PDS)
PMS.CORE.PROD.PROC.LIBRARY Cataloged procedures (PDS)
PMS.LOAN.PROD.MAST.LOANACCT Loan account master (VSAM KSDS)
PMS.CARD.PROD.TRAN.AUTHLOG Card authorization log (ESDS)
PMS.GL.PROD.REPT.MONTHEND GL month-end report (PS)
Naming Convention Enforcement
The team wrote an IEFUSI exit routine and RACF filter rules to enforce naming conventions. Any attempt to allocate a dataset that does not match the approved pattern is rejected with a descriptive error message:
//*================================================================*
//* RACF FILTER TO ENFORCE NAMING CONVENTION
//* PERMIT ONLY DATASETS MATCHING PMS.xxxx.ENV.TYPE PATTERN
//*================================================================*
//* DEFINED IN RACF:
//* ADDSD 'PMS.**' UACC(NONE)
//* PERMIT 'PMS.**' ID(PMSBATCH) ACCESS(ALTER)
//* ADDSD 'PMS.CORE.PROD.**' UACC(NONE)
//* ADDSD 'PMS.CORE.DEV.**' UACC(READ)
//* PERMIT 'PMS.CORE.DEV.**' ID(DEVGROUP) ACCESS(ALTER)
Phase 2: Dataset Organization Selection
The team categorized every dataset by its primary access pattern and assigned the optimal dataset organization.
Physical Sequential (PS) for Reports and Batch Files
Reports, daily transaction extracts, and work files are read and written sequentially from beginning to end. Physical sequential (PS) organization provides the best performance for this pattern because there is no index overhead and the system can use sequential buffer prefetching.
//*------------------------------------------------------------*
//* DAILY BALANCE REPORT - PHYSICAL SEQUENTIAL
//* WRITTEN BY EOD BATCH, READ BY REPORT DISTRIBUTION SYSTEM
//*------------------------------------------------------------*
//BALREPT DD DSN=PMS.CORE.PROD.REPT.DAILYBAL,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(50,10),RLSE),
// DCB=(RECFM=FBA,LRECL=133,BLKSIZE=26600),
// STORCLAS=SCTEMP,
// MGMTCLAS=MCREPORT,
// DATACLAS=DCFB133
The DCB parameters deserve attention:
- RECFM=FBA: Fixed-length blocked records with ASA carriage control characters. The A indicates that position 1 of each record contains a printer control character (space for single-space, 0 for double-space, 1 for page eject, + for overprint).
- LRECL=133: 132 printable positions plus 1 byte for the ASA control character.
- BLKSIZE=26600: Calculated as 200 records per block (200 x 133 = 26,600), fitting within the 27,998-byte half-track limit for 3390 DASD. Larger blocks reduce I/O operations because fewer blocks must be read to process the entire file.
VSAM KSDS for Master Files
Master files require both random access (for online inquiries) and sequential access (for batch processing). VSAM KSDS provides both through its index structure.
//*------------------------------------------------------------*
//* ACCOUNT MASTER - VSAM KSDS
//* RANDOM ACCESS FOR CICS ONLINE, SEQUENTIAL FOR BATCH EOD
//*------------------------------------------------------------*
//* DEFINED VIA IDCAMS:
//* DEFINE CLUSTER (
//* NAME(PMS.CORE.PROD.MAST.ACCT)
//* INDEXED
//* KEYS(10 0)
//* RECORDSIZE(500 500)
//* FREESPACE(15 10)
//* SHAREOPTIONS(2 3)
//* STORCLAS(SCVSAM)
//* MGMTCLAS(MCMASTER)
//* DATACLAS(DCKSDS50)
//* )
The master file category includes: - Account Master: keyed by account number - Customer Master: keyed by customer ID - Branch Master: keyed by branch code - Rate Table: keyed by rate type + effective date - Fee Schedule: keyed by fee code + account type
Each of these requires indexed access for online transaction processing and sequential access for batch reporting and maintenance.
PDS/PDSE for Copybooks, JCL, and Load Modules
Partitioned datasets store members that are individually accessed by name. The team uses PDSE (Partitioned Data Set Extended) rather than traditional PDS for all new libraries because PDSE supports dynamic member addition without periodic compression and eliminates the risk of space fragmentation.
//*------------------------------------------------------------*
//* COBOL COPYBOOK LIBRARY - PDSE
//* CONTAINS RECORD LAYOUTS SHARED ACROSS PROGRAMS
//*------------------------------------------------------------*
//COPYLIB DD DSN=PMS.CORE.PROD.COPY.ACCTLAY,
// DISP=SHR
//*
//* DEFINED VIA IEFBR14 OR ISPF 3.2:
//* DSN=PMS.CORE.PROD.COPY.ACCTLAY
//* DSORG=PO (PDSE)
//* RECFM=FB
//* LRECL=80
//* BLKSIZE=27920
//* SPACE=(TRK,(50,20,100)) <- 100 DIRECTORY BLOCKS
//* DSNTYPE=LIBRARY <- PDSE INDICATOR
//* STORCLAS=SCPDS
//* MGMTCLAS=MCLIB
The library structure:
| Library | Content | Naming |
|---|---|---|
| PMS.CORE.PROD.COPY.ACCTLAY | Account record copybooks | Member = copybook name |
| PMS.CORE.PROD.JCL.LIBRARY | Production JCL | Member = job name |
| PMS.CORE.PROD.LOAD.LIBRARY | Compiled load modules | Member = program name |
| PMS.CORE.PROD.PROC.LIBRARY | Cataloged procedures | Member = proc name |
Phase 3: Space Calculations
The team developed a space calculation methodology based on projected data volumes, growth rates, and DASD geometry.
Calculation Formula
Primary allocation (cylinders) =
(Record count x Record length) / (Bytes per cylinder)
x (1 + Free space factor)
x (1 + Growth factor)
Secondary allocation (cylinders) =
Primary allocation x 0.20
For 3390 DASD: - Bytes per track = 56,664 - Tracks per cylinder = 15 - Bytes per cylinder = 849,960
Account Master KSDS Calculation
Records: 1,200,000 accounts
Record length: 500 bytes
CI size: 4,096 bytes
Records per CI: floor(4096 / 500) = 8
(less 10 bytes CIDF + 7 bytes RDF per record)
Usable CI space = 4096 - 10 = 4086
500 + 7 = 507 bytes per record with RDF
floor(4086 / 507) = 8 records per CI
CIs needed: 1,200,000 / 8 = 150,000 CIs
At 15% CI free space: 150,000 / 0.85 = 176,471 CIs
Bytes for data: 176,471 x 4,096 = 722,921,216 bytes
At 10% CA free space: 722,921,216 / 0.90 = 803,245,795
Cylinders: 803,245,795 / 849,960 = 945 cylinders
Primary: 945 cylinders (rounded to 950)
Secondary: 190 cylinders (20% of primary)
Space Calculation Summary
| Dataset | Organization | Records | LRECL | Primary | Secondary |
|---|---|---|---|---|---|
| Account Master | KSDS | 1,200,000 | 500 | 950 CYL | 190 CYL |
| Customer Master | KSDS | 800,000 | 600 | 710 CYL | 142 CYL |
| Daily Transactions | PS | 3,500,000 | 250 | 120 CYL | 24 CYL |
| Transaction Archive GDG | PS | 3,500,000 | 250 | 120 CYL | 24 CYL |
| Daily Balance Report | PS | 1,200,000 | 133 | 25 CYL | 5 CYL |
| JCL Library | PDSE | 500 members | 80 | 50 TRK | 20 TRK |
| Copybook Library | PDSE | 200 members | 80 | 30 TRK | 10 TRK |
| Load Module Library | PDSE | 150 members | varies | 100 TRK | 50 TRK |
Phase 4: SMS Class Assignments
The bank's SMS configuration defines storage classes, management classes, and data classes that automate dataset placement, backup, and lifecycle management.
Storage Classes
Storage classes determine where datasets are physically placed and how I/O is managed:
STORCLAS=SCVSAM - VSAM master files
Guaranteed space: YES
Volume selection: VSAM pool (high-performance 3390-9 volumes)
Cache: DFW (Direct Fast Write) enabled
Multi-tiered storage: Tier 1 (solid-state)
STORCLAS=SCBATCH - Batch processing files
Guaranteed space: YES
Volume selection: Batch pool (standard 3390 volumes)
Cache: Sequential mode enabled
Multi-tiered storage: Tier 2 (standard disk)
STORCLAS=SCTEMP - Temporary and work files
Guaranteed space: NO
Volume selection: Any available
Cache: None (files are short-lived)
Multi-tiered storage: Tier 3 (lowest cost)
STORCLAS=SCPDS - PDS/PDSE libraries
Guaranteed space: YES
Volume selection: Library pool
Cache: DFW enabled
Multi-tiered storage: Tier 1 (solid-state)
Management Classes
Management classes define backup, migration, and retention policies:
MGMTCLAS=MCMASTER - Master files
Backup frequency: Daily
Backup copies: 3 versions retained
Migration: Never (always on primary)
Expire after: NOLIMIT (never auto-delete)
MGMTCLAS=MCTRANS - Transaction files
Backup frequency: Daily
Backup copies: 2 versions retained
Migration: After 7 days to ML2
Expire after: 90 days
MGMTCLAS=MCREPORT - Report files
Backup frequency: None (regenerable)
Backup copies: 0
Migration: After 3 days to ML2
Expire after: 30 days
MGMTCLAS=MCLIB - Program libraries
Backup frequency: Daily
Backup copies: 5 versions retained
Migration: Never
Expire after: NOLIMIT
Data Classes
Data classes provide default DCB attributes, reducing the JCL coding burden and enforcing standards:
DATACLAS=DCFB250 - Fixed block 250-byte records
RECFM: FB
LRECL: 250
BLKSIZE: 27750 (111 records per block)
Space allocation: Determined by JCL
DATACLAS=DCFB133 - Fixed block 133-byte report records
RECFM: FBA
LRECL: 133
BLKSIZE: 26600 (200 records per block)
DATACLAS=DCKSDS50 - KSDS with 500-byte records
RECFM: (not applicable to VSAM)
RECORDSIZE: (500 500)
CISIZE: 4096
FREESPACE: (15 10)
DATACLAS=DCPDSE80 - PDSE with 80-byte records
RECFM: FB
LRECL: 80
BLKSIZE: 27920
DSNTYPE: LIBRARY
Putting It All Together: A Complete JCL Example
The following JCL fragment shows how a production batch job references datasets using the new naming convention and SMS classes:
//PMSEOD01 JOB (ACCT),'PMS EOD PROCESSING',
// CLASS=A,MSGCLASS=X,MSGLEVEL=(1,1),
// NOTIFY=&SYSUID
//*
//*================================================================*
//* END-OF-DAY BATCH PROCESSING - ACCOUNT POSTING
//* READS: DAILY TRANSACTIONS, ACCOUNT MASTER
//* WRITES: UPDATED ACCOUNT MASTER, DAILY BALANCE REPORT,
//* TRANSACTION ARCHIVE (GDG +1)
//*================================================================*
//*
//POSTING EXEC PGM=PMSPOST1
//STEPLIB DD DSN=PMS.CORE.PROD.LOAD.LIBRARY,DISP=SHR
//*
//* INPUT FILES
//TRANIN DD DSN=PMS.CORE.PROD.TRAN.DAILY,
// DISP=SHR
//ACCTMAST DD DSN=PMS.CORE.PROD.MAST.ACCT,
// DISP=SHR
//CUSTMAST DD DSN=PMS.CORE.PROD.MAST.CUST,
// DISP=SHR
//*
//* OUTPUT FILES
//BALREPT DD DSN=PMS.CORE.PROD.REPT.DAILYBAL,
// DISP=(NEW,CATLG,DELETE),
// STORCLAS=SCTEMP,
// MGMTCLAS=MCREPORT,
// DATACLAS=DCFB133,
// SPACE=(CYL,(25,5),RLSE)
//*
//TRANARCH DD DSN=PMS.CORE.PROD.TRAN.DAILY(+1),
// DISP=(NEW,CATLG,DELETE),
// STORCLAS=SCBATCH,
// MGMTCLAS=MCTRANS,
// DATACLAS=DCFB250,
// SPACE=(CYL,(120,24),RLSE)
//*
//BATCHLOG DD DSN=PMS.CORE.PROD.LOG.BATCH,
// DISP=MOD,
// STORCLAS=SCBATCH,
// MGMTCLAS=MCTRANS
//*
//SYSOUT DD SYSOUT=*
//PARMFILE DD DSN=PMS.CORE.PROD.CTRL.PARMS,
// DISP=SHR
Notice how the naming convention makes the JCL self-documenting. Any operator or developer can immediately understand each DD statement's purpose by reading the dataset name. The SMS class assignments (STORCLAS, MGMTCLAS, DATACLAS) replace individual DCB, UNIT, and VOL parameters, reducing JCL complexity while enforcing organizational standards.
Lessons Learned
1. Naming Conventions Are Infrastructure, Not Bureaucracy
The single most impactful change in this project was the naming convention. Before the redesign, finding all datasets related to the account master required searching through 14,700 dataset names with patterns like PROD.ACCTMST, ACCTMASTER.PROD.FILE, CORE.ACCT.MASTER.V2, and BANKACCT.DATA. After the redesign, a LISTCAT of PMS.CORE.PROD.MAST.** returns exactly the master files for the core banking production environment. This transformed troubleshooting from a multi-hour investigation into a 30-second catalog search.
2. SMS Classes Eliminate Human Error in Dataset Placement
Before SMS, JCL writers had to specify UNIT, VOL, and DCB parameters for every dataset allocation. Incorrect parameters led to datasets placed on wrong volumes (performance-sensitive VSAM on slow archive disks), wrong block sizes (space waste or I/O inefficiency), and missing backup schedules. SMS classes encode all of these decisions once, centrally, and apply them consistently to every dataset allocation.
3. Space Over-Allocation Is Cheaper Than Under-Allocation
The team initially tried to minimize space allocations to conserve DASD. This led to frequent B37 and E37 abends (out of space) during volume batch runs when transaction counts exceeded estimates. They adopted a policy of allocating 120% of projected need for primary space, with secondary extents at 20% of primary. The marginal cost of extra DASD is trivial compared to the operational cost of abended batch jobs.
4. PDSE Eliminates the PDS Compress Problem
Traditional PDS datasets develop internal fragmentation when members are added and deleted. Eventually, the directory runs out of space even though the data area has room, requiring an IEBCOPY compress operation. PDSE (DSNTYPE=LIBRARY) manages space dynamically and never requires compression. The team converted all PDS libraries to PDSE, eliminating a chronic source of library-full abends.
5. Dataset Organization Choice Affects Application Architecture
Choosing VSAM KSDS for the account master was not just a storage decision -- it shaped the entire application architecture. CICS programs use random READ by key for online inquiries. Batch programs use sequential START/READ NEXT for end-of-day processing. The alternate index supports lookups by SSN. If the team had chosen a flat file organization, the application would have needed separate index structures, and online access would have required loading the entire file into memory.
Discussion Questions
-
The naming convention uses a fixed structure of HLQ.APP.ENV.TYPE.OBJECT. What happens when a dataset legitimately belongs to two applications (e.g., an interface file between CORE and LOAN)? How would you handle cross-application datasets in this naming scheme?
-
The team chose 3390-specific block sizes optimized for half-track blocking. What would happen if the bank migrated to a different DASD type with different track geometry? How do SMS data classes help manage this transition?
-
The space calculation for the Account Master KSDS accounts for CI free space and CA free space. Explain why both levels of free space are necessary, and describe the performance impact when free space is exhausted at each level.
-
Management class MCREPORT specifies zero backup copies because reports are "regenerable." What risks does this policy accept, and what compensating controls would you recommend?
-
The team converted all PDS libraries to PDSE. Are there any situations where a traditional PDS would be preferable to a PDSE? Consider compatibility, performance, and operational characteristics.
Connection to Chapter Concepts
This case study integrates several key concepts from Chapter 30:
-
Dataset naming conventions (Section: Dataset Naming Rules and Standards): The hierarchical naming convention demonstrates the 44-character name limit, qualifier rules, and the importance of meaningful high-level qualifiers.
-
Dataset organizations (Section: Physical Sequential, Partitioned, and VSAM): The selection of PS, VSAM KSDS, and PDS/PDSE for different data types illustrates the decision criteria for each organization.
-
Space allocation (Section: Space Allocation and Management): The cylinder-based space calculations show how record counts, record lengths, and blocking factors determine primary and secondary space allocations.
-
SMS classes (Section: Storage Management Subsystem): The storage class, management class, and data class definitions demonstrate how SMS automates dataset placement, backup, and lifecycle management.
-
DASD geometry (Section: Understanding DASD Volumes): The block size calculations reference 3390 track capacity and half-track blocking, connecting physical DASD characteristics to logical dataset design decisions.