Chapter 30 Further Reading: Disaster Recovery and Business Continuity

Tier 1: Verified IBM Documentation

These are primary sources available directly from IBM's documentation library. All URLs are to IBM's official documentation sites.

GDPS

GDPS Family — An Introduction to Concepts and Capabilities IBM Redbook SG24-6374. The essential starting point for understanding the GDPS product family. Covers GDPS/Metro Mirror, GDPS/Global Mirror, GDPS/XRC, and GDPS/HyperSwap architecture, configuration concepts, and use cases. Start with the architecture overview chapter for the conceptual foundation, then read the chapters specific to your replication technology. Available at: IBM Redbooks (https://www.redbooks.ibm.com/)

GDPS/HyperSwap Manager: Technical Guide IBM Redbook SG24-7833. Deep-dive into HyperSwap architecture, configuration, and operational procedures. Includes detailed diagrams of the HyperSwap process, configuration examples for DS8000 storage, and operational procedures for planned and unplanned swap scenarios. Available at: IBM Redbooks

IBM System Automation for z/OS: Customizing and Programming IBM Publication SC34-2715. System Automation (SA z/OS) is the engine that drives GDPS automation. This guide covers policy customization, automated recovery procedures, and the scripting interface. Essential for anyone who needs to customize GDPS behavior beyond the default policies. Available at: IBM Documentation — z/OS library (https://www.ibm.com/docs/en/zos)

DS8900 Copy Services IBM Documentation for the DS8900 storage subsystem's copy services: Metro Mirror (PPRC), Global Mirror, FlashCopy, and consistency group management. This is the storage-level reference for understanding how replication works at the hardware level. Available at: IBM Documentation — DS8900 (https://www.ibm.com/docs/en/ds8900)

Parallel Sysplex and Recovery

z/OS Parallel Sysplex Overview IBM Publication SA22-7661. Review this in the context of failure domain analysis — the Sysplex architecture chapter explains coupling facility structures, XCF signaling, and the mechanisms that enable peer recovery. This chapter assumed familiarity with Chapter 1; this publication fills any gaps. Available at: IBM Documentation — z/OS library

z/OS MVS Planning: Workload Management IBM Publication SA23-1391. The WLM planning guide is relevant to DR because WLM's automatic workload redistribution is the first line of defense during LPAR failures. The sections on service class configuration and goal management explain how WLM absorbs a failed LPAR's workload. Available at: IBM Documentation — z/OS library

DB2 12 for z/OS: Data Sharing: Planning and Administration IBM Publication SC27-8851. The definitive reference for DB2 data sharing — including peer recovery, group restart, and coupling facility structure management. Chapters on failure and recovery scenarios directly support this chapter's failure domain analysis. Available at: IBM Documentation — DB2 for z/OS (https://www.ibm.com/docs/en/db2-for-zos)

CICS TS for z/OS: Recovery and Restart Guide IBM Publication SC34-7501. Covers CICS region recovery, warm start vs. cold start, journal recovery, and emergency restart procedures. Essential reading for the CICS portions of DR runbooks. Available at: IBM Documentation — CICS TS (https://www.ibm.com/docs/en/cics-ts)

Backup and Recovery

DB2 12 for z/OS: Utility Guide and Reference IBM Publication SC27-8855. The RECOVER utility chapter is your definitive reference for point-in-time recovery, TORBA/TOLOGPOINT specifications, and recovery procedures for tablespaces and indexes. Also covers COPY (image copy), MERGECOPY, and QUIESCE utilities that underpin the backup strategy. Available at: IBM Documentation — DB2 for z/OS

DFSMShsm Managing Your Own Data IBM Publication SC23-6870. DFSMShsm (Hierarchical Storage Manager) manages backup, migration, and dump operations for DASD volumes. Relevant to understanding how non-DB2 datasets (VSAM files, sequential files, PDS libraries) are backed up and recovered. Available at: IBM Documentation — z/OS library

Tier 2: Industry Standards and Regulatory Guidance

NIST Publications

NIST SP 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems The most comprehensive publicly available guide to IT contingency planning. Even if you're not in the federal sector, the BIA methodology, recovery strategy selection framework, and plan testing guidance are applicable to any organization. The appendices include templates for BIA worksheets and contingency plan documents. Available at: NIST Computer Security Resource Center (https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final)

NIST SP 800-53 Rev. 5: Security and Privacy Controls for Information Systems and Organizations The CP (Contingency Planning) control family defines the federal baseline for DR controls. Even for non-federal organizations, the CP controls provide a useful checklist: CP-1 through CP-13 cover policy, plan, training, testing, alternate sites, backup, and recovery. Available at: NIST CSRC (https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final)

Financial Services

FFIEC Business Continuity Planning Handbook (2019) The Federal Financial Institutions Examination Council's guidance for US banks and credit unions. Covers business impact analysis, risk assessment, recovery strategies, testing, and oversight. This is what OCC examiners use when evaluating your DR program. The 2019 revision added significant content on cyber resilience. Available at: FFIEC (https://ithandbook.ffiec.gov/it-booklets/business-continuity-management/)

OCC Bulletin 2023-17: Third-Party Risk Management Relevant to DR because many mainframe shops rely on third parties for DR site hosting, GDPS management, or tape vaulting. The OCC expects banks to ensure that third-party DR capabilities meet the bank's own RTO/RPO requirements. Available at: OCC (https://www.occ.gov/news-issuances/bulletins/2023/bulletin-2023-17.html)

Healthcare

HIPAA Security Rule: Contingency Plan Standard (§164.308(a)(7)) The HIPAA requirement for contingency planning. Notably less prescriptive than NIST or FFIEC guidance — which is both a flexibility and a risk (as Ahmad Rashidi's experience at Pinnacle Health illustrates). Available at: HHS (https://www.hhs.gov/hipaa/for-professionals/security/index.html)

European Regulation

DORA (Digital Operational Resilience Act) — Regulation (EU) 2022/2554 Effective January 2025, DORA establishes comprehensive ICT risk management requirements for EU financial entities, including mandatory DR testing and reporting. Relevant for any mainframe shop supporting EU financial services clients. Available at: EUR-Lex (https://eur-lex.europa.eu/)

Tier 3: Books and Practitioner Resources

"z/OS Mainframe Disaster Recovery: GDPS and Beyond" This IBM Redbook (SG24-7983, if available in current revision) covers end-to-end DR architecture for z/OS environments. It bridges the gap between GDPS product documentation and real-world implementation, with configuration examples and test scenarios. Available at: IBM Redbooks

"IT Disaster Recovery Planning For Dummies" by Peter H. Gregory Despite the title, this is a solid practitioner's guide to DR planning methodology. Covers BIA, recovery strategy selection, plan documentation, and testing. Not mainframe-specific, but the methodology chapters are universally applicable. Good for communicating DR concepts to business stakeholders who don't speak z/OS.

"Business Continuity and Disaster Recovery Planning for IT Professionals" by Susan Snedaker A comprehensive treatment of BC/DR from the business perspective. Strong on BIA methodology, risk assessment frameworks, and organizational governance. Recommended for architects who need to bridge the gap between technical DR capabilities and business continuity requirements.

SHARE Conference Proceedings The SHARE user group (https://www.share.org/) regularly publishes sessions on GDPS implementation, DR testing experiences, and mainframe resilience. Search the session archive for "GDPS," "disaster recovery," and "business continuity" for practitioner presentations from organizations that have implemented and tested these architectures.

Chapter Relevance to DR
Chapter 1: The z/OS Ecosystem Sysplex architecture — the foundation for failure domain analysis
Chapter 5: Workload Manager WLM's role in workload redistribution during failures
Chapter 8: DB2 Locking and Concurrency Lock structure in coupling facility — failure recovery implications
Chapter 9: DB2 Utilities Image copy and log apply — the tools for data corruption recovery
Chapter 13: CICS Architecture Region topology, MRO, AOR failover — CICS DR mechanisms
Chapter 18: CICS Failure and Recovery Detailed CICS recovery procedures — the CICS portions of your DR runbook
Chapter 23: Batch Window Design Batch dependency graph — becomes the batch restart plan after DR
Chapter 24: Checkpoint/Restart Checkpoint data enables batch restart from point of interruption
Chapter 27: Batch Monitoring SMF records and alerting — DR monitoring and detection
Chapter 28: Mainframe Security RACF, encryption, audit trail — must be maintained during DR
Chapter 29: Capacity Planning DR site sizing, capacity headroom for N-1 survivability
Chapter 31: Operational Automation Automated recovery procedures — implementing the runbook actions
Chapter 37: Hybrid Architecture DR for cloud-integrated mainframe components
Chapter 40: Knowledge Transfer Preventing people-dependent single points of failure in DR