Chapter 18 Further Reading
IBM Official Documentation
CICS Recovery and Restart
- CICS Recovery and Restart Guide (SC34-7388) — The definitive reference for CICS recovery architecture. Covers the system log, recovery manager, unit of recovery, syncpoint processing, emergency restart, and indoubt resolution. Chapters 1–4 cover the architecture; chapters 5–8 cover configuration and operational procedures. Required reading for anyone responsible for CICS recovery.
-
URL: https://www.ibm.com/docs/en/cics-ts/5.6?topic=recovery
-
CICS System Definition Guide — Recovery Parameters — Detailed reference for all SIT parameters related to recovery: START, KEYINTV, LOGDEFER, RMRETRY, UOWNETQL, and others. Each parameter includes the default value, valid range, and the impact on recovery behavior.
-
URL: https://www.ibm.com/docs/en/cics-ts/5.6?topic=tables-system-initialization
-
CICS Resource Definition Guide — DB2CONN — Comprehensive reference for the DB2CONN resource definition, including RESYNCMEMBER, STANDBYMODE, and THREADERROR parameters that affect recovery coordination between CICS and DB2.
-
CICS Operations and Utilities Guide — DFHRMUTL — Documentation for the recovery manager utility that lists and resolves indoubt units of work. Includes command syntax, output format, and examples of manual resolution procedures.
Two-Phase Commit and XA
-
CICS Intercommunication Guide — Syncpoint Processing (SC34-7390) — Covers two-phase commit across MRO and ISC connections. Explains coordinator and participant roles, the PREPARE/COMMIT protocol, and indoubt handling for distributed transactions.
-
CICS-DB2 Guide (SC34-7370) — Covers the CICS-DB2 attachment facility, including thread management, two-phase commit coordination, and indoubt resolution. The sections on RESYNCMEMBER and group resynchronization are essential for Sysplex environments.
-
CICS-MQ Adapter Guide — Covers integration between CICS and IBM MQ, including transactional coordination (MQ as a 2PC participant), connection failure handling, and session recovery. The sections on connection recovery and channel reconnection are relevant to the Pinnacle Health case study.
z/OS Automatic Restart Management
-
z/OS MVS Setting Up a Sysplex (SA38-0680) — Covers ARM policy definition, restart element registration, cross-system restart, and restart group coordination. Chapter 6 covers ARM in detail.
-
z/OS MVS Programming: Sysplex Services Guide (SA38-0658) — Programming interface for ARM registration. Relevant if you need to understand how CICS registers with ARM and how custom applications can participate in ARM-managed recovery.
IBM Redbooks
-
CICS Transaction Server from Start to Finish (SG24-8347) — Chapter 8 covers CICS recovery in depth: system log architecture, emergency restart processing, two-phase commit, and indoubt resolution. Includes practical examples and configuration guidance.
-
CICS Recovery and Restart (SG24-6843) — An older but still relevant Redbook focused entirely on CICS recovery. Covers failure scenarios, recovery procedures, log stream configuration, and recovery testing methodology. The testing methodology chapter is particularly valuable.
-
DB2 and CICS: The Perfect Couple (SG24-7421) — Chapter 5 covers two-phase commit between CICS and DB2, including indoubt resolution, thread recovery, and the RESYNCMEMBER parameter. Includes worked examples of indoubt scenarios.
-
CICS and the Parallel Sysplex (SG24-6449) — Chapter 7 covers Sysplex-aware recovery features: coupling facility log streams, shared temporary storage recovery, and cross-system indoubt resolution. The coupling facility sizing guidance for system logs is essential for capacity planning.
-
IBM MQ and CICS Integration Patterns (SG24-8374) — Covers MQ as a 2PC participant in CICS transactions, MQ connection recovery, and shared queue architectures. The shared queue chapters are directly relevant to the Pinnacle Health corrective action (MQ shared queues for cross-LPAR indoubt resolution).
Technical Articles and Papers
-
"Understanding CICS Two-Phase Commit" — IBM CICS Blog. A practical walkthrough of the 2PC protocol in CICS with sequence diagrams and failure scenario analysis. Covers the indoubt window, commit point, and resolution mechanisms.
-
"CICS Recovery: Myths and Reality" — SHARE Conference presentation. Addresses common misconceptions about CICS recovery, including the myth that cold start is safe, the myth that LOGDEFER has no risk, and the myth that indoubt transactions resolve themselves.
-
"Designing Idempotent Transactions for Mainframe Systems" — IBM Systems Magazine. Covers idempotency patterns for CICS and batch transactions, including unique request identifiers, duplicate detection, and compensating transactions.
-
"XA Transaction Processing: Standards and Implementation" — The Open Group Technical Standard. The formal specification of the XA interface. While CICS's implementation predates the XA standard (CICS has done 2PC since the 1970s), the XA standard provides the terminology and protocol description used in modern documentation.
Books
-
Tresch, Bob, and Frank Kyne. System Programmer's Guide to z/OS System Logger. — The CICS system log is built on z/OS System Logger. Understanding logger architecture — log stream types, coupling facility structures, offloading, staging datasets — is essential for configuring CICS recovery correctly. The chapters on coupling facility structure sizing and log stream duplexing are particularly relevant.
-
Horswill, Matthew. CICS: A Practical Guide to System Fine Tuning, 2nd Edition. — Chapter 9 covers recovery performance: optimizing emergency restart time, tuning log I/O, and sizing activity keypoint intervals. The benchmarks are dated but the methodology is timeless.
-
Bernstein, Philip A., and Eric Newcomer. Principles of Transaction Processing, 2nd Edition. — The academic foundation for distributed transaction processing, including two-phase commit, indoubt resolution, and compensating transactions. Chapter 8 covers the 2PC protocol in depth. While not mainframe-specific, the principles apply directly to CICS's implementation.
-
Gray, Jim, and Andreas Reuter. Transaction Processing: Concepts and Techniques. — The classic reference on transaction processing. Chapters 10–12 cover recovery management, logging, and two-phase commit. Jim Gray's work at IBM directly influenced CICS's recovery architecture. Essential reading for any architect who wants to understand why 2PC works, not just how.
Conference Presentations
-
SHARE Conference — CICS Recovery Track — Annual presentations on CICS recovery architecture, testing methodologies, and real-world incident analysis. The "CICS Recovery War Stories" sessions are particularly valuable — production architects share anonymized failure scenarios and recovery procedures.
-
IBM TechU — CICS Advanced Recovery Sessions — Deep-dive sessions on CICS recovery internals, two-phase commit optimization, and Sysplex-aware recovery features. IBM CICS Level 3 support engineers often present, providing insight into common recovery problems they see across their customer base.
-
GSE (Guide Share Europe) — CICS Operations Working Group — European focus on CICS operational procedures, including recovery testing frameworks and compliance-driven recovery requirements (GDPR, PSD2).
Related Chapters in This Book
-
Chapter 1: z/OS Parallel Sysplex Architecture — Foundation for understanding coupling facility, ARM, and cross-system recovery. The coupling facility log stream configuration from this chapter depends on the Sysplex infrastructure from Chapter 1.
-
Chapter 8: DB2 Locking and Concurrency — DB2 locks held by CICS transactions persist during region failures until CICS resolves the UOW. Chapter 8's locking strategy (row vs. page) directly affects the blast radius of a CICS failure. The spaced review in Section 18.8 revisits this interaction.
-
Chapter 13: CICS Architecture for Architects — Region topology (TOR/AOR/FOR) creates the failure domains and MRO dependencies that recovery must navigate. The multi-region recovery patterns in this chapter assume the topology from Chapter 13.
-
Chapter 16: CICS Security — Recovery procedures must maintain security boundaries. Manual indoubt resolution requires proper RACF authorization. The two-person authorization requirement for DFHRMUTL is a security architecture decision.
-
Chapter 17: CICS Performance and Tuning — Performance tuning (MAXTASK, DSA sizing) interacts with recovery. Over-tuned regions (MAXTASK too high, storage too tight) are more likely to fail. Under-tuned regions (too much headroom) waste resources. The balance between performance and resilience is an architectural trade-off.
-
Chapter 19: IBM MQ for the COBOL Architect — MQ as a 2PC participant in CICS transactions. MQ connection recovery and shared queue architecture for cross-LPAR indoubt resolution. The Pinnacle Health case study foreshadows MQ topics covered in depth in Chapter 19.
-
Chapter 24: Checkpoint/Restart for Batch — The batch equivalent of CICS recovery. While the mechanisms differ (checkpoint files vs. system logs), the principles are the same: bound the recovery window, make recovery automatic, test regularly.
-
Chapter 30: Disaster Recovery — Extends the recovery concepts from this chapter to site-level failures. GDPS, Sysplex recovery, and site failover build on the single-region and multi-region recovery patterns established here.
Online Resources
- IBM CICS Community — Recovery Forum — Active discussion forum for CICS recovery questions. Topics frequently include indoubt resolution procedures, ARM configuration, and emergency restart troubleshooting.
-
URL: https://community.ibm.com/community/user/ibmz-and-linuxone/groups/topic-home?CommunityKey=cics
-
IBM Support — CICS Recovery APARs — Search for APARs related to CICS recovery to stay current on known issues and fixes. Recovery-related APARs are among the most critical to apply promptly.
-
URL: https://www.ibm.com/support/pages/cics-transaction-server
-
z/OS System Logger Best Practices — IBM's guidance on configuring z/OS Logger for CICS system logs, including coupling facility structure sizing, staging dataset configuration, and offloading policy.
Certification
-
IBM Certified System Administrator — CICS Transaction Server V5.3 (Exam C5050-062) — The "Recovery and Restart" domain covers system log configuration, emergency restart, indoubt resolution, and ARM integration. This chapter's content maps directly to exam objectives in this domain.
-
IBM Certified Solution Developer — CICS Transaction Server V5.x — The "Transaction Management" domain covers syncpoint processing, two-phase commit, and recovery-aware application design (idempotency, retry logic). The application-level recovery patterns from Section 18.6 are relevant.