Further Reading — Chapter 27

IBM Documentation

SMF Records

  • z/OS MVS System Management Facilities (SMF) (SA38-0667) — The definitive reference for all SMF record types. Chapter on Type 30 records is essential reading for anyone writing SMF analysis programs. Pay particular attention to the self-defining section format and the mapping macros (IFASMFR7, SMF30WRK).
  • z/OS MVS System Management Facilities Record Descriptions — Companion volume with detailed field layouts for every SMF record type. Keep this bookmarked; you'll reference it constantly when writing SMF analysis code.

Automation and Monitoring

  • IBM System Automation for z/OS User's Guide (SC34-2710) — Covers SA z/OS automation policy design, message automation, and resource monitoring. The chapter on automation policy objects is particularly relevant to batch monitoring rule design.
  • IBM System Automation for z/OS Programmer's Reference (SC34-2712) — For writing custom automation routines that integrate with SA z/OS.
  • z/OS JES2 Initialization and Tuning Guide (SA32-0991) — Chapter on spool management and monitoring is relevant to JES spool utilization alerting.

Workload Management Integration

  • z/OS MVS Planning: Workload Management (SY38-0686) — Revisit Chapter 5's WLM concepts with specific attention to how batch service classes interact with monitoring thresholds. Section on WLM reporting and SMF data collection is directly applicable.
  • z/OS Workload Management Services Guide and Reference — API documentation for programmatic access to WLM data from COBOL programs.

Vendor Documentation

Scheduling and Automation Products

  • BMC CONTROL-M Documentation — CONTROL-M's monitoring and alerting capabilities, including SLA management, critical path analysis, and automated recovery configuration. The "Monitoring and Alerting" section of the administration guide covers threshold configuration in detail.
  • Broadcom (CA) AutoSys/CA-7 Documentation — CA-7's batch monitoring features, including job-level alerting, predecessor tracking, and automated restart. The CA-OPS/MVS integration guide covers message automation rules.
  • Broadcom CA OPS/MVS Event Management and Automation — Comprehensive guide to writing automation rules for console message processing. The rule syntax reference and the chapter on notification integration are directly applicable to the alerting architecture described in this chapter.
  • IBM Workload Automation (formerly TWS) — Monitoring capabilities including SLA management, dependency tracking, and event-driven automation.

Monitoring Platforms

  • BMC MainView for z/OS — Real-time monitoring of batch job performance, resource utilization, and system health. Provides the dashboard capabilities described in Section 27.3.
  • Broadcom SYSVIEW Performance Management — Comprehensive z/OS performance monitoring with historical data analysis. Useful for the Level 4 trend analysis described in this chapter.
  • IBM OMEGAMON for z/OS — Real-time and historical performance monitoring. The OMEGAMON data warehouse feature supports the trend analysis and capacity planning use cases.

Books

  • z/OS Basics by Mike Ebbers et al. (IBM Redbook SG24-6366) — Chapter on system management provides an accessible introduction to SMF, WLM, and system monitoring for those who need a refresher on the fundamentals before diving into the advanced material in this chapter.
  • System Programmer's Guide to: z/OS System Logger (SG24-6898) — Relevant to understanding how automation products log events and how to build searchable incident repositories.
  • ABCs of z/OS System Programming Volume 9 (SG24-6989) — Covers SMF in depth, including record formats, collection configuration, and analysis techniques. Good bridge between the IBM reference manuals and practical implementation.
  • Site Reliability Engineering: How Google Runs Production Systems by Betsy Beyer et al. (O'Reilly, 2016) — While focused on distributed systems rather than mainframes, the chapters on monitoring, alerting, and incident response articulate principles that apply directly to batch operations. The concept of error budgets translates naturally to batch window slack management. The chapter on being on-call is essential reading for anyone designing on-call rotation and escalation procedures.
  • The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford (IT Revolution Press, 2013) — A novel about IT operations that illustrates many of the organizational and cultural challenges of batch monitoring, change management, and incident response. The "Brent problem" (organizational dependency on a single expert) maps directly to the Rob dependency described in this chapter.
  • Incident Management for Operations by Rob Schnepp, Ron Vidal, and Chris Hawley (O'Reilly, 2017) — Practical guide to building incident management processes, escalation procedures, and post-incident reviews. Translates well from general IT operations to mainframe-specific batch operations.
  • Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, Jez Humble, and Gene Kim (IT Revolution Press, 2018) — Research-based evidence that monitoring, measurement, and feedback loops drive performance improvement. The MTTR metric discussed in this chapter is one of the four key metrics identified in the DORA research.

Technical Articles and Papers

  • "SMF Type 30: A Deep Dive" — Search IBM Z Content Solutions for detailed technical articles on parsing and analyzing SMF type 30 records. The self-defining section parsing technique shown in this chapter is covered in greater detail.
  • "Batch Window Optimization Using SMF Data" — IBM Systems Magazine periodically publishes articles on using SMF data for batch performance analysis and optimization.
  • "Automating z/OS Operations: Best Practices" — Broadcom and BMC both publish whitepapers on automation rule design, including the alert-tier approach described in this chapter.
  • SHARE Conference Proceedings — The annual SHARE conference frequently includes sessions on batch monitoring, automation, and operational excellence. Search the SHARE proceedings archive for presentations on batch operations, SMF analysis, and automated recovery.

Online Resources

  • IBM Z and LinuxONE Community (community.ibm.com) — Active forums where mainframe professionals discuss batch monitoring strategies, SMF analysis techniques, and automation approaches. The "Operations and Automation" forum is particularly relevant.
  • Planet Mainframe (planetmainframe.com) — Industry news and technical articles on mainframe operations, including batch monitoring and automation topics.
  • Mainframe DEV (mainframedev.com) — Technical tutorials and code examples for mainframe development, including COBOL programs for SMF processing.
  • IBM Redbooks (redbooks.ibm.com) — Searchable library of IBM technical publications. Search for "SMF," "batch monitoring," and "automation" for relevant Redbooks and Redpapers.
  • Chapter 1 (COBOL Foundations) — Hard prerequisite. The COBOL programs in this chapter assume familiarity with file handling, working storage, and basic program structure.
  • Chapter 5 (Workload Management) — Soft prerequisite. WLM service classes determine batch job dispatching priority and resource allocation, which directly affects monitoring thresholds and performance analysis.
  • Chapter 23 (Batch Window Architecture) — Hard prerequisite. The batch window SLA milestones monitored in this chapter are designed in Chapter 23.
  • Chapter 24 (Checkpoint/Restart) — Soft prerequisite. Automated restart from checkpoint (a key self-healing mechanism) depends on the checkpoint/restart infrastructure covered in Chapter 24.
  • Chapter 25 (Batch JCL and Resource Management) — Soft prerequisite. JCL conditional execution, COND parameters, and IF/THEN/ELSE constructs used in error routing are covered in Chapter 25.
  • Chapter 26 (Batch Data Management) — Soft prerequisite. Dataset allocation, VSAM management, and GDG concepts referenced in space monitoring and data validation are covered in Chapter 26.