Chapter 5 Exercises: z/OS Workload Manager
Part A: Conceptual Foundations (Exercises 1–6)
Exercise 1: Goal-Based vs. Static Priority Management
Explain in your own words why z/OS moved from static dispatching priorities (compatibility mode) to WLM goal-based management (goal mode). Your answer should address:
a) What specific limitations of static priorities made them inadequate for modern workloads? b) How does goal-based management handle the scenario where a CICS region needs maximum resources at 2:00 PM but minimal resources at 2:00 AM? c) Why would a system running in compatibility mode "leave performance on the table"?
Exercise 2: Service Class Design Reasoning
Continental National Bank has thirty service classes. A junior sysprog proposes consolidating to eight service classes to "simplify management." Kwame Mensah pushes back.
a) What is the minimum number of service classes a production shop like CNB realistically needs, and why? b) What problems would arise from having too few service classes? c) What problems would arise from having too many (e.g., 60+)? d) What is the sweet spot, and how do you determine it?
Exercise 3: Performance Index Interpretation
Given the following RMF data snapshot, answer the questions below:
SERVICE TRANS AVG ---GOAL--- PERF IMP
CLASS COUNT RESP TYPE VALUE INDEX LVL
-------- ------ ------ ---- ------ ------ ---
SVCCLS01 8,200 0.18 RT 0.20 0.90 1
SVCCLS02 42,100 0.52 RT 0.25 2.08 1
SVCCLS03 380 N/A VEL 50% 0.65 2
SVCCLS04 45 N/A VEL 30% 3.20 3
SVCCLS05 12 N/A VEL N/A N/A 5
a) Which service class is performing best relative to its goal? How do you know? b) Which service class has the most serious performance problem? Why is it serious? c) SVCCLS04 has a PI of 3.20 — should the operations team be alarmed? Under what conditions would this be acceptable? d) What would happen to SVCCLS04's PI if the work in SVCCLS02 were reclassified to importance 2? e) Why does SVCCLS05 have no PI?
Exercise 4: Classification Rule Ordering
Explain why classification rules are evaluated top-down with first-match-wins semantics. Consider the following rule set:
JES Subsystem:
Rule 1: Job Name: * → BATCHSTD
Rule 2: Job Name: EOD* → BATCHCRT
Rule 3: Job Name: REG* → BATCHCRT
a) What is wrong with this rule set? b) How should it be corrected? c) What is the operational impact of this error? d) How would you detect this error if it were deployed to production?
Exercise 5: Importance Level Strategy
You are designing a WLM service definition for a mid-size insurance company. The CIO tells you: "Everything is important — put it all at importance 1." Explain:
a) Why this defeats the purpose of WLM b) What the actual behavior would be if all work were at importance 1 c) How you would guide the CIO through a prioritization exercise d) What business-level questions you would ask to establish the importance hierarchy
Exercise 6: Multi-Period Service Classes
Design a two-period service class for CICS transactions at Pinnacle Health Insurance. The requirements are:
- Normal claims adjudication transactions should respond in under 0.5 seconds
- A small percentage of transactions encounter complex cases requiring 5-10 seconds of processing
- These long-running transactions should not consume resources at the same priority as fast transactions
- But they must still complete — they cannot be starved entirely
Define the period durations, goal types, and importance levels for each period. Justify your choices.
Part B: CICS and WLM Integration (Exercises 7–11)
Exercise 7: CICS Transaction Classification Strategy
SecureFirst Retail Bank runs three CICS production regions. Yuki Nakamura wants to classify transactions for WLM. Compare and contrast the two approaches:
a) Classification by transaction name in WLM b) Classification by CICS transaction class in WLM
For each approach, provide: (1) an example rule set for 4 transaction types, (2) the process for adding a new transaction type, and (3) the scenario where this approach is superior.
Exercise 8: LINK vs. START and WLM Priority
A COBOL developer at Federal Benefits Administration writes a claims processing transaction that performs four steps:
- Validate the claim (EXEC CICS LINK to CLMVAL)
- Check eligibility (EXEC CICS LINK to ELIGCHK)
- Calculate benefit (EXEC CICS LINK to BENCALC)
- Write audit record (EXEC CICS START TRANSID('AUDT'))
a) How many WLM classifications occur for this transaction flow? b) If the main transaction is classified as CICSHIGH (importance 1) and transaction AUDT is classified as CICSINTN (importance 2), explain the priority behavior for each step. c) Why is START appropriate for the audit write but not for the benefit calculation? d) What would happen if step 3 (BENCALC) were changed from LINK to START?
Exercise 9: CICS Region Separation Strategy
CNB runs four CICS production regions (CICSPRD1-4). Explain the WLM-related reasons for separating CICS workloads into multiple regions rather than running everything in one large region. Your answer should address:
a) Region dispatching priority elevation b) Isolation of runaway transactions c) WLM classification granularity d) Sysplex workload balancing
Exercise 10: Transaction Response Time Analysis
A CICS transaction at Pinnacle Health has a WLM goal of 0.3 seconds but is averaging 1.2 seconds. The RMF data shows:
- PI = 1.1 (slightly above goal — but the goal is 0.3s, actual is 1.2s... something does not add up)
- Dispatching priority = 208 (high, consistent with importance 1)
- CPU service = normal
- I/O service = 4x normal
a) Why does the PI show only 1.1 when actual response time is 4x the goal? (Hint: think about what "average" means in the PI calculation.) b) Where is the real bottleneck? Is this a WLM problem? c) What additional data would you request to diagnose the root cause? d) What team should investigate this — the WLM team, the DBA team, or the storage team?
Exercise 11: WLM Impact on CICS MRO/ISC
When CICS regions communicate via MRO (Multi-Region Operation) or ISC (Inter-System Communication), a transaction in Region A may route to Region B for processing.
a) How does WLM classify the work in Region B — does it inherit Region A's classification or receive a new one? b) What are the performance implications if Region A is at importance 1 and Region B's work is classified at importance 3? c) How would you design the WLM rules to ensure consistent priority across the MRO/ISC chain?
Part C: Batch and WLM (Exercises 12–17)
Exercise 12: WLM-Managed Initiators vs. Static Initiators
A colleague argues that static JES2 initiators give you "more control" over batch execution. Respond to each claim:
a) "With static initiators, I know exactly how many jobs can run concurrently." b) "Static initiators let me reserve capacity for high-priority jobs." c) "WLM-managed initiators might start too many initiators and overwhelm the system." d) Under what very rare circumstances might static initiators actually be appropriate?
Exercise 13: Batch Window Optimization
Rob Calloway at CNB has a 4-hour batch window (11:00 PM to 3:00 AM) and the following jobs:
| Job | Depends On | Estimated Duration | Business Priority |
|---|---|---|---|
| EODSETL1 | (none) | 45 min | Critical — settlement |
| EODSETL2 | EODSETL1 | 30 min | Critical — settlement |
| REGFED01 | EODSETL2 | 60 min | Critical — regulatory |
| INTRST01 | EODSETL2 | 90 min | Standard — interest calc |
| STMTGEN1 | INTRST01 | 45 min | Standard — statements |
| RPTMGMT1 | (none) | 120 min | Low — management reports |
| EXTDATA1 | (none) | 60 min | Low — data extract |
a) Identify the critical path (longest chain of dependent jobs). b) What is the total critical path duration? c) Which jobs can run in parallel with the critical path? d) Design a WLM service class assignment for each job that maximizes the probability of completing within the 4-hour window. e) What happens if EODSETL1 runs 30 minutes over its estimate?
Exercise 14: Scheduling Environment Design
Design WLM scheduling environments for the following batch workload categories at Federal Benefits Administration:
- Eligibility recalculation (requires access to DB2 production, must not run during peak online hours)
- Annual enrollment batch (runs only during open enrollment period, November 1-December 15)
- Data archive (requires access to tape drives, can run anytime)
- Emergency reprocessing (must be able to run immediately, regardless of other workloads)
For each, specify: (1) scheduling environment name, (2) associated service class, (3) resource requirements, and (4) activation conditions.
Exercise 15: Enclave Design for Stored Procedures
Sandra Chen at Federal Benefits Administration has a batch job (ELIGCALC) that calls DB2 stored procedure CALC_ELIGIBILITY 15 million times per run. The batch job is classified as BATCHCRT (importance 2), but the stored procedure runs in a WLM-managed SPAS classified at importance 3.
a) Explain why this mismatch degrades performance. b) Calculate the approximate performance impact if the stored procedure accounts for 60% of the job's total elapsed time and importance-3 work receives 30% less CPU than importance-2 work. c) Design the correct WLM configuration to resolve this issue. d) What DB2 DDL change is required to implement your solution?
Exercise 16: Month-End Batch Strategy
At CNB, month-end processing includes 2,400 additional jobs on top of the regular 800 nightly jobs. Rob Calloway needs a WLM strategy that:
- Ensures month-end critical path completes by 3:00 AM
- Allows regular nightly batch to complete, even if delayed
- Keeps wire transfer processing at maximum priority
- Does not require manual intervention
Design the MONTHEND service policy, including: (1) service class assignments, (2) importance levels for each, (3) the policy activation mechanism, and (4) a fallback plan if the critical path is at risk of missing the 3:00 AM deadline.
Exercise 17: Batch Job Name Convention Design
Design a job naming convention for a mainframe shop that enables effective WLM classification by job name prefix. Your convention must accommodate:
- 8-character JES job name limit
- 5 business applications
- 3 priority tiers (critical, standard, low)
- Daily, weekly, and monthly frequencies
- Test and production differentiation
Provide: (1) the naming convention template, (2) five example job names with explanations, (3) the corresponding WLM classification rules, and (4) the process for onboarding a new application.
Part D: Diagnostic and Analysis (Exercises 18–22)
Exercise 18: RMF Report Analysis
Analyze the following RMF Workload Activity Report from a 15-minute interval during CNB's batch window:
SERVICE TRANS AVG ---GOAL--- PERF AVG ---USING%---
CLASS COUNT RESP TYPE VALUE INDEX DPRTY CPU STR I/O
-------- ------ ------ ---- ------ ------ ------ ---- ---- ----
CICSHIGH 312 0.095 RT 0.100 0.95 212 2.1 1.4 1.8
CICSPROD 2,840 0.310 RT 0.250 1.24 198 11.2 5.8 8.2
DB2PROD 890 0.620 RT 0.500 1.24 195 4.8 3.2 6.1
BATCHCRT 8 N/A VEL 50.0% 1.68 188 32.4 18.2 28.4
BATCHSTD 22 N/A VEL 30.0% 2.45 145 14.2 12.8 22.1
RPTPROD 3 N/A VEL 40.0% 3.10 132 8.1 6.4 14.2
a) What is the total CPU utilization of the LPAR during this interval? b) Which service classes are meeting their goals? c) CICSPROD has a PI of 1.24 during what should be the batch window. Is this expected? What might be causing it? d) BATCHCRT has a PI of 1.68. Given that this is the batch window and batch should have elevated priority, what is likely happening? e) What specific investigation would you launch based on this data?
Exercise 19: SMF Type 72 Processing
You need to write a COBOL program that reads SMF type 72 subtype 3 records to produce a daily WLM performance summary report. Design:
a) The record layout (copybook) for the key fields you would extract b) The PROCEDURE DIVISION logic for calculating average PI by service class over a 24-hour period c) The output report format d) How you would handle the transition between service policies (e.g., DAYTIME to BATCHWIN) in your calculations
Exercise 20: Performance Triage Flowchart
A production batch job (EODSETL1) at CNB has been running for 2 hours. Its normal elapsed time is 45 minutes. Walk through the WLM diagnostic flowchart from Section 5.7 and answer:
a) What is the first data point you check? b) If the PI is 0.8 (below goal), what does that tell you? c) If the PI is 2.5 (well above goal), what do you check next? d) If the dispatching priority is 125 when it should be ~188, what is the most likely cause? e) If everything looks correct from a WLM perspective, what non-WLM causes should you investigate?
Exercise 21: Capacity Planning with WLM Data
Using the following trend data showing BATCHCRT service class performance over six months, project when CNB will need additional capacity:
| Month | Avg PI | Peak PI | CPU Service (M) | Trans Count |
|---|---|---|---|---|
| Jan | 1.05 | 1.45 | 142 | 28,400 |
| Feb | 1.08 | 1.52 | 148 | 29,200 |
| Mar | 1.12 | 1.61 | 155 | 30,800 |
| Apr | 1.18 | 1.78 | 164 | 32,100 |
| May | 1.25 | 1.92 | 172 | 33,500 |
| Jun | 1.31 | 2.08 | 181 | 34,900 |
a) Calculate the monthly growth rate for CPU service and transaction count. b) At what PI threshold should the team begin capacity planning action? c) Project when the average PI will reach that threshold if the trend continues. d) What are the options besides adding MIPS? List at least three.
Exercise 22: Cross-Subsystem WLM Interaction
A single customer transaction at CNB flows through the following subsystems:
- CICS (front-end transaction) → classified CICSHIGH
- MQ (message to backend) → classified MQPROD
- CICS (backend processing) → classified CICSPROD
- DB2 (database update) → classified DB2PROD
- MQ (confirmation message) → classified MQPROD
a) Draw the priority transitions as this transaction flows through each subsystem. b) Identify potential priority inversions (where a downstream step runs at lower priority than an upstream step that is waiting for it). c) How would you redesign the classification rules to ensure consistent priority treatment for this end-to-end flow? d) What is the trade-off of putting all steps at importance 1?
Part E: Design Challenges (Exercises 23–25)
Exercise 23: Greenfield WLM Design
You are the architect for a new Parallel Sysplex installation supporting a government health exchange. The system must handle:
- Real-time eligibility verification (200 TPS, < 1 second response)
- Batch enrollment processing (500,000 records nightly)
- Document generation (200,000 PDFs nightly)
- Provider payment batch (monthly, 2M payments)
- Ad-hoc reporting for compliance auditors
- API gateway for state marketplace integration (100 TPS, < 2 second response)
Design a complete WLM service definition including: (1) all service classes with goals, (2) importance levels, (3) classification rules, (4) at least two service policies, and (5) workloads for reporting.
Exercise 24: WLM Migration from Compatibility Mode
Federal Benefits Administration has a legacy LPAR still running in WLM compatibility mode. Sandra Chen needs to migrate it to goal mode. Design a migration plan that includes:
a) An inventory of the current IPS/IAS dispatching priorities and the mapping to service classes b) A testing strategy that minimizes risk c) A rollback plan d) Success criteria for the migration e) The communication plan (who needs to know, when, and what)
Exercise 25: WLM for Hybrid Workloads
SecureFirst Retail Bank is deploying a new mobile banking platform. The architecture includes:
- z/OS CICS for core banking transactions
- z/OS DB2 DDF for API access
- zCX containers running API gateway middleware
- MQ for asynchronous communication
- Batch for EOD processing
Carlos Vega and Yuki Nakamura need a WLM design that treats the entire mobile banking flow — from API gateway to CICS to DB2 and back — as a single workload with consistent priority treatment.
Design the WLM service definition, addressing: (1) zCX address space classification, (2) DB2 DDF classification by connection source, (3) MQ message priority, (4) end-to-end response time monitoring, and (5) the policy switch between business hours and batch window.
Part M: Spaced Review — Integration with Chapters 1–4
Exercise M1: z/OS Architecture and WLM (Chapter 1)
In Chapter 1, we described z/OS as a collection of cooperating subsystems. Explain how WLM's classification rules reflect this subsystem architecture. Specifically:
a) How does WLM's subsystem-type classifier (CICS, DB2, JES, STC, TSO) map to the z/OS subsystem model from Chapter 1? b) Why does WLM need to classify work differently for each subsystem type? c) Draw a diagram showing how a single customer transaction interacts with multiple subsystems and how WLM classifies each interaction.
Exercise M2: Address Space Sizing and WLM (Chapter 2)
Chapter 2 covered address space sizing and region configuration. Connect that knowledge to WLM:
a) How does the size of a CICS region (REGION parameter) affect WLM's ability to manage its priority? b) If a batch job's REGION parameter is too small and the job ABENDs with S878, how does this appear in WLM data? c) How does WLM's storage management bias interact with the LE HEAP and STACK settings from Chapter 3?
Exercise M3: LE Overhead and WLM Diagnostics (Chapter 3)
In Chapter 3, we discussed Language Environment runtime overhead. Connect that to WLM performance data:
a) A COBOL batch program has a WLM PI of 0.9 (meeting its velocity goal) but elapsed time has increased by 20% after an LE runtime option change. Explain how this is possible. b) What WLM metric would show that LE overhead is consuming additional CPU service? c) How would you use RMF data to differentiate between CPU-bound overhead (possibly LE) and I/O-bound delays?
Exercise M4: Cross-Chapter Scenario
At Federal Benefits Administration, Marcus Whitfield (the retiring SME) tells Sandra Chen that a critical eligibility batch job "always ran fine until they changed something last month." Sandra investigates and finds:
- The COBOL program was recompiled with Enterprise COBOL V6.4 (previously V5.2)
- The LE runtime options were updated to use 64-bit addressing for AMODE 64
- The WLM service definition was modified to add three new service classes
- The DB2 subsystem was migrated to a new version
Using knowledge from Chapters 1–5, design a diagnostic approach. Which of these four changes could affect batch elapsed time, and in what order would you investigate them?
Exercise M5: Integrated Performance Analysis
You receive the following problem report: "CICS transaction ACCTINQ response time degraded from 0.15 seconds to 0.85 seconds after last weekend's maintenance."
Using concepts from all five chapters, outline a systematic investigation:
a) Chapter 1 concepts: What subsystem interactions could contribute? b) Chapter 2 concepts: What address space or LPAR configuration changes should you check? c) Chapter 3 concepts: What LE runtime changes could affect CICS transaction performance? d) Chapter 4 concepts: (Reference whatever Chapter 4 covered — assume dataset or VSAM-related) e) Chapter 5 concepts: What WLM data do you check first, and what would each finding tell you?