38 min read

> "In a world that never sleeps, the database cannot afford to. Data sharing is how DB2 on z/OS stays awake — and stays consistent — across multiple systems simultaneously."

Chapter 28: Data Sharing and Parallel Sysplex — Multi-Member DB2 on z/OS

"In a world that never sleeps, the database cannot afford to. Data sharing is how DB2 on z/OS stays awake — and stays consistent — across multiple systems simultaneously."

When a bank processes millions of transactions per day, a single DB2 subsystem — no matter how powerful — becomes a single point of failure and a scalability ceiling. The z/OS platform addresses this challenge through one of the most sophisticated clustering technologies ever engineered: DB2 data sharing on the IBM Parallel Sysplex. In this chapter, we explore how multiple DB2 members share concurrent read/write access to the same data, how coupling facility structures maintain coherency, and how the entire architecture delivers the continuous availability that mission-critical enterprises demand.

This is z/OS territory. If your Meridian National Bank deployment runs on Linux, Unix, or Windows, Chapter 29 covers the corresponding HA technologies (HADR, pureScale) for that platform. But even if LUW is your primary environment, understanding data sharing deepens your appreciation for distributed database coherency — concepts that echo across every clustered database system.


28.1 What Is Data Sharing?

The Core Concept

Data sharing is a DB2 for z/OS feature that allows multiple DB2 subsystems — called members — to read and write the same set of databases concurrently. Every member has full read/write access to every table, index, and tablespace in the shared data. There is no partitioning of ownership, no "this member owns table X while that member owns table Y." Every member can do everything, at any time.

This is fundamentally different from a shared-nothing architecture (like DB2 DPF on LUW or most MPP databases), where each node owns a partition of the data and queries are decomposed across partitions. In data sharing, the data is truly shared — the same physical DASD (direct-access storage device) volumes are accessible from every member, and DB2's internal protocols ensure that concurrent access remains serialized and coherent.

Why Banks Need Data Sharing

Consider Meridian National Bank's z/OS environment. The bank operates:

  • Online banking serving millions of customers via CICS transactions
  • ATM networks processing withdrawals and balance inquiries 24/7
  • Wire transfer systems with strict regulatory deadlines (Fedwire, SWIFT)
  • Batch processing for end-of-day settlement, interest calculation, and regulatory reporting
  • Real-time fraud detection that must analyze transactions as they occur

A single DB2 subsystem can be extraordinarily powerful — a modern z15 or z16 LPAR can drive tens of thousands of transactions per second. But a single subsystem means:

  1. Single point of failure. If DB2 comes down, everything stops.
  2. Maintenance windows. Applying PTFs (program temporary fixes) to DB2 requires an outage.
  3. Capacity ceiling. One LPAR has finite CPU, memory, and thread limits.
  4. Workload interference. Batch jobs compete with online transactions for the same resources.

Data sharing solves all four problems:

Problem Data Sharing Solution
Single point of failure If one member fails, others continue processing
Maintenance windows Apply maintenance to one member at a time (rolling)
Capacity ceiling Add members to scale horizontally
Workload interference Route batch to one member, online to another

A Brief History

Data sharing was introduced with DB2 Version 4 in 1994, coinciding with the introduction of the Parallel Sysplex hardware. IBM invested billions of dollars in the coupling facility technology that makes data sharing possible. Over three decades, the technology has matured through every DB2 version, with each release improving performance, reducing coupling facility overhead, and expanding capacity. Today, data sharing groups with 32 members (the architectural maximum) are in production at the world's largest banks, airlines, and government agencies.

Data Sharing vs. Other Clustering Approaches

It is worth pausing to compare data sharing with other database clustering approaches, because the terminology can be confusing:

Shared-Disk (Data Sharing): All members read/write the same physical disks. A coordination layer (the coupling facility) ensures coherency. DB2 data sharing and Oracle RAC follow this model. The advantage is that any member can access any data immediately. The disadvantage is the coherency overhead.

Shared-Nothing (DB2 DPF, most MPP databases): Each node owns a partition of the data. Queries that span partitions require inter-node communication. The advantage is excellent scalability for analytic workloads. The disadvantage is that cross-partition transactions are expensive, and a node failure makes its partition temporarily unavailable.

Shared-Everything with Replication (HADR, standby databases): One primary accepts writes; standbys receive replicated copies. The advantage is simplicity. The disadvantage is that only one node is active for writes at any time — it is an active-passive model, not active-active.

Data sharing occupies the unique position of being both active-active (all members write simultaneously) and fully consistent (the coupling facility ensures ACID properties across all members). This is why it remains the technology of choice for the highest-tier banking workloads on z/OS.

The Economics of Data Sharing

Adopting data sharing is not a trivial decision. It requires:

  • Coupling facility hardware — dedicated processors and memory, typically 10-20% of the total system capacity
  • CF links — high-speed connections between every CPC and every CF
  • Operational complexity — managing multiple DB2 members, monitoring CF health, tuning GBPs and lock structures
  • Specialized skills — data sharing DBAs need deep knowledge of CF internals, XES protocols, and workload balancing

The return on investment comes from:

  • Eliminated downtime costs — for a major bank, one hour of unplanned downtime can cost millions of dollars in lost revenue, regulatory penalties, and reputational damage
  • Deferred hardware upgrades — adding a member is often cheaper than upgrading to a larger CPC
  • Workload flexibility — batch and online workloads can be isolated without separate databases

28.2 Parallel Sysplex Architecture

Data sharing does not exist in isolation. It depends on the Parallel Sysplex — an integrated hardware and software architecture that allows multiple z/OS systems to cooperate as a single logical entity.

Components of a Parallel Sysplex

Central Processor Complexes (CPCs) and LPARs

A CPC is a physical mainframe — a z16, z15, or z14 machine. Each CPC can be divided into multiple Logical Partitions (LPARs), each running its own z/OS image. A DB2 data sharing member runs within a single LPAR. Different members typically run on different LPARs, often on different physical CPCs for hardware isolation.

Physical CPC-1 (z16)              Physical CPC-2 (z16)
┌──────────┬──────────┐          ┌──────────┬──────────┐
│  LPAR-A  │  LPAR-B  │          │  LPAR-C  │  LPAR-D  │
│  z/OS-1  │  z/OS-2  │          │  z/OS-3  │  z/OS-4  │
│  DB2-M1  │  (other) │          │  DB2-M2  │  DB2-M3  │
└──────────┴──────────┘          └──────────┴──────────┘

The Coupling Facility (CF)

The coupling facility is the technological heart of data sharing. It is a specialized processor — either a dedicated LPAR on a CPC or a standalone machine — that provides three critical services:

  1. High-speed shared memory — accessible from every z/OS system in the sysplex
  2. Lock management — a hardware-assisted global lock manager
  3. List and cache structures — used by DB2 and other exploiters for shared state

The coupling facility connects to CPCs via Coupling Facility Links (CF links), which are high-speed point-to-point connections. Modern InfiniBand CF links provide latencies in the single-digit microsecond range — far faster than any network-based communication.

       ┌────────────────────┐
       │  Coupling Facility │
       │  ┌──────────────┐  │
       │  │ Lock Structure│  │
       │  ├──────────────┤  │
       │  │     SCA      │  │
       │  ├──────────────┤  │
       │  │    GBP0      │  │
       │  ├──────────────┤  │
       │  │    GBP1      │  │
       │  ├──────────────┤  │
       │  │    ...       │  │
       │  └──────────────┘  │
       └─────┬───────┬──────┘
        CF Link    CF Link
             │       │
      ┌──────┘       └──────┐
      │                     │
  ┌───┴───┐            ┌───┴───┐
  │ CPC-1 │            │ CPC-2 │
  │DB2-M1 │            │DB2-M2 │
  └───────┘            └───────┘

The Sysplex Timer

All z/OS images in a Parallel Sysplex synchronize their clocks via the Sysplex Timer (STP — Server Time Protocol) or the older External Time Reference (ETR). This ensures that timestamps across all members are consistent — critical for log record ordering, commit sequence numbering, and deadlock resolution.

The sysplex timer guarantees clock synchronization to within one microsecond across all connected systems. Without this, DB2 could not reliably determine the ordering of concurrent updates from different members.

Cross-System Coupling Facility (XCF)

XCF is the z/OS component that manages communication between systems in the sysplex. It provides:

  • Group services — members can form named groups and be notified when members join or leave
  • Signaling services — systems can send messages to each other
  • Monitoring — z/OS can detect when a system or member has failed

DB2 data sharing uses XCF extensively. The data sharing group is an XCF group. When a member starts, it joins the XCF group. When a member fails, XCF notifies the surviving members so they can initiate recovery.

Shared DASD

All members in a data sharing group access the same physical disk volumes. On z/OS, this means the same DASD (Direct Access Storage Device) subsystems — typically IBM DS8000 series storage. The storage subsystem is connected to every CPC via FICON channels, giving every LPAR — and thus every DB2 member — access to the same tablespace datasets, index datasets, log datasets, and bootstrap datasets.

There is no data replication between members. They all read and write the same bytes on the same disks. This is a shared-disk architecture, and the coupling facility is what makes it work without corruption.

The performance of data sharing depends critically on the speed of CF links. Over the generations of mainframe hardware, IBM has offered several link technologies:

Link Type Bandwidth Latency Distance Notes
ISC (Inter-System Channel) 1 Gbps 5-10 us 10 km Older fiber-based links
IC (Internal Coupling) 6+ Gbps 1-3 us Same CPC only Fastest; CF LPAR on same CPC
ICA-SR (InfiniBand) 6 Gbps 3-7 us 150 m Short-range InfiniBand
12x InfiniBand 6 Gbps 3-5 us 150 m z14 and later
Coupling Express LR 10+ Gbps 5-15 us 10 km z15/z16 long-range

For Meridian Bank's configuration, Internal Coupling (IC) links are used between each CPC and its local CF LPAR, providing sub-3-microsecond service times. For cross-CPC CF access (CPC-1 to CF-2 on CPC-2), InfiniBand links are used with latencies of 3-7 microseconds.

The CF link configuration directly impacts the CFSERVICETIME metric — the time to complete a single CF operation. If this metric degrades, every data sharing operation (locking, GBP reads, GBP writes, castout) slows down proportionally. This is why CF link health monitoring is a top-tier operational priority.

Coupling Facility Duplexing

A single coupling facility is a single point of failure. If it fails, all CF structures (lock structure, SCA, GBPs) are lost, and the data sharing group must perform a group restart — a time-consuming process.

Duplexing eliminates this risk by maintaining identical copies of every CF structure on two separate coupling facilities. Every CF operation is performed on both copies simultaneously (system-managed duplexing) or sequentially (user-managed duplexing). If one CF fails, the other CF already has the complete, up-to-date copy of every structure.

The two duplexing modes:

  • System-managed duplexing: z/OS manages the duplexing automatically. Every CF operation writes to both copies. This is the recommended mode for all production data sharing environments.
  • User-managed duplexing: The user (or automation) manages rebuild and failover. Less overhead but more operational risk.

Duplexing approximately doubles the CF processing load because every operation is performed twice. This must be factored into CF sizing. For Meridian Bank, the CF LPARs are provisioned with sufficient capacity to handle the duplexed workload plus headroom for peak periods.


28.3 DB2 Data Sharing Groups

Group Definition and Member Naming

A data sharing group is defined by:

  • Group name — up to 8 characters, e.g., DSNDBGP
  • Group attach name — the name that applications use to connect, e.g., DBANKGRP
  • Member names — each DB2 subsystem in the group has a unique 4-character SSID (subsystem ID), e.g., DB1A, DB1B, DB1C

The group attach name is critical for application transparency. Applications connect to the group attach name, not to a specific member. The workload balancing layer (WLM, sysplex distributor, or VTAM generic resources) routes the connection to an available member. If one member is down, connections automatically go to surviving members.

-- Application connects to group attach name
-- CONNECT TO DBANKGRP
-- WLM routes to DB1A, DB1B, or DB1C based on policy

Starting a Data Sharing Group

Creating a data sharing group involves:

  1. Install the first member using the DB2 installation CLIST or JCL
  2. Enable data sharing in the DSNTIP1 installation panel (DATA SHARING = YES)
  3. Specify group name and member name in the installation panels
  4. Define coupling facility structures in the CFRM (Coupling Facility Resource Management) policy
  5. Start the first member — it creates the group
  6. Install and start additional members — they join the existing group

Each member has its own: - BSDS (bootstrap dataset) — contains log inventory and checkpoint information - Active log datasets — each member writes its own recovery log - Archive log datasets — each member's logs are archived independently - Work files (TEMP database) — each member has its own temporary tablespace - EDM pool, buffer pools, RID pool — each member has its own memory structures

But all members share: - The DB2 catalog (DSNDB06) and directory (DSNDB01) - All user tablespaces and indexes - The coupling facility structures

Group Buffer Pools

In a non-data-sharing DB2 subsystem, buffer pools are simple: DB2 caches pages in memory, reads from disk on a miss, and writes dirty pages back to disk. In data sharing, a fundamental problem arises: if member M1 updates a page in its local buffer pool, how does member M2 know that its cached copy of that page is now stale?

The answer is the Group Buffer Pool (GBP). A GBP is a cache structure in the coupling facility that serves as an intermediary between members' local buffer pools and DASD. When a member updates a page in a tablespace that is GBP-dependent, it writes the updated page to the GBP. When another member needs that page, it can read the current version from the GBP rather than getting a stale copy from its local buffer pool.

GBPs are numbered from GBP0 to GBP49, corresponding to buffer pool numbers BP0 through BP49 (and similarly for 8K, 16K, and 32K page sizes). Each local buffer pool maps to one GBP:

Local Buffer Pool Group Buffer Pool
BP0 GBP0
BP1 GBP1
BP8K0 GBP8K0
BP16K0 GBP16K0
BP32K GBP32K

Lock Structure

The lock structure is a coupling facility structure that manages global locking across all members. Every lock request that requires inter-system serialization goes through the lock structure. The lock structure contains:

  • Lock table — a hash table where lock conflicts are detected
  • Lock names — identifying the resource being locked (page, row, tablespace, etc.)
  • Lock states — shared (S), exclusive (X), update (U), and others

The lock structure is sized based on the expected number of concurrent locks across all members. Undersizing it causes lock escalation in the coupling facility (not the same as DB2 lock escalation), which degrades performance.


28.4 Coupling Facility Structures

The Lock Structure (DSNDB2_LOCK1)

The lock structure is the most performance-critical CF structure. Every global lock request — and in data sharing, most lock requests are global — involves the lock structure. The structure contains:

  • Lock entries — recording which member holds which locks
  • Contention detection — hardware-assisted detection of lock conflicts between members
  • Notify lists — when a lock conflict occurs, the CF notifies the holding member

The lock structure is named according to the convention groupname_LOCK1, e.g., DSNDBGP_LOCK1. A secondary lock structure (LOCK2) can be defined for rebuild purposes.

Sizing the lock structure:

The lock structure must be large enough to hold the lock table (a hash table for fast lookup) and the record list (actual lock entries). IBM provides the following guidance:

  • Lock table entries: at least 2x the maximum number of concurrent page locks
  • Record list entries: depends on the number of retained locks during recovery scenarios
  • Typical sizes range from 256 MB for small groups to several GB for large groups
STRUCTURE NAME(DSNDBGP_LOCK1)
    SIZE(512M)
    INITSIZE(256M)
    PREFLIST(CF01, CF02)
    REBUILDPERCENT(5)
    DUPLEX(ENABLED)

The Shared Communications Area (SCA)

The SCA (groupname_SCA) is a list structure used by DB2 members to share control information. It contains:

  • Database exception table (DBET) — tracks the status of databases, tablespaces, and index spaces across members (e.g., which objects are in COPY PENDING, CHECK PENDING, etc.)
  • Function registration — members register their capabilities
  • CASTOUT ownership — which member is responsible for casting out (writing) dirty pages from the GBP to DASD for a given pageset/partition

The SCA is relatively small compared to GBPs and the lock structure — typically 32-128 MB. But it is accessed frequently, so it should be placed in a coupling facility with good connectivity.

Group Buffer Pool Structures

Each GBP is a cache structure in the coupling facility. A GBP contains:

  • Directory entries — tracking which pages are cached and their status
  • Data entries — the actual page images (4K, 8K, 16K, or 32K depending on the buffer pool)

The ratio of directory entries to data entries is critical. IBM recommends a directory-to-data ratio of at least 5:1 for most workloads. This is because the directory tracks not just pages that are currently cached in the GBP, but also pages that have been cast out (written to DASD) but whose directory entries are retained for cross-invalidation purposes.

Why the 5:1 ratio matters:

When a page is updated by member M1 and written to the GBP, a directory entry is created. When that page is later cast out to DASD, the data entry is reclaimed, but the directory entry is retained. If member M2 later reads the page from its local buffer pool, the CF checks the directory to determine if that cached copy is still valid. If the directory entry has been reclaimed (because the GBP directory is too small), the CF cannot confirm validity, and M2 must re-read the page from DASD — a costly operation.

STRUCTURE NAME(DSNDBGP_GBP0)
    SIZE(2G)
    INITSIZE(1G)
    PREFLIST(CF01, CF02)
    REBUILDPERCENT(5)
    DUPLEX(ENABLED)

28.5 XES and Inter-System Communication

Cross-System Extended Services (XES)

XES is the z/OS component that provides the programming interface to coupling facility structures. DB2 does not communicate directly with the coupling facility hardware — it uses XES macros (IXLLOCK, IXLCACHE, IXLLIST) to perform operations on CF structures.

The key XES operations for data sharing are:

Lock Operations (IXLLOCK)

  • OBTAIN — request a lock on a resource
  • ALTER — change the state of a held lock (e.g., from S to X)
  • RELEASE — release a lock
  • PURGE — release all locks held by a member (used during member failure)

Cache Operations (IXLCACHE)

  • READ — read a page from the GBP
  • WRITE — write a page to the GBP
  • CASTOUT — write a dirty GBP page to DASD and update the directory
  • DELETE NAME — remove a page from the GBP
  • CROSS-INVALIDATE — mark a page in another member's local buffer pool as invalid

The Coherency Protocol: P-Locks and L-Locks

DB2 data sharing uses two distinct types of locks to maintain data coherency:

Physical Locks (P-Locks)

P-locks control cached data coherency. They are managed through the coupling facility's lock structure and are transparent to applications. P-locks ensure that when one member modifies a page, other members are notified that their cached copies are invalid.

P-locks operate at the pageset/partition level for most objects. When a member accesses a pageset, it obtains a P-lock. The P-lock state indicates the member's interest in the pageset:

  • IS (Intent Share) — the member may read pages from this pageset
  • IX (Intent Exclusive) — the member may read or update pages
  • S (Share) — the member has read the pageset; no one is updating it
  • SIX (Share with Intent Exclusive) — combination of S and IX
  • X (Exclusive) — the member is the only one accessing the pageset

When a P-lock conflict occurs (e.g., member M1 holds IS and member M2 requests IX), the coupling facility sends a notify message to M1. M1 must then negotiate — typically by writing its dirty pages to the GBP and downgrading its P-lock.

Logical Locks (L-Locks)

L-locks are the traditional DB2 transaction locks — the locks you learned about in Chapter 18 (Concurrency and Locking). In data sharing, L-locks are still used for transaction isolation, but they are managed globally through the coupling facility's lock structure rather than locally within a single DB2 subsystem.

When a transaction on member M1 takes an X-lock on a row, that lock is registered in the CF lock structure. If a transaction on member M2 tries to read or update the same row, the CF detects the conflict and M2's request waits (or times out) just as it would in a non-data-sharing environment.

Cross-Invalidation

Cross-invalidation (XI) is the mechanism by which the coupling facility tells a member that its locally cached copy of a page is no longer valid. Here is the sequence:

  1. Member M1 updates page P in tablespace T
  2. M1 writes the updated page P to GBP
  3. The CF checks: does any other member have page P in its local buffer pool?
  4. If member M2 has a cached copy, the CF sends an XI signal to M2
  5. M2 marks its local copy of page P as invalid
  6. The next time M2 needs page P, it reads the current version from the GBP (not from its stale local cache)

This cross-invalidation protocol is what makes data sharing possible. Without it, members would read stale data, and the database would become corrupt.

The Cost of Coherency

Every coherency operation has a cost — measured in CF service time (typically 5-30 microseconds per request on modern hardware). For high-throughput workloads, these microseconds add up:

  • A transaction that performs 10 page updates incurs 10 GBP writes + associated P-lock negotiations
  • At 20,000 transactions per second across the group, that is 200,000 CF operations per second just for page writes
  • Add lock requests, cross-invalidations, and castout processing, and the CF can be servicing millions of requests per second

This is why coupling facility sizing, placement, and link configuration are critical to data sharing performance.

Detailed Walk-Through: A Cross-Member Update

To make the coherency protocol concrete, let us trace a single row update that involves two members. This walkthrough shows every step that occurs when member M1 updates a row that member M2 has cached.

Starting state: - Tablespace T is GBP-dependent (both M1 and M2 have P-locks in IX mode) - Page P (containing the row to be updated) is cached in both M1's and M2's local buffer pools - No transactions are actively modifying the row

Step 1: M1's transaction begins updating the row. - M1 requests an L-lock (exclusive) on the row via the CF lock structure - The CF grants the X lock (no conflict; no other transaction holds a lock on this row) - M1 modifies the row in its local buffer pool (in-memory)

Step 2: M1's transaction commits. - M1 writes the commit log record to its local active log - M1 writes the dirty page P to the GBP (this is synchronous in SYNC mode) - The CF registers that page P has been updated by M1 - The CF sends a cross-invalidation (XI) signal to M2, because M2 has page P in its local buffer pool - M2 receives the XI signal and marks its local copy of page P as invalid - M1 releases the L-lock on the row via the CF lock structure

Step 3: M2's transaction reads the row. - M2 checks its local buffer pool for page P — finds it marked invalid - M2 reads page P from the GBP (a GBP read, counted as a GBP hit) - M2 now has the current version of the row - M2 requests an L-lock (shared) on the row — granted because M1 has released its X lock

Total CF operations for this single row update: 1 L-lock obtain, 1 GBP write, 1 XI signal, 1 L-lock release, 1 GBP read, 1 L-lock obtain — six coupling facility interactions. At 5-10 microseconds each, the total coherency overhead is 30-60 microseconds. For a single transaction, this is negligible. For 20,000 transactions per second with multiple row updates each, it adds up to millions of CF operations per second.


28.6 Group Buffer Pool Management

GBP-Dependent and GBP-Independent Pagesets

Not every tablespace needs to participate in the GBP protocol. DB2 classifies pagesets as:

  • GBP-dependent — multiple members are accessing the pageset concurrently for read/write. Pages must flow through the GBP for coherency.
  • GBP-independent — only one member is accessing the pageset, or all access is read-only. No GBP writes are needed.

DB2 automatically manages the transition between these states based on P-lock negotiations:

  1. Member M1 opens tablespace T and obtains a P-lock in IX mode. Since M1 is the only accessor, DB2 grants exclusive P-lock ownership. T is GBP-independent on M1.
  2. Member M2 opens the same tablespace T and requests a P-lock. The CF detects the conflict with M1's lock.
  3. M1 is notified and must write all its dirty pages for T to the GBP. T becomes GBP-dependent on M1.
  4. M2 obtains its P-lock. T is now GBP-dependent on both members.
  5. When M2 closes T (or M2 stops), M1 may revert T to GBP-independent.

Castout Processing

Castout is the process of writing dirty pages from the GBP to DASD. This is essential because:

  1. The GBP has limited space — dirty pages must be written out to make room
  2. The GBP is volatile — if the coupling facility fails, dirty pages in the GBP would be lost (unless duplexing is enabled)
  3. Disk-resident pages must be kept reasonably current for recovery purposes

Castout processing is performed by one member per pageset/partition — the castout owner, registered in the SCA. The castout owner periodically writes dirty GBP pages to DASD in the following scenarios:

  • Class castout threshold — when the percentage of dirty pages in the GBP for a given class reaches a threshold (default 10%)
  • Group castout threshold — when the overall GBP dirty page percentage reaches a threshold (default 35%)
  • Castout engine trigger — DB2 proactively schedules castout to avoid threshold breaches
  • System checkpoint — all dirty pages are cast out during a system checkpoint

GBP Hit Ratio and Performance

The GBP hit ratio measures how often a member finds a needed page in the GBP versus having to read from DASD:

GBP Hit Ratio = (GBP reads satisfied from GBP) / (Total GBP reads) * 100

A high GBP hit ratio (>80%) indicates that the GBP is effectively serving as a shared cache. A low ratio indicates that pages are being cast out before they are re-read — the GBP is too small, or the workload has poor temporal locality.

Key metrics to monitor:

Metric Target Concern Level
GBP hit ratio >80% <50% indicates undersized GBP
XI ratio <10% of buffer reads High XI = high cross-member contention
Directory-to-data ratio >5:1 effective <3:1 causes directory reclaims
Castout class threshold hit rate Rare Frequent hits = undersized GBP
GBP-dependent getpage rate Minimize High rate = high CF overhead

Read and Write Patterns

Understanding GBP I/O patterns is key to optimization:

Write Pattern — Changed page writes to GBP: When a member commits a transaction that updated GBP-dependent pages, those pages are written to the GBP. This is synchronous — the commit does not complete until the GBP write is acknowledged. This adds to commit latency.

Read Pattern — GBP reads: When a member needs a page that has been cross-invalidated (or was never in its local buffer pool), it reads from the GBP. If the page is in the GBP, it is a GBP hit. If not, it falls through to a DASD read.

Castout Pattern — GBP to DASD writes: The castout owner writes dirty GBP pages to DASD. This is asynchronous and runs on a background engine. It should keep up with the incoming write rate to avoid GBP space pressure.


28.7 Workload Balancing

Why Balance Workloads?

In a data sharing group, you have multiple members capable of servicing any request. How incoming work is distributed across members has profound effects on:

  • Performance — unbalanced workloads create hot members while others idle
  • Coupling facility overhead — if the same data is accessed by multiple members, GBP activity increases
  • Recovery exposure — if all work goes to one member, that member's failure has maximum impact

The ideal workload distribution depends on the nature of the work. Two contrasting strategies:

  1. Affinity-based routing — route related work to the same member to minimize cross-member data sharing overhead
  2. Even distribution — spread work evenly across members to balance CPU and memory utilization

In practice, most installations use a hybrid approach.

Workload Manager (WLM) Routing

z/OS Workload Manager (WLM) can route work to data sharing members based on server health and capacity. For CICS regions, WLM knows which member each CICS region is connected to and can start/stop CICS regions to balance the load.

For distributed (DRDA) connections coming through TCP/IP, WLM integrates with the Sysplex Distributor — a z/OS TCP/IP feature that load-balances incoming connections across target systems.

Sysplex Distributor

The sysplex distributor is a network-level load balancer built into z/OS TCP/IP. It works as follows:

  1. A distributing stack LPAR holds a dynamic virtual IP address (DVIPA) that represents the data sharing group
  2. When a TCP connection arrives for the DVIPA, the distributing stack routes it to one of the target stacks — the LPARs running DB2 members
  3. Routing decisions use WLM health information: members with more available capacity get more connections

From the application's perspective, there is a single IP address and port for the data sharing group. The sysplex distributor handles failover transparently — if a member goes down, new connections are routed to surviving members.

Client Application
       │
       ▼
┌─────────────────┐
│ DVIPA: 10.1.1.1 │  (Distributing Stack)
│   Port: 446     │
└─────┬───────────┘
      │  WLM-advised routing
      ├──────────────┬──────────────┐
      ▼              ▼              ▼
┌──────────┐  ┌──────────┐  ┌──────────┐
│  DB2-M1  │  │  DB2-M2  │  │  DB2-M3  │
│ 10.1.1.2 │  │ 10.1.1.3 │  │ 10.1.1.4 │
└──────────┘  └──────────┘  └──────────┘

VTAM Generic Resources

For SNA-based connections (CICS-to-DB2 via VTAM), generic resources provide similar load balancing. A generic resource name (e.g., DBANKGRP) represents all members in the group. When a session is established, VTAM routes it to an available member based on WLM advice.

Connection Balancing Strategies

Workload Type Recommended Strategy Rationale
OLTP (CICS/IMS) Affinity by transaction type Minimize cross-member data sharing for hot tables
Distributed (DRDA) Sysplex distributor with WLM Automatic balancing based on capacity
Batch Dedicated member Isolate batch from online workload
Utilities Dedicated member or rotate Utilities (REORG, COPY) are resource-intensive
Read-only queries Any member No GBP write overhead for read-only access

28.8 Member Failure and Recovery

Failure Scenarios

Data sharing must handle several failure scenarios:

  1. DB2 member abnormal termination — the DB2 subsystem crashes (e.g., due to a software error)
  2. z/OS system failure — the entire LPAR crashes
  3. CPC failure — the physical machine goes down, affecting all LPARs on it
  4. Coupling facility failure — the CF crashes (the most catastrophic scenario for data sharing)

Retained Locks

When a member fails, it may have been holding locks on behalf of in-flight transactions. These locks cannot simply be released — releasing an X-lock on a partially updated row before rolling back the transaction would expose uncommitted data to other members.

Instead, the locks become retained locks. The coupling facility preserves them in the lock structure with a special "retained" status. Other members can detect retained locks and must wait for recovery before accessing the affected resources.

Retained locks are one of the most operationally significant aspects of data sharing. They protect data integrity but can also block other members from accessing critical data until recovery completes.

Peer Recovery

In data sharing, a surviving member can perform peer recovery on behalf of the failed member. This is much faster than waiting for the failed member to restart:

  1. XCF detects the member failure and notifies surviving members
  2. A surviving member reads the failed member's recovery log (the failed member's active and archive logs are on shared DASD)
  3. Backward recovery — the surviving member undoes all uncommitted transactions by reading the failed member's log
  4. Retained locks are released — as each transaction is rolled back, its retained locks are freed
  5. Normal access resumes — other members can now access the previously locked resources

Peer recovery typically completes in seconds to minutes, depending on the volume of uncommitted work on the failed member.

Group Restart

If all members fail simultaneously (e.g., a dual CF failure or sysplex-wide outage), the group must perform a group restart. The first member to restart:

  1. Reads the group's SCA to determine the state of the group
  2. Performs forward recovery (redo) for committed transactions not yet written to DASD
  3. Performs backward recovery (undo) for uncommitted transactions
  4. Rebuilds the coupling facility structures
  5. Releases all retained locks

Group restart is the most time-consuming recovery scenario and is the primary reason for implementing CF duplexing — to avoid situations where all members must perform a cold start.

Light Recovery

DB2 also supports light recovery for specific scenarios where only certain pagesets need recovery attention. If a member fails while it was the only updater of a particular pageset, DB2 can perform localized recovery for just that pageset without a full peer recovery pass. This optimization reduces recovery time and minimizes impact on other members.

Recovery Timeline

Time ──────────────────────────────────────────────────────►

Member M2 fails (crash)
│
├── XCF detection (1-3 seconds)
│
├── Peer recovery initiated by M1 or M3
│   ├── Read M2's log (seconds)
│   ├── Undo uncommitted transactions (seconds to minutes)
│   └── Release retained locks (immediate upon undo)
│
├── All retained locks released
│   └── Other members resume full access
│
└── M2 restarts and rejoins group (minutes)

28.9 Planned Maintenance with Zero Downtime

One of the most valuable benefits of data sharing is the ability to perform planned maintenance — including DB2 version upgrades — without any application downtime.

Rolling Maintenance

To apply PTFs (fixes) to DB2:

  1. Quiesce member M1 — drain work from M1 by removing it from the WLM routing table
  2. Stop M1 — shut down DB2 on M1 cleanly
  3. Apply PTFs — install the maintenance on M1's libraries
  4. Restart M1 — M1 rejoins the group and resumes processing
  5. Repeat for M2, M3, etc.

At no point is the data sharing group unavailable. The group attach name continues to accept connections, routing work to the members that are still running.

Version Coexistence

DB2 data sharing supports version coexistence — members at different DB2 versions can run simultaneously in the same group during a migration. For example, during a migration from DB2 12 to DB2 13:

  1. Start in enabling-new-function mode (ENFM) on DB2 12
  2. Install DB2 13 on one member
  3. Bring up the DB2 13 member in compatibility mode — it coexists with DB2 12 members
  4. Migrate remaining members one at a time
  5. Once all members are at DB2 13, activate new-function mode (NFM)

This process can span weeks or months, allowing thorough testing at each step. The group never goes down.

Online Schema Changes

Many schema changes in a data sharing environment can be performed online using ALTER TABLE ... ADD COLUMN, CREATE INDEX ... ONLINE, or pending definition changes. Combined with the REORG TABLESPACE ... SHRLEVEL CHANGE utility (which allows concurrent read/write access during reorganization), most maintenance activities can be performed without any application impact.

Operational Procedures for Rolling Maintenance

A detailed rolling maintenance procedure for a three-member group (MBK1, MBK2, MBK3) applying a critical PTF:

Pre-Maintenance Checklist: 1. Verify all three members are ACTIVE: -MBK1 DISPLAY GROUP 2. Verify no retained locks or recovery-pending objects 3. Verify GBP hit ratios and CF service times are within baselines 4. Notify operations and application teams of the maintenance window 5. Ensure the PTF has been tested in a non-production data sharing environment

Maintenance Sequence for MBK3 (batch member): 1. Wait for any active batch jobs on MBK3 to complete or reach a checkpoint 2. Drain MBK3: -MBK3 STOP DDF MODE(QUIESCE) (stop distributed connections) 3. Cancel remaining non-critical work: -MBK3 DISPLAY THREAD(*) TYPE(ACTIVE) 4. Stop MBK3: -MBK3 STOP DB2 MODE(QUIESCE) 5. Apply PTF to MBK3's load libraries using SMP/E 6. Restart MBK3: START DB2 MBK3 7. Verify MBK3 rejoined the group: -MBK1 DISPLAY GROUP 8. Verify MBK3 is healthy: check CF service times, buffer pool status, active threads 9. Resume batch operations on MBK3

Repeat for MBK1 and MBK2 (online members) — one at a time. Before taking each online member down, verify that the remaining online member can handle the full CICS and DRDA workload. Monitor response times on the remaining member during the maintenance window.

Post-Maintenance Verification: 1. All three members ACTIVE 2. CF service times within baselines 3. GBP hit ratios recovering to normal levels 4. No abnormal lock contention or false contention spikes 5. Application response times at pre-maintenance levels


28.10 Data Sharing Performance Considerations

False Contention

The coupling facility lock structure uses a hash table for lock detection. When two different resources hash to the same lock table entry, the CF reports a conflict — even though the resources are different. This is false contention.

False contention causes unnecessary P-lock negotiations and can degrade performance. To minimize false contention:

  • Size the lock table appropriately — more entries = fewer hash collisions
  • Monitor LOCK_ENTRY_SHORTAGE and FALSE_CONTENTION counters in IFCID 0230 and RMF reports
  • Rebuild the lock structure with a larger lock table if false contention is excessive

GBP-Dependent Read Overhead

Every GBP-dependent read adds coupling facility overhead compared to a local buffer pool read. A local buffer pool read takes nanoseconds (it is a memory access). A GBP read takes microseconds (it crosses the CF link). For workloads that perform millions of reads per second, this difference matters.

Strategies to minimize GBP-dependent reads:

  1. Affinity routing — route transactions that access the same data to the same member, keeping the data in that member's local buffer pool and the pageset GBP-independent
  2. Large local buffer pools — a page read from the local buffer pool (if valid) avoids a GBP read
  3. Minimize cross-member interest in hot pagesets — if possible, design the workload so that hot tables are primarily accessed by one member

Castout Overhead

Castout writes GBP pages to DASD, consuming both CF processing time and I/O bandwidth. If castout cannot keep up with the rate of incoming writes, the GBP fills up and DB2 must wait for space — a severe performance problem.

Monitor castout throughput and ensure: - Castout thresholds are tuned appropriately - The DASD subsystem can handle the I/O rate - Multiple castout engines are active (controlled by the CASTOUT parameter in DSNZPARM)

When Data Sharing Hurts

Data sharing is not free. The coupling facility overhead means that a single DB2 subsystem will always outperform a data sharing group for the same workload on the same hardware — if availability and scalability are not concerns. Specifically, data sharing performs poorly when:

  • Every member accesses every page — maximum GBP-dependent activity
  • High update rates on shared data — constant P-lock negotiations and XI signals
  • Small, frequent transactions with high lock counts — lock structure becomes a bottleneck
  • Undersized coupling facility — CF becomes a chokepoint

Data sharing shines when:

  • Continuous availability is required — the additional overhead is the cost of zero downtime
  • Workloads can be partitioned — different members handle different workload types, minimizing cross-member data sharing
  • Horizontal scalability is needed — adding a member adds capacity
  • Planned maintenance must be non-disruptive — rolling maintenance eliminates planned outages

Measuring and Reporting Data Sharing Overhead

To quantify data sharing overhead, compare the total CPU consumption of the data sharing group with an equivalent single-subsystem configuration. IBM provides the data sharing overhead ratio:

DS Overhead = (Total Group CPU - Equivalent Single System CPU) / Total Group CPU * 100

A well-tuned data sharing group typically shows 5-15% overhead compared to a single subsystem. Groups with poor workload affinity or undersized CF structures can exhibit 20-30% overhead or more.

Key metrics for measuring overhead:

  • CFSERVICETIME: Average time per CF operation. Track this by structure type (lock, GBP, SCA). If it increases over time, the CF may be overloaded or links degraded.
  • CF CPU utilization: The percentage of the CF LPAR's processing capacity consumed by CF operations. Should remain below 70% to avoid queuing delays.
  • GBP-dependent getpage ratio: The percentage of getpages that require a GBP read (as opposed to being satisfied from the local buffer pool). Lower is better; indicates workload affinity is effective.
  • P-lock negotiation rate: The number of P-lock negotiations per second. High rates indicate heavy cross-member interest in the same pagesets.
  • Synchronous CF write time: Time spent waiting for synchronous GBP writes during commit. This directly impacts transaction response time.

These metrics should be collected continuously (every 5-15 minutes) and trended over weeks and months. Sudden changes indicate workload shifts, configuration problems, or capacity issues that require investigation.

Global Deadlock Detection

In a single DB2 subsystem, deadlock detection runs within that subsystem's IRLM (Internal Resource Lock Manager). In data sharing, deadlocks can span multiple members — member M1's transaction waits for member M2's transaction, which in turn waits for M1's transaction. These global deadlocks are detected by a distributed protocol:

  1. Each member's IRLM detects local wait chains
  2. If a local deadlock cycle is not found but a transaction is waiting for a lock held on another member, the IRLM sends wait information to the coupling facility
  3. The CF aggregates wait information from all members
  4. If a global deadlock cycle is detected, the CF notifies the relevant members
  5. One member's transaction is chosen as the deadlock victim and rolled back

The global deadlock detection cycle is typically longer than the local cycle (seconds vs. milliseconds), because it requires inter-system communication. This means that applications in a data sharing environment may experience slightly longer waits before a deadlock is detected and resolved. Proper transaction design — keeping transactions short, accessing resources in consistent order — remains the best defense against deadlocks, whether local or global.


28.11 Meridian Bank Data Sharing Design

Requirements

Meridian National Bank's z/OS DB2 environment requires:

  • 24/7 availability — no planned or unplanned outages
  • High throughput — 15,000 CICS transactions per second at peak
  • Batch window — nightly batch processes including end-of-day settlement, interest calculation, and regulatory reporting
  • Disaster recovery — site-level redundancy (covered in Chapter 30)
  • Rolling maintenance — DB2 PTFs applied without downtime

Three-Member Data Sharing Group Design

Meridian deploys a three-member data sharing group named MBNKGRP:

Group Name:    MBNKGRP
Group Attach:  MBNKDB2
Members:       MBK1, MBK2, MBK3

Physical Layout:
  CPC-1 (z16 A01)     CPC-2 (z16 A02)
  ┌─────────┐          ┌─────────┐
  │ LPAR-A  │          │ LPAR-C  │
  │ MBK1    │          │ MBK2    │
  ├─────────┤          ├─────────┤
  │ LPAR-B  │          │ LPAR-D  │
  │ MBK3    │          │ (spare) │
  └─────────┘          └─────────┘

  CF-1 (LPAR on CPC-1)    CF-2 (LPAR on CPC-2)
  ┌────────────────┐       ┌────────────────┐
  │ MBNKGRP_LOCK1  │       │ MBNKGRP_LOCK1  │ (duplex)
  │ MBNKGRP_SCA    │       │ MBNKGRP_SCA    │ (duplex)
  │ MBNKGRP_GBP0   │       │ MBNKGRP_GBP0   │ (duplex)
  │ MBNKGRP_GBP1   │       │ MBNKGRP_GBP1   │ (duplex)
  │ MBNKGRP_GBP8K0 │       │ MBNKGRP_GBP8K0 │ (duplex)
  └────────────────┘       └────────────────┘

Workload Distribution

Member Primary Workload Routing Mechanism
MBK1 Online banking (CICS) VTAM generic resource + WLM
MBK2 Online banking (CICS) + Distributed (DRDA) VTAM + Sysplex distributor
MBK3 Batch processing + Utilities JCL-specific SSID

During the batch window (11 PM - 5 AM), MBK3 runs batch jobs while MBK1 and MBK2 continue serving online traffic. This isolation minimizes the GBP overhead — batch jobs on MBK3 access different pagesets than online transactions on MBK1/MBK2, so most pagesets remain GBP-independent.

Coupling Facility Sizing

Structure Size Rationale
MBNKGRP_LOCK1 1 GB 15K TPS * average 5 locks/txn = 75K concurrent locks
MBNKGRP_SCA 64 MB Control information only
MBNKGRP_GBP0 4 GB Primary 4K buffer pool, holds account tables
MBNKGRP_GBP1 2 GB Secondary 4K pool for indexes
MBNKGRP_GBP8K0 1 GB 8K pool for transaction history

All structures are duplexed across CF-1 and CF-2 for resilience. If either CF fails, the structures survive on the other CF.

Failure Scenarios and Recovery

Scenario Impact Recovery
MBK1 fails CICS sessions reconnect to MBK2 via generic resource Peer recovery by MBK2 or MBK3 (<60 seconds)
MBK3 fails during batch Batch jobs abend; restart on MBK3 after recovery Peer recovery by MBK1 or MBK2
CPC-1 fails (MBK1 + MBK3) MBK2 continues on CPC-2 Peer recovery for both MBK1 and MBK3
CF-1 fails Duplexed structures survive on CF-2 Rebuild duplex when CF-1 returns
Both CFs fail Group restart required RTO: 5-15 minutes

Performance Baselines

Meridian establishes the following performance baselines for data sharing health:

  • CF service time: < 15 microseconds per request
  • GBP hit ratio for GBP0: > 85%
  • False contention rate: < 1% of lock requests
  • XI ratio: < 5% of getpages
  • Castout class threshold hits: < 10 per hour
  • Peer recovery time: < 90 seconds for any single member failure

Spaced Review

From Chapter 3: SQL Foundations

Connection: Every SQL statement you learned in Chapter 3 — SELECT, INSERT, UPDATE, DELETE — works identically in a data sharing environment. The application is unaware that multiple DB2 members exist. The group attach name provides location transparency, and the coupling facility ensures that concurrent SQL from different members produces correct results. This is the beauty of data sharing: the complexity is entirely beneath the SQL layer.

From Chapter 18: Concurrency and Locking

Connection: The locking concepts from Chapter 18 — lock modes (S, X, U, IS, IX), lock escalation, isolation levels, deadlock detection — all apply in data sharing. The difference is that locks are now global, managed in the coupling facility's lock structure rather than in a single subsystem's memory. A transaction on member M1 and a transaction on member M2 can deadlock just as two transactions on a single subsystem can. DB2 detects and resolves these global deadlocks, but the detection cycle may be slightly longer due to inter-system communication.

From Chapter 26: z/OS Environment

Connection: Chapter 26 introduced the z/OS environment — LPARs, WLM, RACF, the DB2 address spaces (sMSTR, sDBM1, sDIST). In a data sharing group, each member has its own set of address spaces. WLM manages the workload across all members, and RACF secures resources consistently across the group. The skills you built in Chapter 26 for managing a single DB2 subsystem now multiply across all members in the group.


Summary

Data sharing on the IBM Parallel Sysplex represents the pinnacle of shared-disk database clustering. By leveraging the coupling facility for lock management, cache coherency, and shared control state, multiple DB2 members can provide concurrent read/write access to the same data with full transactional integrity.

The architecture delivers continuous availability through redundancy (multiple members, duplexed CF structures), transparent failover (group attach name, peer recovery), and non-disruptive maintenance (rolling PTF application, version coexistence). The cost is coupling facility overhead — every global lock request, every GBP write, every cross-invalidation adds microseconds that do not exist in a single-subsystem environment.

For Meridian National Bank and enterprises like it, that cost is a bargain. The alternative — planned downtime, single points of failure, and capacity ceilings — is simply unacceptable for systems that process billions of dollars in transactions daily.

In the next chapter, we turn to the DB2 LUW platform and explore HADR, pureScale, and replication technologies that deliver high availability in non-z/OS environments.