Chapter 28 Key Takeaways

Core Concepts

  1. Data sharing allows multiple DB2 members to concurrently read and write the same data. This is a shared-disk architecture — all members access the same physical DASD volumes. There is no data partitioning between members.

  2. The coupling facility is the technological heart of data sharing. It provides hardware-assisted lock management, shared cache (group buffer pools), and a shared communications area (SCA). Without the coupling facility, concurrent multi-member access would result in data corruption.

  3. Two types of locks maintain coherency: P-locks and L-locks. Physical locks (P-locks) manage cached data coherency between members — they are invisible to applications. Logical locks (L-locks) are traditional transaction locks, now managed globally through the coupling facility's lock structure.

  4. Group buffer pools (GBPs) serve as a shared cache in the coupling facility. When a member updates a page in a GBP-dependent tablespace, it writes the updated page to the GBP. Other members can then read the current version from the GBP instead of getting stale data from their local buffer pools.

  5. Cross-invalidation (XI) signals notify members when their cached pages are stale. The coupling facility sends XI signals when a page is updated by one member, marking copies in other members' local buffer pools as invalid.

Operational Essentials

  1. Retained locks protect data integrity during member failure. When a member crashes, its locks become retained locks in the coupling facility. Other members must wait for peer recovery to roll back uncommitted transactions and release these locks.

  2. Peer recovery is fast and automatic. A surviving member reads the failed member's log and rolls back uncommitted transactions, typically completing in seconds to minutes. This is dramatically faster than waiting for the failed member to restart.

  3. Rolling maintenance eliminates planned downtime. PTFs and even DB2 version migrations can be applied one member at a time. The group continues serving traffic at all times.

  4. Workload affinity reduces coupling facility overhead. Routing related transactions to the same member keeps pagesets GBP-independent, avoiding the cost of GBP writes, reads, and cross-invalidation signals.

  5. Data sharing has a measurable performance cost. Every CF operation adds microseconds. The overhead is justified by the availability and scalability benefits, but it means a data sharing environment will always consume more total CPU than a single subsystem running the same workload.

Sizing and Monitoring

  1. Size the lock structure to avoid false contention. False contention (hash collisions in the lock table) causes unnecessary lock negotiations. The lock table needs enough entries to minimize collisions — typically 2x the maximum concurrent lock count.

  2. Maintain a 5:1 directory-to-data ratio in GBPs. Directory entries are retained after pages are cast out, enabling the CF to validate locally cached copies. If the directory is too small, members must re-read pages from DASD unnecessarily.

  3. Monitor GBP hit ratio, XI ratio, CF service time, and false contention rate continuously. These metrics are the vital signs of a data sharing group's health. Degradation in any metric requires investigation.

  4. Castout must keep pace with incoming GBP writes. If dirty pages accumulate faster than they are cast out, the GBP fills up and becomes a bottleneck. Monitor castout threshold hit rates and tune castout engines accordingly.

Design Principles

  1. Use the group attach name for all application connections. Applications should never connect to a specific member SSID (except for dedicated workloads like batch). The group attach name provides location transparency and enables automatic workload routing.

  2. Duplex all coupling facility structures. A CF failure with non-duplexed structures requires a group restart — the most disruptive recovery scenario. Duplexing ensures structures survive any single CF failure.

  3. Isolate batch on a dedicated member when possible. Batch workloads that update millions of rows generate enormous GBP activity if they share pagesets with online members. Dedicating a member to batch reduces cross-member contention.

  4. Plan for failure as part of normal operations. Test member failure, peer recovery, CF rebuild, and group restart regularly. The first time you exercise these procedures should not be during a real outage at 3 AM.