Chapter 14 Exercises: Physical Database Design
These exercises progress from basic storage calculations through tablespace design to comprehensive physical design projects. Work through them in order; later exercises build on earlier results.
Section A: Storage Calculations (Exercises 1-8)
Exercise 1: Row Size Calculation — OLTP Table
Given the following table definition, calculate the average and maximum row sizes on both z/OS (8-byte row header) and LUW (10-byte row header). Account for NULL indicators and VARCHAR length prefixes.
CREATE TABLE ORDER_LINE (
ORDER_ID BIGINT NOT NULL, -- 8 bytes
LINE_NUM SMALLINT NOT NULL, -- 2 bytes
PRODUCT_ID INTEGER NOT NULL, -- 4 bytes
QUANTITY SMALLINT NOT NULL, -- 2 bytes
UNIT_PRICE DECIMAL(9,2) NOT NULL, -- 5 bytes
DISCOUNT_PCT DECIMAL(5,2), -- 3 bytes
LINE_TOTAL DECIMAL(11,2) NOT NULL, -- 6 bytes
STATUS CHAR(1) NOT NULL, -- 1 byte
SHIP_DATE DATE, -- 4 bytes
NOTES VARCHAR(200) -- 2 + avg 40 bytes
);
(a) How many columns are nullable? How many bytes for the NULL indicator? (b) What is the average row size (z/OS)? (c) What is the maximum row size (z/OS)? (d) What is the average row size (LUW)?
Exercise 2: Rows Per Page
Using your answer from Exercise 1 (average row size on z/OS): (a) How many rows fit on a 4 KB page (usable: 4,050 bytes)? (b) How many rows fit on an 8 KB page (usable: 8,170 bytes)? (c) How many rows fit on a 16 KB page (usable: 16,340 bytes)? (d) Which page size would you recommend and why?
Exercise 3: Table Size Estimation
A retail company has 85 million ORDER_LINE rows. Using the 8 KB page size from Exercise 2: (a) How many pages are needed with PCTFREE 0? (b) How many pages with PCTFREE 10? (c) How many pages with PCTFREE 20? (d) What is the total storage in gigabytes for each PCTFREE value? (e) How much additional storage does PCTFREE 20 cost compared to PCTFREE 0?
Exercise 4: Partition Size Estimation
The ORDER_LINE table receives 250,000 new rows per day, evenly distributed. (a) How many rows accumulate in one month (assume 30 days)? (b) Using 8 KB pages and PCTFREE 0, what is the monthly partition size in GB? (c) If you define DSSIZE 4G on z/OS, will one month fit in a single partition? (d) If you retain 3 years of data, how many monthly partitions do you need? (e) What is the total storage for 3 years of ORDER_LINE data?
Exercise 5: Compression Savings
The DB2 compression estimation utility reports a 58% compression ratio for the ORDER_LINE table. (a) Starting from the uncompressed 3-year total (Exercise 4e), how much storage is needed after compression? (b) If enterprise storage costs $0.12/GB/month, what is the annual savings from compression? (c) If compression adds 5% CPU overhead and the system's CPU cost is $800,000/year, what is the annual CPU cost of compression? (d) What is the net annual savings (or cost)? (e) Over 3 years, what is the cumulative net benefit?
Exercise 6: VSAM Data Set Sizing (z/OS)
A z/OS DBA needs to set PRIQTY and SECQTY for a tablespace that will hold the CUSTOMER table: - 3.2 million rows - Average row size: 380 bytes - 8 KB pages - PCTFREE 10
(a) Calculate the number of pages required. (b) Convert pages to kilobytes (PRIQTY is specified in KB). (c) Set SECQTY to 10% of PRIQTY. What values do you specify? (d) If the table grows by 15% per year, how many secondary extents will be needed in Year 1? (e) After how many years will the data set reach the 127-extent limit (assuming no REORG to reclaim PRIQTY)?
Exercise 7: LUW Tablespace Sizing
For the same CUSTOMER table on LUW with automatic storage: (a) What INITIALSIZE would you specify? (b) What INCREASESIZE would you specify? (c) What MAXSIZE would you set for a 5-year horizon with 15% annual growth? (d) What EXTENTSIZE would you choose and why?
Exercise 8: LOB Storage Estimation
A healthcare application stores medical images as BLOBs: - 500 new images per day - Average image size: 2.4 MB - Maximum image size: 25 MB - 95th percentile: 8 MB - Retention: 10 years
(a) How many images will accumulate over 10 years? (b) What is the total LOB storage (using the average size)? (c) Add 10% overhead for LOB metadata. What is the gross storage? (d) Would you recommend compression for this LOB data? Why or why not? (e) What page size must the LOB tablespace use on z/OS?
Section B: Tablespace Design (Exercises 9-18)
Exercise 9: Tablespace Type Selection (z/OS)
For each table below, recommend a z/OS tablespace type (segmented, PBG UTS, PBR UTS, LOB) and justify your choice:
(a) AUDIT_LOG — 200 million rows/year, queried by date, retained 7 years, archival by year (b) COUNTRY_CODE — 250 rows, rarely changes (c) EMPLOYEE_PHOTO — BLOB column, average 500 KB per image (d) DAILY_BALANCE — 50 million rows/year, queried by account and date (e) SESSION_TOKEN — 10 million rows, 24-hour TTL, constantly inserted and deleted
Exercise 10: Tablespace Type Selection (LUW)
For the same five tables from Exercise 9, recommend an LUW tablespace configuration: (a) Storage model (SMS, DMS, or Automatic Storage)? (b) Page size (4K, 8K, 16K, 32K)? (c) Partitioned or non-partitioned? (d) EXTENTSIZE recommendation?
Exercise 11: Page Size Impact Analysis
A table has rows with an average length of 3,900 bytes. (a) How many rows fit on a 4 KB page (usable: 4,050 bytes)? (b) How much space is wasted per page on a 4 KB page? (c) How many rows fit on an 8 KB page (usable: 8,170 bytes)? (d) What percentage of the 8 KB page is utilized? (e) Which page size is more efficient, and by how much (in terms of storage for 10 million rows)?
Exercise 12: Buffer Pool Configuration
A database has the following tablespaces:
| Tablespace | Page Size | Size | Access Pattern |
|---|---|---|---|
| TS_CUSTOMER | 8K | 5 GB | Random OLTP |
| TS_TRANS_HIST | 8K | 200 GB | Sequential batch scans |
| TS_REFERENCE | 4K | 50 MB | Frequent random reads |
| TS_TEMP | 32K | 20 GB | Sort operations |
| TS_INDEX | 8K | 30 GB | Random index lookups |
The server has 64 GB of RAM available for buffer pools.
(a) How many buffer pools do you need (minimum)? (b) How would you distribute the 64 GB across the buffer pools? (c) Justify your largest allocation. (d) Would you give TS_REFERENCE its own buffer pool or share with TS_TEMP?
Exercise 13: PCTFREE Analysis
A table experiences the following update patterns: - 30% of rows are updated within 24 hours of insertion - Updates add an average of 60 bytes to VARCHAR columns - Original average row size: 200 bytes - Page size: 8 KB (usable: 8,170 bytes)
(a) How many rows fit per page initially (PCTFREE 0)? (b) After updates, what is the average row size? (c) If PCTFREE is 0, how many rows will experience row migration? (d) What PCTFREE value would prevent most row migration? (e) Calculate the storage overhead of your recommended PCTFREE versus PCTFREE 0 for 50 million rows.
Exercise 14: Partitioning Key Decision
An insurance company has a CLAIMS table with these columns: - CLAIM_ID (BIGINT, unique) - POLICY_ID (BIGINT) - CLAIM_DATE (DATE) - SETTLEMENT_DATE (DATE) - CLAIM_STATUS (CHAR(2): OP=Open, CL=Closed, DN=Denied) - CLAIM_AMOUNT (DECIMAL) - REGION_CODE (CHAR(4), 12 distinct values)
The most common queries are: 1. Find all open claims for a policy (40% of queries) 2. Find claims by date range (30% of queries) 3. Monthly claims summary by region (20% of queries) 4. Claim detail by CLAIM_ID (10% of queries)
(a) Evaluate CLAIM_DATE as a partitioning key. (b) Evaluate REGION_CODE as a partitioning key. (c) Evaluate CLAIM_STATUS as a partitioning key. (d) What is your recommended partitioning key and granularity? (e) How many partitions would you define for a 5-year retention period?
Exercise 15: Hot/Cold Data Separation
A telecommunications company has a CALL_DETAIL_RECORD table: - 2 billion rows per year - Row size: 180 bytes - Data retention: 5 years - Access patterns: - Current month: 500 queries/second - Months 2-3: 50 queries/second - Months 4-12: 5 queries/second - Year 2+: < 1 query/hour
Design a tiered storage strategy: (a) Define the storage tiers (hot, warm, cold). (b) Map time ranges to tiers. (c) Estimate the storage in each tier (in GB). (d) Calculate cost savings versus putting all data on the hot tier at $0.15/GB/month (hot), $0.08/GB/month (warm), $0.03/GB/month (cold).
Exercise 16: Temporary Tablespace Sizing
A batch job performs the following operations: 1. Sort 120 million rows by (ACCOUNT_ID, TRANS_DATE). Sort key: 12 bytes. Row pointer: 8 bytes. Overhead: 4 bytes. 2. Hash join between two tables: build input is 5 million rows at 200 bytes each. 3. GROUP BY producing 2 million groups, each 40 bytes.
(a) Calculate the sort space for step 1 (in GB). (b) Calculate the hash join space for step 2 (in GB). (c) Calculate the GROUP BY space for step 3 (in GB). (d) What is the peak temporary space if steps 1 and 2 run concurrently? (e) What size temporary tablespace would you provision?
Exercise 17: Migration from Segmented to UTS (z/OS)
A legacy system has a segmented tablespace with three tables:
| Table | Rows | Avg Row Size | Current TS |
|---|---|---|---|
| POLICY | 5M | 450 bytes | TS_LEGACY |
| COVERAGE | 15M | 180 bytes | TS_LEGACY |
| ENDORSEMENT | 8M | 320 bytes | TS_LEGACY |
(a) Design the target UTS tablespace layout (one TS per table or shared?). (b) What tablespace type (PBG or PBR) for each? (c) Write the ALTER statements to convert the segmented tablespace. (d) What utility must be run to materialize the change? (e) What is the estimated downtime for the migration?
Exercise 18: Inline LOB Design (LUW)
A content management system stores articles as CLOBs: - 80% of articles are under 1,500 bytes - 15% are between 1,500 and 10,000 bytes - 5% are between 10,000 and 200,000 bytes - Average article size: 2,200 bytes - Total articles: 12 million
(a) What INLINE LENGTH would you recommend? (b) How much LOB data stays inline with your recommendation? (c) What is the impact on the base table row size? (d) Calculate total storage with inline LOBs vs. all LOBs in a large tablespace. (e) What performance benefit do you expect from inlining?
Section C: Comprehensive Design Problems (Exercises 19-25)
Exercise 19: E-Commerce Physical Design
Design the complete physical tablespace layout for an e-commerce system with these tables:
| Table | Row Count | Avg Row Size | Growth Rate | Access Pattern |
|---|---|---|---|---|
| CUSTOMER | 10M | 400 bytes | 20%/year | Random OLTP |
| PRODUCT | 500K | 600 bytes | 5%/year | Read-heavy |
| ORDER_HEADER | 50M | 250 bytes | 30%/year | OLTP + batch |
| ORDER_LINE | 200M | 120 bytes | 30%/year | OLTP + batch |
| PRODUCT_REVIEW | 15M | 800 bytes | 25%/year | Read-heavy, LOB text |
| INVENTORY | 500K | 80 bytes | 5%/year | High-frequency update |
| CLICK_STREAM | 5B/year | 150 bytes | 40%/year | Insert-only, batch |
For each table, specify: (a) Tablespace type (z/OS: PBG, PBR; LUW: regular, range-partitioned) (b) Page size (c) PCTFREE (d) Compression (YES/NO) (e) Partitioning strategy (if applicable) (f) 3-year storage estimate
Exercise 20: Compression ROI — Full Analysis
Perform a complete compression ROI analysis for the e-commerce CLICK_STREAM table: - 5 billion rows per year - 150 bytes per row - Estimated compression ratio: 65% - Storage cost: $0.10/GB/month - CPU cost: $1.2 million/year for the z/OS LPAR - Compression adds 4% CPU overhead - Retention: 2 years
(a) Calculate uncompressed storage for 2 years. (b) Calculate compressed storage for 2 years. (c) Calculate annual storage savings. (d) Calculate annual CPU cost of compression. (e) Calculate 2-year net benefit. (f) At what compression ratio does the ROI break even?
Exercise 21: Partition Lifecycle Management
Design a partition management procedure for the CLICK_STREAM table (from Exercise 19): (a) Define the partition scheme (key, granularity, number of partitions). (b) Write the DDL for the initial partition definition (first 6 months). (c) Write the SQL/utility commands to add a new monthly partition. (d) Write the SQL/utility commands to archive and detach the oldest partition. (e) Create a calendar-based maintenance schedule for partition management.
Exercise 22: Multi-Workload Buffer Pool Design
A system runs three workloads concurrently: - OLTP: 2,000 transactions/second, random single-row access, targets 99% buffer pool hit ratio - Reporting: 50 concurrent queries, sequential scans of 3-12 month date ranges - Batch ETL: nightly 4-hour window, full table scans and bulk inserts
Available memory: 128 GB for buffer pools.
(a) Design the buffer pool configuration (names, page sizes, sizes). (b) Explain how you would prevent batch scans from polluting the OLTP buffer pool. (c) What buffer pool monitoring metrics would you check daily? (d) How would you adjust the configuration if the OLTP hit ratio drops to 95%?
Exercise 23: Storage Capacity Planning Presentation
You are the DBA for a hospital system. Management has asked for a 5-year storage plan. The current database has: - 200 tables, total 2.5 TB - Growing at 25% per year (new regulatory requirements) - LOB data (medical images): 8 TB, growing at 40% per year - Indexes: 800 GB - Temp space: 100 GB
(a) Calculate year-by-year storage requirements for all 5 years. (b) Factor in compression (assume 55% ratio on table data, no compression on LOBs). (c) Calculate the total cost at $0.10/GB/month. (d) Identify the single largest cost driver. (e) Propose one technical strategy to control the largest cost driver.
Exercise 24: Physical Design Review Checklist
You have inherited a database with the following issues. For each, identify the problem and propose a solution:
(a) All 200 tables in a single segmented tablespace on z/OS. (b) A 500 GB table with no partitioning, PCTFREE 20, but it is insert-only and never updated. (c) Every tablespace uses 32 KB pages, including tables with 50-byte rows. (d) Compression is disabled on all tables "because it uses too much CPU." (e) The temporary tablespace is 500 MB, but the largest batch sort processes 80 million rows. (f) LOB data (average 100 bytes) is stored in a separate LOB tablespace with no inline LOBs. (g) No storage growth monitoring is in place; the DBA adds space "when users complain."
Exercise 25: Meridian Bank Physical Design Extension
Extend the Meridian Bank physical design from Section 14.11 to include a new regulatory requirement: the bank must now store a complete audit trail of every data modification.
Design the AUDIT_TRAIL table: - Must capture: table name, column name, old value, new value, user ID, timestamp, transaction ID - Expected volume: 50 million audit rows per day - Retention: 10 years - Most queries filter by table name and date range - Must not impact OLTP performance
(a) Design the table structure (columns and data types). (b) Choose the tablespace type and partitioning strategy. (c) Calculate storage for Year 1 and Year 10. (d) Design the compression strategy. (e) Recommend the buffer pool assignment. (f) Write the complete DDL (table, tablespace, partitioning).
Answer Key Notes
Exercises 1-8 have definitive numerical answers. Exercises 9-18 have recommended approaches with justifications. Exercises 19-25 are open-ended design problems; evaluate based on the quality of reasoning, not a single correct answer. Key evaluation criteria: (1) all relevant factors considered, (2) decisions documented with rationale, (3) calculations shown with correct arithmetic, (4) trade-offs acknowledged.