Chapter 16 Quiz: Schema Evolution and Data Migration

DataField.Dev

Chapter 16 Quiz: Schema Evolution and Data Migration

Test your understanding of schema evolution, ALTER TABLE operations, migration strategies, and zero-downtime patterns in DB2. Try to answer each question before revealing the answer.

Question 1

What is the safest and most common schema change you can perform on a production DB2 table during business hours?

Show Answer

Adding a new nullable column (or a column with a DEFAULT value). This operation is essentially instantaneous on both z/OS and LUW, regardless of table size, because DB2 records the new column definition in the catalog without rewriting existing data pages. Existing rows are treated as having NULL (or the default value) in the new column until they are individually updated.

Question 2

On DB2 for z/OS, what is the difference between REORG PENDING and ADVISORY REORG?

Show Answer

**REORG PENDING (REORP)** means the tablespace must be reorganized before data can be accessed. SQL statements against the object will receive SQLCODE -904 (resource unavailable). This is triggered by changes that alter the physical row format (e.g., changing SMALLINT to INTEGER). **ADVISORY REORG (AREOR)** means DB2 recommends a REORG but does not require one. Data is fully accessible. This is typically triggered by adding a nullable column — DB2 can handle the discrepancy between old rows (without the new column) and new rows (with the column) at read time, but a REORG would improve efficiency.

Question 3

You need to widen a CHAR(5) column to CHAR(10) on z/OS. What pending state does this produce, and why?

Show Answer

This produces **REORG PENDING**. Because CHAR is a fixed-length data type, every existing row physically stores exactly 5 bytes for this column. Changing to CHAR(10) means every row needs to be rewritten with 10 bytes (padded with spaces). Unlike VARCHAR widening (which is just a metadata change), CHAR widening changes the physical row length, requiring reorganization before the data can be accessed.

Question 4

What is the expand-then-contract pattern, and when would you use it?

Show Answer

The expand-then-contract pattern is a three-phase approach for complex schema changes: 1. **Expand**: Add the new structure (column, table) alongside the old one. Both coexist. 2. **Migrate**: Update applications to use the new structure. During this phase, applications may dual-write to both old and new structures. 3. **Contract**: Remove the old structure once all applications have migrated. Use it when ALTER TABLE alone cannot accomplish the change (e.g., changing CHAR to VARCHAR, renaming a column on z/OS, restructuring a table's data model). Its strength is that every phase is independently deployable and reversible.

Question 5

Before dropping a table in production, what four categories of dependencies must you check?

Show Answer

1. **Referential integrity**: Foreign keys from other tables that reference this table. 2. **Views**: Any views that SELECT from or depend on this table. 3. **Programmatic objects**: Stored procedures, functions, and triggers that reference this table. 4. **Packages/Plans** (z/OS) or **cached statements**: Compiled SQL that references the table. Additionally, you should check for materialized query tables (MQTs) based on the table, and any application code outside the database that references it.

Question 6

On LUW, what is the difference between an inplace REORG and a classic REORG?

Show Answer

**Classic (offline) REORG** rebuilds the table from scratch. The table may be unavailable for write operations (reads may be allowed with ALLOW READ ACCESS). It is more thorough. **Inplace (online) REORG** reorganizes the table while it remains fully accessible for reads and writes. It runs in three phases: (1) Forward phase — scans and reorganizes pages while applications continue normally, (2) Replay phase — applies logged changes that occurred during reorganization, (3) Truncate phase — reclaims empty pages with a brief exclusive lock. Inplace REORG can also be paused, resumed, and stopped, which is useful during unexpected peak load.

Question 7

Why must you map Oracle DATE columns to DB2 TIMESTAMP (not DB2 DATE) when migrating from Oracle to DB2?

Show Answer

Oracle's DATE data type includes both date and time components (hours, minutes, seconds). DB2's DATE data type stores only the date (year, month, day) with no time component. If you map Oracle DATE to DB2 DATE, you lose all time information. Mapping to DB2 TIMESTAMP preserves the full date-and-time value. This is the single most common source of bugs in Oracle-to-DB2 migrations.

Question 8

What is the purpose of a mapping table in a z/OS online REORG (SHRLEVEL CHANGE)?

Show Answer

The mapping table tracks where rows move during the reorganization process. When DB2 reorganizes data, rows are physically relocated to new pages. During this process, applications continue reading and writing. The mapping table records the source and target Row IDs (RIDs) so that DB2 can redirect in-flight queries to the correct row location even while data is being rearranged. Without the mapping table, SHRLEVEL CHANGE REORG is not possible.

Question 9

What are the three phases of DB2 for z/OS version migration, and at which phases can you fall back to the previous version?

Show Answer

1. **Compatibility Mode (CM)**: DB2 runs new version code but behaves like the old version. New features are not available. **Fallback is possible** from CM. 2. **Enabling New Function Mode (ENFM)**: New catalog structures and features are progressively activated through multiple sub-stages. **Fallback is possible** from early ENFM stages. 3. **New Function Mode (NFM)**: All new features are available. **Fallback is NOT possible** — returning to the previous version requires a full restore from a pre-migration backup. This phased approach provides multiple validation checkpoints before you commit to the irreversible NFM stage.

Question 10

In a data sharing group on z/OS, how does rolling migration achieve near-zero downtime for a version upgrade?

Show Answer

A data sharing group has multiple DB2 members accessing the same data through the Coupling Facility. During rolling migration: 1. Quiesce one member (drain its workload to remaining members). 2. Upgrade that member to the new version (in Compatibility Mode). 3. Bring it back into the group. 4. Repeat for each remaining member. At every step, the group remains operational with the non-quiesced members handling workload. The Coupling Facility ensures data consistency across members running different code levels during CM. This provides continuous availability because at least N-1 members are always active.

Question 11

On LUW, what is the difference between IMPORT and LOAD for bulk data loading?

Show Answer

| Feature | IMPORT | LOAD | |---|---|---| | **Mechanism** | Uses SQL INSERT internally | Writes directly to data pages | | **Speed** | Slower (typically 5-10x) | Faster | | **Logging** | Full logging | Minimal or no logging | | **Triggers** | Fired for each row | Not fired | | **Constraints** | Checked per row during insert | Checked after load (via SET INTEGRITY) | | **Table availability** | Table remains fully available | May enter LOAD PENDING state | | **Recoverability** | Table is immediately recoverable | May need backup (if NONRECOVERABLE) | Use IMPORT for small data sets or when triggers must fire. Use LOAD for large data migrations where speed is critical.

Question 12

You need to add a NOT NULL constraint to a column that currently contains NULL values. What steps must you take?

Show Answer

1. **Identify NULLs**: Query the table to find all rows where the column IS NULL. 2. **Fix NULLs**: UPDATE the NULL values to an appropriate non-NULL value. For large tables, run this in batches with COMMIT between batches to avoid lock escalation and log space issues. 3. **Verify**: Confirm that zero rows have NULL in the column. 4. **Add constraint**: Run `ALTER TABLE ... ALTER COLUMN ... SET NOT NULL`. The ALTER will fail if any NULL values remain. The batch UPDATE in step 2 is often the most time-consuming and resource-intensive part of the process.

Question 13

What is the dual-write pattern, and how does it differ from the shadow table approach?

Show Answer

**Dual-write pattern**: The application is modified to write to both old and new tables on every INSERT, UPDATE, and DELETE. This requires application code changes. After backfilling historical data and validating consistency, reads are switched to the new table, then dual-writing is stopped. **Shadow table approach**: A trigger (or replication mechanism) on the old table automatically propagates changes to the new table. No application code changes are needed for the synchronization — the database handles it. The key difference is where the synchronization logic lives: in the application (dual-write) or in the database (shadow table). Shadow tables are simpler for applications but add database overhead. Dual-write gives more control but requires coordinated application changes.

Question 14

What does the db2move utility do, and when would you use it instead of individual EXPORT/IMPORT commands?

Show Answer

db2move exports or imports all tables in a database (or a specified subset) in a single command. It generates individual IXF files for each table and a control file that tracks the mapping. Use db2move instead of individual EXPORT/IMPORT when: - You are migrating an entire schema or database (many tables). - You want a single command to handle all tables rather than writing individual scripts. - You need to move data between LUW systems as part of a migration or environment refresh. db2move supports EXPORT, IMPORT, LOAD, and COPY modes. For large migrations, the LOAD mode is significantly faster than IMPORT mode.

Question 15

When performing an expand-then-contract column migration, why is it important to separate the schema change deployment from the application change deployment?

Show Answer

Separating schema and application deployments provides several benefits: 1. **Independent validation**: You can verify that the schema change (adding the new column) succeeds without the risk of the application deployment failing simultaneously. 2. **Simpler rollback**: If the application deployment causes issues, you can roll back the application without touching the schema. If the schema change causes issues, you can drop the new column without touching the application. 3. **Reduced blast radius**: A combined deployment has two potential failure points; separating them means each deployment is simpler and has fewer things that can go wrong. 4. **Flexible timing**: The schema change (adding a nullable column) can be done during business hours because it is instant. The application deployment may need a different window. This principle — separate schema changes from application changes — is fundamental to safe production database management.

Question 16

What is the REORGCHK command on LUW, and what do its three formulas (F1, F2, F3) measure?

Show Answer

REORGCHK assesses whether a table needs reorganization by evaluating three formula-based thresholds: - **F1**: Measures the percentage of overflow rows — rows that were updated and no longer fit on their original page, so they were moved to a different page with a pointer left behind. High overflow rows degrade performance. - **F2**: Measures free space utilization — how much wasted space exists within data pages. High wasted space means the table is using more pages (and buffer pool) than necessary. - **F3**: Measures page fetch efficiency for sequential access — how many additional page fetches are needed compared to an optimally organized table. If three or more formulas exceed their thresholds (marked with asterisks), DB2 recommends a REORG. However, the decision should also consider the table's access patterns, the impact of REORG on running applications, and whether the performance degradation is actually causing problems.

Question 17

In an Oracle-to-DB2 migration, what are the key differences in exception handling between PL/SQL and SQL PL?

Show Answer

Key differences: 1. **Named exceptions**: Oracle uses named exceptions in EXCEPTION blocks (e.g., `WHEN NO_DATA_FOUND THEN`). DB2 uses DECLARE HANDLER or checks SQLSTATE directly. 2. **Raising errors**: Oracle uses `RAISE_APPLICATION_ERROR(-20001, 'message')`. DB2 uses `SIGNAL SQLSTATE 'xxxxx' SET MESSAGE_TEXT = 'message'`. 3. **Custom exceptions**: Oracle allows declaring and raising custom exceptions. DB2 uses custom SQLSTATE values with SIGNAL. 4. **Handler scope**: DB2 handlers are declared at the beginning of a compound statement and have specific continuation behavior (CONTINUE or EXIT). Oracle exception blocks are at the end of a BEGIN...END block. 5. **Implicit exception propagation**: Oracle automatically propagates unhandled exceptions to the caller. DB2 requires explicit handling or the compound statement terminates with the error.

Question 18

What is the critical difference between LOG YES and LOG NO on a z/OS LOAD utility, and when would you choose each?

Show Answer

**LOG YES**: Every row inserted by LOAD is written to the DB2 log. The table is recoverable to any point in time after the LOAD. This is safe but slower, especially for large loads, and can consume enormous amounts of log space. **LOG NO**: Individual row inserts are not logged. The LOAD is dramatically faster and consumes minimal log space. However, the tablespace is placed in COPY PENDING status — it is not recoverable until you take an image copy (backup). If the system crashes after a LOG NO load but before an image copy, you must re-run the LOAD from the input data. **Choose LOG YES** when: the load is small, or you need point-in-time recoverability, or you cannot schedule an immediate image copy afterward. **Choose LOG NO** when: the load is large (millions of rows), speed is critical, and you can take an image copy immediately after the LOAD completes.

Question 19

Describe the HADR-based migration approach on LUW. What makes it suitable for near-zero-downtime migration?

Show Answer

HADR-based migration works by setting up the new system as an HADR standby: 1. Configure HADR between old system (primary) and new system (standby). 2. HADR continuously ships and replays log records from primary to standby, keeping them synchronized. 3. When ready, perform a TAKEOVER on the standby, making it the new primary. 4. Redirect application connections to the new system. It is suitable for near-zero-downtime because: - Synchronization happens continuously in the background while the old system runs normally. - The takeover operation itself takes only seconds to minutes. - The old system can be kept as a standby (providing fallback). - DB2 supports HADR between certain version pairs, allowing the standby to run a newer version. The brief downtime window is only the time needed for the takeover and connection redirect, typically measurable in seconds to low minutes.

Question 20

Why should you never type ALTER TABLE directly into a production DB2 console? What process should be followed instead?

Show Answer

Typing ALTER TABLE directly into a production console bypasses every safety mechanism: - **No review**: Nobody else has verified the SQL syntax or assessed the impact. - **No testing**: The change has not been validated in a lower environment. - **No rollback script**: If the change causes problems, there is no pre-written reversal. - **No audit trail**: There is no record of who made the change, when, or why. - **No impact assessment**: You may not have checked whether the change triggers REORG PENDING, invalidates packages, or locks the table. The correct process is: 1. Write the change as a SQL script. 2. Peer-review the script. 3. Test the script in DEV, then QA, then STAGING. 4. Write and test the rollback script. 5. Assess impact (REORG requirements, lock behavior, duration estimate). 6. Schedule the change in a change management system. 7. Execute the reviewed, tested script in production through an automated deployment tool or under four-eyes supervision. 8. Validate the result.

Question 21

You are migrating from DB2 10.5 to DB2 11.5 on LUW. After running db2iupgrade and UPGRADE DATABASE, what three critical post-upgrade tasks must you perform?

Show Answer

1. **REBIND ALL PACKAGES**: Recompile all SQL packages so they can take advantage of the new version's optimizer improvements and use updated catalog structures. Without rebinding, packages run in compatibility mode and may miss performance improvements. 2. **RUNSTATS on system catalog tables**: Update statistics on the SYSIBM system tables so the optimizer has accurate information about the catalog structure after migration. The catalog may have changed in structure or content. 3. **Validate with INSPECT CHECK DATABASE**: Verify catalog and database integrity after the upgrade to detect any corruption or inconsistency introduced during the migration process. Additionally, you should review DBM and DB configuration parameters for deprecated or changed defaults, run application smoke tests, and verify that backup and recovery procedures work with the new version.

Question 22

What is view-based abstraction as a schema evolution strategy, and what is its main limitation?

Show Answer

View-based abstraction places a view layer between applications and physical tables. Applications query and (where possible) modify data through views rather than directly accessing tables. When the physical table structure changes, you update the view definition to present the data in the expected format, without changing application code. For example, if you split a CUSTOMER table into CUSTOMER and CUSTOMER_CONTACT, you can create a view named V_CUSTOMER that joins both tables and presents the original column set. **Main limitation**: It works best when planned from the beginning. Retrofitting view-based abstraction onto a system where applications already query tables directly requires a coordinated application migration to the view layer — which is itself a significant project. Additionally, not all operations can be performed through views (certain updates, inserts into views with joins), and views add a layer of complexity to query optimization.

Question 23

In the Meridian Bank schema evolution scenario, why is the INTEGER-to-BIGINT conversion scheduled in a separate maintenance window (Week 4) rather than combined with the column additions (Week 1)?

Show Answer

The INTEGER-to-BIGINT conversion is isolated because it is the only change that: 1. **Triggers REORG PENDING on z/OS**, making the table inaccessible until REORG completes. 2. **Requires a full table REORG** on both platforms, which for a 1.8 billion row table could take hours. 3. **Cannot be performed online** — unlike adding a nullable column (instant) or widening a VARCHAR (instant), changing the physical width of a column requires rewriting every row. 4. **Has the highest risk** — if the REORG fails partway through, recovery may require restoring from backup. By isolating this change, Week 1's safe changes can be validated independently, and the risky change gets its own maintenance window with full team attention and a dedicated rollback plan. This is the principle of separating safe changes from risky ones.

Question 24

What IBM tool automates Oracle-to-DB2 conversion, and what is its typical automatic conversion rate?

Show Answer

The **IBM Database Conversion Workbench (DCW)** automates Oracle-to-DB2 conversion. It performs schema analysis, DDL conversion, SQL translation (including stored procedures), data type mapping, and generates detailed compatibility reports. DCW typically converts **70-85% of schema objects automatically**. The remaining 15-30% require manual work, usually concentrated in: - Complex PL/SQL packages - Dynamic SQL with Oracle-specific syntax - Oracle-specific features (database links, materialized view refresh groups, Oracle optimizer hints) - Cursor handling differences between PL/SQL and SQL PL - Exception handling patterns The DCW report identifies exactly which objects need manual intervention and provides estimated effort, making it invaluable for migration planning.

Question 25

Explain why "schema rigidity" (being afraid to change the schema) is more dangerous than schema evolution.

Show Answer

When teams are afraid to modify the schema, they do not stop needing new data structures — they just find workarounds that bypass the relational model: 1. **Column overloading**: Repurposing existing columns to hold data they were not designed for, destroying data type safety. 2. **Shadow tables**: Creating unofficial parallel tables with no referential integrity, leading to data inconsistency. 3. **Application-layer encoding**: Storing structured data as JSON, CSV, or delimited strings in VARCHAR columns, making the data unqueryable by SQL and uncheckable by constraints. 4. **EAV patterns**: Generic key-value tables that destroy type safety, query performance, and referential integrity. Each workaround trades short-term convenience for compounding technical debt. The cost of remediation grows exponentially over time. Managed schema evolution — scripted, tested, reversible changes — is always cheaper than cleaning up years of workarounds.

Return to Chapter 16 | Continue to Exercises