Chapter 9 Quiz: Data Modification

DataField.Dev

Chapter 9 Quiz: Data Modification

Test your understanding of Chapter 9. Attempt all questions from memory before revealing the answers. The act of retrieval -- even when you get it wrong -- strengthens learning more than re-reading.

Scoring Guide: - 20-25 correct: Excellent -- you have strong command of the material - 15-19 correct: Good -- review the sections where you missed questions - 10-14 correct: Fair -- re-read the chapter with the quiz questions in mind - Below 10: Re-read the chapter carefully, then retake this quiz in 48 hours

Multiple Choice Questions

Question 1

Why should you always include a column list in an INSERT statement?

A) DB2 requires it; INSERT without a column list is a syntax error B) It improves performance by telling the optimizer which columns to write C) It makes the intent explicit, survives schema changes, and is self-documenting D) It prevents duplicate key violations

Answer

**C) It makes the intent explicit, survives schema changes, and is self-documenting** A column list is technically optional in DB2, but omitting it means the INSERT depends on the table's column order. If a column is added later, the INSERT breaks. Including the column list makes the statement resilient to schema changes and readable without consulting the table definition.

Question 2

In a multi-row INSERT, what is the primary performance advantage over separate single-row INSERT statements?

A) Multi-row INSERT bypasses constraint checking B) Multi-row INSERT uses less disk space C) Reduced network round trips, more efficient logging, and fewer lock acquisitions D) Multi-row INSERT automatically runs in parallel

Answer

**C) Reduced network round trips, more efficient logging, and fewer lock acquisitions** A single multi-row INSERT sends all data in one network round trip, allows DB2 to batch log writes, and acquires locks once rather than once per statement. Constraint checking still occurs for every row.

Question 3

What is the "Halloween Problem" in the context of INSERT...SELECT?

A) A performance bug that occurs on October 31st due to date formatting B) An INSERT reading from the same table it writes to might see its own newly inserted rows and loop C) A concurrency issue where two sessions insert the same row simultaneously D) A bug where DEFAULT values are not applied to inserted rows

Answer

**B) An INSERT reading from the same table it writes to might see its own newly inserted rows and loop** The Halloween Problem occurs when a query reads and writes the same table, potentially processing newly inserted rows. DB2 prevents this by materializing the SELECT result before performing the INSERT. The name comes from the date it was first discovered at IBM.

Question 4

In an UPDATE with SET, when are the expressions in the SET clause evaluated?

A) Left to right, so later columns see the effects of earlier assignments B) Right to left, so earlier columns see the effects of later assignments C) All expressions are evaluated using the row's pre-update values D) Each expression is evaluated independently in an unpredictable order

Answer

**C) All expressions are evaluated using the row's pre-update values** DB2 evaluates all SET expressions against the "before" picture of the row. The order of columns in the SET clause does not matter. This means `SET a = b, b = a` successfully swaps the values of a and b.

Question 5

What happens if you execute an UPDATE without a WHERE clause?

A) DB2 raises an error and prevents the statement from executing B) DB2 prompts for confirmation before proceeding C) Every row in the table is updated D) Only the first row in the table is updated

Answer

**C) Every row in the table is updated** DB2 does not warn you or prompt for confirmation. An UPDATE without WHERE affects every row in the table. This is why the golden rule is: write the WHERE clause first, test with SELECT, and never use autocommit for interactive DML.

Question 6

What is the key difference between DELETE and TRUNCATE TABLE?

A) DELETE is faster because it does not log the operation B) TRUNCATE fires triggers for each row; DELETE does not C) DELETE logs each row individually and fires triggers; TRUNCATE uses minimal logging and does not fire triggers D) There is no difference; TRUNCATE is just an alias for DELETE

Answer

**C) DELETE logs each row individually and fires triggers; TRUNCATE uses minimal logging and does not fire triggers** DELETE writes a before-image log record for every row and fires DELETE triggers. TRUNCATE uses minimal logging (recording the action itself, not individual rows) and skips trigger execution. This makes TRUNCATE dramatically faster for clearing large tables.

Question 7

Which ON DELETE action prevents deletion of a parent row when child rows exist?

A) ON DELETE CASCADE B) ON DELETE SET NULL C) ON DELETE RESTRICT D) ON DELETE IGNORE

Answer

**C) ON DELETE RESTRICT** ON DELETE RESTRICT (and ON DELETE NO ACTION) prevent the DELETE from proceeding if any child rows reference the parent. CASCADE deletes the child rows. SET NULL sets the foreign key columns to NULL. There is no ON DELETE IGNORE in DB2.

Question 8

What does the MERGE statement combine into a single operation?

A) SELECT and INSERT B) INSERT and UPDATE (and optionally DELETE) based on match conditions C) CREATE TABLE and INSERT D) DELETE and TRUNCATE

Answer

**B) INSERT and UPDATE (and optionally DELETE) based on match conditions** MERGE examines each source row against the target table using an ON condition. For matching rows, it can UPDATE or DELETE. For non-matching rows, it can INSERT. This is the "upsert" pattern.

Question 9

In a MERGE statement, why is it important that the ON clause references indexed columns on the target table?

A) DB2 requires an index for MERGE to compile B) Without an index, MERGE performs a table scan for every source row, which is catastrophic for large tables C) Indexes prevent the MERGE from inserting duplicate rows D) The ON clause does not affect performance

Answer

**B) Without an index, MERGE performs a table scan for every source row, which is catastrophic for large tables** For each source row, DB2 must determine if a matching target row exists. If the ON clause columns are indexed, DB2 uses an index lookup. Without an index, DB2 scans the entire target table for each source row, resulting in O(n * m) performance.

Question 10

After a COMMIT, what is the only way to "undo" the changes?

A) ROLLBACK B) ROLLBACK TO SAVEPOINT C) Execute compensating SQL (reverse INSERT/UPDATE/DELETE statements) D) It is impossible to undo committed changes

Answer

**C) Execute compensating SQL (reverse INSERT/UPDATE/DELETE statements)** COMMIT makes changes permanent. ROLLBACK only undoes uncommitted changes. After COMMIT, the only option is to write SQL that reverses the effect (delete inserted rows, update changed values back, re-insert deleted rows) -- or restore from backup.

Question 11

What is the DB2 term for a transaction?

A) Logical session B) Work packet C) Unit of work (UOW) D) Commit scope

Answer

**C) Unit of work (UOW)** DB2 uses the term "unit of work" to describe a transaction -- a sequence of SQL statements that form a logical unit and are committed or rolled back as a group.

Question 12

What happens if a DB2 for LUW connection drops unexpectedly during a transaction?

A) The pending changes are automatically committed B) The pending changes are automatically rolled back C) The changes are saved in a pending state until the next connection D) The behavior is undefined

Answer

**B) The pending changes are automatically rolled back** When a connection terminates abnormally, DB2 rolls back any uncommitted work. This protects against partial transactions that could leave the database in an inconsistent state.

Question 13

On DB2 for z/OS, what happens to pending DML changes when you execute a CREATE TABLE statement?

A) The pending changes are implicitly committed before the DDL executes B) The DDL is part of the current unit of work and can be rolled back along with the DML C) DB2 raises an error; you must COMMIT before executing DDL D) The pending changes are automatically rolled back

Answer

**B) The DDL is part of the current unit of work and can be rolled back along with the DML** On z/OS, DDL does NOT implicitly commit. This is a significant difference from DB2 for LUW (and many other databases), where DDL causes an implicit commit. On z/OS, you can ROLLBACK a CREATE TABLE.

Question 14

What is the primary risk of running a large batch UPDATE in a single transaction without committing?

A) The data changes will not be written to disk B) Active log space may be exhausted, causing a log-full condition that halts all database activity C) DB2 will automatically partition the UPDATE into smaller batches D) Other users will see the partially updated data

Answer

**B) Active log space may be exhausted, causing a log-full condition that halts all database activity** Every modification generates log records. These log records must remain in the active log until the transaction commits or rolls back. A very large transaction can fill the active log, causing DB2 to stop all processing until log space is freed.

Question 15

What does ROLLBACK TO SAVEPOINT do?

A) Rolls back the entire transaction to the beginning B) Rolls back only the changes made after the named savepoint, leaving earlier changes intact C) Commits all changes up to the savepoint and rolls back the rest D) Deletes the savepoint without affecting any data

Answer

**B) Rolls back only the changes made after the named savepoint, leaving earlier changes intact** ROLLBACK TO SAVEPOINT undoes changes made after the savepoint was set, but changes made before the savepoint remain. The transaction is still active -- no COMMIT has occurred. You can continue working and eventually COMMIT or ROLLBACK the entire transaction.

Short Answer Questions

Question 16

Explain the difference between GENERATED ALWAYS and GENERATED BY DEFAULT for identity columns. When would you choose each?

Answer

**GENERATED ALWAYS:** DB2 always generates the value. You cannot specify a value in the INSERT statement. Use this for new tables where DB2 should control all key generation and you do not need to migrate existing data with pre-assigned keys. **GENERATED BY DEFAULT:** DB2 generates a value only if you do not provide one. If you supply a value, DB2 uses yours. Use this during data migration when you need to preserve existing key values from a source system while also allowing DB2 to generate keys for new rows. The risk with GENERATED BY DEFAULT is duplicate key errors if a manually supplied value collides with a future generated value.

Question 17

What is the purpose of CACHE on a sequence object? What happens to cached values if DB2 crashes?

Answer

**CACHE** tells DB2 to pre-allocate a block of sequence values in memory. When a session requests NEXT VALUE FOR, DB2 can return the next cached value without updating the catalog on disk. This significantly improves performance for high-frequency INSERT operations. If DB2 crashes, the cached values that have not been used are **lost** -- they are never assigned. When DB2 restarts, it continues generating values from the start of the next cache block. This creates gaps in the sequence. For example, with CACHE 50: if DB2 cached values 1000-1049, used values 1000-1023, and then crashed, values 1024-1049 are lost. On restart, the next value generated would be 1050.

Question 18

Describe the SELECT FROM INSERT pattern using FINAL TABLE. What problem does it solve?

Answer

The SELECT FROM INSERT pattern wraps an INSERT inside a SELECT:

SELECT customer_id FROM FINAL TABLE (
    INSERT INTO meridian.customer (first_name, last_name)
    VALUES ('David', 'Park')
);

FINAL TABLE returns the row as it exists after the INSERT, including auto-generated values (identity columns, defaults, trigger modifications). **Problem it solves:** Without this pattern, retrieving a generated key requires a separate query after the INSERT (using IDENTITY_VAL_LOCAL() or PREVIOUS VALUE FOR). The SELECT FROM INSERT pattern does both in one atomic operation, eliminating the possibility of a race condition and reducing the number of round trips.

Question 19

What is lock escalation? Why does it matter for batch UPDATE and DELETE operations?

Answer

**Lock escalation** occurs when DB2 converts many fine-grained locks (row or page) into a single coarse-grained lock (table level). This happens when the number of individual locks held by a transaction exceeds a platform-specific threshold. **Why it matters for batch DML:** 1. A table-level exclusive lock blocks ALL other users from reading or writing the table. 2. In a production environment, this can cause cascading application timeouts and failures. 3. Batch operations that update or delete many rows without committing are prime candidates for lock escalation. **Prevention:** Commit frequently in batch operations to release locks periodically. Process data in smaller batches. Or, if exclusive access is acceptable, acquire a table-level lock explicitly at the start (avoiding the overhead of individual lock acquisition and escalation).

Question 20

List the five data modification safety patterns described in Section 9.10. Briefly explain why each matters.

Answer

1. **Always test with SELECT first.** Run your WHERE clause in a SELECT to verify which rows will be affected before executing UPDATE or DELETE. Prevents accidental modification of wrong rows. 2. **Write WHERE before SET.** When typing an UPDATE, write the WHERE clause first. Prevents the accidental submission of an UPDATE without a WHERE clause. 3. **Disable autocommit for interactive DML.** Ensures that mistakes can be rolled back before they become permanent. With autocommit ON, every statement is immediately committed. 4. **Verify row counts.** After every DML statement, check how many rows were affected. If the count does not match your expectation, ROLLBACK and investigate. 5. **Use transactions and appropriate commit frequency.** Wrap related changes in a transaction. For batch operations, commit at regular intervals to manage log space and lock accumulation.

Application Questions

Question 21

A developer writes the following MERGE to update customer emails from a staging table:

MERGE INTO meridian.customer AS t
USING meridian.email_updates AS s
ON t.email = s.old_email
WHEN MATCHED THEN
    UPDATE SET t.email = s.new_email;

Identify at least two problems with this MERGE. How would you fix them?

Answer

**Problem 1: The ON clause matches on email, not a unique key.** Multiple customers might share the same email address (or email might be NULL). This can cause ambiguous matches and unexpected updates. The ON clause should use a unique identifier like customer_id. **Problem 2: No NOT MATCHED clause.** If a new email appears in the staging table for a customer not yet in the system, the MERGE silently ignores it. Depending on business requirements, this might be fine, but it should be a deliberate design choice, not an oversight. **Problem 3: No index on the match column.** If the ON clause uses email and there is no index on meridian.customer(email), DB2 performs a table scan for every staging row. **Fixed version:**

MERGE INTO meridian.customer AS t
USING meridian.email_updates AS s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN
    UPDATE SET t.email = s.new_email;

Question 22

Explain why Meridian Bank should use ON DELETE RESTRICT instead of ON DELETE CASCADE for the foreign key from meridian.account to meridian.customer. Give a specific scenario where CASCADE would cause a serious business problem.

Answer

**Why RESTRICT:** In a financial system, data deletion must be deliberate and auditable. ON DELETE RESTRICT prevents accidental data loss by requiring explicit removal of dependent records before a parent can be deleted. This forces a multi-step process with verification at each step. **CASCADE danger scenario:** An administrator accidentally deletes customer_id 10001 (perhaps intending to delete customer_id 10002 -- a simple typo). With ON DELETE CASCADE: - All of customer 10001's accounts are deleted - All transaction history for those accounts is deleted - All audit records tied to those accounts may be deleted The customer's entire financial history vanishes in a single DELETE statement. With ON DELETE RESTRICT, the DELETE fails immediately because accounts exist, giving the administrator the chance to realize the mistake before any data is lost. In banking, data is never truly "deleted" -- it is archived or marked inactive. CASCADE contradicts this principle.

Question 23

Design a stored procedure outline for a batch interest calculation that processes 500,000 savings accounts. Include error handling, commit frequency, restart capability, and logging. You do not need to write complete SQL -- pseudocode with key SQL statements is sufficient.

Answer

PROCEDURE calculate_monthly_interest(
    IN p_batch_date DATE,
    IN p_batch_size INTEGER DEFAULT 5000
)

-- Step 1: Check if this batch already ran (restart capability)
SELECT COUNT(*) INTO v_existing
FROM meridian.batch_control
WHERE batch_name = 'INTEREST'
  AND batch_date = p_batch_date
  AND status = 'COMPLETE';

IF v_existing > 0 THEN
    SIGNAL error 'Batch already completed for this date';
END IF;

-- Step 2: Get the last successfully processed account_id (for restart)
SELECT COALESCE(MAX(last_processed_id), 0) INTO v_last_processed
FROM meridian.batch_control
WHERE batch_name = 'INTEREST'
  AND batch_date = p_batch_date
  AND status = 'IN_PROGRESS';

-- Step 3: Open cursor for unprocessed accounts
DECLARE cur CURSOR FOR
    SELECT account_id, current_balance, interest_rate
    FROM meridian.account
    WHERE account_type = 'SAVINGS'
      AND status = 'ACTIVE'
      AND current_balance > 0
      AND account_id > v_last_processed
    ORDER BY account_id;

-- Step 4: Process in batches
SET v_count = 0;
FOR each row from cursor:
    -- Calculate interest
    SET v_interest = ROUND(current_balance * (interest_rate / 12), 2);

    -- Insert transaction record
    INSERT INTO meridian.transaction_history (...);

    -- Update balance
    UPDATE meridian.account SET current_balance = current_balance + v_interest
    WHERE account_id = current account_id;

    SET v_count = v_count + 1;

    IF MOD(v_count, p_batch_size) = 0 THEN
        -- Update batch control with checkpoint
        UPDATE meridian.batch_control
        SET last_processed_id = current account_id,
            rows_processed = v_count;
        COMMIT;
        -- Log progress
    END IF;
END FOR;

-- Step 5: Mark batch complete
UPDATE meridian.batch_control
SET status = 'COMPLETE', rows_processed = v_count;
COMMIT;

**Key design decisions:** - **Restart capability:** The batch_control table tracks the last successfully processed account_id. On restart, processing resumes from where it left off. - **Commit frequency:** Every 5,000 rows (configurable) to manage log space and locks. - **Ordered processing:** Accounts are processed in account_id order to enable deterministic restart. - **Error handling:** Not shown in pseudocode, but each batch should include a SAVEPOINT so that if a single account fails, the batch can skip it and continue.

Question 24

Compare the total log volume generated by each approach for loading 1 million rows into a staging table:

A) Individual INSERT statements with autocommit ON B) Multi-row INSERT with 1,000 rows per statement, committing every 10 statements C) INSERT...SELECT from a source table D) LOAD utility E) INSERT with NOT LOGGED INITIALLY (LUW)

Rank them from most to least log volume and explain your ranking.

Answer

**Ranking (most to least log volume):** 1. **A) Individual INSERT with autocommit ON** -- Most log volume. Each of the 1 million INSERTs generates a log record AND a commit record. The commit records alone are significant overhead. Total: approximately 1 million insert log records + 1 million commit log records. 2. **C) INSERT...SELECT** -- Full logging of all 1 million rows, but only one commit record. Still logs every row's after-image. Total: approximately 1 million insert log records + 1 commit record. 3. **B) Multi-row INSERT with batched commits** -- Same row-level logging as C, but with 100 commit records (1,000 rows * 10 statements = 10,000 rows per commit, 100 commits for 1 million rows). Logging is essentially the same as C but with slightly more commit overhead. In practice, very similar to C. 4. **D) LOAD utility** -- Minimal logging. LOAD writes data pages directly and logs only control information (which pages were modified, not the individual row data). Log volume is typically 1-5% of the data volume. 5. **E) INSERT with NOT LOGGED INITIALLY** -- Near-zero logging for the INSERT itself. Only the COMMIT is logged plus metadata about the table's state. However, the table is unrecoverable if the transaction fails. **Note:** The exact log volumes depend on row size, page size, and DB2 configuration, but the relative ranking is consistent.

Question 25

A junior DBA asks: "Why can't sequences just keep track of which values were actually used and reclaim the ones from rolled-back transactions? That way there would be no gaps."

Write a response that explains: 1. Why this is technically difficult 2. Why it would harm performance 3. Why gaps in sequences do not actually matter for most applications

Answer

**1. Technical difficulty:** To reclaim rolled-back values, the sequence would need to maintain a "free list" of returned values and check it before generating new ones. This requires persistent storage (a table or catalog update for every transaction outcome) and coordination across concurrent sessions. If session A gets value 1000, session B gets 1001, and then session A rolls back, the sequence must somehow insert 1000 back into the available pool and ensure the next requestor gets 1000 before 1002. **2. Performance impact:** Sequences are designed for extreme performance -- NEXT VALUE FOR must be nearly instantaneous because it is called on every INSERT. Adding a reclamation mechanism would require: - A persistent free list with its own locking - Additional I/O for every NEXT VALUE FOR call (check the free list first) - Serialization across sessions to prevent two sessions from claiming the same reclaimed value - This would transform a near-zero-cost operation into an expensive, serialized one, destroying INSERT throughput **3. Why gaps don't matter:** A primary key's job is to uniquely identify a row. It does not need to be contiguous. Gaps carry no semantic meaning -- customer_id 10001 followed by 10003 (skipping 10002) does not mean a customer was lost. Applications should never depend on sequence contiguity. If gapless numbering is legally required (some invoice number regulations), it must be implemented with explicit application-level serialization and accepted as a significant performance bottleneck.

Return to Chapter 9 | Continue to Case Study 1