Chapter 33 Quiz: JDBC, ODBC, and Python Access

DataField.Dev

Chapter 33 Quiz: JDBC, ODBC, and Python Access

Test your understanding of DB2 application connectivity. Each question has one best answer.

Question 1

What is the primary difference between a JDBC Type 2 and Type 4 driver?

A) Type 2 supports stored procedures; Type 4 does not B) Type 2 uses native DB2 client libraries; Type 4 is pure Java communicating directly over TCP/IP C) Type 2 is for DB2 LUW only; Type 4 is for DB2 on z/OS only D) Type 2 supports connection pooling; Type 4 does not

Answer

**B) Type 2 uses native DB2 client libraries; Type 4 is pure Java communicating directly over TCP/IP** The Type 2 driver requires a local DB2 client installation and uses native CLI libraries for communication. The Type 4 driver is pure Java, speaking DRDA protocol directly over TCP/IP with no native dependencies. Type 4 is the standard choice for modern deployments, especially in containerized environments where installing a DB2 client is impractical.

Question 2

Why should you use PreparedStatement instead of Statement with string concatenation?

A) PreparedStatement is faster for single-use queries B) PreparedStatement prevents SQL injection and enables DB2 to reuse the access plan C) Statement does not support SELECT queries D) PreparedStatement automatically creates database indexes

Answer

**B) PreparedStatement prevents SQL injection and enables DB2 to reuse the access plan** PreparedStatement sends parameter values separately from the SQL text, preventing user input from altering the SQL structure (SQL injection). Because the SQL text is identical across executions (only parameter values change), DB2 can cache and reuse the compiled access plan from the package cache. Statement with concatenation creates unique SQL text for each value set, defeating the cache and requiring recompilation every time.

Question 3

What is the recommended approach for connection management in a production Java web application?

A) Create a new connection for each request using DriverManager.getConnection() B) Use a single shared connection for all request threads C) Use a connection pool (e.g., HikariCP) that maintains pre-established connections D) Create a new DataSource instance for each request

Answer

**C) Use a connection pool (e.g., HikariCP) that maintains pre-established connections** Connection pooling maintains a cache of pre-established database connections. Threads borrow connections from the pool and return them when done. This eliminates the overhead of creating new connections (TCP handshake, TLS negotiation, DRDA authentication) for every request. A single shared connection would serialize all requests. Creating connections per request wastes 50-200ms per request and can overwhelm DB2 with connection storms.

Question 4

In JDBC, how do you check if a column value was NULL after calling rs.getInt()?

A) Check if the returned value is 0 B) Check if the returned value is -1 C) Call rs.wasNull() immediately after the get method D) NULL columns throw a NullPointerException

Answer

**C) Call `rs.wasNull()` immediately after the get method** For primitive types (`getInt`, `getDouble`, `getLong`), JDBC returns 0 or 0.0 for NULL columns -- not null, because primitives cannot be null. The only way to distinguish between a NULL value and a legitimate zero is to call `rs.wasNull()` immediately after the getter. For object types (`getString`, `getBigDecimal`), NULL returns Java null directly.

Question 5

Which JDBC isolation level constant maps to DB2's Cursor Stability (CS)?

A) TRANSACTION_READ_UNCOMMITTED B) TRANSACTION_READ_COMMITTED C) TRANSACTION_REPEATABLE_READ D) TRANSACTION_SERIALIZABLE

Answer

**B) `TRANSACTION_READ_COMMITTED`** The mapping between JDBC and DB2 isolation levels is: `READ_UNCOMMITTED` = UR, `READ_COMMITTED` = CS, `REPEATABLE_READ` = RS, `SERIALIZABLE` = RR. This is confusing because JDBC's `REPEATABLE_READ` maps to DB2's RS (Read Stability), NOT to DB2's RR (Repeatable Read). JDBC's `SERIALIZABLE` maps to DB2's RR. Memorize this mapping -- it causes frequent production issues.

Question 6

What is the purpose of pstmt.addBatch() and pstmt.executeBatch() in JDBC?

A) They execute multiple different SQL statements in parallel B) They accumulate multiple sets of parameters and send them to DB2 in a single network round-trip C) They create multiple database connections for parallel processing D) They enable multi-row FETCH operations from a ResultSet

Answer

**B) They accumulate multiple sets of parameters and send them to DB2 in a single network round-trip** `addBatch()` accumulates a set of parameter values without executing. `executeBatch()` sends all accumulated parameter sets to DB2 in a single network message via DRDA. This dramatically reduces network overhead for bulk insert/update/delete operations -- typically achieving 10-50x throughput improvement over individual statement execution.

Question 7

In Python ibm_db, what is the difference between ibm_db.fetch_tuple() and ibm_db.fetch_assoc()?

A) fetch_tuple returns all rows at once; fetch_assoc returns one at a time B) fetch_tuple returns a tuple indexed by position; fetch_assoc returns a dictionary indexed by column name C) fetch_tuple is for SELECT queries; fetch_assoc is for stored procedures D) There is no difference; they are aliases

Answer

**B) `fetch_tuple` returns a tuple indexed by position; `fetch_assoc` returns a dictionary indexed by column name** `fetch_tuple()` returns values accessed as `row[0]`, `row[1]`, etc. `fetch_assoc()` returns a dictionary accessed as `row['ACCOUNT_ID']`, `row['BALANCE']`. There is also `fetch_both()` which provides both access methods. `fetch_assoc()` is more readable; `fetch_tuple()` has slightly less overhead for performance-critical loops.

Question 8

Why is setting pool_pre_ping=True important on a SQLAlchemy engine in production?

A) It improves query performance by pre-caching execution plans B) It verifies that connections are alive before handing them to application code C) It enables connection pooling D) It pre-compiles all SQL statements at startup

Answer

**B) It verifies that connections are alive before handing them to application code** `pool_pre_ping=True` causes SQLAlchemy to test each connection's liveness (with a lightweight query) before returning it from the pool. This detects stale connections caused by network interruptions, DB2 restarts, firewall timeouts, or `FORCE APPLICATION` commands. Without it, the application may receive a dead connection and fail on the first SQL execution with a confusing error.

Question 9

When using pandas read_sql() with chunksize=50000, what does the function return?

A) A single DataFrame with 50,000 rows B) An iterator that yields DataFrames of up to 50,000 rows each C) A list of 50,000 DataFrames D) An error if the result set exceeds 50,000 rows

Answer

**B) An iterator that yields DataFrames of up to 50,000 rows each** With `chunksize` specified, `read_sql()` returns a generator (iterator) that yields DataFrame chunks. Each chunk contains up to the specified number of rows. This enables processing datasets larger than available memory by handling one chunk at a time. The last chunk may contain fewer rows. Without `chunksize`, the entire result set is loaded into a single DataFrame in memory.

Question 10

What is the correct response when a JDBC application catches SQLCODE -911 (deadlock)?

A) Ignore it -- DB2 automatically retries the transaction B) Close the connection and open a new one C) Catch the SQLException, wait briefly with exponential backoff, and retry the entire transaction D) Increase the LOCKTIMEOUT database configuration parameter

Answer

**C) Catch the SQLException, wait briefly with exponential backoff, and retry the entire transaction** When SQLCODE -911 occurs, DB2 has already rolled back the deadlock victim's entire unit of work. The application must retry the complete transaction from the beginning (not just the failed statement). Exponential backoff (e.g., 100ms, 200ms, 400ms) prevents the same two transactions from immediately deadlocking again. DB2 does NOT automatically retry -- the application is responsible.

Question 11

Which pandas to_sql() option for if_exists is dangerous in production?

A) 'fail' B) 'replace' C) 'append' D) None of them

Answer

**B) `'replace'`** `if_exists='replace'` drops the existing table and recreates it from the DataFrame schema. This destroys the original table definition including indexes, constraints, foreign keys, triggers, grants, and all existing data. In production, this can cause cascading application failures. Always use `'append'` for production tables, or `'fail'` for safety checks.

Question 12

What is the optimal connection pool size for most applications?

A) Equal to the number of concurrent users B) As large as possible (500+) to handle peak load C) Relatively small -- typically (cores * 2) + effective_spindle_count D) Exactly 1 connection per CPU core on the application server

Answer

**C) Relatively small -- typically `(cores * 2) + effective_spindle_count`** This formula, from PostgreSQL research but applicable to DB2, suggests surprisingly small pools. For an 8-core DB2 server with SSD storage, this yields approximately 17 connections. More connections increase DB2 memory consumption, internal latch contention, and context-switching overhead -- actually degrading performance. Even high-throughput systems rarely need more than 50-100 connections per application instance.

Question 13

What happens if you execute SQL via ibm_db.exec_immediate() without first calling ibm_db.autocommit(conn, SQL_AUTOCOMMIT_OFF)?

A) The statement fails with an error B) Each statement is automatically committed individually after execution C) Changes are never committed until you explicitly call ibm_db.commit() D) The behavior depends on the DB2 server's default isolation level

Answer

**B) Each statement is automatically committed individually after execution** By default, ibm_db connections operate in auto-commit mode. Every statement is committed immediately. For multi-statement transactions (like fund transfers), you must explicitly disable auto-commit with `ibm_db.autocommit(conn, ibm_db.SQL_AUTOCOMMIT_OFF)` and then call `ibm_db.commit()` or `ibm_db.rollback()` to control the transaction boundary.

Question 14

When building a SQLAlchemy engine URL for DB2, what is the correct dialect prefix?

A) db2:// B) ibm_db:// C) ibm_db_sa:// D) jdbc:db2://

Answer

**C) `ibm_db_sa://`** SQLAlchemy engine URLs use the format `dialect+driver://`. For DB2 with the ibm_db_sa dialect, the prefix is `ibm_db_sa://`. Some configurations also accept `db2+ibm_db://`. The `jdbc:db2://` prefix is JDBC-specific (Java) and not used in SQLAlchemy. The full URL format is: `ibm_db_sa://user:password@hostname:port/database`.

Question 15

What is the purpose of conn.setAutoCommit(false) in JDBC before performing a fund transfer?

A) It improves query performance by batching network traffic B) It prevents other transactions from reading the accounts C) It groups the debit, credit, and record insert into a single atomic transaction D) It enables the use of PreparedStatement

Answer

**C) It groups the debit, credit, and record insert into a single atomic transaction** With auto-commit enabled (the default), each SQL statement is committed independently. For a fund transfer, this is catastrophic: if the debit succeeds but the credit fails, money vanishes. Setting `autoCommit(false)` means no work is committed until `conn.commit()` is explicitly called. If any step fails, `conn.rollback()` undoes ALL changes, maintaining atomicity.

Question 16

What does pstmt.setFetchSize(500) do in JDBC?

A) Limits the query to return at most 500 rows B) Tells the JDBC driver to retrieve 500 rows per network round-trip from DB2 C) Sets a 500ms timeout for the query D) Allocates 500 bytes of buffer for each row

Answer

**B) Tells the JDBC driver to retrieve 500 rows per network round-trip from DB2** Fetch size controls how many rows the JDBC driver retrieves in each network round-trip from DB2. The default may be as low as 1 row per round-trip, which is extremely inefficient for queries returning many rows. Setting fetch size to 500 means each round-trip retrieves 500 rows, reducing the total number of round-trips for a 100,000-row result set from 100,000 to 200. It does NOT limit the total rows returned.

Question 17

In the db2cli.ini configuration file, what does TxnIsolation=2 mean?

A) Uncommitted Read (UR) B) Cursor Stability (CS / Read Committed) C) Read Stability (RS / Repeatable Read) D) Repeatable Read (RR / Serializable)

Answer

**B) Cursor Stability (CS / Read Committed)** In `db2cli.ini`, the `TxnIsolation` values are: 1=UR (Uncommitted Read), 2=CS (Cursor Stability), 4=RS (Read Stability), 8=RR (Repeatable Read). CS is the most common default for OLTP applications, providing a balance between consistency and concurrency.

Question 18

What is the risk of NOT setting maxLifetime on a HikariCP connection pool?

A) Connections will never be closed B) The pool will grow without bound C) Long-lived connections may become stale due to DNS changes, firewall timeouts, or DB2 restarts D) DB2 will reject connections after a timeout

Answer

**C) Long-lived connections may become stale due to DNS changes, firewall timeouts, or DB2 restarts** Without `maxLifetime`, connections persist indefinitely. This causes problems when: (1) DB2 fails over and DNS resolves to a new IP, but existing connections still point to the old server; (2) firewalls silently drop idle TCP connections after 30-60 minutes, creating "half-open" connections that fail silently; (3) DB2 configuration changes require new connections to take effect. Setting `maxLifetime` to 25-30 minutes forces periodic connection recycling.

Question 19

How does pandas handle DB2 DECIMAL(15,2) columns by default, and why is this a problem for financial data?

A) Pandas uses Python Decimal type, which is exact B) Pandas uses NumPy float64, which cannot represent all decimal values exactly C) Pandas stores them as strings, which requires manual conversion D) Pandas rejects decimal columns and raises an error

Answer

**B) Pandas uses NumPy `float64`, which cannot represent all decimal values exactly** By default, pandas stores numeric data as NumPy `float64`, which uses IEEE 754 binary floating-point. This cannot represent many decimal values exactly -- for example, `0.10` may be stored as `0.09999999999999998`. For financial data, this precision loss can cause incorrect calculations and reporting. Solutions include converting to Python `Decimal`, storing as integer cents, or converting to string for transit/storage.

Question 20

What DB2 monitoring query would help you detect a connection leak from an application?

A) SELECT * FROM SYSCAT.TABLES B) Querying MON_GET_CONNECTION for connections with zero commits/rollbacks held for a long time C) SELECT * FROM SYSIBM.SYSDUMMY1 D) Checking the db2diag.log for error messages

Answer

**B) Querying `MON_GET_CONNECTION` for connections with zero commits/rollbacks held for a long time** A connection leak means the application borrowed a connection from the pool but never returned it. On the DB2 side, this appears as a connection that has been open for a long time (e.g., >60 minutes) with `TOTAL_APP_COMMITS = 0` and `TOTAL_APP_ROLLBACKS = 0`. The query: `SELECT * FROM TABLE(MON_GET_CONNECTION(NULL,-1)) WHERE TIMESTAMPDIFF(4, ...) > 60 AND TOTAL_APP_COMMITS = 0` identifies these leaked connections.

Question 21

When calling a DB2 stored procedure from Python using ibm_db, which parameter direction constant is used for OUT parameters?

A) ibm_db.SQL_PARAM_INPUT B) ibm_db.SQL_PARAM_OUTPUT C) ibm_db.SQL_PARAM_INOUT D) ibm_db.SQL_PARAM_RETURN

Answer

**B) `ibm_db.SQL_PARAM_OUTPUT`** When binding parameters for a stored procedure call, you specify the direction: `SQL_PARAM_INPUT` for IN parameters (values sent to the procedure), `SQL_PARAM_OUTPUT` for OUT parameters (values returned by the procedure), and `SQL_PARAM_INOUT` for parameters that serve both purposes. The `bind_param()` call takes the direction as the fourth argument.

Question 22

What is the correct validation query for testing DB2 LUW connection liveness in HikariCP?

A) SELECT 1 B) VALUES 1 C) PING D) SELECT 1 FROM DUAL

Answer

**B) `VALUES 1`** DB2 LUW supports the `VALUES` clause as a standalone SQL expression that does not require a FROM clause. `VALUES 1` is the lightest possible validation query -- it returns a single row with a single integer without accessing any table. `SELECT 1 FROM SYSIBM.SYSDUMMY1` also works (and is required on DB2 for z/OS). `SELECT 1 FROM DUAL` is Oracle syntax. `PING` is not valid SQL.

Question 23

In a Python data pipeline reading from DB2 to a data lake, why should extract queries use WITH UR isolation?

A) It makes queries run faster by using parallel I/O B) It prevents any locks from being acquired, minimizing impact on production OLTP workloads C) It ensures the extracted data is perfectly consistent D) It is required for pd.read_sql() to work with chunked reading

Answer

**B) It prevents any locks from being acquired, minimizing impact on production OLTP workloads** `WITH UR` (Uncommitted Read) means the query acquires no locks at all. For extract pipelines running against production databases, this is critical -- it prevents the pipeline from interfering with real-time transaction processing. The trade-off is that `WITH UR` may read uncommitted data, but for extract pipelines running against yesterday's (fully committed) data, this risk is negligible.

Question 24

What is the N+1 query problem, and how does it affect DB2 performance?

A) A query that returns N+1 rows instead of N rows due to an off-by-one error B) A pattern where fetching N parent records triggers N additional queries for child records, creating N+1 total round-trips C) A deadlock between N+1 concurrent transactions D) A query plan that scans N+1 index levels

Answer

**B) A pattern where fetching N parent records triggers N additional queries for child records, creating N+1 total round-trips** The N+1 problem occurs when code fetches a list of parent objects (1 query) and then individually queries child objects for each parent (N queries). For example, fetching 100 customers then querying accounts for each generates 101 round-trips instead of 1 JOIN query. This is the most common performance issue in ORM-based applications and can be solved with JOIN queries, eager loading, or batch fetching.

Question 25

A developer sets maximumPoolSize=200 for an application serving 500 users. The DB2 server has 8 CPU cores. What is the most likely outcome?

A) Excellent performance -- plenty of connections for all users B) Degraded performance due to increased DB2 contention, memory usage, and context switching C) No impact -- DB2 handles any number of connections efficiently D) DB2 will reject the extra connections with SQLCODE -1224

Answer

**B) Degraded performance due to increased DB2 contention, memory usage, and context switching** With 8 cores, DB2 can only execute approximately 8 queries truly concurrently. The remaining 192 idle connections each consume DB2 memory (thread storage, EDM pool), hold latch resources, and add to internal bookkeeping. Under load, the excess connections create CPU contention from context switching and increase lock contention. The optimal pool size for this server is approximately `(8 * 2) + 1 = 17` connections, not 200. The 500 users need far fewer concurrent connections than 200 because user think time means only a fraction of users are actively querying at any moment.