Chapter 33 Key Takeaways
Core Concepts
-
DB2 supports multiple access methods for different workloads: JDBC for Java applications, ODBC/CLI for native C/C++ applications, and ibm_db/ibm_db_sa for Python. All use dynamic SQL and connect over TCP/IP (DRDA protocol) or shared memory.
-
JDBC Type 4 is the default choice. Pure Java, no native libraries, no DB2 client installation required. Use Type 2 only when benchmarks show a measurable advantage for local connections. Type 4 is essential for Docker/Kubernetes and cloud deployments.
-
Connection pooling is mandatory for production. Opening a connection takes 50-200ms. Use HikariCP (Java) or SQLAlchemy's built-in pool (Python). Right-size the pool — more connections is not better. The formula
(cores * 2) + spindle_countgives a good starting point. -
Always use PreparedStatement (JDBC) or prepared statements (ibm_db). They prevent SQL injection, enable statement caching, and maximize DB2's dynamic statement cache hit rate. Never concatenate user input into SQL strings.
-
Disable auto-commit for multi-statement transactions. Fund transfers, order processing, and any multi-step operation must execute in a single transaction. Use
setAutoCommit(false)(JDBC) oribm_db.autocommit(conn, SQL_AUTOCOMMIT_OFF)(Python), then explicitly commit or rollback. -
Batch operations reduce network overhead. JDBC's
addBatch()/executeBatch()and pandas'to_sql(chunksize=N)send multiple rows in a single network round-trip. Batch sizes of 500-5,000 are typical. -
JDBC isolation levels map to DB2 isolation levels — but the names do not match intuitively. READ_COMMITTED = CS, REPEATABLE_READ = RS, SERIALIZABLE = RR. Know the mapping.
-
Python ibm_db provides direct DB2 access with
exec_immediatefor one-off queries andprepare/executefor parameterized queries. Usefetch_assocfor dictionary-style row access. -
SQLAlchemy (ibm_db_sa) brings ORM and expression language to DB2. Define Python classes mapping to DB2 tables, use sessions for transaction management, and leverage the expression language for type-safe query building.
-
pandas read_sql with chunksize handles large datasets. For tables with millions of rows, chunked reading keeps memory usage bounded. Combined with Parquet output, this enables efficient DB2-to-data-lake pipelines.
Meridian Bank Patterns
- Online banking API uses Java with JDBC, HikariCP connection pooling (20 connections per pod), and PreparedStatement for all queries.
- Data pipeline uses Python with ibm_db_sa, chunked pandas extraction, and Parquet output to the data lake. Extraction uses
WITH URto avoid impacting production. - Connection pools are right-sized — 20 connections per application instance, not 200. DB2 thread resources are finite and expensive.
Common Pitfalls to Avoid
- Oversizing connection pools (wastes DB2 resources, increases contention)
- Using Statement with string concatenation (SQL injection risk, defeats caching)
- Forgetting to check
rs.wasNull()for nullable primitive columns in JDBC - Loading entire large tables into pandas without chunking (out of memory)
- Using
to_sql(if_exists='replace')in production (drops and recreates the table) - Leaving auto-commit enabled for multi-step transactions
- Not handling deadlocks (-911) with retry logic
- Confusing JDBC isolation level names with DB2 isolation level names