Chapter 33 Key Takeaways

Core Concepts

  1. DB2 supports multiple access methods for different workloads: JDBC for Java applications, ODBC/CLI for native C/C++ applications, and ibm_db/ibm_db_sa for Python. All use dynamic SQL and connect over TCP/IP (DRDA protocol) or shared memory.

  2. JDBC Type 4 is the default choice. Pure Java, no native libraries, no DB2 client installation required. Use Type 2 only when benchmarks show a measurable advantage for local connections. Type 4 is essential for Docker/Kubernetes and cloud deployments.

  3. Connection pooling is mandatory for production. Opening a connection takes 50-200ms. Use HikariCP (Java) or SQLAlchemy's built-in pool (Python). Right-size the pool — more connections is not better. The formula (cores * 2) + spindle_count gives a good starting point.

  4. Always use PreparedStatement (JDBC) or prepared statements (ibm_db). They prevent SQL injection, enable statement caching, and maximize DB2's dynamic statement cache hit rate. Never concatenate user input into SQL strings.

  5. Disable auto-commit for multi-statement transactions. Fund transfers, order processing, and any multi-step operation must execute in a single transaction. Use setAutoCommit(false) (JDBC) or ibm_db.autocommit(conn, SQL_AUTOCOMMIT_OFF) (Python), then explicitly commit or rollback.

  6. Batch operations reduce network overhead. JDBC's addBatch()/executeBatch() and pandas' to_sql(chunksize=N) send multiple rows in a single network round-trip. Batch sizes of 500-5,000 are typical.

  7. JDBC isolation levels map to DB2 isolation levels — but the names do not match intuitively. READ_COMMITTED = CS, REPEATABLE_READ = RS, SERIALIZABLE = RR. Know the mapping.

  8. Python ibm_db provides direct DB2 access with exec_immediate for one-off queries and prepare/execute for parameterized queries. Use fetch_assoc for dictionary-style row access.

  9. SQLAlchemy (ibm_db_sa) brings ORM and expression language to DB2. Define Python classes mapping to DB2 tables, use sessions for transaction management, and leverage the expression language for type-safe query building.

  10. pandas read_sql with chunksize handles large datasets. For tables with millions of rows, chunked reading keeps memory usage bounded. Combined with Parquet output, this enables efficient DB2-to-data-lake pipelines.

Meridian Bank Patterns

  • Online banking API uses Java with JDBC, HikariCP connection pooling (20 connections per pod), and PreparedStatement for all queries.
  • Data pipeline uses Python with ibm_db_sa, chunked pandas extraction, and Parquet output to the data lake. Extraction uses WITH UR to avoid impacting production.
  • Connection pools are right-sized — 20 connections per application instance, not 200. DB2 thread resources are finite and expensive.

Common Pitfalls to Avoid

  • Oversizing connection pools (wastes DB2 resources, increases contention)
  • Using Statement with string concatenation (SQL injection risk, defeats caching)
  • Forgetting to check rs.wasNull() for nullable primitive columns in JDBC
  • Loading entire large tables into pandas without chunking (out of memory)
  • Using to_sql(if_exists='replace') in production (drops and recreates the table)
  • Leaving auto-commit enabled for multi-step transactions
  • Not handling deadlocks (-911) with retry logic
  • Confusing JDBC isolation level names with DB2 isolation level names