Chapter 28: Further Reading — Performance Optimization

An annotated bibliography of resources for deepening your understanding of performance optimization in Python and web applications.


Books

1. High Performance Python by Micha Gorelick and Ian Ozsvald (O'Reilly, 2nd Edition, 2020)

The definitive guide to Python performance. Covers profiling, data structures, compiling to C with Cython, concurrency, and cluster computing. Particularly strong on understanding why Python code is slow at the CPU and memory level. Essential reading for anyone doing performance-critical Python work. Complements Sections 28.2, 28.3, and 28.7 of this chapter.

2. Designing Data-Intensive Applications by Martin Kleppmann (O'Reilly, 2017)

While not Python-specific, this book is the gold standard for understanding database performance, caching, replication, and distributed system design. The chapters on indexing, query optimization, and data partitioning directly support Section 28.6. Required reading for anyone building systems that must scale.

3. Python Concurrency with asyncio by Matthew Fowler (Manning, 2022)

A thorough treatment of Python's asyncio framework with practical examples. Covers event loops, tasks, synchronization primitives, and integration with web frameworks. Excellent companion to Section 28.5, particularly for readers who want to go deeper on async patterns.

4. Web Scalability for Startup Engineers by Artur Ejsmont (McGraw-Hill, 2015)

Practical guide to scaling web applications, covering caching layers, load balancing, database scaling, and asynchronous processing. The caching and load balancing chapters complement Sections 28.4 and 28.8. Written for engineers at growing companies who need to scale incrementally.

5. Systems Performance: Enterprise and the Cloud by Brendan Gregg (Pearson, 2nd Edition, 2020)

The comprehensive reference on systems performance analysis, written by the creator of flame graphs. Covers CPU, memory, file systems, networking, and observability tools. While focused on Linux systems rather than Python, the methodology and mental models apply universally. Brendan Gregg's flame graph visualization technique is discussed in Section 28.2.


Online Resources

6. Python Profiling Documentation

URL: https://docs.python.org/3/library/profile.html The official Python documentation for the cProfile and profile modules. Includes the complete API reference, examples of programmatic use, and the pstats module for analyzing results. The canonical reference for Section 28.2.

7. py-spy GitHub Repository and Documentation

URL: https://github.com/benfred/py-spy The sampling profiler used in production environments. The README includes installation, usage examples, and comparisons with other profilers. The flame graph and top-like views are demonstrated with screenshots. Essential tool reference for Section 28.2.

8. "Use the Index, Luke" — SQL Indexing and Tuning Guide

URL: https://use-the-index-luke.com/ A free, comprehensive online guide to database indexing. Covers B-tree indexes, composite indexes, partial indexes, and query plan analysis for PostgreSQL, MySQL, Oracle, and SQL Server. Directly supports Section 28.6 and the indexing strategies discussed in both case studies.

9. Locust Load Testing Documentation

URL: https://docs.locust.io/ The official documentation for the Locust load testing framework. Includes getting started guides, examples of user behavior definitions, distributed testing setup, and result interpretation. Supports Section 28.8.

10. Real Python: Speed Up Your Python Program

URL: https://realpython.com/python-performance/ An accessible tutorial covering profiling, caching, and common optimization patterns in Python. Good for readers who want a gentler introduction to the topics in Sections 28.2 through 28.4 before diving into more advanced material.


Articles and Talks

11. "The Tail at Scale" by Jeffrey Dean and Luiz André Barroso (Communications of the ACM, 2013)

A foundational paper on why tail latencies (P99, P99.9) matter more than averages in distributed systems. Explains how component-level latencies compound and why "fast on average" can still mean "slow for many users." Directly relevant to the performance budgets discussed in Section 28.1 and the load testing metrics in Section 28.8.

12. "Latency Numbers Every Programmer Should Know" (Various Sources, Originally by Jeff Dean)

A frequently updated reference showing the relative speeds of L1 cache access, memory reads, SSD reads, network round-trips, and disk seeks. Provides crucial intuition for understanding why database round-trips are expensive and why caching works. Supports the intuition-building in Sections 28.1 and 28.4.

13. Brendan Gregg's Blog — Flame Graphs

URL: https://www.brendangregg.com/flamegraphs.html The original resource on flame graph visualization, including how to read them, generate them, and use them for performance analysis. Includes interactive examples and flame graph variants (CPU, memory, off-CPU). Extends the py-spy flame graph discussion in Section 28.2.

14. "N+1 Queries or Memory Problems: Why Not Solve Both?" by Evan Phoenix (2022)

A practical article on solving N+1 problems in ORMs while also managing memory usage for large result sets. Discusses eager loading, batch loading, and streaming approaches. Bridges Sections 28.6 and 28.7.

15. SQLAlchemy Performance Documentation

URL: https://docs.sqlalchemy.org/en/20/faq/performance.html SQLAlchemy's official performance FAQ, covering relationship loading strategies (lazy, eager, subquery, selectin), query optimization, and connection pooling configuration. Essential reference for Python developers using SQLAlchemy who encounter the N+1 patterns described in Section 28.6.


How to Use These Resources

  • Start with: Real Python tutorial (#10) for a quick refresher, then High Performance Python (#1) for comprehensive Python-specific coverage.
  • For database optimization: "Use the Index, Luke" (#8) and Designing Data-Intensive Applications (#2) together cover indexing, query planning, and data architecture.
  • For async programming: Python Concurrency with asyncio (#3) provides the depth needed for production async code.
  • For system-level understanding: Systems Performance (#5) and Brendan Gregg's blog (#13) teach you to think about performance at every level of the stack.
  • For web application scaling: Web Scalability for Startup Engineers (#4) provides practical patterns for growing systems.