Chapter 14: Further Reading

An annotated bibliography of resources for deeper exploration of AI coding failures, verification strategies, and secure development practices.


AI-Specific Resources

1. "Do Large Language Models Know What They Don't Know?" — Yin et al. (2023)

What it covers: Systematic study of LLM self-knowledge and calibration. Explores whether language models can accurately assess their own uncertainty, finding significant gaps between confidence and correctness. Why it matters: Provides the empirical basis for Section 14.7 (The Confidence Problem). Understanding why AI cannot flag its own uncertainty helps you develop better verification strategies. Where to find it: arXiv preprint, freely available.

2. "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions" — Pearce et al. (2022)

What it covers: Systematic evaluation of security vulnerabilities in AI-generated code across 89 scenarios. Found that approximately 40% of generated programs contained security weaknesses including CWE (Common Weakness Enumeration) entries for SQL injection, path traversal, and others. Why it matters: Provides hard data on the frequency and types of security vulnerabilities in AI-generated code. Essential reading for anyone deploying AI-generated code in production. Where to find it: Published at IEEE Symposium on Security and Privacy. Available on the authors' institutional pages.

3. "Large Language Models for Code: Security Hardening and Adversarial Testing" — Tony et al. (2023)

What it covers: Techniques for making LLM-generated code more secure, including prompt engineering strategies to reduce vulnerability introduction and automated methods for security testing of AI output. Why it matters: Moves beyond identifying the problem to proposing solutions. Directly applicable to building the verification pipelines discussed in Section 14.10. Where to find it: arXiv preprint, freely available.

4. "An Empirical Study of AI-Generated Code in Open Source" — Liu et al. (2024)

What it covers: Large-scale study analyzing AI-generated code contributions to open-source projects. Examines bug rates, code quality metrics, and maintenance patterns compared to human-written code. Why it matters: Provides real-world data on how AI-generated code performs in production settings, beyond laboratory benchmarks. Where to find it: Conference proceedings, often available through author preprints.


Security Resources

5. OWASP Top 10 — Open Web Application Security Project

What it covers: The ten most critical web application security risks, updated periodically. Includes injection, broken authentication, sensitive data exposure, XML external entities, broken access control, security misconfiguration, cross-site scripting, insecure deserialization, using components with known vulnerabilities, and insufficient logging. Why it matters: This is the definitive checklist for Section 14.4. Every security review of AI-generated web code should reference these categories. Where to find it: https://owasp.org/www-project-top-ten/ (free, open access)

6. "Secure Coding in Python" — OWASP Cheat Sheet Series

What it covers: Python-specific security best practices covering input validation, output encoding, authentication, session management, access control, cryptographic practices, error handling, data protection, communication security, and system configuration. Why it matters: Provides actionable, Python-specific security guidance that directly complements the vulnerability patterns discussed in this chapter. Where to find it: https://cheatsheetseries.owasp.org/ (free, open access)

7. "Bandit: Security-Oriented Static Analysis for Python" — PyCQA

What it covers: Documentation and usage guide for the Bandit security linter. Covers all detection rules, severity levels, configuration options, and integration with CI/CD pipelines. Why it matters: Bandit is recommended in Section 14.10 as a key component of the automated verification pipeline. Understanding its detection capabilities and limitations helps you use it effectively. Where to find it: https://bandit.readthedocs.io/ (free, open access)


Software Quality and Testing

8. "A Philosophy of Software Design" — John Ousterhout (2018)

What it covers: Principles of good software design including complexity management, deep vs. shallow modules, information hiding, and how to recognize and eliminate design problems. Not AI-specific but highly relevant to evaluating AI-generated architectures. Why it matters: Provides the mental framework for evaluating whether AI-generated code has good design, beyond simply checking for bugs. AI often generates "shallow" modules (Ousterhout's term) with poor abstraction boundaries. Where to find it: Available as a book through major retailers.

9. "Lessons Learned in Software Testing" — Kaner, Bach, and Pettichord (2001)

What it covers: 293 practical lessons from decades of software testing experience. Covers test design, test automation, testing strategy, bug reporting, and the psychology of testing. Why it matters: The boundary value analysis and edge case testing techniques described in this chapter have deep roots in the software testing discipline. This book provides the theoretical foundation for why those techniques work and how to apply them systematically. Where to find it: Available as a book through major retailers.

10. "Working Effectively with Legacy Code" — Michael Feathers (2004)

What it covers: Techniques for safely modifying code that lacks tests, including characterization testing, seam identification, and incremental refactoring. Directly applicable to working with AI-generated code that may not have adequate tests. Why it matters: AI-generated code shares many characteristics with legacy code: you did not write it, you may not fully understand it, and it may lack tests. Feathers' techniques for building understanding and adding safety nets apply directly. Where to find it: Available as a book through major retailers.


Python-Specific Quality Tools

11. "Mypy Documentation" — Python Static Type Checker

What it covers: Comprehensive guide to using mypy for static type checking in Python. Covers type annotations, generics, protocols, plugins, and configuration. Why it matters: Type checking is a key component of the automated verification pipeline described in Section 14.10. Mypy catches many type-related bugs that AI introduces, especially in function signatures and return types. Where to find it: https://mypy.readthedocs.io/ (free, open access)

12. "Ruff Documentation" — Fast Python Linter

What it covers: Usage guide for the Ruff linter, which combines the functionality of flake8, isort, pyupgrade, and several other tools into a single, fast linter. Covers all supported rules, configuration, and IDE integration. Why it matters: Ruff is the recommended all-in-one linting tool for the verification pipeline. It can detect many AI-generated code issues including unused imports, unreachable code, and style violations. Where to find it: https://docs.astral.sh/ruff/ (free, open access)


Dependency Security

13. "Backstabber's Knife Collection: A Review of Open Source Software Supply Chain Attacks" — Ohm et al. (2020)

What it covers: Systematic review of supply chain attacks through package managers, including typosquatting, dependency confusion, and malicious package publication. Catalogs attack vectors and real-world incidents. Why it matters: Directly relevant to the dependency confusion risk discussed in Section 14.2 and Case Study 1. As AI hallucinations create demand for non-existent package names, this attack vector becomes increasingly relevant. Where to find it: Published at CCS (ACM Conference on Computer and Communications Security). Available through ACM Digital Library.

14. "pip-audit: Auditing Python Environments for Known Vulnerabilities"

What it covers: Documentation for pip-audit, a tool that scans installed Python packages for known security vulnerabilities using the OSV (Open Source Vulnerabilities) database. Why it matters: Complements the import verification discussed in Section 14.2 by checking that installed dependencies are not just real, but also free of known vulnerabilities. Where to find it: https://github.com/pypa/pip-audit (free, open source)


For readers who want to explore these resources systematically:

  1. Start with the Pearce et al. paper (#2) for empirical data on AI security issues
  2. Then read the OWASP Top 10 (#5) and Python cheat sheet (#6) for security fundamentals
  3. Follow with the Bandit (#7) and Ruff (#12) documentation to set up your verification pipeline
  4. Explore the LLM self-knowledge paper (#1) for deeper understanding of the confidence problem
  5. Read Ousterhout (#8) and Feathers (#10) for long-term code quality perspectives
  6. Reference Kaner et al. (#9) when designing test strategies for AI-generated code

Return to the main chapter or proceed to the exercises.