Chapter 36: Further Reading

AI Coding Agents and Autonomous Workflows


Research Papers

  1. Yao, S., Zhao, J., Yu, D., et al. (2023). "ReAct: Synergizing Reasoning and Acting in Language Models." International Conference on Learning Representations (ICLR). The foundational paper on the ReAct pattern, which interleaves reasoning traces and actions in language models. This paper provides the theoretical basis for the plan-act-observe loop discussed in Section 36.2. Essential reading for anyone building or evaluating agent systems. The paper demonstrates that combining reasoning with acting significantly outperforms either approach alone.

  2. Jimenez, C. E., Yang, J., Wettig, A., et al. (2024). "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" International Conference on Learning Representations (ICLR). The paper introducing SWE-bench, the leading benchmark for evaluating coding agents. Understanding SWE-bench's methodology is critical for evaluating agent claims and designing your own benchmarks. The paper also provides an excellent analysis of why coding tasks are challenging for AI systems and what capabilities are needed for reliable performance.

  3. Schick, T., Dwivedi-Yu, J., Dessi, R., et al. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools." Advances in Neural Information Processing Systems (NeurIPS). Explores how language models can learn to use external tools, providing theoretical background for the tool-use patterns in Section 36.3. The paper demonstrates that models can determine when and how to use tools without extensive tool-specific training, which has implications for designing general-purpose agent tool frameworks.

  4. Wang, L., Ma, C., Feng, X., et al. (2024). "A Survey on Large Language Model based Autonomous Agents." Frontiers of Computer Science. A comprehensive survey of the LLM-based agent landscape, covering architecture, capabilities, and applications. This is the best single paper for understanding the breadth of agent research. It categorizes agent architectures and provides a useful taxonomy that maps directly to the concepts in this chapter.

  5. Xi, Z., Chen, W., Guo, X., et al. (2023). "The Rise and Potential of Large Language Model Based Agents: A Survey." arXiv preprint. Another comprehensive survey that focuses on the evolution from language models to agents, with particular attention to the social and collaborative aspects of multi-agent systems. Useful for understanding how single-agent patterns extend to multi-agent architectures, which is the topic of Chapter 38.

Books

  1. Russell, S. J. and Norvig, P. (2020). Artificial Intelligence: A Modern Approach. 4th Edition. Pearson. The definitive textbook on AI includes extensive coverage of agent architectures, planning, and decision-making that provides the theoretical foundation for modern AI coding agents. Chapters on rational agents, planning algorithms, and decision theory are particularly relevant. While the book does not cover LLM-based agents specifically, the underlying principles of goal-directed behavior, environment interaction, and planning apply directly.

  2. Huyen, C. (2025). AI Engineering: Building Applications with Foundation Models. O'Reilly Media. A practical guide to building applications with foundation models, including coverage of agent architectures, tool use, and evaluation. This book bridges the gap between research papers and production systems, with actionable advice on building reliable AI-powered applications.

Technical Documentation and Guides

  1. Anthropic. (2025). "Claude Code Documentation." docs.anthropic.com. The official documentation for Claude Code, Anthropic's CLI-based coding agent. This documentation provides concrete examples of agent architecture, tool use, permission systems, and memory management (CLAUDE.md) as implemented in a production system. Reading this documentation alongside this chapter provides a real-world reference implementation for every concept discussed.

  2. Anthropic. (2025). "Tool Use (Function Calling) Guide." docs.anthropic.com. Anthropic's guide to implementing tool use with Claude models. This is the definitive reference for the function calling patterns described in Section 36.3, including tool definition schemas, multi-tool conversations, and best practices for tool descriptions. Essential reading for anyone building custom agent tools.

  3. OpenAI. (2024). "Function Calling Guide." platform.openai.com. OpenAI's documentation on function calling provides an alternative perspective on tool use implementation. Comparing this with Anthropic's approach gives you a broader understanding of tool-use patterns and helps you write tools that work well across different model providers.

Blog Posts and Technical Articles

  1. Cognition AI. (2024). "Introducing Devin, the First AI Software Engineer." cognition.ai/blog. The announcement of Devin, one of the first systems to claim full autonomous software engineering capability. While the claims should be evaluated critically, the post provides useful insight into the vision for fully autonomous coding agents and the marketing challenges of evaluating agent capabilities.

  2. Anthropic. (2025). "Building Effective Agents." anthropic.com/research. Anthropic's guide to building effective agents covers practical patterns for agent reliability, including prompt engineering for agents, error handling, and testing strategies. This post provides practical guidance that complements the conceptual framework in this chapter.

  3. Karpathy, A. (2023). "The Busy Person's Intro to Large Language Models." YouTube/Blog. While not specific to agents, Andrej Karpathy's overview of LLMs provides essential background on the capabilities and limitations of the models that power coding agents. Understanding what LLMs can and cannot do is foundational to designing effective agent systems.

Standards and Specifications

  1. Anthropic. (2025). "Model Context Protocol (MCP) Specification." modelcontextprotocol.io. The specification for MCP, a standard protocol for connecting AI models to external tools and data sources. MCP is the topic of Chapter 37, but understanding the specification provides technical context for the tool-use architecture discussed in this chapter. MCP standardizes how agents discover and invoke tools, which has implications for building portable, interoperable agent systems.

  2. OWASP. (2025). "OWASP Top 10 for Large Language Model Applications." owasp.org. The OWASP guide to security risks in LLM applications, including prompt injection, insecure tool use, and excessive agency. This resource is essential for understanding the security implications of agent systems and designing the guardrails discussed in Section 36.5. Every developer building or deploying coding agents should be familiar with these risks.