Chapter 38: Further Reading

Multi-Agent Development Systems


Research Papers

  1. Wu, Q., Bansal, G., Zhang, J., et al. (2023). "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." arXiv preprint arXiv:2308.08155. The foundational paper behind Microsoft's AutoGen framework, which enables multi-agent conversation for complex task solving. AutoGen introduces a "conversable agent" abstraction where multiple agents interact through structured conversations, supporting the orchestration patterns discussed in Section 38.3. The paper demonstrates that multi-agent conversation outperforms single-agent approaches on coding benchmarks, mathematical reasoning, and decision-making tasks. Essential reading for anyone building multi-agent systems.

  2. Hong, S., Zhuge, M., Chen, J., et al. (2024). "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework." International Conference on Learning Representations (ICLR). MetaGPT assigns different GPT-based agents to distinct roles (Product Manager, Architect, Engineer, QA) in a software development pipeline, mirroring the role-based design in Section 38.2. The paper demonstrates that structured role assignment with standardized operating procedures significantly improves code generation quality. Its emphasis on artifact-passing between roles -- design documents, interface definitions, code files -- directly parallels the artifact exchange communication model covered in Section 38.4.

  3. Li, G., Hammoud, H. A. A. K., Itani, H., et al. (2023). "CAMEL: Communicative Agents for 'Mind' Exploration of Large Language Model Society." Advances in Neural Information Processing Systems (NeurIPS). Introduces the "role-playing" framework where two AI agents collaborate through structured conversation, each assigned a distinct role. CAMEL's insight that role assignment plus inception prompting enables autonomous cooperation between agents provides theoretical grounding for the system prompt design principles in Section 38.2. The paper's analysis of when cooperation breaks down offers practical lessons for conflict resolution.

  4. Qian, C., Liu, W., Liu, H., et al. (2024). "ChatDev: Communicative Agents for Software Development." Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). ChatDev models software development as a multi-phase, multi-role conversation where agents playing CEO, CTO, Programmer, and Tester roles collaborate through a "chat chain" mechanism. The paper's sequential pipeline with role-specific phases maps directly to the sequential orchestration pattern in Section 38.3. ChatDev's finding that the chat chain reduces hallucination and improves code quality validates the principle that structured inter-agent communication outperforms unstructured interaction.

  5. Wang, L., Ma, C., Feng, X., et al. (2024). "A Survey on Large Language Model based Autonomous Agents." Frontiers of Computer Science. A comprehensive survey covering agent architecture, capabilities, and multi-agent interaction patterns. The survey's taxonomy of agent-agent interactions -- cooperation, competition, and social simulation -- provides a broader framework for understanding the cooperative multi-agent development systems described in this chapter. Particularly useful for understanding how the techniques in this chapter relate to the wider landscape of LLM-based agent research.

Books

  1. Wooldridge, M. (2009). An Introduction to MultiAgent Systems. 2nd Edition. John Wiley & Sons. The classic textbook on multi-agent systems from the pre-LLM era. Covers the fundamental concepts of agent design, communication protocols, coordination mechanisms, and cooperative problem-solving that underlie modern LLM-based multi-agent systems. While the implementation technology has changed dramatically, the principles of role assignment, coordination overhead, conflict resolution, and team scaling remain directly applicable. Chapters on communication languages and cooperation mechanisms provide theoretical depth for the practical patterns in Sections 38.4 and 38.5.

  2. Huyen, C. (2025). AI Engineering: Building Applications with Foundation Models. O'Reilly Media. A practical guide to production AI systems that covers agent orchestration, reliability engineering, evaluation, and cost management. The chapters on multi-step reasoning and agent architectures complement this chapter's focus on software development pipelines. Particularly valuable for its treatment of monitoring, observability, and failure handling in production agent systems -- topics covered in Sections 38.9 and 38.10.

  3. Russell, S. J. and Norvig, P. (2020). Artificial Intelligence: A Modern Approach. 4th Edition. Pearson. The definitive AI textbook includes chapters on multi-agent systems, game theory, and communication that provide the theoretical foundation for understanding agent coordination. The treatment of cooperative and competitive multi-agent environments gives context for why role separation and conflict resolution strategies work. The planning algorithms discussed in the book inform the orchestration patterns in Section 38.3.

Technical Documentation and Guides

  1. Anthropic. (2025). "Building Effective Agents." anthropic.com/research. Anthropic's practical guide to building agent systems includes key patterns for reliability, error recovery, and tool use that directly apply to the individual agents within a multi-agent team. The guide's emphasis on keeping agent designs simple and composable aligns with this chapter's recommendation to start with sequential pipelines and add complexity only when needed. The section on orchestration frameworks provides context for the orchestration patterns in Section 38.3.

  2. Anthropic. (2025). "Multi-Agent Systems with Claude." docs.anthropic.com. Documentation on using Claude in multi-agent configurations, including practical guidance on system prompt design for different roles, context management across agents, and strategies for handling the inherent non-determinism of LLM outputs. This resource provides Claude-specific implementation details for the general patterns described throughout this chapter.

  3. Microsoft. (2024). "AutoGen Documentation." microsoft.github.io/autogen. The official documentation for AutoGen, one of the most widely used multi-agent frameworks. Includes tutorials on building conversable agents, configuring group chats, implementing human-in-the-loop patterns, and managing agent workflows. Reading this documentation provides concrete implementation examples for the abstract concepts in this chapter, and the framework's design choices illustrate real-world trade-offs in multi-agent system design.

Blog Posts and Technical Articles

  1. Weng, L. (2023). "LLM Powered Autonomous Agents." lilianweng.github.io. A thorough overview of LLM-based agents covering planning, memory, tool use, and multi-agent collaboration. The post's analysis of agent memory systems (short-term conversation history vs. long-term retrieval-augmented memory) has direct implications for inter-agent communication design. The section on multi-agent collaboration surveys several frameworks and provides a balanced comparison that helps contextualize the design choices in this chapter.

  2. Shinn, N., Cassano, F., Gopinath, A., et al. (2023). "Reflexion: Language Agents with Verbal Reinforcement Learning." Advances in Neural Information Processing Systems (NeurIPS). Introduces the Reflexion pattern where agents learn from their mistakes through verbal self-reflection. This pattern is directly relevant to the feedback loops in Section 38.6 -- when a tester finds bugs, the coder receives the failures and must self-correct. Understanding Reflexion's mechanism of generating verbal feedback and using it to improve subsequent attempts deepens your understanding of why bounded feedback loops work.

  3. Park, J. S., O'Brien, J. C., Cai, C. J., et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST). While focused on social simulation rather than software development, this paper demonstrates that multiple LLM-based agents with distinct roles, memories, and behaviors can interact in complex, emergent ways. The paper's architecture for agent memory, planning, and reflection provides design inspiration for the agent role configurations in Section 38.2, and its treatment of inter-agent communication informs the patterns in Section 38.4.

  4. Huang, D., Bu, Q., Zhang, J., et al. (2024). "AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation." arXiv preprint arXiv:2312.13010. Demonstrates a multi-agent approach to code generation where a programmer agent, a test designer agent, and a test executor agent collaborate iteratively. The paper validates the core premise of this chapter -- that separating code generation from testing using different agents improves code quality -- with quantitative results on HumanEval and MBPP benchmarks. The iterative refinement loop in AgentCoder maps directly to the feedback loop pattern in Section 38.6.