Chapter 19 Further Reading: Prompt Engineering Fundamentals
Foundational Texts on Prompt Engineering
1. Mollick, E. (2024). Co-Intelligence: Living and Working with AI. Portfolio. The most practically useful guide to working with large language models, written by Wharton professor Ethan Mollick. Drawing on extensive classroom experimentation, Mollick provides concrete frameworks for prompt engineering, AI-augmented decision-making, and organizational adoption. His concept of "useful fiction" — treating the AI as a knowledgeable collaborator while remaining aware of its limitations — directly informs the role-based prompting approach discussed in Section 19.5. Essential reading for any MBA student seeking to develop prompt engineering as a professional skill.
2. White, J., Fu, Q., Hays, S., et al. (2023). "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT." arXiv preprint arXiv:2302.11382. An academic catalog of reusable prompt patterns, organized as a pattern language analogous to software design patterns. The authors identify patterns including the Persona Pattern (our role-based prompting), the Template Pattern (our output format specification), and the Few-Shot Pattern. Particularly valuable for readers who want a systematic taxonomy of prompting techniques and their applications. The design pattern metaphor will resonate with readers who have a software engineering background.
3. Schulhoff, S., Ilie, M., Balepur, N., et al. (2024). "The Prompt Report: A Systematic Survey of Prompting Techniques." arXiv preprint arXiv:2406.06608. The most comprehensive academic survey of prompting techniques as of 2024, cataloging over 60 distinct prompting methods with empirical evaluations. Covers everything from basic techniques (zero-shot, few-shot) through advanced methods (chain-of-thought, tree-of-thought, self-consistency). A dense but invaluable reference for readers who want to understand the full landscape of prompting research. The taxonomy of techniques extends well beyond what a single chapter can cover and provides a roadmap for the advanced techniques discussed in Chapter 20.
Business Applications of AI Writing and Communication
4. Davenport, T. H., & Mittal, N. (2023). All-in on AI: How Smart Companies Win Big with Artificial Intelligence. Harvard Business Review Press. Examines how organizations deploy AI for competitive advantage, with extensive case studies of companies that have integrated LLMs into business workflows. The chapters on knowledge work automation and content generation connect directly to the business applications discussed in Section 19.12. Davenport's emphasis on organizational capability — not just technology adoption — reinforces the chapter's argument that prompt libraries and processes matter more than model selection.
5. Dell'Acqua, F., McFowland III, E., Mollick, E., et al. (2023). "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality." Harvard Business School Technology & Operations Management Unit Working Paper No. 24-013. A rigorous field experiment conducted with Boston Consulting Group consultants using GPT-4. The study found that consultants using AI completed 12.2 percent more tasks, 25.1 percent faster, and produced 40 percent higher quality results — but only for tasks within the AI's capability frontier. For tasks beyond the frontier, AI users performed worse than non-users. This "jagged frontier" concept is critical context for the chapter's discussion of when prompt engineering helps and when it cannot compensate for model limitations (Section 19.9, Pitfall 4).
6. Brynjolfsson, E., Li, D., & Raymond, L. (2023). "Generative AI at Work." NBER Working Paper No. 31161. A study of 5,179 customer support agents at a large software company that deployed an LLM-based assistant. Productivity increased by 14 percent overall, with the largest gains (34 percent) for novice and low-skilled workers. The system essentially encoded the communication patterns of high-performing agents into its prompts and suggestions — a real-world example of the prompt library concept applied to customer service. Directly relevant to the customer email application in Section 19.12.
Prompt Engineering Technique and Methodology
7. Wei, J., Wang, X., Schuurmans, D., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." Advances in Neural Information Processing Systems, 35, 24824-24837. The foundational paper on chain-of-thought (CoT) prompting, which demonstrated that adding "Let's think step by step" or providing step-by-step reasoning examples dramatically improves LLM performance on reasoning tasks. While CoT is covered in depth in Chapter 20, this paper provides essential context for understanding why the prompting strategy matters as much as the model capability. The experimental methodology — comparing zero-shot, few-shot, and chain-of-thought across multiple benchmarks — is a model for systematic prompt evaluation.
8. Brown, T. B., Mann, B., Ryder, N., et al. (2020). "Language Models Are Few-Shot Learners." Advances in Neural Information Processing Systems, 33, 1877-1901. The GPT-3 paper that demonstrated few-shot learning as an emergent capability of large language models. The experimental framework — comparing zero-shot, one-shot, and few-shot performance across dozens of tasks — established the empirical foundation for the few-shot prompting technique discussed in Section 19.4. Although technically dense, the results sections are accessible and illustrate clearly why providing examples in prompts improves model performance.
9. Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). "Why Johnny Can't Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts." Proceedings of the 2023 CHI Conference on Human Factors in Computing. An empirical study of how non-technical users approach prompt engineering, identifying common failure modes and misconceptions. The researchers found that users systematically underspecify their prompts, overestimate the model's ability to infer intent, and abandon prompt improvement too early. These findings directly inform the discussion of common pitfalls in Section 19.9 and provide empirical support for the argument that prompt engineering training produces measurable productivity gains.
AI Security and Prompt Safety
10. Perez, F., & Ribeiro, I. (2022). "Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of Language Models via a Global Scale Prompt Hacking Competition." arXiv preprint arXiv:2311.16119. A comprehensive analysis of prompt injection attacks gathered through a global competition with over 600,000 adversarial prompts. Catalogs attack patterns including direct instruction override, context manipulation, and encoding-based bypasses. Essential reading for any organization deploying LLMs in customer-facing applications or processing untrusted input. Connects directly to the prompt injection discussion in Section 19.9 and provides a catalog of specific attack patterns to test against.
11. Greshake, K., Abdelnabi, S., Mishra, S., et al. (2023). "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. Demonstrates indirect prompt injection attacks in which malicious instructions are embedded in content that an LLM processes (web pages, emails, documents) rather than in the user's direct prompt. This extends the security considerations beyond the direct prompt injection discussed in the chapter and highlights risks for RAG systems and AI agents that will be covered in Chapters 21 and 29.
Organizational AI Adoption and Knowledge Management
12. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. A critical examination of the risks and limitations of large language models, including environmental costs, encoding of biases, and the gap between language generation and language understanding. While the paper predates the generative AI boom, its warnings about treating fluent text as accurate text remain essential context for any business deploying LLMs. Provides the intellectual foundation for the "Caution" callouts throughout the chapter — particularly the warnings about hallucination, role prompting limitations, and the need for human validation.
13. Srivastava, A., Rastogi, A., Rao, A., et al. (2023). "Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models." Transactions on Machine Learning Research. The BIG-Bench collaboration, involving over 400 researchers, evaluated 204 language model tasks across diverse domains. The findings reveal where models excel (pattern matching, text transformation, summarization) and where they struggle (logical reasoning, mathematical computation, certain forms of common sense). This capability map is directly useful for prompt engineers: knowing the model's strengths and weaknesses helps you design prompts that play to the model's strengths and compensate for its weaknesses.
Case Study Background
14. Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot." arXiv preprint arXiv:2302.06590. The foundational empirical study of GitHub Copilot's impact on developer productivity, finding a 55.8 percent faster task completion rate among Copilot users. The study's methodology — randomized controlled trial with clear productivity metrics — sets a standard for measuring AI tool impact. Essential background for Case Study 1 and a model for how organizations should measure the ROI of prompt engineering investments.
15. Ziegler, A., Kalliamvakou, E., Li, X. A., et al. (2024). "Measuring GitHub Copilot's Impact on Productivity." Communications of the ACM, 67(3), 54-63. An expanded analysis of Copilot's productivity impact, including longitudinal data on adoption patterns and the role of developer expertise in mediating productivity gains. The finding that experienced developers benefit differently from novice developers — with novices seeing larger absolute gains but experienced developers showing more sophisticated prompt-like behavior — connects directly to the skill-dependent nature of prompt engineering discussed throughout Chapter 19.
16. Rogenmoser, D. (2023). "Building Jasper: Lessons from the AI Content Revolution." (Interview, First Round Review). Jasper CEO Dave Rogenmoser discusses the company's early product decisions, including the critical insight that users needed structured templates rather than blank prompt fields. The interview provides founder perspective on the "blank prompt problem" discussed in Case Study 2 and illuminates the product thinking behind Jasper's template-first approach.
Practical Guides and Frameworks
17. OpenAI. (2024). "Prompt Engineering Guide." OpenAI Documentation. OpenAI's official guide to prompt engineering for their models, including GPT-4 and the chat completions API. Covers best practices for instruction clarity, few-shot prompting, system messages, and parameter configuration. While model-specific, the principles are broadly applicable. The guide is regularly updated and serves as a useful reference for the technical details of API-based prompt execution discussed in Section 19.11.
18. Anthropic. (2024). "Prompt Engineering Guide." Anthropic Documentation. Anthropic's guide to prompting their Claude models, with particular attention to role prompting, XML-structured prompts, and long-context prompting techniques. Provides a useful counterpoint to OpenAI's guide, illustrating how different model families respond to different prompting approaches. The emphasis on clear, structured prompts aligns closely with the six-component framework presented in this chapter.
19. Shieh, J. (2023). "Best Practices for Prompt Engineering with the OpenAI API." OpenAI Blog. A practitioner-focused blog post with concrete examples of effective and ineffective prompts, organized by task type (summarization, classification, extraction, generation). The before-and-after examples mirror the approach in Section 19.13 and provide additional practice material for students developing their prompting skills.
Economics of AI and Knowledge Work
20. Agrawal, A., Gans, J., & Goldfarb, A. (2022). Power and Prediction: The Disruptive Economics of Artificial Intelligence. Harvard Business Review Press. The sequel to Prediction Machines, focusing on how AI changes decision-making architectures within organizations. The framework of "decision factories" — systems that combine prediction, judgment, data, and action — provides economic grounding for understanding why prompt engineering matters: it is the mechanism by which human judgment is encoded into AI-powered decision systems. The "system design" perspective connects prompt engineering to the broader organizational strategy discussed in Part 6.
21. Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). "GPTs Are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models." arXiv preprint arXiv:2303.10130. Estimates that 80 percent of the US workforce could have at least 10 percent of their tasks affected by LLMs. The paper's task-level analysis — examining which specific work activities are most susceptible to LLM augmentation — provides a framework for prioritizing prompt engineering investments: focus first on the tasks where LLMs can add the most value, and build prompt libraries for those tasks.
Additional Resources
22. Learn Prompting (learnprompting.org). Open-source prompt engineering curriculum. A comprehensive, community-maintained curriculum covering prompting techniques from basic to advanced. Includes interactive examples, exercises, and a prompt engineering certification program. Useful as a supplement to this chapter and as a resource for continued skill development beyond the textbook.
23. Saravia, E. (2023). "Prompt Engineering Guide." GitHub (dair-ai/Prompt-Engineering-Guide). An open-source collection of prompt engineering techniques, papers, and tutorials maintained by Elvis Saravia and the DAIR.AI community. The guide catalogs techniques with examples and links to the underlying research, making it a useful reference for readers who want to explore specific techniques in depth.
24. Liang, P., Bommasani, R., Lee, T., et al. (2022). "Holistic Evaluation of Language Models." arXiv preprint arXiv:2211.09110. The HELM benchmark provides a comprehensive evaluation framework for language models across 42 scenarios and 7 metrics. Understanding how models are evaluated — and where they succeed or fail — helps prompt engineers design prompts that work within model capabilities. The evaluation methodology also provides a template for how organizations should evaluate their own prompt performance.
For readings on advanced prompting techniques (chain-of-thought, tree-of-thought, self-consistency, prompt chaining), see Chapter 20 Further Reading. For AI security and prompt injection defense, see Chapter 29 Further Reading. For organizational AI strategy, see Chapter 31 Further Reading.