Chapter 10 Further Reading: Advanced Prompting Techniques
Foundational Research Papers
Chain-of-Thought Prompting
"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022) Google Brain / NeurIPS 2022
The paper that established chain-of-thought prompting as a formal technique. Wei et al. demonstrate that providing step-by-step reasoning examples in few-shot prompts dramatically improves performance on arithmetic, commonsense, and symbolic reasoning tasks. Includes the seminal result showing 3× improvement on math word problems. Essential reading for anyone wanting to understand why CoT works at a mechanistic level.
Available: arxiv.org/abs/2201.11903
"Large Language Models are Zero-Shot Reasoners" Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022) Google Research / NeurIPS 2022
The paper that demonstrated "Let's think step by step" as a zero-shot CoT trigger. Shows that this simple addition produces substantial reasoning improvements without any few-shot examples, establishing zero-shot CoT as a practical and accessible technique for everyday use.
Available: arxiv.org/abs/2205.11916
"Self-Consistency Improves Chain of Thought Reasoning in Language Models" Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., & Zhou, D. (2022) Google Brain
Introduces the idea of generating multiple reasoning paths and selecting the most consistent answer — a technique that pushes reasoning accuracy beyond single-path CoT. Highly relevant for tasks where accuracy is critical and you have flexibility to generate multiple responses.
Available: arxiv.org/abs/2203.11171
"Tree of Thoughts: Deliberate Problem Solving with Large Language Models" Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023) Princeton University / Google DeepMind
The paper that formalized Tree-of-Thought prompting — exploring multiple reasoning paths before committing. Provides the theoretical framework and empirical results for problems where single-path CoT is insufficient. Read alongside the Wei et al. CoT paper for a complete picture of the reasoning technique landscape.
Available: arxiv.org/abs/2305.10601
Few-Shot Learning
"Language Models are Few-Shot Learners" Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020) OpenAI / NeurIPS 2020
The GPT-3 paper. Section 3 establishes the formal framework for few-shot, one-shot, and zero-shot prompting as applied to large language models. This paper coined the prompting-context usage of "few-shot" and demonstrated that very large models can perform new tasks from examples in the context window without fine-tuning. A foundational text.
Available: arxiv.org/abs/2005.14165
"What Makes Good In-Context Examples for GPT-3?" Liu, J., Shen, D., Zhang, Y., Dolan, B., Cohn, T., & Chen, W. (2022) University of Melbourne
Directly addresses the question of example selection quality in few-shot prompting. The paper shows that example quality and representativeness matter more than quantity, and that randomly selected examples often underperform carefully chosen ones. Practical guidance on the selection principles this chapter discusses.
Available: arxiv.org/abs/2101.06804
Self-Critique and Constitutional AI
"Constitutional AI: Harmlessness from AI Feedback" Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., ... & Anthropic. (2022) Anthropic
The foundational paper for Constitutional AI — using explicit principles to guide AI self-evaluation and revision. While primarily about training methodology, the principles are directly applicable to prompting-time self-critique techniques. This paper is the intellectual origin of the "evaluate against explicit standards" approach.
Available: arxiv.org/abs/2212.08073
"Large Language Models Can Self-Improve" Huang, J., Gu, S. S., Hou, L., Wu, Y., Wang, X., Yu, H., & Han, J. (2022) Google Research
Explores the degree to which large language models can improve their own outputs through self-generated rationales and self-consistency mechanisms. Relevant for understanding the theoretical basis and practical limits of self-critique approaches.
Available: arxiv.org/abs/2210.11610
Practical Guides and Books
"The Prompt Engineering Guide" DAIR.AI (continuously updated) https://www.promptingguide.ai/
The most comprehensive freely available guide to prompting techniques. Covers CoT, few-shot, self-consistency, ToT, and many more techniques with clear explanations and examples. Actively maintained as new techniques emerge. The chapter on advanced techniques (CoT, few-shot, and beyond) is an excellent companion to this chapter.
"Prompt Engineering for Generative AI" James Phoenix & Mike Taylor (O'Reilly Media, 2024)
A thorough practitioner's guide to prompt engineering, covering both foundational and advanced techniques with practical examples across multiple models. Strong coverage of few-shot example construction and chain-of-thought applications. Recommended for readers who want book-length treatment of the material this chapter introduces.
"AI-Assisted Work: A Practitioner's Guide" Ethan Mollick (Wharton, 2024)
While not exclusively about prompting techniques, Mollick's research-grounded guide to AI in professional work provides context for where and how advanced prompting techniques produce the largest real-world gains. His section on "getting AI to show its work" is directly relevant to CoT applications.
Online Resources and Communities
"Learn Prompting" https://learnprompting.org/
Open-source prompting education resource with hands-on examples. The advanced techniques module covers CoT, few-shot, self-consistency, and prompt chaining with interactive examples. Good for verification exercises — build the prompts you read about here and test them directly.
Anthropic's Prompt Engineering Documentation https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
Anthropic's official guidance on prompting Claude models, including coverage of CoT and few-shot techniques as applied to their specific models. Valuable because the guidance is tailored to current model capabilities and includes model-specific best practices.
OpenAI's Prompt Engineering Guide https://platform.openai.com/docs/guides/prompt-engineering
OpenAI's equivalent resource for their model family. Useful for comparing model-specific prompting guidance. The sections on "use delimiters," "specify the output structure," and "chain of thought" are directly relevant to this chapter's material.
Research Aggregators
Papers with Code — Prompt Engineering https://paperswithcode.com/task/prompt-engineering
Tracks published research on prompt engineering techniques with links to papers and implementation code. Useful for staying current as the field moves quickly.
"Awesome Prompts" (GitHub) A community-maintained list of high-quality prompts across use cases, organized by technique and domain. Valuable for seeing how these techniques are applied in practice across many professional contexts.
A Note on Recency
The prompting techniques field moves quickly. The foundational papers cited here are stable — the core findings about CoT and few-shot effectiveness have been replicated many times across many models. But newer techniques and refinements emerge regularly.
For the most current research: check arxiv.org with the search terms "chain-of-thought prompting," "in-context learning," and "prompt engineering" filtered to the last 6-12 months. The Prompt Engineering Guide at promptingguide.ai is updated regularly and is a reliable index of current best practices.