Chapter 2: Further Reading
Annotated Bibliography
Foundational Papers
- "Attention Is All You Need" by Vaswani et al. (2017) Published in: Advances in Neural Information Processing Systems (NeurIPS)
The paper that introduced the transformer architecture, which underpins every modern AI coding assistant. While the mathematical notation can be dense, the introduction and conclusion sections are accessible and provide essential context for understanding why transformers replaced earlier architectures. Focus on Sections 1-3 for the conceptual framework; the attention diagrams in Section 3.2 are particularly illuminating.
- "Language Models are Few-Shot Learners" by Brown et al. (2020) Published by OpenAI (GPT-3 paper)
This paper demonstrated that scaling language models to 175 billion parameters produced emergent abilities -- including code generation -- without task-specific training. The few-shot learning results (Sections 3-4) show how providing examples in the prompt can dramatically change model behavior, directly relevant to the context management skills discussed in this chapter.
- "Training Language Models to Follow Instructions with Human Feedback" by Ouyang et al. (2022) Published by OpenAI (InstructGPT paper)
The definitive paper on RLHF for language models. Explains the three-stage training pipeline (supervised fine-tuning, reward model training, and PPO optimization) in detail. Section 3 provides an accessible overview of the methodology. Essential reading for understanding why modern AI assistants behave so differently from raw language models.
AI for Code
- "Evaluating Large Language Models Trained on Code" by Chen et al. (2021) Published by OpenAI (Codex paper)
Introduces the Codex model (which powered the original GitHub Copilot) and the HumanEval benchmark for evaluating code generation. The evaluation methodology and failure analysis (Section 4) provide valuable insight into where AI code generation excels and where it fails. The discussion of functional correctness versus syntactic correctness is particularly relevant.
- "A Survey of Large Language Models for Code" by Zan et al. (2024) Published in: ACM Computing Surveys
A comprehensive survey covering the landscape of LLMs for code generation, code understanding, code repair, and related tasks. Provides an excellent overview of the field as of 2024, including training methodologies, benchmark comparisons, and practical applications. A good starting point for readers who want a broad understanding before diving into specific papers.
- "Competition-Level Code Generation with AlphaCode" by Li et al. (2022) Published by DeepMind
Demonstrates AI code generation at the level of competitive programming. The paper's approach of generating many candidate solutions and filtering them is instructive for understanding the gap between AI code generation and human problem-solving. The analysis of failure modes (Section 5) complements our Chapter 2 discussion of where AI struggles.
Accessible Books and Articles
- "The Illustrated Transformer" by Jay Alammar (2018) Available at: jalammar.github.io/illustrated-transformer/
The single best visual explanation of how transformers work. Alammar's step-by-step illustrations of self-attention, multi-head attention, and the encoder-decoder architecture make these concepts accessible without requiring advanced mathematics. Highly recommended as a companion to Section 2.3 of this chapter. Follow-up posts on GPT-2 and BERT are equally valuable.
- "What Are Embeddings?" by Vicki Boykis (2023) Available at: vickiboykis.com/what_are_embeddings/
A thorough, clearly written explanation of word and token embeddings -- the numerical representations that form the input to transformer models. Covers the evolution from word2vec to modern contextual embeddings with practical examples. Excellent for readers who want to understand the embedding step described in our case study.
- "Build a Large Language Model (From Scratch)" by Sebastian Raschka (2024) Published by Manning Publications
A hands-on book that walks through building a GPT-style language model from scratch in Python. While more technical than this chapter, it provides concrete implementations of every concept we covered: tokenization, embeddings, attention, and training. Recommended for readers who learn best by building.
-
"Understanding Deep Learning" by Simon J.D. Prince (2023) Published by MIT Press (available free online)
A comprehensive textbook that covers neural networks from basics through transformers. Chapters 12-13 on transformers and Chapter 14 on large language models are directly relevant. The mathematical treatment is thorough but the accompanying visualizations make concepts accessible. A good reference for readers who want more depth than this chapter provides.
Practical and Applied
-
"Prompt Engineering Guide" by DAIR.AI (ongoing) Available at: promptingguide.ai
A continuously updated, community-driven guide to prompt engineering techniques. Covers zero-shot, few-shot, chain-of-thought, and many other prompting strategies with practical examples. Directly applicable to the practical implications discussed in Section 2.10 and serves as a bridge to the prompting chapters later in this book.
-
"The Tokenizer Playground" by OpenAI Available at: platform.openai.com/tokenizer
An interactive tool for exploring how text is tokenized by GPT models. Essential for developing intuition about token counts discussed in Section 2.6. Try pasting Python code, English text, and different programming languages to see how tokenization varies. The
tiktokenPython library provides the same functionality programmatically. -
"Constitutional AI: Harmlessness from AI Feedback" by Bai et al. (2022) Published by Anthropic
Describes the Constitutional AI approach used in training Claude, where the model evaluates its own outputs against a set of principles. This paper explains the alignment methodology referenced in Section 2.8 and helps readers understand how AI assistants are trained to be both helpful and safe. Sections 1-3 are accessible; later sections are more technical.
-
"A Survey on Hallucination in Large Language Models" by Huang et al. (2023) Published in: ACM Computing Surveys
A systematic survey of why language models produce confident but incorrect output (hallucinations). Directly relevant to understanding why AI can produce plausible-looking but wrong code, as discussed in Section 2.9. The taxonomy of hallucination types and the mitigation strategies are both practically useful for vibe coders.
-
"Scaling Laws for Neural Language Models" by Kaplan et al. (2020) Published by OpenAI
Establishes the mathematical relationships between model size, training data volume, compute budget, and model performance. While technical, the key insight is accessible: performance improves predictably with scale, following power laws. This helps explain why larger models generate better code and why the field continues to invest in scale. Focus on the summary and figures in Sections 1-2.