Appendix G: Glossary
Definitions of key terms used throughout this book. Where a term is introduced or discussed in detail in a specific chapter, that chapter is noted in parentheses at the end of the definition. Terms are arranged alphabetically.
Agent / Agentic AI An AI system that can plan and execute multi-step tasks with some degree of autonomy, including taking actions (browsing the web, running code, calling APIs, managing files) and making decisions about how to proceed based on intermediate results. Unlike a single-turn chatbot interaction, agentic AI operates in loops, observing the results of its actions and adapting its approach. Agentic capabilities are an active area of development as of 2025. (Chapter 10)
Alignment The challenge of ensuring that AI systems pursue goals that are consistent with human values and intentions. An AI that is well-aligned behaves as intended even in novel situations; a misaligned AI may pursue proxy metrics or exhibit unexpected behavior when deployed in contexts outside its training distribution. Alignment is a major area of AI safety research. (Chapter 9)
Anthropic An AI safety company and the developer of the Claude family of AI models. Founded in 2021 by former OpenAI researchers. Notable for its emphasis on AI safety research and its Constitutional AI approach to model training.
API (Application Programming Interface) A set of defined rules that allow one software application to communicate with another. In the context of this book, AI APIs allow developers and technical users to access AI model capabilities programmatically — sending prompts and receiving responses via code rather than through a chat interface. API access enables automation, integration, and customization that the standard interface does not support. (Chapter 11, Appendix B)
Attention Mechanism The computational technique at the heart of transformer architectures that allows a model to weigh the relevance of different parts of an input when generating each part of an output. Attention allows the model to "focus" on distant parts of a document or conversation that are contextually relevant, rather than only considering nearby text. (Chapter 2)
Automation Bias The tendency for humans to over-trust and over-rely on automated or AI-generated outputs — accepting them without adequate verification, even when the human has the expertise to detect errors. Automation bias is a well-documented phenomenon in human factors research and is a primary risk in AI-assisted professional work. (Chapter 7, Appendix F)
Baseline In the context of AI use, a baseline is a measurement of performance or output quality before AI assistance is introduced. Establishing a baseline allows meaningful evaluation of whether AI tools are actually improving outcomes. (Chapter 13)
Benchmark A standardized test used to compare the performance of AI models across capabilities (reasoning, language understanding, coding, etc.). Common benchmarks include MMLU (Massive Multitask Language Understanding), HumanEval (coding), and HellaSwag (commonsense reasoning). Benchmark performance does not always predict real-world performance on professional tasks. (Chapter 2)
Chain-of-Thought Prompting (CoT) A prompting technique in which the user instructs the AI to reason through a problem step by step before producing a final answer. Research shows this substantially improves performance on multi-step reasoning, math, and analytical tasks. The simplest implementation is appending "Let's think step by step" to a prompt. (Chapter 5, Appendix F Study #10)
Chunking Breaking a large document or task into smaller segments to work within context window limits or to enable more focused AI processing. For example, summarizing a 200-page report by first summarizing each chapter, then synthesizing those summaries. (Chapter 6)
Claude A family of AI models developed by Anthropic, including Haiku (fastest, most economical), Sonnet (balanced), and Opus (most capable). Claude models are known for strong instruction-following, long context windows, and a safety-focused development approach. (Throughout)
Cognitive Offloading The use of external tools, environments, or people to reduce the cognitive demands on the individual — essentially outsourcing some mental work to the outside world. All professional tools involve cognitive offloading to some degree; AI tools represent a powerful new form of it. (Chapter 8, Appendix F Study #17)
Constitutional AI Anthropic's approach to training AI models to behave safely and helpfully by specifying a set of principles (a "constitution") and using AI feedback rather than only human feedback to reinforce behavior consistent with those principles. (Chapter 9)
Context Window The maximum amount of text (measured in tokens) that a model can "see" and process at one time, including both the input (prompt, conversation history) and the output. As of early 2025, context windows range from approximately 8,000 tokens (smaller models) to over 1 million tokens (Gemini 1.5 Pro). Information outside the context window is not accessible to the model. (Chapter 3)
Corpus A large collection of text data used to train a language model. The quality, diversity, and recency of training corpora significantly affect model capabilities and biases. (Chapter 2)
Diffusion Model A type of generative model used primarily for image generation (Stable Diffusion, DALL-E). Diffusion models work by learning to reverse a process of progressively adding noise to an image, enabling them to generate images from noise by gradually "denoising" toward a coherent output guided by a text prompt. (Chapter 12)
Embedding A numerical representation of text (or other data) in a high-dimensional vector space, where semantically similar content is located close together. Embeddings enable semantic search, similarity comparison, and efficient information retrieval — key components of RAG systems and modern AI applications. (Chapter 10)
Emergent Behavior Capabilities that appear in AI models at sufficient scale that were not explicitly trained for and were not present in smaller versions of the same model architecture. Examples include in-context learning and complex reasoning. Emergent behaviors are not fully predictable, which is a source of both excitement and concern in AI research. (Chapter 2)
Evaluation (Evals) Systematic methods for measuring the quality of AI model outputs on specific tasks. Evals range from automated benchmarks (scoring on standardized tests) to human preference ratings to domain-specific rubrics. Running evals is essential for making informed decisions about model selection and prompt design in professional settings. (Chapter 13)
Few-Shot Prompting Providing the model with a small number of input-output examples within the prompt to demonstrate the desired behavior or format. As opposed to zero-shot prompting (no examples) or fine-tuning (modifying the model itself). Few-shot prompting is highly effective for tasks with a specific format or style requirement. (Chapter 5)
Fine-Tuning Continuing to train a pre-trained foundation model on a smaller, domain-specific dataset to adapt it to particular tasks or styles. Fine-tuning modifies the model's weights, unlike prompting which does not. Organizations may fine-tune models on their own data to improve domain-specific performance. (Chapter 2)
Foundation Model A large AI model trained on broad data at scale that serves as the base for many downstream applications. Foundation models are not designed for a single task — they are adapted to specific uses through prompting, fine-tuning, or other techniques. GPT-4, Claude 3, and Gemini 1.5 are examples. (Chapter 2)
Generative AI AI systems that produce new content — text, images, audio, code, video — rather than simply classifying or retrieving existing content. Modern generative AI models are typically based on transformer or diffusion architectures trained on large datasets. (Chapter 1)
Grounding Connecting AI outputs to verified external information sources, typically through RAG or web search integration, to reduce hallucination and improve factual accuracy. A grounded response cites retrievable, verifiable sources. (Chapter 10)
Guardrails Safety constraints built into AI systems — through training, system prompts, or filtering layers — to prevent harmful, illegal, or policy-violating outputs. Guardrails are a key part of responsible AI deployment but can sometimes be over- or under-calibrated for specific professional contexts. (Chapter 9)
Hallucination The generation of text that is factually incorrect, fictional, or fabricated, presented with the same fluency and apparent confidence as accurate information. Hallucination is an inherent property of current language models, not a fixable bug. It arises from the statistical nature of text generation, not from any form of deliberate deception. (Chapter 7, Appendix F Studies #6-9)
Human-in-the-Loop (HITL) A system design principle in which human judgment is integrated into an AI-assisted workflow at key decision points, rather than allowing AI to operate fully autonomously. HITL is essential in high-stakes applications and wherever AI error rates are consequential. (Chapter 7)
Inference The process of running a trained AI model to generate an output from a given input. In AI contexts, inference is what happens when you send a prompt and receive a response. Distinct from training (which creates the model). Inference costs are what API users pay for. (Chapter 2)
In-Context Learning The ability of large language models to adapt their behavior based on examples or instructions provided within the prompt, without any change to the model's parameters. Few-shot and zero-shot prompting both rely on in-context learning. (Chapter 5)
Instruction Tuning A form of fine-tuning in which models are trained on examples of instructions and appropriate responses, teaching the model to follow directions rather than simply completing text. Instruction tuning is what transforms a raw base model into a useful assistant. (Chapter 2)
Iteration / Iterative Prompting The practice of refining a prompt through multiple attempts, using the model's output to inform how to revise the input. Effective AI use is rarely a single prompt — it is a dialogue that converges toward a useful result. (Chapter 4)
Jailbreaking Attempts to circumvent the safety guidelines and guardrails of an AI model, typically through adversarially constructed prompts. Jailbreaking raises legal and ethical concerns and violates most AI providers' terms of service. (Chapter 9)
Knowledge Cutoff (Training Cutoff) The date after which no information was included in a model's training data. A model with a training cutoff of, say, January 2024 has no knowledge of events that occurred after that date. This is a significant limitation for tasks requiring current information. (Chapter 3)
Large Language Model (LLM) A type of AI model trained on vast text datasets, with billions to trillions of parameters, capable of generating coherent, contextually appropriate text across a wide range of tasks. LLMs are the foundation of modern AI chat assistants and AI writing tools. (Chapter 2)
Latency The time between sending a request to an AI model and receiving a complete response. Latency matters in real-time applications and in user experience design. Smaller models generally have lower latency than larger ones. (Chapter 11)
Meta-Prompting Asking the AI to help you improve your prompt. For example, "Here is my current prompt — how can I revise it to get better results?" A useful technique for breaking out of prompt design dead ends. (Chapter 5)
Model Card A documentation artifact published by AI developers that describes a model's capabilities, limitations, intended uses, performance characteristics, and ethical considerations. Reading model cards is a good practice for anyone deploying AI models in professional applications. (Chapter 9)
Multimodal An AI model's ability to process and generate multiple types of data — text, images, audio, video, and documents — within a single model. Leading models including GPT-4o, Claude 3.5, and Gemini 1.5 are multimodal. (Chapter 12)
Next-Token Prediction The core training objective of most language models: given all the preceding text, predict the most likely next word (token). Training on this objective at massive scale produces models with broad language understanding and generation capability as a byproduct of learning to predict text well. (Chapter 2)
Overfitting When a model learns the training data too specifically, performing well on training examples but poorly on new inputs. In the context of fine-tuning, overfitting on a small domain dataset can make a model less useful for other tasks. (Chapter 2)
Parameters The numerical weights inside a neural network that are learned during training and determine how the model transforms inputs into outputs. "Parameter count" (commonly measured in billions, e.g., GPT-4 is estimated at ~1.8 trillion) is often used as a rough proxy for model capability, though the relationship is not linear. (Chapter 2)
Perplexity A technical measure of how well a language model predicts a text sample — lower perplexity indicates the model found the text more predictable. Also the name of a popular AI-powered search tool (Perplexity AI). (Chapter 2)
Pre-training The initial large-scale training of a foundation model on broad text data, using self-supervised objectives like next-token prediction. Pre-training is computationally expensive (millions of dollars for frontier models) and produces the base of general language capability on which all subsequent fine-tuning builds. (Chapter 2)
Prompt / Prompting A prompt is the input you provide to an AI model to elicit a desired output. Prompting is the practice of crafting these inputs deliberately and skillfully to get useful, accurate, and appropriately formatted responses. Good prompting is the primary practical skill developed in this book. (Chapters 4-5)
Prompt Engineering The systematic practice of designing, testing, and refining prompts to reliably produce desired outputs from AI models. Can range from simple rewording to complex structured prompting techniques. (Chapter 5)
Prompt Injection An attack or failure mode in which malicious instructions embedded in external content (a webpage, user-submitted text, a document) are interpreted as instructions by an AI agent, causing unintended behavior. A significant security concern in agentic AI applications. (Chapter 10)
RAG (Retrieval-Augmented Generation) A technique that combines information retrieval with language model generation. When a query is received, relevant documents are retrieved from a database and provided to the model as context, allowing it to ground its response in specific, verifiable sources rather than relying entirely on training data. Reduces hallucination and enables up-to-date responses. (Chapter 10)
RLHF (Reinforcement Learning from Human Feedback) A training technique in which human raters evaluate model outputs and those ratings are used to train a reward model, which is then used to fine-tune the main model to produce outputs humans prefer. RLHF is a key technique for making models helpful, harmless, and honest. Used by OpenAI, Anthropic, and others. (Chapter 2)
Sampling / Temperature Temperature is a parameter that controls the randomness of a model's output. At temperature 0, the model always picks the most probable next token (deterministic, consistent). At higher temperatures, less probable tokens are sampled more often, producing more varied but potentially less accurate outputs. Most chat interfaces use temperatures around 0.7-1.0. (Chapter 3)
Semantic Search Search that uses the meaning of a query to find relevant results, rather than exact keyword matching. AI systems use embeddings to enable semantic search — finding content that means the same thing even if worded differently. (Chapter 10)
System Prompt Instructions provided to an AI model before the conversation begins — often invisible to the end user — that configure the model's behavior, role, style, and constraints for that session. System prompts are widely used in AI products to customize behavior for specific applications. Understanding system prompts is important for API users and for understanding why the same underlying model behaves differently across applications. (Chapter 3, Appendix B)
Temperature See Sampling / Temperature.
Token The basic unit of text that AI language models process. A token is roughly 3-4 characters or approximately 0.75 English words. The word "understanding" is one token; the phrase "AI tools" is two tokens. API costs and context windows are measured in tokens. (Chapter 3, Appendix E)
Top-p (Nucleus Sampling) A sampling technique that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p. Often used alongside or instead of temperature to control output diversity. (Chapter 3)
Training Cutoff See Knowledge Cutoff.
Transfer Learning The technique of taking a model pre-trained on one task or domain and adapting it for a related task, rather than training from scratch. All modern LLMs rely on transfer learning — general language understanding learned during pre-training is transferred to specific applications through fine-tuning or prompting. (Chapter 2)
Transformer The neural network architecture introduced in the paper "Attention Is All You Need" (Vaswani et al., 2017) that underlies virtually all modern large language models. Key innovations include the attention mechanism and the ability to process all parts of an input in parallel rather than sequentially, enabling training on vastly larger datasets than previous architectures. (Chapter 2)
Trust Calibration The process of aligning your level of trust in AI outputs with the actual reliability of those outputs for specific types of tasks. Well-calibrated trust means verifying when errors are likely and consequential, and trusting more freely when errors are rare or low-stakes. Both over-trust and under-trust impose costs. (Chapter 7)
Vector Database A specialized database designed to store and query high-dimensional numerical vectors (embeddings). Used in RAG systems to efficiently find documents semantically similar to a query. (Chapter 10)
Verbosity The tendency of AI models to produce longer responses than necessary — adding caveats, restating the question, providing unnecessary context, or padding outputs. Verbosity can be reduced through explicit length instructions in prompts. (Chapter 4)
Zero-Shot Prompting Asking a model to perform a task without providing any examples, relying solely on the model's pre-trained knowledge and instruction-following ability. Contrasted with few-shot prompting (which includes examples). Large modern models are capable zero-shot performers across many tasks. (Chapter 5)
Terms not found here may be defined in the chapter where they are introduced. For the most current definitions in a rapidly evolving field, also consult the glossaries maintained by Anthropic (anthropic.com), OpenAI (openai.com), and Google DeepMind (deepmind.google).