The initial training phase of an **LLM** where it learns from massive amounts of text data to predict the next token. Produces general language understanding before **fine-tuning** for specific tasks. (Ch. 2)