Attention Mechanism

The core innovation of the **transformer** architecture. Allows the model to weigh the relevance of different parts of the input when processing each element. Enables the model to "focus" on relevant context regardless of distance in the text. (Ch. 2)