Glossary

Residual connection

**Pre-layer normalization** (again) - **Feed-forward network** — typically a gated variant: SwiGLU (Shazeer, 2020), computed as $\text{FFN}(x) = (W_1 x \odot \sigma(W_g x)) W_2$, where $\sigma$ is the SiLU activation and $\odot$ is element-wise multiplication - **Residual connection**

Learn More

Related Terms