**Linear warmup** (most common): The learning rate increases linearly from 0 to the target - **Exponential warmup**: The learning rate increases exponentially, spending more time at low rates - **Gradual warmup** (Goyal et al., 2017): For very large batch training, warmup over 5--10 epochs