Glossary

Variants:

**Momentum:** $\boldsymbol{v}^{(t+1)} = \gamma \boldsymbol{v}^{(t)} + \eta \nabla f$; $\boldsymbol{\theta}^{(t+1)} = \boldsymbol{\theta}^{(t)} - \boldsymbol{v}^{(t+1)}$ - **Adam:** Adaptive learning rates using first and second moment estimates. The default optimizer for most deep learning models in

Learn More

Related Terms