Glossary

hard attention

the model is selecting a single input position. Soft attention (where weights are distributed) computes an interpolation between multiple encoder states, enabling smoother gradient flow.

Learn More

Related Terms