Glossary

Spatial reduction attention

it reduces the spatial resolution of keys and values by a factor $R$ before computing attention, changing the complexity from $O(N^2)$ to $O(N^2/R)$. (2) **Hierarchical architecture** — it produces multi-scale features at 1/4, 1/8, 1/16, and 1/32 resolution, so attention at higher levels operates on

Learn More

Related Terms