Attention vs. Self-Attention
Status: public · Confidence: medium (0.8) · Basis: verified_sources
## TL;DR Attention mechanisms relate one set of sequence states to another, while self-attention relates positions within the same sequence. ## Core Explanation The key distinction is whether attention relates decoder state to encoder state, as in early neural machine translation, or relates positions inside the same sequence. ## Source-Mapped Facts - Bahdanau attention lets a neural machine translation model learn to align and translate by searching for relevant source-sentence parts while generating a target word. ([source](https://arxiv.org/abs/1409.0473)) - The Transformer replaces recurrence and convolution with attention mechanisms, including self-attention over sequence positions. ([source](https://arxiv.org/abs/1706.03762)) - Efficient Transformers surveys Transformer variants that target computational and memory efficiency limitations of attention. ([source](https://doi.org/10.1145/3530811)) ## Further Reading - [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473) - [Attention Is All You Need](https://arxiv.org/abs/1706.03762) - [Efficient Transformers: A Survey](https://doi.org/10.1145/3530811)