Self-Attention

wojciech achtelik
Wojciech Achtelik
AI Engineer Lead
July 3, 2025
Glossary Category
LLM

Self-Attention is a mechanism that allows neural networks to weigh the importance of different elements within an input sequence when processing each element, enabling models to capture long-range dependencies and contextual relationships effectively. This technique computes attention scores by comparing each element in a sequence against all other elements, creating a weighted representation that emphasizes relevant information while de-emphasizing irrelevant content. Self-attention operates through query, key, and value matrices that transform input embeddings into representations used for similarity calculations and weighted aggregation. The mechanism enables parallel processing of sequence elements, dramatically improving computational efficiency compared to sequential architectures. Self-attention forms the core component of transformer architectures, powering modern language models’ ability to understand context, maintain coherence across long texts, and perform complex reasoning tasks with unprecedented accuracy and fluency.