# Transformers 中的 Attention Sink：利用、解释与缓解综述

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-11 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnygovrm003psl137s7rpqa0
- 原文链接：https://arxiv.org/abs/2604.10098

## AI 摘要

Transformer 架构中的 Attention Sink（AS）现象导致注意力过度集中于少量无信息 token，损害模型可解释性、干扰训练推理动态并加剧幻觉。该综述首次系统整合 AS 相关研究，从基础利用、机制解释与策略缓解三个维度梳理领域现状，澄清关键概念并追踪演进趋势，为研究者和从业者提供管理 AS 的参考框架，相关论文列表已开源至 GitHub。

## 正文

As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.