RAT+：用指数衰减记忆增强注意力，改善查询感知KV稀疏性

2026-05-27 08:00·37天前

AI 摘要

RAT+ 引入指数衰减记忆增强注意力，使模型在推理时支持灵活的空洞注意力。将 RAT+ 与 Quest、MoBA、SnapKV 等查询感知稀疏推理方法结合，在八个 needle-in-a-haystack 任务上，不同稀疏预算下均一致优于标准注意力。验证基于 RAT+ 已发布检查点及用额外 10B token 继续预训练的 OLMo2-7B。最后提出两种假说解释记忆模块为何有益。

原文 · 未翻译

Efficient inference is critical for long-context language models, where attention computation and KV-cache access dominate the cost. Recent work RAT+, introduces a recurrence-augmented attention backbone that enables flexible dilated attention at inference time. In this paper, we investigate whether this exponentially decaying memory can also improve existing query-aware sparse inference methods. Using representative methods including Quest, MoBA, and SnapKV, we show that RAT+ consistently improves accuracy over standard attention across sparse budgets on eight needle-in-a-haystack tasks. We validate these gains both on the released checkpoints from the RAT+ paper and on OLMo2-7B, which we continue pretraining with the added memory module for 10B tokens. Finally, we propose two hypotheses explaining why this memory module benefits query-aware sparse inference and design targeted experiments to support them.

HuggingFace Daily Papers（社区热门论文）

44导出 Markdown

RAT+：用指数衰减记忆增强注意力，改善查询感知KV稀疏性

2026-05-27 08:00·37天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译