# LongAttnComp： 面向长上下文推理的跨模型族上下文压缩

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-31 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpw5g4y2007ksl1ugdf7qgvv
- 原文链接：https://arxiv.org/abs/2606.01336

## AI 摘要

LongAttnComp 是一种针对长上下文适配的方法，它通过微调一个轻量级跨注意力评分层，并引入了 token 级分块、token 预算 top-p 算法、位置重排和格式无关查询解析器。该方法采用两阶段微调：第一阶段基于 NIAH 风格数据构建通用检索基础，第二阶段通过多跳和推理数据进行扩展。实验表明，在 InfiniteBench Code-Debug 上，LongAttnComp 能够匹配或超越全上下文精度，并显著优于无训练基线。在 LongBench v2 上，两阶段配方在多文档推理任务上有效缩小了性能差距，同时保持了代码调试性能，并可跨三个模型族的四个目标模型进行转移。

## 正文

As real-world applications increasingly require processing inputs of 100k+ tokens, the gap between context length and inference efficiency has become a critical bottleneck. Context compression offers a way to reduce prefill costs while preserving task accuracy. However, existing training-free attention-based methods leave substantial gaps in demanding long-context tasks such as code reasoning. We present LongAttnComp, a long-context adaptation of AttnComp that fine-tunes a lightweight cross-attention scoring layer and introduces tokenlevel chunking, a token-budget top-p algorithm, positional reordering, and a formatagnostic query parser. We further design a two-stage fine-tuning recipe for the compressor: Stage 1 builds a general retrieval foundation from NIAH-style data, and Stage 2 extends it with multi-hop and reasoning data for broader long-context task coverage. On InfiniteBench Code-Debug, LongAttnComp matches or exceeds full-context accuracy, substantially outperforms training-free baselines, and transfers across four target models from three families. On LongBench v2, the two-stage recipe largely closes the Stage 1 gap on multi-document reasoning while preserving Code-Debug performance.