基于观察上下文压缩的高效终端智能体自进化框架
阅读原文· arxiv.org针对长程终端任务中环境反馈冗余导致token成本二次增长的问题,本文提出即插即用的自进化框架TACO,通过从交互轨迹自动发现并优化压缩规则,实现任务感知的上下文压缩。在TerminalBench等六个基准测试中,该框架使用MiniMax-2.5模型时在降低约10% token开销的同时提升多数基准表现,为强智能体模型带来1%-4%的性能增益,并在相同token预算下进一步提升准确率2%-3%。
As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.