# TokenPilot：面向LLM智能体的缓存高效上下文管理框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-15 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmqg0v5ep02heslspaj2hexz4
- 原文链接：https://arxiv.org/abs/2606.17016

## AI 摘要

TokenPilot是一种双粒度上下文管理框架，旨在解决长对话场景中LLM智能体因上下文累积导致的高推理成本。全局层面，Ingestion-Aware Compaction稳定提示词前缀并消除环境噪声；局部层面，Lifecycle-Aware Eviction监控上下文片段剩余效用，仅在任务相关性过期时卸载。在PinchBench和Claw-Eval上，孤立模式成本降低61%和56%，连续模式降低61%和87%，同时保持竞争力。该框架已集成至LightMem2。

## 正文

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This reveals a critical trade-off between text sparsity and prompt cache continuity. To address this, we present TokenPilot, a dual-granularity context management framework. Globally, Ingestion-Aware Compaction acts as a framework harness to stabilize prompt prefixes and eliminate open-world environmental noise at the ingestion gate. Locally, Lifecycle-Aware Eviction monitors the ongoing residual utility of context segments, enforcing a conservative batch-turn schedule to offload content segments only when task relevance expires. Experiments on PinchBench and Claw-Eval under both isolated and continuous modes demonstrate that TokenPilot reduces costs by 61% and 56% in isolated mode, and 61% and 87% in continuous mode, while maintaining competitive performance compared to prior systems. TokenPilot has been integrated into LightMem2 at https://github.com/zjunlp/LightMem2.