TokenPilot 提出一种针对 LLM 智能体的缓存高效上下文管理方法,通过摄入感知压缩和生命周期感知驱逐两大机制,在 PinchBench 和 Claw-Eval 基准上实现 61–87% 的成本降低,同时保持有竞争力的分数。传统方法通常直接截断或摘要历史,容易导致文本偏移、破坏 prompt 缓存。TokenPilot 在工具结果进入上下文前进行清理,保持早期提示布局稳定;同时延迟删除旧任务历史,因为已完成的工作仍可能为引用相同文件或目标的后续任务提供帮助。
TokenPilot reduces LLM agent costs via ingestion-aware compaction and lifecycle-aware eviction.
Achieves 61-87% cost reduction on PinchBench and Claw-Eval with competitive scores.
Argues that cheaper AI agents need stable memory, not just shorter prompts.
Older methods usually cut or summarize the history, but that can shift the text around and break the prompt cache, which is the system that reuses unchanged prompt text to save money.
TokenPilot tries to fix both sides at once by cleaning new tool results before they enter the context and by keeping the early prompt layout stable across tasks.