该论文指出,当AI智能体在多轮对话中重复使用相同文档和历史记录时,固定的上下文策略并非最优。研究提出了“效率前沿”框架,将上下文策略选择建模为一个成本与性能的平衡问题。通过引入重用参数N进行扫描,可以识别出检索、压缩或全上下文各自占据优势的交叉区域。在5000个HotpotQA实例上的测试表明,部署感知的选择能在保持相同性能下减少约25%的有效token使用量,而摊销内存压缩在高性能设置下比全上下文提示的运行成本便宜超过50%。
// The Efficiency Frontier //
Cool paper on context management.
As agents reuse the same documents and histories across many turns, the cheapest context strategy is not fixed. This work describes a principled rule for picking one per deployment instead of defaulting to whatever topped a benchmark in isolation.
Retrieval and compression methods are almost always benchmarked on accuracy and cost separately, so you never learn when one actually beats another under real load.
The Efficiency Frontier models context strategy selection as a single cost-performance problem, with a log-utility term for diminishing returns from extra context and a reuse parameter N that amortizes preprocessing across repeated queries.