AGENTCL 提出评估 AI 智能体是否真正从经验学习,而非单纯累积信息。通过构建组合任务流(前序任务包含可被后续任务复用的代码片段、研究证据或工作流),与无固定复用线索的随意任务流对比。关键发现:当前记忆方法在任务连接明显时可复用过去经验,但当任务差异较大时仍难以避免混淆。论文旨在为智能体持续学习提供更清晰的测评标准。
Most AI agents do not forget because they lack memory; they fail because they remember badly.
AGENTCL asks a simple question: does an AI agent really learn from experience, or merely carry clutter forward?
Today's agents can spend enormous effort solving one task, then enter the next one almost as if nothing happened.
AGENTCL says AI agents need better tests for whether their memory actually helps them learn across tasks.
The paper's main idea is to build task streams where earlier tasks clearly contain pieces that later tasks can reuse, such as a small coding function, evidence for a research question, or a useful workflow.
It compares these careful "compositional" streams with normal "naive" streams, where tasks come from the same area but do not have a guaranteed reuse link.