Mem-π：通过学习何时与生成何物实现的自适应记忆

2026-05-20 08:00·44天前

AI 摘要

Mem-π是一个用于大型语言模型代理的自适应记忆框架，它通过专门的模型按需生成指导内容，而非从外部记忆库检索静态信息。该框架采用决策-内容解耦的强化学习方法，使模型能自主判断是否生成指导及生成何种内容。在涵盖网页导航、终端工具使用等多样化的代理任务基准测试中，Mem-π性能持续优于检索式方法和现有强化学习记忆方案，其中在网页导航任务上实现了超过30%的相对提升。

原文 · 未翻译

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-π consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.

HuggingFace Daily Papers（社区热门论文）

65导出 Markdown

Mem-π：通过学习何时与生成何物实现的自适应记忆

2026-05-20 08:00·44天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译