PEAM:通过经验对比内化实现的参数化具身智能体记忆
阅读原文· arxiv.orgPEAM是在Minecraft中提出的智能体记忆框架,它将记忆从推理时检索转变为通过经验内化的参数化技能。框架将用于开放式推理的慢速大语言模型与用于快速技能执行的参数模块相结合。该快速模块采用多模态Mixture-of-Experts LoRA架构,并配备按类别物理隔离的适配器,以实现无灾难性遗忘的参数级持续学习。框架将失败视为首要训练信号,通过行为克隆与对比目标联合学习失败-纠正轨迹对,使智能体不仅学习成功方案,还理解纠正与失败的区别。为控制整合过程,PEAM引入了参数化价值评分和无尺度自触发整合机制,使智能体能够自我进化,无需任务特定阈值即可决定整合时机。实验表明,该框架提升了长期任务性能,缓解了技能遗忘,并提升了参数化记忆相对于检索方法的效率。
We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.