# 反射性智能体中的记忆虚构现象

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-31 08:00
- AIHOT 分数：42
- AIHOT 链接：https://aihot.virxact.com/items/cmq687vre062ksl5imrrsuxme
- 原文链接：https://arxiv.org/abs/2605.29463

## AI 摘要

研究发现，基于Reflexion的智能体依赖自我生成的反思作为记忆，但在ALFWorld和HumanEval任务中会系统性失败：智能体存储了自信但错误的任务解释，并在环境每次重置为正确任务的情况下仍持续按错误解释行动。该现象被命名为“记忆虚构”。作者提出Reflection Repetition Rate（RRR），一种基于日志的指标，用于检测对错误反思内容的重复依赖，并据此识别出ALFWorld中16个冻结环境（121条反思中0条提及正确目标对象）以及HumanEval中4个类似案例。缓解方案用程序化提取轨迹级失败信号替代开放式自我诊断，使正确提及目标对象从0%提升至86%，RRR从0.64降至0.10，并解决了16个冻结环境中的3个。

## 正文

Reflexion-style agents rely on self-generated reflections as memory, implicitly assuming that agents can accurately diagnose their own failures. We show that this assumption can fail systematically: across ALFWorld and HumanEval, agents store confident but incorrect interpretations of the task and continue acting on them across trials, even though the environment resets to the correct task each time. We call this failure mode memory confabulation and introduce the Reflection Repetition Rate (RRR), a log-based metric that detects repeated reliance on incorrect reflective content. Using RRR, we identify 16 frozen environments in ALFWorld, where 0 of 121 reflections mention the correct target object, and 4 analogous cases in HumanEval. Our mitigation replaces open-ended self-diagnosis with programmatic extraction of trajectory-level failure signals, increasing correct object mention from 0% to 86%, reducing RRR from 0.64 to 0.10, and solving 3 of 16 frozen ALFWorld environments, suggesting that reflective memory can reinforce false beliefs rather than correct them.
