# MEMPROBE：基于隐藏用户状态恢复的长期记忆智能体探测基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-23 08:00
- AIHOT 分数：45
- AIHOT 链接：https://aihot.virxact.com/items/cmqsr0g2805ixslfueyqppmzw
- 原文链接：https://arxiv.org/abs/2606.24595

## AI 摘要

MEMPROBE是一个评估长期记忆AI智能体的新基准，通过隐藏用户状态恢复直接衡量记忆质量。基准在受控任务中模拟50个用户，每个携带31个隐藏维度（共1550个恢复目标），让配备记忆的智能体辅助完成任务，随后从记忆中重构用户状态，支持全存储与top-k两种访问模式。测试5种代表性记忆系统后，任务完成率几乎饱和（无记忆基线也达），但类别平衡恢复率仅约0.6，在top-k检索下进一步下降。MEMPROBE是首个直接研究记忆恢复的基准，将恢复率作为可优化目标。

## 正文

Long-term memory promises LLM agents that grow more capable across sessions, maintaining an accurate, evolving understanding of the user that interaction forms. In practice, however, this memory is evaluated mostly through downstream behavior, such as later answers, personalization quality, or task success, which tests that understanding only indirectly and leaves the memory artifact itself largely unaudited. We argue that long-term memory should instead be evaluated as an auditable post-interaction artifact: after ordinary assistance, what structured user state can be reconstructed from the memory the agent leaves behind? We instantiate this view in MEMPROBE, a benchmark in which a memory-equipped agent assists simulated users, each carrying a hidden, taxonomy-anchored user-state bank, across a trajectory of leak-controlled tasks, after which that bank is reconstructed from the agent's resulting memory under both full-store and top-k access. Built on synthetic ground truth for efficient, scalable measurement, MEMPROBE spans 50 simulated users with 31 hidden dimensions each (1,550 recovery targets) and tests 5 representative memory systems. Testing state-of-the-art memory agents, we find that successful assistance and recoverable memory behave as distinct capabilities. Task completion nearly saturates, even for a memoryless baseline, while category-balanced recovery stays moderate (about 0.6) and drops further under top-k retrieval. MEMPROBE is the first benchmark to study memory recovery directly, reconstructing the user state a system retains and scoring it against ground truth. We see recovery as a concrete objective for future memory agents to optimize, and MEMPROBE as a step toward an environment where agents are trained to remember their users, growing more faithful the longer they know them.