# SubtleMemory： 细粒度关系记忆辨别基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-04 08:00
- AIHOT 分数：53
- AIHOT 链接：https://aihot.virxact.com/items/cmq4oaypg006zslt2p6miqpmv
- 原文链接：https://arxiv.org/abs/2606.05761

## AI 摘要

SubtleMemory是一个评估长期运行AI智能体在细粒度关系记忆辨别上能力的基准。它构建关系控制的潜在语义伪影变体（包含互补、细微或矛盾关系），并嵌入逼真的用户-智能体历史。基准包含1,522个评估实例，基于10个长历史，覆盖用户相关与非用户相关查询。评测了多个独立记忆系统和Claw-style智能体，发现当前系统表现薄弱。研究还引入诊断协议，揭示记忆保留、检索和下游推理阶段的差异化能力轮廓。

## 正文

Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated recall. Existing long-term memory benchmarks rarely probe how agents preserve and utilize such relations during downstream tasks. To address this gap, we introduce SubtleMemory, a benchmark for fine-grained relational memory discrimination in long-running AI agents. SubtleMemory constructs relation-controlled latent semantic artifacts whose variants instantiate complementary, nuanced, or contradictory relations, and embeds them into realistic user-agent histories, requiring agents to recover distributed relational structures during later queries and instructions. The benchmark contains 1,522 evaluation instances over 10 long histories, grounded in 1,090 relation-controlled memory-variant sets and spanning user-related and non-user-related queries. Evaluating six standalone memory systems, two Claw-style agents with native memory modules, and three Claw-style agents with plugin memory modules, we find that current systems remain weak on fine-grained relational memory discrimination. We further introduce diagnostic protocols that reveal distinct capability profiles across memory preservation, retrieval, and downstream reasoning stages.