GateMem：多主体共享记忆智能体的记忆治理基准

2026-06-17 08:00·16天前

AI 摘要

GateMem 是一个针对多主体共享记忆智能体的基准，联合评估长期多步请求的效用、上下文访问控制与主动遗忘。测试覆盖医疗、办公、教育和家庭四个领域，包含长篇幅多方对话、增量记忆注入、隐藏检查点与结构化判分。对多种基线和骨干模型的实验表明，没有方法能同时实现强效用、鲁棒访问控制和可靠遗忘。长上下文提示词治理分数最高但 token 成本极高；检索与外部记忆方法成本较低，却仍会泄露未经授权或已删除的信息。当前记忆智能体远未达到在共享机构中可靠部署的要求。

原文 · 未翻译

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

HuggingFace Daily Papers（社区热门论文）

52导出 Markdown

GateMem：多主体共享记忆智能体的记忆治理基准

2026-06-17 08:00·16天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译