大语言模型的上下文信念管理

2026-05-28 08:00·36天前

AI 摘要

该研究指出大语言模型在长时程交互中需要管理累积信息，即上下文信念管理（CBM）。研究提出BeliefTrack基准进行精确评估，涵盖规则发现与电路诊断任务。发现普通大语言模型存在严重的CBM失败，包括无法保持状态、无法更新状态及无法隔离噪声。显式信念追踪提示收效有限，而采用信念状态奖励的强化学习将平均失败率降低了70.9%。进一步的表征层面引导将两项任务的失败率降低了46.1%。相关代码将在GitHub开源。

原文 · 未翻译

Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as Contextual Belief Management (CBM): maintaining a predicted belief state aligned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, a closed-world benchmark spanning Rule Discovery and Circuit Diagnosis, where a finite belief space and symbolic verifiers enable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay, Failed Update, and Failed Isolation. Across multiple LLMs, vanilla models exhibit severe CBM failures, while explicit belief-tracking prompts provide limited gains. In contrast, reinforcement learning with belief-state rewards reduces failure rates by 70.9\% on average. Further probing reveals latent belief-state dynamics behind these failures, and representation-level steering reduces failure rates by 46.1\% across two tasks\footnote{Code is coming soon at https://github.com/zjunlp/CBM.

HuggingFace Daily Papers（社区热门论文）

60导出 Markdown

大语言模型的上下文信念管理

2026-05-28 08:00·36天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译