CalVerT:带校准验证器遥测的智能体在知识密集型任务中提升行动与学习
阅读原文· arxiv.org大语言模型智能体在知识密集型问答中常因无法判断答案是否不确定、无支撑或已完整,导致过早给出自信但无支撑的回答,或在证据足够时过度检索。CalVerT通过向智能体状态注入校准的自信心分数和基础验证器分数,提供更完整的状态空间视图。在四个QA基准上,无需训练即可提升F1,既触发对过度依赖参数知识的检索,又减少冗余检索。经强化学习训练后,添加CalVerT遥测的智能体表现优于同等训练的无遥测系统。
LLM agents in knowledge intensive question answering take retrieval and reasoning actions with incomplete knowledge about whether their current answer is uncertain, unsupported, or already complete. This produces two failure modes: committing to confident but unsupported answers, which hurts accuracy, and over-retrieving when the evidence in hand already suffices, resulting in wasted compute. To give agents a more complete picture of the state space they are operating in, we introduce calibrated verifier telemetry (CalVerT), which augments the agent's state with additional telemetry: a calibrated self-confidence score and a grounding verifier score. We show that CalVerT can improve agents in both training-free and training-based settings. On four QA benchmarks, we find that CalVerT raises F1 by triggering retrieval in cases where agents over-rely on parametric knowledge, while cutting redundant retrieval in cases where agents have sufficient context to answer. We show that CalVerT can augment existing QA frameworks without training. Moreover, CalVerT also improves trained systems: by simply augmenting an agent's state with telemetry, we observe improvements after reinforcement learning, as compared to an agent with identical training but no CalVerT telemetry.