# 幻觉起始的最快检测：延迟界与学习型CUSUM统计量

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-10 14:10
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmqfd53o400dpsl2aln533smm
- 原文链接：https://arxiv.org/abs/2606.12476

## AI 摘要

将模型幻觉起始检测建模为最快变化点检测问题。基于RAGTruth验证的忠实/幻觉状态一阶马尔可夫模型，在虚警率0.01时Lorden下界约1.3个token。因果循环标注器相当于学习增量的CUSUM，在匹配虚警率下检测延迟11–13个token，而线性每token基线为31个token。优势主要来自更优的每token分数而非时间累积。Donsker-Varadhan型信息率最优性定理表明，学习得分仅实现了特征所携带散度的1/4.5，标定无法弥补该差距，剩余为有限时域效应。分类指标掩盖了延迟结构，序列分析使其可测量。

## 正文

Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment; at a matched false-alarm rate it detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable