确定性幻觉：解耦在线策略蒸馏中的能力与校准

2026-04-18 08:00·76天前

AI 摘要

研究发现在线策略蒸馏（OPD）在提升任务准确率的同时，会系统性导致模型过度自信，形成"错误校准的缩放定律"。该问题源于训练时教师模型的特权上下文与部署时信息的不匹配。为此提出 CaOPD 框架，通过模型 rollout 估计经验置信度，并以学生基础目标替代自报告置信度进行蒸馏。实验表明，CaOPD 在保持模型竞争力的同时实现帕累托最优校准，且在分布外和持续学习场景下稳健泛化。

原文 · 未翻译

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: teacher supervision is formed under privileged context available during training, whereas the deployed model must report confidence using only deployment-time information. We formalize this perspective theoretically, showing that teacher-conditioned success is generally not a valid target for deployment-time confidence and that helpful privileged context induces entropy collapse and a systematic optimism bias. To address this, we propose a calibration-aware OPD framework, CaOPD, that estimates empirical confidence from model rollouts, replaces self-reported confidence with this student-grounded target, and distills the revised response through the same self-distillation pipeline. Experiments across various models and domains show that CaOPD achieves Pareto-optimal calibration while maintaining competitive capability, generalizing robustly under out-of-distribution and continual learning. Our findings highlight that capability distillation does not imply calibrated confidence, and that confidence should be treated as an essential objective in post-training. Code: https://github.com/SalesforceAIResearch/CaOPD

HuggingFace Daily Papers（社区热门论文）

导出 Markdown

确定性幻觉：解耦在线策略蒸馏中的能力与校准

2026-04-18 08:00·76天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译