# 自我评估已然存在：用极少数据激发基座大模型的潜在评判校准能力

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-03 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmq687vre062gsl5itfjsn4yu
- 原文链接：https://arxiv.org/abs/2606.05122

## AI 摘要

研究发现，基座大语言模型未经针对性训练，仅凭少量样本提示即可预测外部评判者的多属性质量分数，效果显著高于随机。Self-Evaluation Elicitation（SEE）方法分两阶段激发该能力：先通过校准耦合的强化学习改进答案并预测评判者，再以掩码蒸馏精炼预测而不改动答案。仅用160个示例（比强化学习基线少约31倍），SEE就在三个基准上提升留出校准并保持答案质量。该自我评估集中在模型自身的token分布，对未训练过的评判者表现稳定，表明其捕捉的是可迁移的质量概念而非单一评判者偏好。

## 正文

Large language models are increasingly evaluated by other models, raising a natural question: can a model predict how a judge will score its own output? We find that the ability is largely present before any targeted training: prompted few-shot, a base model already predicts an external judge's multi-attribute quality scores on open-ended responses well above chance across three benchmarks. We introduce Self-Evaluation Elicitation (SEE), a method that surfaces this latent ability through a short cycle comprising a calibration-coupled reinforcement learning phase that improves the answer and predicts the judge, followed by a masked distillation phase that sharpens the prediction while leaving the answer untouched. From 160 unique examples, roughly 31x fewer than a reinforcement learning baseline, SEE improves held-out calibration across three benchmarks while preserving answer quality. The elicited self-evaluation is sharply localized within the model's own token distribution and stable across judges it was never trained against, indicating a transferable notion of quality rather than a single judge's preference. These results reframe judge-aligned self-evaluation as a problem of elicitation rather than acquisition.
