# 信任函数：通过学习何时信任弱教师实现近乎无损的弱到强泛化

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-31 08:00
- AIHOT 分数：48
- AIHOT 链接：https://aihot.virxact.com/items/cmq7asy4q01easl5wjh7q4pax
- 原文链接：https://arxiv.org/abs/2606.01000

## AI 摘要

弱到强泛化研究如何利用较弱教师的监督信号来提升强学生模型，核心挑战在于筛选出足够可靠的弱标签。信任函数为每个弱标签分配一个标量信任分数，并据此过滤弱监督信号。在世界知识、定量推理和策略游戏等多个领域，信任过滤训练出的学生模型能够匹配甚至超越基于真实标签监督的模型，实现近乎无损的弱到强泛化。此外，信任函数支持迭代式弱到强链——将训练好的学生作为下一轮教师，叠加增益效果。其优势可归因于多种机制。

## 正文

Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.
