# 在线蒸馏或成后训练持久方法

- 来源：Nathan Lambert (@natolambert)
- 发布时间：2026-05-19 07:00
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmpbu3rne1aa5slnz9cp8t9k6
- 原文链接：https://x.com/natolambert/status/2056510299579273447

## AI 摘要

在线蒸馏有望成为后训练中的持久方法。涉及领域包括：
指令微调（SFT/IFT）
RLHF
直接偏好优化（DPO等）
RLVR
在线蒸馏（OPD）
新方法类别实属罕见！期待参与实践。

## 正文

On-policy distillation is on track to be a lasting method in post-training. The list of areas would be：

Instruction tuning （SFT/IFT）
RLHF
Direct Preference Optimization （DPO et al）
RLVR
On-policy Distillation （OPD）

New classes of methods are rare！ Excited to play.