# Draft-OPD： 投机草稿模型的在线策略蒸馏

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：56
- AIHOT 链接：https://aihot.virxact.com/items/cmpw3ayog03zjsluk8f6ht7mh
- 原文链接：https://arxiv.org/abs/2605.29343

## AI 摘要

本文提出Draft-OPD，一种用于改进投机解码中草稿模型的在线策略蒸馏方法。针对现有监督微调方法（如EAGLE3、DFlash）存在的离线数据与推理状态不匹配问题，Draft-OPD采用目标模型辅助的序列展开，从验证步骤暴露的错误位置进行重放学习。这使草稿模型能从目标模型对其提议的接受和拒绝反馈中优化。实验表明，该方法对各类思考模型实现了超过5倍的无损加速，相比EAGLE-3和DFlash分别取得了23%和13%的性能提升。

## 正文

Speculative decoding accelerates large language model inference by pairing a target model with a lightweight draft model whose proposed tokens are verified in parallel. A common way to build draft models, like EAGLE3 or DFlash is supervised fine-tuning (SFT) on target-generated trajectories. However, we observe that SFT quickly plateaus: the draft model's acceptance length on test data stops improving. The reason is an offline-to-inference mismatch: In SFT, the drafter learns from fixed target-generated trajectories, whereas during speculative decoding it is evaluated on blocks proposed under its own policy. This motivates on-policy distillation (OPD), where the target model supervises the drafter on draft-induced states. Yet OPD remains difficult for draft models, as they cannot reliably roll out complete sequences independently, whereas target-assisted generation makes the collected sequences follow the target distribution and thus eliminates the on-policy signal. We therefore propose Draft-OPD, which uses target-assisted rollout for stable continuations and replays drafting from the verification-exposed error positions. This allows the drafter to learn from target feedback on both accepted and rejected proposals, focusing training on the draft-induced errors that limit speculative acceptance. Experiments show that Draft-OPD achieves over 5times lossless acceleration for thinking models across diverse tasks, improving over EAGLE-3 and DFlash by 23\% and 13\%.