# 使用块扩散草稿树加速推测解码

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-14 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnzr169400hcslwztd65hwhn
- 原文链接：https://arxiv.org/abs/2604.12989

## AI 摘要

研究团队提出 DDTree（Diffusion Draft Tree）方法，突破 DFlash 每轮仅验证单条轨迹的局限，直接从块扩散草稿模型的逐位置分布构建草稿树。在固定节点预算下，该方法通过 best-first 堆算法筛选最可能匹配目标模型的序列，并利用祖先注意力掩码实现单次前向传播验证。基于当前领先的 DFlash 模型，DDTree 将推测解码性能提升至领域前沿水平。

## 正文

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art speculative decoding performance, outperforming strong autoregressive drafters such as EAGLE-3. Vanilla DFlash, however, still verifies only a single drafted trajectory per round, potentially limiting its acceptance length. We introduce DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per-position distributions of a block diffusion drafter. Under a fixed node budget, DDTree uses a simple best-first heap algorithm to select the continuations that are most likely to match the target model according to a surrogate defined by the draft model's output. The resulting tree is verified efficiently in a single target model forward pass using an ancestor-only attention mask. Because DDTree builds on DFlash, a leading draft model for speculative decoding, these gains place DDTree among the leading approaches to speculative decoding.