使用块扩散草稿树加速推测解码

2026-04-14 08:00·80天前

AI 摘要

研究团队提出 DDTree（Diffusion Draft Tree）方法，突破 DFlash 每轮仅验证单条轨迹的局限，直接从块扩散草稿模型的逐位置分布构建草稿树。在固定节点预算下，该方法通过 best-first 堆算法筛选最可能匹配目标模型的序列，并利用祖先注意力掩码实现单次前向传播验证。基于当前领先的 DFlash 模型，DDTree 将推测解码性能提升至领域前沿水平。

原文 · 未翻译

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art speculative decoding performance, outperforming strong autoregressive drafters such as EAGLE-3. Vanilla DFlash, however, still verifies only a single drafted trajectory per round, potentially limiting its acceptance length. We introduce DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per-position distributions of a block diffusion drafter. Under a fixed node budget, DDTree uses a simple best-first heap algorithm to select the continuations that are most likely to match the target model according to a surrogate defined by the draft model's output. The resulting tree is verified efficiently in a single target model forward pass using an ancestor-only attention mask. Because DDTree builds on DFlash, a leading draft model for speculative decoding, these gains place DDTree among the leading approaches to speculative decoding.

HuggingFace Daily Papers（社区热门论文）

导出 Markdown

使用块扩散草稿树加速推测解码

2026-04-14 08:00·80天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译