BlockPilot：基于实例自适应策略学习的扩散投机解码方法

2026-06-30 08:00·2天前

AI 摘要

针对扩散投机解码中固定推理块大小且假设最优解码策略对所有输入统一的问题，BlockPilot 提出一种样本自适应策略，利用预填充层的表示首次预测每个样本的最优块大小，将选择问题转化为低维结构化决策空间的轻量策略学习。该方法即插即用、开销极低，在 Qwen3-4B 模型、温度 T=1 条件下，实现接受长度 5.92 和 4.20 倍加速，无需牺牲生成质量。

原文 · 未翻译

Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based speculative decoding further improves parallelism by generating multiple tokens per forward pass via block-level diffusion, achieving state-of-the-art (SOTA) performance. However, existing methods adopt a fixed inference block size and assume a uniform optimal decoding strategy across all inputs. In this paper, we show that this assumption is suboptimal, as the optimal block size varies across samples and plays a critical role in speculative decoding performance. Moreover, these values exhibit a clear local structure, concentrating around the training block size, which reduces the problem to a low-dimensional and structured decision space. Based on these insights, we propose BlockPilot, a sample-adaptive policy that predicts the optimal block size from the prefilling representation. Specifically, we formulate block size selection as a lightweight policy learning problem and propose an instance-adaptive decision mechanism that predicts the optimal block size based on the representation of the prefilling stage. The prediction is performed only once after prefilling, allowing for seamless integration. Extensive experiments demonstrate that our method is plug-and-play, introduces minimal overhead, and consistently improves efficiency, achieving an acceptance length of 5.92 and a 4.20times speedup on Qwen3-4B under temperature T=1.

HuggingFace Daily Papers（社区热门论文）

42导出 Markdown

BlockPilot：基于实例自适应策略学习的扩散投机解码方法

2026-06-30 08:00·2天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译