DSpark vs. JetSpec， which is better？

Authors： @Lanxiang_Hu @aaronzhfeng @YuYangQian_ai @Jensen_Yuan @haozhangml

TL；DR：

Speculative decoding （SD） techniques have proliferated recently. SD accelerates autoregressive generation by letting a lightweight draft model propose future tokens， while the target model verifies them in parallel.

Among recent efforts， DSpark and JetSpec emerged almost concurrently around the same bottleneck： once drafting becomes cheap， how do we preserve enough causal consistency for parallel proposals to survive verification？

This naturally raises the question： which one is better？ Or， more interestingly， are they actually complementary？

The fact that both works converge in this direction suggests that causality is becoming a central lever for next-generation speculative decoding. They approach it from complementary sides of the throughput-latency frontier. DSpark targets high-concurrency serving： on Qwen3-8B and AIME25， DSpark improves accepted length from 4.07 （DFlash） to 5.01 at budget 7 with causal recurrent state for confidence-scheduled verification. JetSpec targets the latency-oriented， compute-budget-rich regime： by building causality directly into the parallel draft head， it turns larger draft budgets into longer accepted prefixes， on the same settings， scaling accepted length from 7.23 at budget 16 to 9.82 at budget 128， up from DFlash's 7.34 （DDTree's 8.66） at budget 128， for low latency generation.

Causality in DSpark and JetSpec

Traditional drafters like the EAGLE series often preserve draft quality through autoregressive generation， but this makes longer drafts require more sequential draft steps. DFlash changes the cost structure： by using a lightweight block-parallel drafter to predict many future positions in one pass， it opens the door to making draft cost cheap.

Hao AI Lab@haoailab · X

51导出 Markdown

2026-07-02 08:10·23小时前

在 X 看原推· x.com

AI 摘要

DSpark 与 JetSpec 几乎同时出现，都解决轻量级草稿模型并行提案时的因果一致性问题。DSpark 面向高并发，通过轻量级马尔可夫校正头与置信度估计控制预算，在 Qwen3-8B 与 AIME25 上，预算 7 时将接受长度从 DFlash 的 4.07 提升至 5.01。JetSpec 面向低延迟，将因果性直接构建进并行草稿头，预算 16 时接受长度 7.23，预算 128 时达 9.82，高于 DFlash 的 7.34 与 DDTree 的 8.66。两者分别从吞吐与延迟侧优化因果性。

http://x.com/i/article/2072448547069599744

DSpark vs. JetSpec， which is better？

Authors： @Lanxiang_Hu @aaronzhfeng @YuYangQian_ai @Jensen_Yuan @haozhangml

TL；DR：