新的MTP技术通过提前草拟多个令牌并一次完成验证,使Qwen 3.6模型在Atomic Chat中的运行速度提升高达2.5倍。该技术对Dense模型(如Qwen 3.6 27B)加速显著,速度从51提升至117 tokens/s;而对MoE模型(如Qwen 3.6 35B-A3B)提升相对较小(25%)。MTP实现了约80%的草稿接受率,无精度损失,仅需额外约1GB显存。用户可通过开源的Atomic Chat应用在本地测试该模型。
Qwen 3.6 models are now 2.5x times faster on Atomic Chat with new MTP speedups.
MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on the memory moved per pass.
Users can run Qwen 3.6 models locally via the open-source Atomic Chat to test them!