Deepseek发布DSpark推理框架,AI响应速度最高提升85%
阅读原文· the-decoder.comDeepseek推出DSpark推理框架,采用推测解码技术,由小模型生成候选答案、大模型批量验证,并一次生成多个token而非单个,使每用户响应速度提升60–85%。系统基于置信度动态调整验证深度,减少无效计算。DSpark与Deepseek-V4-Pro模型(与北京大学联合开发)已在HuggingFace和GitHub以MIT许可证开源。高效推理降低对高端芯片需求,有助于中国与欧盟在芯片受限下获取更多AI性能,短期构成战略优势。
Deepseek's DSpark boosts AI speed by up to 85 percent, a strategic win under tightening US export controls
Deepseek has released DSpark, a new method that boosts per-user response speed for its AI models by 60 to 85 percent, according to the company.
Most LLMs generate text one word at a time. That leads to low GPU utilization and long wait times for lengthy responses, Deepseek says. Its new framework, DSpark, uses speculative decoding, where a small, lightweight model proposes answer candidates that the larger model then checks in batches. It also generates small word groups instead of single tokens, boosting overall efficiency. A confidence-based system adjusts verification depth on the fly depending on compute load, cutting wasted processing on rejected token proposals.

Deepseek also tested DSpark with open models from Google DeepMind (Gemma) and Alibaba (Qwen), suggesting the approach works broadly. The framework and Deepseek-V4-Pro model, developed jointly with Peking University, are available on Hugging Face and GitHub under the MIT license. Technical details are in the paper.

Less chip pressure or faster scaling
This release matters strategically for China. Faster inference lowers chip requirements and cuts infrastructure costs. That's good news for China and potentially for the EU, both of which trail the US in data center buildout and high-performance chips.
But the Jevons paradox could kick in. More efficient inference does reduce chip demand per query. Yet the freed-up compute will likely get absorbed immediately by more AI requests, longer contexts, or new applications. Total chip demand could stay flat or even grow. Deepseek itself says that DSpark "enables performance tiers that were previously unattainable, shifting the Pareto frontier of our serving system."
Still, in the short term, these efficiency gains help China and the EU. They can squeeze more AI performance out of fewer high-end chips. Given tight chip supply and US export restrictions, that's a strategic advantage, reducing the US's ability to use chips as a geopolitical lever.