OpenBMB@OpenBMB

2026-06-18 21:00·14天前

AI 摘要

由 OpenBMB、SGLang 和 NVIDIA 联合主办的 SOAR 2026 挑战赛结束，旨在单消费级 GPU 上最大化 MiniCPM-SALA（稀疏+线性混合注意力模型）推理性能。最终 326 支队伍注册，4300+ 次提交，69 队入围排行榜。冠军团队实现整体 6.33 倍加速，单请求推理峰值达 9.72 倍，方案结合 NVFP4 量化、FlashInfer plan-cache 优化、自定义 Triton 内核、EAGLE-3 推测解码及运行时感知调度。低比特量化、推测解码、稀疏注意力和阶段感知调度被视为下一代高效推理核心支柱。

SOAR 2026 has officially wrapped up！ 🎉

Hosted by @OpenBMB， @SGLang， and @NVIDIA， the challenge tasked developers worldwide with maximizing the inference performance of MiniCPM-SALA - our sparse+linear hybrid attention model - on a single consumer GPU.

On June 6， we brought the SOAR 2026 community together in Beijing for our final in-person Meetup. Developers， researchers， and open-source builders from @NVIDIA， @SGLang， and @OpenBMB gathered to share hard-won lessons from the frontlines of inference optimization. From Blackwell architecture tuning to SGLang-Omni and the Densing Law， it was a powerful reminder that inference efficiency is a full-stack， cross-community effort.☺️

Huge thanks to our co-hosts @SGLang and @NVIDIA for making this possible - and to every participant who submitted， iterated， and shared. 😘

Final Metrics： 📊 326 teams registered， 370 participants 📊 4，300+ total submissions 📊 69 teams on the final leaderboard

🏆 The winning team achieved an overall 6.33x speedup over baseline - peaking at 9.72x on single-request inference. Their solution combined： 🔹 NVFP4 quantization with hybrid GEMM dispatch 🔹 FlashInfer plan-cache optimization 🔹 Custom Triton kernels for GLA layers 🔹 EAGLE-3 speculative decoding with dynamic depth switching 🔹 Runtime-aware scheduling across different concurrency levels

Low-bit quantization， speculative decoding， sparse attention， and phase-aware scheduling are emerging as the core pillars of next-gen efficient inference. SOAR 2026 put that thesis to the test - and the community delivered.

The leaderboard is closed， but the optimizations， code， and conversations will live on in the open-source ecosystem. 🚀

🔗 MiniCPM-SALA： http://huggingface.co/openbmb/MiniCPM-SALA