SpikingBrain2.0：面向高效长上下文与跨平台推理的类脑基础模型

2026-04-27 12:00·55天前·Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li

精选理由

这篇论文把脉冲神经网络和稀疏注意力结合，实测在400万上下文下推理加速10倍，还能跑在神经形态芯片上。做模型压缩或边缘部署的同行，值得看看这个脑启发架构的工程实现。

AI 摘要

SpikingBrain2.0（SpB2.0）是一个5B参数的类脑基础模型，在架构和训练效率上取得突破。其核心创新是双空间稀疏注意力机制，融合稀疏Softmax与线性注意力，优化长上下文建模的效能平衡；同时支持INT8脉冲编码与FP8量化双路径，分别适配事件驱动计算与GPU推理。该模型仅用不足7k A100 GPU小时即恢复基础Transformer大部分能力，在4M上下文长度下实现10.13倍的首次令牌生成加速，并支持超过1000万令牌的长序列。实验表明，其FP8 GPU推理可提速2.52倍，神经形态执行则实现高稀疏度，显著降低面积与功耗，为资源受限场景提供了轻量级多模态脉冲基础模型的可行路径。

原文 · 未翻译

Computer Science > Machine Learning

[Submitted on 24 Apr 2026]

Title:SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference

Authors:Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li

View PDF HTML (experimental)

Abstract:Scaling context length is reshaping large-model development, yet full-attention Transformers suffer from prohibitive computation and inference bottlenecks at long sequences. A key challenge is to design foundation models that maintain performance and long-context efficiency with minimal training overhead. We introduce SpikingBrain2.0 (SpB2.0), a 5B model that advances both architecture and training efficiency of its predecessor.
Our contributions are two-fold. (1) Architectural Innovation: We propose Dual-Space Sparse Attention (DSSA), an inter-layer hybrid of Sparse Softmax Attention (MoBA) and Sparse Linear Attention (SSE), achieving an improved performance-efficiency trade-off for long-context modeling. SpB2.0 further supports dual quantization paths: INT8-Spiking coding enables sparse event-driven computation, while FP8 coding accelerates inference on modern GPUs. (2) Enhanced Training Strategy: We develop an optimized Transformer-to-Hybrid (T2H) pipeline with dual conversion paths for LLMs and VLMs using curated open-source data.
Empirically, SpB2.0-5B and SpB2.0-VL-5B recover most of the base Transformer (Qwen3-4B) capability with under 7k A100 GPU hours. SpB2.0 achieves a 10.13x TTFT speedup at 4M context and supports over 10M tokens on 8 A100 GPUs under vLLM, where full-attention models exceed memory limits. It also demonstrates strong cross-platform compatibility, enabling FP8 GPU inference (2.52x speedup at 250k) and efficient neuromorphic execution (64.31% sparsity, with 70.6% and 46.5% area and power reduction at 500MHz).
Overall, SpikingBrain2.0 provides a practical pathway for lightweight, multimodal, spiking foundation models, highlighting the potential of combining brain-inspired mechanisms with efficient architectures for resource-constrained and edge scenarios.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2604.22575 [cs.LG]
	(or arXiv:2604.22575v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2604.22575 arXiv-issued DOI via DataCite

Submission history

From: Yuqi Pan [view email]
[v1] Fri, 24 Apr 2026 14:07:54 UTC (2,052 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2026-04

Change to browse by:

References & Citations

Bookmark

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)

Connected Papers (What is Connected Papers?)

Litmaps (What is Litmaps?)

scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub (What is DagsHub?)

Gotit.pub (What is GotitPub?)

Hugging Face (What is Huggingface?)

ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)

Hugging Face Spaces (What is Spaces?)

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)

CORE Recommender (What is CORE?)

IArxiv Recommender (What is IArxiv?)

Author
Venue
Institution
Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

端侧论文/研究部署/工程

arXiv：cs.LG（机器学习，全量分类）

精选71