# MiniMax发布M3稀疏注意力，1M上下文下解码加速15.6倍

- 来源：Chubby♨️ (@kimmonismus)
- 发布时间：2026-05-26 23:54
- AIHOT 分数：70
- AIHOT 链接：https://aihot.virxact.com/items/cmpmtpkeb0rsxsl0158toildu
- 原文链接：https://x.com/kimmonismus/status/2059302121489486335

## AI 摘要

MiniMax预览了其M3架构采用的新稀疏注意力（Sparse Attention）技术。测试显示，在1M token上下文下，该技术相比M2实现了9.7倍的预填充（prefilling）加速和15.6倍的解码（decoding）加速。M2曾为保证生产环境就绪而采用全注意力机制，M3则采用了新的两阶段方法：先用轻量级索引分支选择数据块，再仅对相关的KV块执行稀疏注意力。这是开源领域的新进展。

## 正文

MiniMax just teased their Sparse Attention architecture for M3. The benchmarks show 9.7x prefilling speedup and 15.6x decoding speedup at 1M tokens vs M2.

MiniMax deliberately went back to full attention for M2 because efficient attention wasn't production-ready. Their pretrain lead wrote a whole blog post about it in March. Now they're showing a new two-stage approach， lightweight index branch for block selection， then sparse attention only on relevant KV blocks.

Really interesting. And tbh I'm always happy when open source receives new wins.

### 引用推文

> MiniMax (official)：#MSA #OpenSource #M3 🫣😎
