MiniMax (official)@MiniMax_AI

2026-06-03 06:53·30天前

AI 摘要

MiniMax M3模型通过Live Session分享了核心信息。其MSA技术采用块级Top-K选择，保持真实、未压缩的KV缓存，使1M token上下文窗口高效运行。该技术将长上下文生成的注意力内核解码时间从约30%降至约5%，效率提升显著。M3是原生多模态模型，支持图像视频输入，可处理长程智能体任务及桌面操作，并具备视觉自评估迭代能力。模型在金融任务中展现出初级分析师水平。未来版本将聚焦更复杂的长程任务，并扩展金融、法律与生物领域。Together AI为其提供推理服务。

We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun

A few highlights 🧵 1. MSA （MiniMax Sparse Attention） is the star ⭐️. Unlike CSA/HCA， which compress the KV cache， MSA keeps the real， uncompressed KV and does block-level selection with a small top-K. That's how the 1M context window stays tractable.

The efficiency win is huge. In our previous generation， ~30% of per-decode wall-clock time went to the attention kernel. With MSA that now drops to ~5%. Big gains for long-context generation.

M3 isn't just a coding model. Natively multimodal （image + video in）， ability to handle long-horizon agentic tasks， and even operate a desktop computer. People are already throwing game-dev + Minecraft-style builds at it （Unity included） and it's holding its own.

M3 can self-evaluate on vision-coding tasks： it builds a website or SVG， browses and inspects its own rendered output， judges it， and iterates - grading work visually.

We're also seeing junior-analyst-level performance on finance tasks； something we haven't even showcased publicly yet.

What's next： harder long-horizon / multi-file tasks in future releases， scaling data + post-training （RL） compute toward pre-training scale， and going deeper into finance， legal & bio.

Thanks to everyone who joined 🙏

Try M3 link in the comments👇

Together AIMiniMax M3 is live and Together AI is powering its inference 🚀 Tomorrow at 6pm PT we're going live on X Spaces with the teams behind the model and the infrastr...

多模态推理模型发布

MiniMax (official)@MiniMax_AI · X

74导出 Markdown

2026-06-03 06:53·30天前

在 X 看原推· x.com

AI 摘要

We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun

The efficiency win is huge. In our previous generation， ~30% of per-decode wall-clock time went to the attention kernel. With MSA that now drops to ~5%. Big gains for long-context generation.

M3 isn't just a coding model. Natively multimodal （image + video in）， ability to handle long-horizon agentic tasks， and even operate a desktop computer. People are already throwing game-dev + Minecraft-style builds at it （Unity included） and it's holding its own.