微软发布了 MAI-Thinking-1,这是一款采用 MoE 架构的模型,拥有 35B 活跃参数和 1T 总参数。该模型从零开始在 30T tokens 上完成预训练,且未使用第三方模型蒸馏。微软称其迭代优化流程为“爬山机器”。在基准测试中,该模型于 AIME 2025 获得 97.0%,在 LiveCodeBench v6 获得 87.7%,在 SWE-Bench Pro 获得 52.8% 的成绩。
Microsoft unveiled MAI-Thinking-1.
So Microsoft now has a full in-house pipeline for building stronger reasoning models again and again.
Microsoft calls this system a "hill-climbing machine," meaning it keeps improving the data, training setup, rewards, safety tests, and evaluations as one connected process.
Strong for its size, including 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro.
MAI-Thinking-1 is the first model from that process, using 35B active parameters inside a 1T total parameter mixture-of-experts model, where only part of the model runs for each token.
The base model was trained from scratch on 30T mostly human-generated tokens, with Microsoft saying it avoided third-party model distillation during pre-training.
After that, the team used reinforcement learning, which means the model practiced tasks and improved from feedback, to teach math reasoning, coding, tool use, helpfulness, and safety.