# Nemotron 3 Ultra：开放高效混合专家Mamba-Transformer智能体推理模型

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-12 08:00
- AIHOT 分数：64
- AIHOT 链接：https://aihot.virxact.com/items/cmqg55i9n0005sl451g3nyzmp
- 原文链接：https://arxiv.org/abs/2606.15007

## AI 摘要

Nemotron 3 Ultra 是一个 550B 总参数、55B 激活参数的混合专家（MoE）Mamba-Attention 语言模型。它在 20T tokens 上预训练，上下文窗口扩展至 1M tokens，后训练采用监督微调（SFT）、强化学习（RL）和多方教师在线蒸馏（MOPD）。关键技术包括 LatentMoE、多 token 预测（MTP）、NVFP4 预训练、多环境 RLVR、MOPD 和推理预算控制。相比公开 SOTA 大语言模型，推理吞吐量提升约 6 倍且准确率持平，适合长时间运行的自主智能体任务。模型开源基础、后训练和量化检查点，以及训练数据和配方。

## 正文

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M tokens, and post-trained using Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD). Nemotron 3 Ultra is our most capable model yet, employing multiple key technologies - LatentMoE, Multi Token Prediction (MTP), NVFP4 pre-training, multi-environment RLVR, MOPD, and reasoning budget control. Nemotron 3 Ultra achieves up to ~6x higher inference throughput as compared to state-of-the-art publicly available LLMs while attaining on-par accuracy. The state-of-the-art accuracy, high inference throughput, and 1M token context length make Nemotron 3 Ultra ideal for long-running autonomous agentic tasks. We open-source the base, post-trained, and quantized checkpoints, along with the training data and recipe on HuggingFace.