Nemotron 3 Ultra：开放高效混合专家Mamba-Transformer智能体推理模型

2026-06-12 08:00·21天前

AI 摘要

Nemotron 3 Ultra 是一个 550B 总参数、55B 激活参数的混合专家（MoE）Mamba-Attention 语言模型。它在 20T tokens 上预训练，上下文窗口扩展至 1M tokens，后训练采用监督微调（SFT）、强化学习（RL）和多方教师在线蒸馏（MOPD）。关键技术包括 LatentMoE、多 token 预测（MTP）、NVFP4 预训练、多环境 RLVR、MOPD 和推理预算控制。相比公开 SOTA 大语言模型，推理吞吐量提升约 6 倍且准确率持平，适合长时间运行的自主智能体任务。模型开源基础、后训练和量化检查点，以及训练数据和配方。

原文 · 未翻译

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M tokens, and post-trained using Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD). Nemotron 3 Ultra is our most capable model yet, employing multiple key technologies - LatentMoE, Multi Token Prediction (MTP), NVFP4 pre-training, multi-environment RLVR, MOPD, and reasoning budget control. Nemotron 3 Ultra achieves up to ~6x higher inference throughput as compared to state-of-the-art publicly available LLMs while attaining on-par accuracy. The state-of-the-art accuracy, high inference throughput, and 1M token context length make Nemotron 3 Ultra ideal for long-running autonomous agentic tasks. We open-source the base, post-trained, and quantized checkpoints, along with the training data and recipe on HuggingFace.

HuggingFace Daily Papers（社区热门论文）

64导出 Markdown

Nemotron 3 Ultra：开放高效混合专家Mamba-Transformer智能体推理模型

2026-06-12 08:00·21天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译