# NVIDIA发布Nemotron 3 Nano Omni，专为智能体感知层设计

- 来源：Chubby♨️ (@kimmonismus)
- 发布时间：2026-04-29 02:02
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmoixv4r50083slngl0v75n14
- 原文链接：https://x.com/kimmonismus/status/2049187410672746678

## AI 摘要

NVIDIA推出Nemotron 3 Nano Omni模型，其定位并非通用聊天机器人，而是作为智能体系统中的轻量级感知模块。该模型采用30B-3B混合专家架构，在处理视觉、音频和文本多模态输入时，吞吐量可比同类开源全模态模型提升高达9倍。它旨在充当多智能体栈中的“眼睛和耳朵”，负责感知屏幕、文档和音频等信息，并将结构化上下文传递给如Nemotron Super（执行）和Ultra（规划）等推理层，从而优化大规模、高频率调用的智能体工作流。模型完全开源，现已登陆Hugging Face。

## 正文

NVIDIA just launched Nemotron 3 Nano Omni. Not the first omni-model， but built for a different job. And that makes it really interesting：

Models like ChatGPT and Gemini already handle vision， audio， and text. What they're not optimized for is running as a lightweight perception sub-agent inside agentic systems， where the model gets called hundreds of times in a loop.

That's the gap Nemotron 3 Nano Omni fills. A 30B-A3B mixture-of-experts architecture that delivers up to 9x higher throughput than comparable open omni models. Not smarter but faster and cheaper at scale.

The design logic： It acts as the "eyes and ears" in a multi-agent stack， paired with Nemotron Super for execution and Ultra for planning. One model sees screens， reads documents， hears audio. Then hands structured context to the reasoning layer.

Fully open weights， datasets， and training recipes. Runs from workstation to cloud. Available today on Hugging Face.

NVIDIA isn't trying to build the best chatbot. They're building the infrastructure layer for agentic AI， and this is the perception module.

Btw. the blogpost was written by Kari Briski. I already talked to her about NVIDIA Nemotron at the GTC ：） Check it out in the comments.