Nvidia在GTC台北大会押注物理AI：发布新世界模型、驾驶大脑及开源人形机器人

2026-06-01 21:26·31天前·Maximilian Schreiner

AI 摘要

Nvidia在GTC台北大会上发布了一系列面向机器人、自动驾驶和视频系统的模型。核心发布包括升级版世界模型Cosmos 3、显著扩展规模的驾驶模型Alpamayo 2 Super，以及一个开源的人形机器人参考平台。这些产品共同推进了其在物理AI领域的布局。

原文 · 未翻译

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to launch a series of models for robots, autonomous vehicles, and video systems. The centerpieces are the new world model Cosmos 3, a significantly scaled-up driving model called Alpamayo 2 Super, and an open reference platform for humanoid robots.

Cosmos 3 is Nvidia's next version of its open "omnimodel," which processes text, images, video, ambient audio, and action data in a single system. Developers building robots, autonomous vehicles, and video surveillance systems can use it to generate synthetic training data, interpret scenes, and predict future world states without having to painstakingly recreate those situations in the real world.

Nvidia names three use cases. As a vision-language model, Cosmos 3 analyzes video, for example to detect traffic anomalies in smart cities, as partner Linker Vision is already doing.

As a world model, it generates photorealistic video sequences of rare situations like near-misses or unusual object arrangements in a warehouse.

And as the basis for so-called world-action models, it produces numerical motion data like joint angles or gripper positions that robots use to learn tasks such as picking and placing, as industrial partner Agile Robots demonstrates.

The architecture uses a mixture-of-transformers approach: one reasoning transformer analyzes a scene, then a second generation transformer produces videos, descriptions, or motion trajectories from that analysis. Training data included billions of examples spanning text, images, video, audio, and action data. Nvidia offers three variants: Cosmos 3 Super delivers the best current quality, Nano is built for fast inference, and a forthcoming Edge model targets real-time operation on embedded systems. The models are available under the OpenMDW-1.1 license on Hugging Face and GitHub.

The release comes alongside the "Cosmos Coalition," a partner group that includes Black Forest Labs, Runway, LTX, Generalist, Agile Robots, and Skild AI. In practice, it's an alliance that uses Nvidia's DGX Cloud training infrastructure and contributes models and data in return.

The Decoder：AI News（RSS）

69导出 Markdown

Nvidia在GTC台北大会押注物理AI：发布新世界模型、驾驶大脑及开源人形机器人

2026-06-01 21:26·31天前·Maximilian Schreiner

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot