Google DeepMind 发布世界模型 Genie 3,支持从文本生成交互式世界,实现 720p 分辨率下 24fps 实时交互与数分钟一致性。作者认为这代表"游戏引擎 2.0"——未来 UE5 等复杂引擎的层级结构将被数据驱动的注意力权重取代,直接根据手柄输入生成像素时空块。
This is game engine 2.0. Some day, all the complexity of UE5 will be absorbed by a data-driven blob of attention weights. Those weights take as input game controller commands and directly animate a spacetime chunk of pixels.
Agrim and I were close friends and coauthors back at Stanford Vision Lab. So great to see him at the frontier of such cool research! Congrats!