swyx@swyx

2026-06-02 00:07·31天前

AI 摘要

前xAI世界模型负责人Ethan He在播客中分享了对Grok Imagine及视频生成未来的看法。他指出，视频模型的智能主要来自LLM，而非单纯扩大视频数据规模，因此正从视频生成转向LLM领域。他认为，视频生成的下一个前沿是训练用于编排视频模型的视频Agent模型。AI视频的发展将类似编程Agent路径，当前文本到视频仅是“自动补全”阶段。未来，世界模型将变得实时交互，语言模型或成为视频的控制层。

This pod was an incredible gift to the community：

not only our first pod about @xAI， but Ethan really indulged on all our questions on how to train a SOTA Videogen world model， including specific areas （consistent extending/editing， voice） that Grok @Imagine is *still* SOTA，

on top of the factual overviews he ALSO came loaded with opinions/predictions：

why he's quitting Videogen for LLMs： video models get most of their intelligence from LLMs， not from scaling video data
why the next frontier for videogen also happens to be video agent models - agentic models trained to orchestrate video models
why deterministic compression （like MP4） is a useless target vs VAE compression
Videomaxxing： if you truly believe in the "Moore's law" of AI/genmedia， then video models become the final boss UI of everything， like Flipbook （below）

Latent.Space🆕Grok Imagine's Video Agent Moment: Cosmos, xAI, World Models, Generative UI, & the Codex Phase for Video! https://www.latent.space/p/video-agents @EthanHe_42,...

智能体 xAI 大佬观点视频

在 X 查看原推导出 Markdown

swyx@swyx · X

71导出 Markdown

2026-06-02 00:07·31天前

在 X 看原推· x.com

AI 摘要

This pod was an incredible gift to the community：

on top of the factual overviews he ALSO came loaded with opinions/predictions：

why he's quitting Videogen for LLMs： video models get most of their intelligence from LLMs， not from scaling video data
why the next frontier for videogen also happens to be video agent models - agentic models trained to orchestrate video models
why deterministic compression （like MP4） is a useless target vs VAE compression