# Ethan He论视频生成未来

- 来源：swyx (@swyx)
- 发布时间：2026-06-02 00:07
- AIHOT 分数：71
- AIHOT 链接：https://aihot.virxact.com/items/cmpvfatkp06d1sl0z5zcbu7hu
- 原文链接：https://x.com/swyx/status/2061479719980425437

## AI 摘要

前xAI世界模型负责人Ethan He在播客中分享了对Grok Imagine及视频生成未来的看法。他指出，视频模型的智能主要来自LLM，而非单纯扩大视频数据规模，因此正从视频生成转向LLM领域。他认为，视频生成的下一个前沿是训练用于编排视频模型的**视频Agent模型**。AI视频的发展将类似编程Agent路径，当前文本到视频仅是“自动补全”阶段。未来，世界模型将变得实时交互，语言模型或成为视频的控制层。

## 正文

This pod was an incredible gift to the community：

not only our first pod about @xAI， but Ethan really indulged on all our questions on how to train a SOTA Videogen world model， including specific areas （consistent extending/editing， voice） that Grok @Imagine is *still* SOTA，

on top of the factual overviews he ALSO came loaded with opinions/predictions：

- why he's quitting Videogen for LLMs： video models get most of their intelligence from LLMs， not from scaling video data
- why the next frontier for videogen also happens to be video agent models - agentic models trained to orchestrate video models
- why deterministic compression （like MP4） is a useless target vs VAE compression
- Videomaxxing： if you truly believe in the "Moore's law" of AI/genmedia， then video models become the final boss UI of everything， like Flipbook （below）

### 引用推文

> Latent.Space：🆕Grok Imagine's Video Agent Moment: Cosmos, xAI, World Models, Generative UI, & the Codex Phase for Video! https://www.latent.space/p/video-agents @EthanHe_42,...