AI 摘要
Google 通过 Gemini Omni API 发布 gemini-skills 技能包,支持视频编辑、文生视频、图片参考视频生成、首帧生成视频,并提供预处理输入视频为 10 秒 720p、音频剥离、视频检查等辅助工具。同作者展示 Omni Flash 模型编辑能力:输入“将桌子改成浅水池”,模型输出湿手、水波、折射、阴影及音效。该 API 已开放,可用于构建视频编辑流水线。
You can bootstrap your agent quickly with the Omni API using the skill we published:
https://github.com/google-gemini/gemini-skills
It includes:
- video editing
- text to video
- video generation with image references
- first frame to video
But it also has some helper tools for:
- prepping input videos for editing (10s, 720p)
- audio stripping if you want to generate new audio
- video inspection
Omni Flash is a smart model. The way the hand is wet, the water ripples, the refraction, the shadows, the sound effects 🤯 > Change the table to be a shallow po...