Google拥有唯一真正的全模态模型,但各元素尚未连接。它似乎能接收和输出音频、图像、视频、歌曲、文本、代码等。但目前每种输出类型都是分离的。当你能直接访问模型,混合模式,很多事情就变得可能了。
Google has the only true Omni model, but the elements aren't hooked up. It appears it can take in &; output audio, images. video, songs, text, code, etc.
But right now each type of output is separate. When you can access the model directly, blending modes, a lot becomes possible.