字节跳动开源多模态模型BAGEL,单一7B参数模型即可执行图像生成、编辑、风格转换与视觉理解,采用Apache 2.0许可。引用推文显示,该公司此前已发布首个去中心化训练的视频生成模型Paris 2.0,其在FVD基准上性能约为同等数据与算力单体模型的2倍。
ByteDance just open-sourced one of the most capable multimodal models out there.
BAGEL does image generation, editing, style transfer, and visual understanding - all in a single 7B parameter model. Apache 2.0 licensed!
One model. No switching between specialized tools. Amazing