阿里巴巴 HappyHorse 1.1 在 Artificial Analysis 文生视频和图生视频排行榜位列第二,仅次于字节跳动 Seedance 2.0。该模型基于统一 Transformer 架构,是 1.0 的改进版,重点提升音画同步,支持七种语言的原生音频与唇形同步对话,并在运动、角色和场景一致性上增强。支持最多 9 张参考图像,生成 720p 和 1080p。图生视频带音频模态从第 5 名升至第 2 名。定价 $9.90/分钟(1080p),已在阿里云 Model Studio、Qwen Cloud 和 fal 上线。
Alibaba's HappyHorse 1.1 lands at #2 on the Artificial Analysis Text to Video and Image to Video leaderboards, behind only ByteDance's Seedance 2.0!
HappyHorse 1.1 is the latest version of Alibaba's video generation model, a refinement of 1.0 on the same unified transformer architecture. Alibaba positions the upgrade around stronger audio-visual sync, including native audio with better lip-synced dialogue in seven languages, alongside gains in motion, character, and scene consistency. It supports up to nine reference images and generates at 720p and 1080p.