社区开发者基于 VoxCPM2 与 ComfyUI 构建了 VoiceGate,实现自动语音提取(ASR)、大语言模型翻译、多语言语音合成(支持 30 多种语言和 9 种方言,含语音克隆与音色设计)、时间戳对齐音频以及背景音分离混音。核心创新 VoiceBridge 插件首次在 ComfyUI 中引入 SRT 时间戳驱动的 TTS 对齐,实现字幕级精细控制,解决 AI 配音音视频不同步问题。应用包括中文视频转英/日/韩等多语言,以及全球视频转中文及方言。
A developer in our community recently built VoiceGate using VoxCPM2 + ComfyUI for cross-lingual video dubbing and localization.💥
You can upload a video, and it automatically: 🎬 Extract speech and generate subtitles (ASR) 🌍 Translate content using LLMs 🗣 Synthesize multilingual speech with VoxCPM2 (30+ languages + 9 dialects support, plus voice cloning & timbre design) ⏱ Align audio with timestamp-aware SRT scheduling 🎧 Separate and remix voice / background audio for natural output
👍Core innovation The VoiceBridge plugin introduces SRT timestamp-driven TTS alignment into ComfyUI for the first time, enabling fine-grained subtitle-level control over speech generation. 📊SRT-driven audio splitting + TTS generation 📊Timestamp-based audio merging for precise sync 📊ASR + forced alignment for structured subtitles 📊Solves audio-video desynchronization in AI dubbing workflows