# OpenBMB发布开源多语言TTS模型VoxCPM 2

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-04-13 11:56
- AIHOT 链接：https://aihot.virxact.com/items/cmnwp88by01eusl6xgewyc6se
- 原文链接：https://x.com/rohanpaul_ai/status/2043538724047425536

## AI 摘要

OpenBMB发布开源TTS模型VoxCPM 2，仅2B参数支持30种语言，无需语言标签即可生成语音。Apache-2.0许可，8GB显存可运行。支持文本描述创建新声音、可控克隆与终极克隆，保留说话人细节。输出48kHz音质，RTX 4090实时推理达0.3 RTF。兼容PyTorch、LoRA微调及Nano-VLLM部署，适用于影视、游戏、有声书等专业场景。

## 正文

VoxCPM 2 just dropped by @OpenBMB

Only 2B-param open-source TTS （Text-to-Speech） model built for production-grade multilingual voice work.

Apache-2.0 license， Can run on only 8GB VRAM.

• Eliminates the "robotic" feel of traditional TTS， delivering prosody and emotional depth suitable for high-stakes professional environments like filmmaking， gaming， animation， and audiobooks.

• 30-language multilingual： no language tag needed， just type in a supported language and generate directly.

• Voice design： create a brand-new voice from a text description alone， like age， tone， pace， or emotion. No reference audio required. Describe the desired voice characteristics （gender， age， tone， emotion， pace …） in Control Instruction， and VoxCPM2 will craft a unique voice from your description alone.

• Controllable cloning： clone from a short clip， then steer delivery style without losing the speaker's core voice.

• Ultimate cloning： use reference audio + transcript for continuation-style cloning that keeps the tiny vocal details.

• 48kHz output： takes 16kHz reference audio and produces studio-quality speech without an external upsampler.

• Real-time ready： around 0.3 RTF on RTX 4090， even lower with Nano-VLLM.
• Commercial use： Apache-2.0 licensed.

Developer-Friendly Infrastructure：
- Native Torch Inference： Direct support for PyTorch-based workflows.
- Training Flexibility： Supports both full-parameter and LoRA fine-tuning for specific domain adaptation.
- Production Readiness： Compatible with voxcpm-nanovllm for large-scale， high-concurrency deployment.
