Fish Audio 发布 S2.1 Pro 文本转语音模型,通过 API 免费使用至 2026 年 7 月 24 日。该模型支持 83 种语言、声音克隆及自然语言控制情感与韵律,质量、延迟和吞吐量均优于前代 S2 Pro。在 Artificial Analysis Speech Arena 排行榜上,S2.1 Pro 基于 1072 场竞技获得 Elo 1153,排名第 13,超过 Async Pro v1.0、Speech 2.8 Turbo 和 Step TTS 2。处理速度达 56.3 字符/秒,高于 GPT-Realtime-2(45.8 chars/s)和 Gemini 3.1 Flash TTS(25.3 chars/s)。
Fish Audio has recently released S2.1 Pro and is making it available for free via API through July 24.
Fish Audio S2.1 Pro is the latest Text to Speech model from @FishAudio, supporting multilingual speech generation across 83 languages with improved quality, lower latency, and higher throughput than S2 Pro. The model also supports voice cloning and natural language control over emotion and prosody.
Key takeaways:
➤ Quality: S2.1 Pro has an Elo of 1,153, placing it #13 on the Artificial Analysis Speech Arena Leaderboard ahead of Async Pro v1.0, Speech 2.8 Turbo, and Step TTS 2, based on 1,072 arena appearances.