# Google 发布迄今最具表现力的 Gemini 3.1 文本转语音模型，支持70余种语言

- 来源：The Decoder：AI News（RSS）
- 作者：Matthias Bastian
- 发布时间：2026-04-16 01:45
- AIHOT 链接：https://aihot.virxact.com/items/cmo0d4jpf00rjsli27tucrt7u
- 原文链接：https://the-decoder.com/google-ships-its-most-expressive-gemini-3-1-text-to-speech-model-yet-with-70-language-support

## AI 摘要

Google 推出 Gemini 3.1 Flash TTS 文本转语音模型，支持超过70种语言的自然语音合成。该模型引入音频标签功能，允许用户精确控制输出语音的风格、语速和语调，显著提升了语音合成的表现力和可控性，适用于多语言内容创作场景。

## 正文

Google ships its most expressive Gemini 3.1 text-to-speech model yet with 70+ language support

Google is rolling out its new text-to-speech model based on Gemini 3.1 Flash. The company says it's the most natural and expressive voice output it has shipped to date. The big new feature is audio tags—simple text commands that let developers control the style, tempo, tone, and accent of the generated speech. The model supports over 70 languages and can handle multi-speaker dialogs.

On the Artificial Analysis ranking list, the model hits an Elo rating of 1,211 and stands out for its quality-to-price ratio. It beats Elevenlabs v3 in overall quality and sits just behind Inworld 1.5 Max.

Gemini 3.1 Flash TTS has a free tier, but Google uses the data to improve its products. The paid tier runs $1.00 per million tokens for text input and $20.00 per million tokens for audio output. Batch mode cuts those prices in half to $0.50 and $10.00, respectively. On the paid tier, Google doesn't use the data for product improvement.

Gemini 3.1 Flash TTS is available as a preview through the Gemini API, Vertex AI for enterprise users, and Google Vids for Workspace users. Anyone can try it for free in Google's AI Studio. All generated audio is tagged with Google's SynthID watermark to flag AI-generated content.

AI News Without the Hype – Curated by Humans