# Gemma 4 12B 语音转录表现不佳，落后于专有转录模型

- 来源：Artificial Analysis (@ArtificialAnlys)
- 发布时间：2026-06-06 08:43
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmq1ne7nh0hdpsltr15xfl6pn
- 原文链接：https://x.com/ArtificialAnlys/status/2063059212532642193

## AI 摘要

Google DeepMind 发布开源权重模型 Gemma 4 12B，支持语音转录，在 AA-WER 基准上得分为 8.8%（排名第 58），远低于专注转录的开源模型 Voxtral Mini Transcribe 2（4B 参数，WER 3.6%）和 Voxtral Small（12B 参数，WER 2.8%）。该模型是 Gemma 4 系列中支持转录的最大型号（另有 E4B、E2B），而 31B 和 26B A4B 仅支持文本、图片和视频输入。Google 同步推出本地听写应用 Eloquent（MacOS/iOS）。模型已在 Hugging Face、Ollama 和 LMStudio 上架。

## 正文

Google's newly released open weights model， Gemma 4 12B， supports transcription but is far from the frontier， scoring 8.8% on AA-WER （#58）

Gemma 4 12B is the latest release from @GoogleDeepMind in the Gemma 4 family. With a score of 8.8% on AA-WER， it is able to capture a reasonable amount of conversation context， but underperforms compared to transcription-focused open weights models like Voxtral Mini Transcribe 2 （3.6% WER， with 4B parameters） and slightly larger open weights language models like Voxtral Small （2.8% WER， with 12B parameters）. The new model launched alongside their local dictation app， Eloquent， available on MacOS and iOS.

Gemma 4 12B is the largest in the Gemma 4 family to support transcription， alongside Gemma 4 E4B and Gemma 4 E2B， with Gemma 4 31B and Gemma 4 26B A4B supporting text， image and video input only. These models are available on a variety of platforms including Hugging Face， Ollama and LMStudio.

We are currently running Gemma 4 12B through the full Artificial Analysis Intelligence Index and will share results soon.