Soniox 发布 v5 Real-Time 流式 STT 模型,在 AA-WER Streaming 基准上处于准确率与延迟的帕累托前沿。First Final 转录 WER 4.5%(延迟 0.05s),比 Deepgram Flux (7.4%, 0.02s) 和 Nova-3 Realtime (6.7%, 0.06s) 更准确,比 Cartesia Ink-2 (3.7%, 0.09s) 和 ElevenLabs Scribe v2 Realtime (3.6%, 0.14s) 更快。First Partial 转录 WER 4.7%(延迟 0.05s),准确率仅次于上述两款模型但速度更快。价格 $2/1000 分钟,为所有测试专有流式模型最低。支持 60+ 语言及实时翻译。
Soniox has released Soniox v5 Real-Time: a low latency streaming Speech to Text model on the Pareto frontier for accuracy and latency, at the lowest price of any proprietary model tested
Soniox v5 Real-Time is @soniox_ai's latest streaming Speech to Text (STT) model, joining Soniox v5 Async, their non-streaming model released last week. On AA-WER Streaming it occupies the middle of the Pareto frontier: faster than the most accurate models (Cartesia Ink-2, ElevenLabs Scribe v2 Realtime) and more accurate than the fastest (Deepgram Flux, Nova-3), while at a lower price than all of them.
AA-WER Streaming Overview
AA-WER Streaming reports WER and latency as a pair, measured from Silero VAD-detected end of speech on the same ~8 hours of audio as our non-streaming STT benchmark, AA-WER v2.0. We report both at two points: First Final (first final-denoted transcript, best for accuracy) and First Partial (first transcript-bearing event, best for when speed matters most).
Key takeaways
➤ First Final Transcription: Soniox v5 Real-Time achieves a 4.5% WER at 0.05s after end of speech, more accurate than the faster Deepgram Flux (7.4%, 0.02s) and Deepgram Nova-3 Realtime (6.7%, 0.06s), and faster than the more accurate Cartesia Ink-2 external endpoints (3.7%, 0.09s) and ElevenLabs Scribe v2 Realtime (3.6%, 0.14s)
➤ First Partial Transcription: The model achieves a 4.7% WER at 0.05s after end of speech, behind only Cartesia Ink-2 external endpoints (4.3%, 0.07s) and ElevenLabs Scribe v2 Realtime (3.6%, 0.13s) on accuracy, while faster than both
➤ Price: The model costs $2 per 1,000 minutes representing the lowest of any proprietary streaming model tested, below Cartesia Ink-2 ($4), Deepgram Nova-3 Realtime ($4.80) and ElevenLabs Scribe v2 Realtime ($6.50)