Elon Musk@elonmusk

2026-05-13 04:40·51天前

AI 摘要

Artificial Analysis 发布首个语音到语音（S2S）模型智能体性能基准测试𝜏-Voice，模拟包含口音、噪音和网络丢包的复杂客服场景。测试显示，当前最强S2S模型仅能端到端解决约一半的真实任务，与顶尖文本智能体存在差距。xAI的Grok Voice Think Fast 1.0以52.1%的成功率领先，平均对话时长5.6分钟；OpenAI的GPT-Realtime系列与谷歌的Gemini紧随其后。该领域发展迅速，排名可能随模型更新而变动。

Grok Voice is #1！

Artificial AnalysisAnnouncing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use τ-Voice to measure tool calling and customer interaction ...

xAI 评测/基准语音

在 X 查看原推导出 Markdown

Elon Musk@elonmusk · X

48导出 Markdown

2026-05-13 04:40·51天前

在 X 看原推· x.com

AI 摘要

Grok Voice is #1！

Artificial AnalysisAnnouncing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use τ-Voice to measure tool calling and customer interaction ...

xAI 评测/基准语音