Grok Voice Think Fast 1.0:xAI发布旗舰语音模型,专为复杂工作流设计
xAI发布旗舰语音模型Grok Voice Think Fast 1.0,专为客服、销售等领域的复杂多步骤工作流打造。该模型在τ-voice Bench全双工语音排行榜位列第一,能在电话音频、噪音、口音及频繁打断等真实苛刻条件下稳定运行,并原生支持25种以上语言。其核心优势包括精准的数据录入与复述、实时后台推理不增加延迟,并能通过思考避免错误回答。目前该模型已应用于Starlink的销售与客服,实现了20%的电话销售转化率和70%的客服自主解决率,能跨数百个工作流调用28种工具处理硬件故障排查、换货等高风险任务。
Today, we're excited to announce a step change in xAI's Voice Agent capabilities: Introducing `grok-voice-think-fast-1.0` — our new flagship voice model.
This new model excels at complex, ambiguous, multi-step workflows across customer support, sales, and enterprise applications. It is especially well-suited for high-stakes scenarios that demand precise data entry and high-volume tool calling to address the user's request. Built for the messiness of the real world
We built `grok-voice-think-fast-1.0` through tight collaboration with partners like Starlink to combine top-tier intelligence with low response latency and organic conversational ability.
Our model prioritizes snappy responses and unparalleled cost effectiveness without compromising on accuracy or tool orchestration. The result is a model that lets teams confidently deploy complex, multi-turn voice experiences across almost any conceivable use case: Customer support, phone sales, appointment booking, restaurant reservations, and more.
This new model takes the top spot on the τ-voice Bench leaderboard, which evaluates full-duplex voice agents under realistic conditions including noise, accents, interruptions, and turn-taking. See the benchmark details here. τ-voice Leaderboard Retail
Order handling, returns, promotions in noisy environments Airline
Booking changes, delays, and complex itineraries Telecom
Plan changes, billing disputes, technical troubleshooting
The model has been battle-tested in the toughest real-world conditions: telephony audio, background noise, heavy accents, and frequent interruptions. It natively supports 25+ languages, making it ideal for global deployments. Precise data entry and read-back
Collecting and confirming user information is critical for many workflows. Grok Voice is able to seamlessly collect email addresses, physical street addresses, phone numbers, full names, account numbers, and other structured data—even when information is spoken quickly or with a strong accent. It gracefully handles speech disfluencies and accepts natural corrections as a human would.