Rohan Paul@rohanpaul_ai · 5月12日62Thinking Machines is replacing turn-taking AI with always-present AI.
They just announced TML-Interaction-Small, a 276B-parameter MoE model with 12B active parameters that treats conversation as a live stream instead of a stop-start chat box.
Most AI voice systems still behave like walkie-talkies: you speak, they wait, they answer, then their view of the world freezes while they talk.
Thinking Machines changes that by slicing audio, video, and text into 200ms micro-turns, so the model can listen, watch, speak, draw, search, and call tools while the interaction is still happening.
This is why the demos feel different: the model can interrupt when context demands it, keep talking while listening, react to visual cues, track elapsed time, and hand harder work to a background model without vanishing from the conversation.
The architecture is also cleaner than many current real-time systems because interactivity is trained into the model itself rather than patched together with voice detectors, turn detectors, separate speech models, and timing rules.
The early numbers are strong: 0.40s turn-taking latency, 77.8 on FD-bench V1.5 interaction quality, and 43.4% on Audio MultiChallenge, which means it is not just fast, it still retains useful reasoning and instruction-following ability.
The model can notice timing, silence, overlap, gestures, screen changes, and uncertainty as part of the same context.
译Thinking Machines公司发布了TML-Interaction-Small模型,旨在以“始终在场”的AI取代传统的轮替式对话AI。该模型采用混合专家架构,将音频、视频和文本流切分为200毫秒的微轮次,使其能在交互过程中并行执行聆听、观看、说话、绘图、搜索及调用工具等操作。其核心设计理念是让人工智能像人类一样实时并行处理多任务。模型在保持低延迟(0.40秒)的同时,保留了强大的推理与指令遵循能力,且交互性直接内建于模型架构,而非依赖外部组件拼凑实现。