Rohan Paul@rohanpaul_ai

2026-05-12 11:48·52天前

AI 摘要

Thinking Machines公司发布了TML-Interaction-Small模型，旨在以“始终在场”的AI取代传统的轮替式对话AI。该模型采用混合专家架构，将音频、视频和文本流切分为200毫秒的微轮次，使其能在交互过程中并行执行聆听、观看、说话、绘图、搜索及调用工具等操作。其核心设计理念是让人工智能像人类一样实时并行处理多任务。模型在保持低延迟（0.40秒）的同时，保留了强大的推理与指令遵循能力，且交互性直接内建于模型架构，而非依赖外部组件拼凑实现。

Thinking Machines is replacing turn-taking AI with always-present AI.

They just announced TML-Interaction-Small， a 276B-parameter MoE model with 12B active parameters that treats conversation as a live stream instead of a stop-start chat box.

Most AI voice systems still behave like walkie-talkies： you speak， they wait， they answer， then their view of the world freezes while they talk.

Thinking Machines changes that by slicing audio， video， and text into 200ms micro-turns， so the model can listen， watch， speak， draw， search， and call tools while the interaction is still happening.

This is why the demos feel different： the model can interrupt when context demands it， keep talking while listening， react to visual cues， track elapsed time， and hand harder work to a background model without vanishing from the conversation.

The architecture is also cleaner than many current real-time systems because interactivity is trained into the model itself rather than patched together with voice detectors， turn detectors， separate speech models， and timing rules.

The early numbers are strong： 0.40s turn-taking latency， 77.8 on FD-bench V1.5 interaction quality， and 43.4% on Audio MultiChallenge， which means it is not just fast， it still retains useful reasoning and instruction-following ability.

The model can notice timing， silence， overlap， gestures， screen changes， and uncertainty as part of the same context.

Thinking MachinesPeople talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approa...

Rohan Paul@rohanpaul_ai · X

62导出 Markdown

2026-05-12 11:48·52天前

在 X 看原推· x.com

AI 摘要

Thinking Machines is replacing turn-taking AI with always-present AI.

They just announced TML-Interaction-Small， a 276B-parameter MoE model with 12B active parameters that treats conversation as a live stream instead of a stop-start chat box.

Most AI voice systems still behave like walkie-talkies： you speak， they wait， they answer， then their view of the world freezes while they talk.