Thinking Machines Lab与OpenBMB团队正推动AI交互从传统的“对讲机”式轮转模式,向全双工、时间对齐的微轮转模式演进。其核心是通过Omni-Flow等框架,将视觉、听觉输入与语音、文本输出对齐到统一时间轴,实现感知与响应的同步。作为实践,开源的90亿参数多模态模型MiniCPM-o 4.5已能同时看、听、说,并在多模态能力和语音生成质量上超越了更大规模的模型。这标志着AI交互层的重要突破,使实时、自然的类人对话成为可能,且已具备代码、权重及边缘部署方案。
Just a few days back, Thinking Machines Lab (TML), showcased a way of making AI interaction continuous instead of turn-based, a Full-Duplex Time-aligned micro-turn.
It's a preview of the future of a near-realtime AI voice and video conversation with new 'interaction models'
And MiniCPM-o 4.5 already shipped the same core idea through OpenBMB's Omni-Flow framework: time-aligned perception and response instead of old turn-based chat.
A 9B Full-Duplex omnimodal model that can see, hear, and speak at the same time.
Omni-Flow also treats interaction as a continuous stream on a shared temporal axis, aligning visual input, audio input, and output speech/text into time chunks so the model can perceive while responding.