Rohan Paul@rohanpaul_ai

2026-05-18 02:28·46天前

AI 摘要

Thinking Machines Lab与OpenBMB团队正推动AI交互从传统的“对讲机”式轮转模式，向全双工、时间对齐的微轮转模式演进。其核心是通过Omni-Flow等框架，将视觉、听觉输入与语音、文本输出对齐到统一时间轴，实现感知与响应的同步。作为实践，开源的90亿参数多模态模型MiniCPM-o 4.5已能同时看、听、说，并在多模态能力和语音生成质量上超越了更大规模的模型。这标志着AI交互层的重要突破，使实时、自然的类人对话成为可能，且已具备代码、权重及边缘部署方案。

Just a few days back， Thinking Machines Lab （TML）， showcased a way of making AI interaction continuous instead of turn-based， a Full-Duplex Time-aligned micro-turn.

It's a preview of the future of a near-realtime AI voice and video conversation with new 'interaction models'

And MiniCPM-o 4.5 already shipped the same core idea through OpenBMB's Omni-Flow framework： time-aligned perception and response instead of old turn-based chat.

A 9B Full-Duplex omnimodal model that can see， hear， and speak at the same time.

Omni-Flow also treats interaction as a continuous stream on a shared temporal axis， aligning visual input， audio input， and output speech/text into time chunks so the model can perceive while responding.

That breaks the old walkie-talkie UX of AI： user talks， model waits， model replies.

And this is not just a demo concept. It is a 9B open model with code， weights， a report， and edge deployment under 12GB RAM.

It also surpasses Qwen3-Omni-30B-A3B in omni-modal capabilities and speech generation quality.

This feels like the interaction layer AI was missing.

OpenBMB already shipped this as a real Full-Duplex omni-modal architecture， with video tokens， audio tokens， LLM hidden states， speech tokens， and waveform generation all synced to one shared timeline.

Thinking MachinesPeople talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approa...

Rohan Paul@rohanpaul_ai · X

63导出 Markdown