# AI交互新突破：全双工时间对齐微轮转实现类人实时对话

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-05-18 02:28
- AIHOT 分数：63
- AIHOT 链接：https://aihot.virxact.com/items/cmpa4f11l0w2dslnz3qx9yf0x
- 原文链接：https://x.com/rohanpaul_ai/status/2056079437394051270

## AI 摘要

Thinking Machines Lab与OpenBMB团队正推动AI交互从传统的“对讲机”式轮转模式，向全双工、时间对齐的微轮转模式演进。其核心是通过Omni-Flow等框架，将视觉、听觉输入与语音、文本输出对齐到统一时间轴，实现感知与响应的同步。作为实践，开源的90亿参数多模态模型MiniCPM-o 4.5已能同时看、听、说，并在多模态能力和语音生成质量上超越了更大规模的模型。这标志着AI交互层的重要突破，使实时、自然的类人对话成为可能，且已具备代码、权重及边缘部署方案。

## 正文

Just a few days back， Thinking Machines Lab （TML）， showcased a way of making AI interaction continuous instead of turn-based， a Full-Duplex Time-aligned micro-turn.

It's a preview of the future of a near-realtime AI voice and video conversation with new 'interaction models'

And MiniCPM-o 4.5 already shipped the same core idea through OpenBMB's Omni-Flow framework： time-aligned perception and response instead of old turn-based chat.

A 9B Full-Duplex omnimodal model that can see， hear， and speak at the same time.

Omni-Flow also treats interaction as a continuous stream on a shared temporal axis， aligning visual input， audio input， and output speech/text into time chunks so the model can perceive while responding.

That breaks the old walkie-talkie UX of AI： user talks， model waits， model replies.

And this is not just a demo concept. It is a 9B open model with code， weights， a report， and edge deployment under 12GB RAM.

It also surpasses Qwen3-Omni-30B-A3B in omni-modal capabilities and speech generation quality.

This feels like the interaction layer AI was missing.

OpenBMB already shipped this as a real Full-Duplex omni-modal architecture， with video tokens， audio tokens， LLM hidden states， speech tokens， and waveform generation all synced to one shared timeline.

### 引用推文

> Thinking Machines：People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approa...