Rohan Paul@rohanpaul_ai

2026-06-17 04:22·16天前

AI 摘要

Catnip推出MaineCoon，一个22B参数的实时音频-视觉基础模型，能将文本提示词转化为带同步语音、动作和表情的实时角色流，支持无限时长交互。作为首个流式原生模型，MaineCoon实现亚秒级首帧，单张H100上达47.5FPS，单张RTX Pro 6000上达30FPS，内部测试吞吐量比同类音频-视觉系统快约7倍。与被动视频生成不同，它能因果性地实时响应，记住自身不完美的过去，并保持角色身份、声音和节奏的连贯一致，让AI从轮次式应答变为“与你同在”的实时存在。

Catnip just dropped MaineCoon， a 22B real-time audio-visual foundation model that turns text prompts into a live character stream with synced speech， motion， and expression.

The first streaming-native model of its kind.

sub-second first frame， 47.5FPS on one H100， 30FPS on one RTX Pro 6000， and about 7x faster throughput than comparable audio-visual systems in its internal tests.

The big deal is that a normal video generator can wait， revise， and render a finished clip， but a social interface has to move causally， remember its own imperfect past， and stay ahead of playback without breaking identity， voice， or rhythm.

Catnip🥇MaineCoon: From Passive Video to Real-Time AI Presence The first unlimited-duration interactive audio-visual model. Most AI products today still feel like the...

多模态模型发布视频语音

在 X 查看原推

Rohan Paul@rohanpaul_ai · X

65导出 Markdown

2026-06-17 04:22·16天前

在 X 看原推· x.com

AI 摘要

Catnip just dropped MaineCoon， a 22B real-time audio-visual foundation model that turns text prompts into a live character stream with synced speech， motion， and expression.

The first streaming-native model of its kind.

sub-second first frame， 47.5FPS on one H100， 30FPS on one RTX Pro 6000， and about 7x faster throughput than comparable audio-visual systems in its internal tests.