Matrix 被 Kim 称为首个「不像 cosplay」的 AI 公司产品。它在 GDPval-Bench 上以 95.45% 的得分击败 Codex (84.9%) 和 Claude Code (80.3%),长任务差距说明规划和协调比原始模型能力更关键。Matrix 定位为运行「零员工公司」的运行时,而非简单提示编排器。上周有限 beta 期间用户已创建数万个零员工公司并开展真实业务,即日起向所有人开放公测。
This is the first "AI company" product I've seen that doesn't feel like pure cosplay.
Two interesting points:
Matrix treats the company idea seriously. You are not just creating agents and hoping they coordinate. Matrix beat both Codex and Claude Code on GDPval-Bench, with 95.45% against 84.9% and 80.3% respectively.
That gap seems to matter most on longer tasks, where planning and coordination actually decide the outcome rather than raw model capability.
Which is maybe the point. A lot of "AI companies" are really just prompt orchestrators with a nice UI. Matrix looks like it's building something closer to an actual operating layer. Whether that holds up beyond benchmarks, I don't know yet. But it really makes me want to find out.