Noam Brown@polynoamial

2025-07-19 17:55·348天前

AI 摘要

实验研究转化为产品需数月，但 AI 能力迭代极快，数月即可产生代差。新 IMO 题目测试中，所有模型表现均不及人类，Grok-4 即使采用 best-of-n 策略也表现糟糕。

It takes us a few months to turn the experimental research frontier into a product. But progress is so fast that a few months can mean a big difference in capabilities.

Ravid Shwartz ZivSo, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selecti...

Meta xAI 大佬观点推理

在 X 查看原推导出 Markdown

Noam Brown@polynoamial · X

导出 Markdown

2025-07-19 17:55·348天前

在 X 看原推· x.com

AI 摘要

It takes us a few months to turn the experimental research frontier into a product. But progress is so fast that a few months can mean a big difference in capabilities.

Ravid Shwartz ZivSo, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selecti...

Meta xAI 大佬观点推理

在 X 查看原推