Ethan Mollick@emollick

2026-05-06 07:10·58天前

AI 摘要

所有基准测试都有缺陷，但GPQA一直相当稳定且与其他测量基准高度相关。我认为这是一个很好的方式来看我们已经走了多远，OpenAI的免费模型GPT 5.5 Instant已经达到了甚至付费模型直到2025年底才达到的水平

All benchmarks are flawed， but GPQA has been fairly consistent &amp； highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI， GPT 5.5 Instant， is at a level that even paid models did not reach until late 2025

OpenAI 大佬观点评测/基准

在 X 查看原推导出 Markdown

Ethan Mollick@emollick · X

66导出 Markdown

2026-05-06 07:10·58天前

在 X 看原推· x.com

AI 摘要

OpenAI 大佬观点评测/基准