# All benchmarks are flawed， but GPQA has been fairly consistent &amp； highly correlated with other me…

- 来源：Ethan Mollick (@emollick)
- 发布时间：2026-05-06 07:10
- AIHOT 分数：66
- AIHOT 链接：https://aihot.virxact.com/items/cmot9lvxr03h0slv7hyke750g
- 原文链接：https://x.com/emollick/status/2051801703209742734

## AI 摘要

所有基准测试都有缺陷，但GPQA一直相当稳定且与其他测量基准高度相关。我认为这是一个很好的方式来看我们已经走了多远，OpenAI的免费模型GPT 5.5 Instant已经达到了甚至付费模型直到2025年底才达到的水平

## 正文

All benchmarks are flawed， but GPQA has been fairly consistent &amp； highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI， GPT 5.5 Instant， is at a level that even paid models did not reach until late 2025
