Ethan Mollick@emollick

2026-04-19 13:01·74天前

AI 摘要

我觉得这些"泄露"很好笑的是，他们甚至懒得弄个大体准确的基准测试数据来输入到图像生成器里。至少让模型查一下真实数据吧。这很简单！比如 GPQA 在所有近期模型上都超过 90% 了。https://t.co/XljT8L3QCJ

What I find very funny about these "leaks" is that they don't even bother to get ballpark benchmarks to feed into the image generators. Ask the model to look up real data， at least. Its easy！

Like GPQA is over 90% for all recent models.

大佬观点现象/趋势评测/基准

在 X 查看原推导出 Markdown

Ethan Mollick@emollick · X

导出 Markdown

2026-04-19 13:01·74天前

在 X 看原推· x.com

AI 摘要

What I find very funny about these "leaks" is that they don't even bother to get ballpark benchmarks to feed into the image generators. Ask the model to look up real data， at least. Its easy！

Like GPQA is over 90% for all recent models.

大佬观点现象/趋势评测/基准