Epoch AI@EpochAIResearch

2026-05-06 04:26·58天前

AI 摘要

“经典”推理基准的配方很简单：纯文本、数小时的时间跨度、易于评分，并带有专家人类基线。接下来呢？在本周的Gradient Update中，@GregHBurnham 认为只需舍弃这四种成分之一即可。

The recipe for "classic" reasoning benchmarks is simple： text-only， several-hour time horizons， easy to grade， with expert human baselines.

What next？ In this week's Gradient Update， @GregHBurnham argues it's as easy as dropping one of these four ingredients.

Epoch AI@EpochAIResearch · X

2026-05-06 04:26·58天前

AI 摘要

The recipe for "classic" reasoning benchmarks is simple： text-only， several-hour time horizons， easy to grade， with expert human baselines.

What next？ In this week's Gradient Update， @GregHBurnham argues it's as easy as dropping one of these four ingredients.