# The recipe for "classic" reasoning benchmarks is simple： text-only， several-hour time horizons， easy…

- 来源：Epoch AI (@EpochAIResearch)
- 发布时间：2026-05-06 04:26
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmot33mlu0200slv7rw8tu5j9
- 原文链接：https://x.com/EpochAIResearch/status/2051760424891392204

## AI 摘要

“经典”推理基准的配方很简单：纯文本、数小时的时间跨度、易于评分，并带有专家人类基线。

接下来呢？在本周的Gradient Update中，@GregHBurnham 认为只需舍弃这四种成分之一即可。

## 正文

The recipe for "classic" reasoning benchmarks is simple： text-only， several-hour time horizons， easy to grade， with expert human baselines.

What next？ In this week's Gradient Update， @GregHBurnham argues it's as easy as dropping one of these four ingredients.