Rohan Paul@rohanpaul_ai

2026-07-05 04:23·2小时前

AI 摘要

耶鲁大学与芝加哥大学通过11,683篇真实论文构建受控测试：为LLM提供每篇论文的邻近前期工作作为起点，要求其提出新的动机和方法，再与人类真实想法比较。关键发现：差距不在想法质量，而在想法范围——人类想法广泛分布于解释机制、测试失败、测量证据等多种模式；仅12.1%的人类想法主要是连接不同工作，而LLM中这一比例高达47.1%–64.2%（约为人类的4–5倍）。额外推理反而强化了该模式，表明LLM倾向于打磨熟悉配方而非探索更多样化的研究手法。

This Yale + University of Chicago paper shows that real gap between LLM generated research ideas vs humans is not idea quality， but idea range： LLMs think narrower than human researchers.

The researchers built a controlled test from 11，683 real papers， using each paper's nearby prior work as the shared starting point.

They asked models to propose a new motivation and method from those same prior papers， then compared those ideas with the real human paper ideas.

Instead of asking whether 1 idea looked novel， they labeled each idea by what gap it noticed and what kind of contribution it made.

Human ideas spread across many patterns， such as explaining mechanisms， testing failures， measuring evidence， building systems， and improving efficiency.

Only 12.1% of human ideas were mainly about connecting separate work， but 47.1% to 64.2% of LLM ideas did that， meaning models used this move about 4 to 5 times more often.

Even extra reasoning made this pattern stronger， suggesting models often polish a familiar recipe instead of finding more varied research moves.

---

arxiv. org/abs/2607.01233

Title： "Measuring the Gap Between Human and LLM Research Ideas"

论文/研究评测/基准

Rohan Paul@rohanpaul_ai · X

55导出 Markdown