Ethan Mollick@emollick

2026-06-15 23:28·17天前

AI 摘要

奇怪的标题——我不确定解决10个极其困难的新问题中的7个就意味着AI“没有完成任务”，而15个月前大语言模型还不会做数学。但实际研究很有趣，揭示了AI在数学中的缺陷与成功。https://1stproof.org/assets/docs/report.pdf [引用 @Nature]：人工智能经历了其最严谨的数学测试，然而它并未完成任务 https://go.nature.com/4oqlNk6

Weird headline - I am not sure solving 7 out of 10 novel very hard problems meant AI "did not live up to the task，" when 15 months ago LLMs couldn't do math.

But the actual study is interesting and illuminates flaws &amp； successes of AIs in math. https://1stproof.org/assets/docs/report.pdf

natureArtificial intelligence has undergone its most scrupulous maths test yet, and it did not live up to the task https://go.nature.com/4oqlNk6

大佬观点推理评测/基准

在 X 查看原推导出 Markdown

Ethan Mollick@emollick · X

53导出 Markdown

2026-06-15 23:28·17天前

在 X 看原推· x.com

AI 摘要

Weird headline - I am not sure solving 7 out of 10 novel very hard problems meant AI "did not live up to the task，" when 15 months ago LLMs couldn't do math.

But the actual study is interesting and illuminates flaws &amp； successes of AIs in math. https://1stproof.org/assets/docs/report.pdf

natureArtificial intelligence has undergone its most scrupulous maths test yet, and it did not live up to the task https://go.nature.com/4oqlNk6