# OpenAI 研究人员解释为何数学是通往 AGI 之路

- 来源：The Decoder：AI News（RSS）
- 作者：Maximilian Schreiner
- 发布时间：2026-04-29 23:24
- AIHOT 分数：46
- AIHOT 链接：https://aihot.virxact.com/items/cmok84nqi00maslgctwq1xug2
- 原文链接：https://the-decoder.com/openai-researchers-explain-why-math-is-the-road-to-agi

## AI 摘要

OpenAI 研究人员在播客中指出，数学能力已成为衡量人工智能向通用人工智能（AGI）发展进程的关键测试。AI 模型在短短两年内，已从掌握小学算术进步到能应对奥林匹克竞赛乃至研究级别的数学问题。这种在复杂数学推理上的快速突破，被视为模型泛化能力和抽象思维提升的重要标志，是迈向 AGI 的核心路径之一。

## 正文

OpenAI researchers explain why math is the road to AGI

AI models have jumped from grade-school arithmetic to olympiad-level and research mathematics in only two years. In the OpenAI Podcast, OpenAI researchers Sebastian Bubeck and Ernest Ryu explain why math has become the key test on the road to artificial general intelligence.

Reasoning models didn't exist two years ago. Four years ago, Bubeck was impressed when Google's Minerva model could draw a line through points on a coordinate system. Today, he told Andrew Mayne, these systems are helping Fields Medal winners with their daily work. At a conference 18 months ago, 80 percent of the mathematicians in the room thought it was impossible for scaled-up LLMs to crack open research problems, Bubeck says.

Ernest Ryu, a former UCLA math professor, says he solved a 42-year-old open problem about Nesterov's method in optimization theory using ChatGPT - in just twelve hours spread across three evenings. He had already spent more than 40 hours on it without AI and gotten nowhere. Ryu acted as a verifier, catching errors and steering the conversation in promising directions.

Why math has become the benchmark for AGI

For Bubeck, math isn't the yardstick for AGI progress by accident. It demands exactly the kind of capability a generally intelligent system needs. Mathematical proofs require long, consistent reasoning over hours, days, or even years, and a single mistake anywhere in the chain destroys the entire argument, no matter how correct the rest is. Anything that can handle that has to be able to spot and fix its own errors.

That's what the researchers want to carry over from math training into other fields, from biology to materials science. Bubeck draws a parallel with how people are educated: students learn math not because they'll go on to write proofs, but because the subject forces them to think logically.

Math also has practical advantages as a benchmark. Problems are clearly stated, answers can be checked, and nobody argues about whether a result is correct. Bubeck introduces the idea of "AGI time": two years ago, models could simulate a student's thinking for minutes. Today, they're up to days or even a week. The next target is weeks and months.

OpenAI's training methods aren't specific to math, Bubeck says, but general, which means progress in other sciences should follow. The researchers are building an "automated researcher" that can work on problems on its own over long stretches of time.

The Erdős problems and the fight over what they mean

Bubeck and Ryu also dig into the Erdős problems, a collection of open questions left behind by the late Hungarian mathematician. Bubeck says internal models initially found solutions to ten problems marked as open, mostly through deep literature searches. His misleading tweet about it sparked a public spat with Google CEO Demis Hassabis, since many people read it as a claim that OpenAI had produced new proofs. By now, Bubeck says, ChatGPT and internal models have actually produced more than ten genuinely new solutions worthy of publication in academic journals.

What seemed like an unrealistic claim is now reality, and the pace is picking up. Bubeck sees this as evidence that the models are making the leap from recombining existing knowledge to producing new mathematics. Even if the philosophical question of whether scientific progress is anything more than clever recombination plus a bit of reasoning remains open.

The risks: mental atrophy and fake proofs

Both researchers warn against using these tools superficially. Expertise matters more than ever, they argue, because only trained mathematicians can put the models to productive use. Non-mathematicians who post long AI-generated proofs on social media are usually wrong. Ryu sees the same pattern in programming, where a whole generation is losing the ability to use debuggers.

Bubeck says claims that scientists are no longer needed are therefore dangerous. Academic institutions need to actively reclaim their role. At the same time, AI can speed up proof verification - a process that currently takes years - and flag problems in published papers.

AI News Without the Hype – Curated by Humans
