原文 · 未翻译
OpenAI researchers explain why math is the road to AGI
AI models have jumped from grade-school arithmetic to olympiad-level and research mathematics in only two years. In the OpenAI Podcast, OpenAI researchers Sebastian Bubeck and Ernest Ryu explain why math has become the key test on the road to artificial general intelligence.
Reasoning models didn't exist two years ago. Four years ago, Bubeck was impressed when Google's Minerva model could draw a line through points on a coordinate system. Today, he told Andrew Mayne, these systems are helping Fields Medal winners with their daily work. At a conference 18 months ago, 80 percent of the mathematicians in the room thought it was impossible for scaled-up LLMs to crack open research problems, Bubeck says.
Ernest Ryu, a former UCLA math professor, says he solved a 42-year-old open problem about Nesterov's method in optimization theory using ChatGPT - in just twelve hours spread across three evenings. He had already spent more than 40 hours on it without AI and gotten nowhere. Ryu acted as a verifier, catching errors and steering the conversation in promising directions.
Why math has become the benchmark for AGI
For Bubeck, math isn't the yardstick for AGI progress by accident. It demands exactly the kind of capability a generally intelligent system needs. Mathematical proofs require long, consistent reasoning over hours, days, or even years, and a single mistake anywhere in the chain destroys the entire argument, no matter how correct the rest is. Anything that can handle that has to be able to spot and fix its own errors.
That's what the researchers want to carry over from math training into other fields, from biology to materials science. Bubeck draws a parallel with how people are educated: students learn math not because they'll go on to write proofs, but because the subject forces them to think logically.
Math also has practical advantages as a benchmark. Problems are clearly stated, answers can be checked, and nobody argues about whether a result is correct. Bubeck introduces the idea of "AGI time": two years ago, models could simulate a student's thinking for minutes. Today, they're up to days or even a week. The next target is weeks and months.
OpenAI's training methods aren't specific to math, Bubeck says, but general, which means progress in other sciences should follow. The researchers are building an "automated researcher" that can work on problems on its own over long stretches of time.
The Erdős problems and the fight over what they mean
Bubeck and Ryu also dig into the Erdős problems, a collection of open questions left behind by the late Hungarian mathematician. Bubeck says internal models initially found solutions to ten problems marked as open, mostly through deep literature searches. His misleading tweet about it sparked a public spat with Google CEO Demis Hassabis, since many people read it as a claim that OpenAI had produced new proofs. By now, Bubeck says, ChatGPT and internal models have actually produced more than ten genuinely new solutions worthy of publication in academic journals.