原文 · 未翻译
Turing Award winner Richard Sutton says pure generative AI can't do real science
Turing Award winner Richard Sutton argues that ordinary generative AI lacks a key ability for scientific discovery: it can't evaluate and develop its own results.
Large language models, image generators, and video models learn from massive amounts of examples and produce outputs that resemble them. According to Sutton, when these outputs are good, it's usually thanks to the source material: the texts, images, or data the model learned from. When the outputs are truly novel, they go beyond that material. For factual queries, that's called hallucination.
Sutton illustrates his critique with an old researcher's joke: "This work is both novel and good. Unfortunately, the parts that are good are not novel, and the parts that are novel are not good." That diagnosis fits large parts of today's generative AI, Sutton says. It can mimic useful things or randomly produce new things, but it can't tell on its own which new ideas are actually good.
Sutton doesn't deny that generative AI can be useful for summaries, research, assistants, or entertainment. Novelty often isn't even the goal: a summary shouldn't invent new facts, and research shouldn't sneak in extra claims. "Generative AI can be extremely useful, even when it just mimics, if it is faster, or cheaper, or smaller, or more customizable, or more copy-able, than the thing being mimicked," Sutton says.
Imitation falls short for science
In Sutton's view, this boundary matters most for science in general, where the point isn't to reproduce what's already known but to discover new things, test them, and turn them into lasting knowledge.
Sutton describes genuine discovery as a three-step process: variation, evaluation, and selective retention. A system has to generate different options, test them, and keep using the approaches that work. Sutton says this principle exists in evolution, in the scientific method, in planning, in search, and in reinforcement learning.
What pure generative AI lacks most is evaluation. Language and image models do generate different variants. But without testing, there's no selection of the best and no discovery. "The novelty flickers into existence, but if its value is unrecognized, it flickers away and is lost," Sutton says.
Evaluation can come from humans, for example, when users pick the best image from several AI-generated options. But it can also come from a clear goal: a checkmate, a formally valid proof, a successful program run, or a high reward in a simulated environment. Only that kind of feedback turns mere generation into a search and discovery process.
AlphaGo, AlphaFold, and Claude Code show the difference
Sutton says some AI systems that go beyond pure generative AI are already "capable of true creativity and true discovery." He points to examples like AlphaGo with its famous move 37, AlphaZero with its unique chess style, AlphaFold in protein structure prediction, AlphaProof in math, Claude Code in programming, and GT-Sophy in simulated racing.