Anthropic联合创始人描绘递归式AI改进如何超越人类监督者

2026-05-05 20:15·58天前·Maximilian Schreiner

AI 摘要

Anthropic联合创始人Jack Clark在长文中指出，AI系统训练其自身后继者所需的基础构件已基本就位。他预测到2028年底，AI实现递归式自我改进的可能性高达60%。这一进程可能使AI的进化速度超越负责监督的人类能力，引发对AI发展自主性的关键讨论。

原文 · 未翻译

Anthropic co-founder maps out how recursive AI improvement could outpace the humans meant to supervise it

Jack Clark argues in a long essay that the building blocks for AI systems training their own successors are largely in place. He puts the odds at 60 percent by the end of 2028.

In his newsletter Import AI, Anthropic co-founder Jack Clark says public data points to an imminent automation of AI research. What he means specifically is a system that can train a more powerful successor on its own, "no-human-involved." He pegs the odds at roughly 60 percent by the end of 2028, and 30 percent by 2027.

Clark builds his case mainly on benchmark trends. On SWE-Bench, which tests how well AI systems handle real-world GitHub issues, success rates jumped from about two percent (Claude 2, late 2023) to 93.9 percent, essentially saturating the benchmark. The METR time horizons measure, which tracks how complex a task an AI can complete at 50 percent reliability based on how many hours a skilled human would need, climbed from about 30 seconds with GPT-3.5 to roughly twelve hours with today's frontier models. METR researcher Ajeya Cotra thinks 100 hours by the end of 2026 is plausible.

Core research skills are mostly covered

Clark also points to big gains on research-specific tasks. CORE-Bench, which asks AI systems to reproduce the results of a research paper, was declared solved by one of its authors at 95.5 percent. On MLE-Bench, which tests performance in Kaggle competitions, the top score rose from 16.9 to 64.4 percent. On an internal Anthropic test that asks models to "optimize a CPU-only small language model training implementation to run as fast as possible," the mean speedup went from 2.9x (Opus 4, May 2025) to 52x (April 2026), according to Clark. A human researcher would need four to eight hours to hit a 4x speedup on the same task.

On PostTrainBench, which measures how well frontier models can fine-tune open-weight models against human-built instruct versions, the best systems reached about half the human score. Anthropic has also published a proof of concept for automated alignment research, in which AI agents beat Anthropic-designed baselines on a small-scale safety research problem.

The Decoder：AI News（RSS）

66导出 Markdown

Anthropic联合创始人描绘递归式AI改进如何超越人类监督者

2026-05-05 20:15·58天前·Maximilian Schreiner

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

Anthropic co-founder maps out how recursive AI improvement could outpace the humans meant to supervise it

Jack Clark argues in a long essay that the building blocks for AI systems training their own successors are largely in place. He puts the odds at 60 percent by the end of 2028.