# Anthropic 博客：Claude 能力加速，接近递归自我改进

- 来源：Chubby♨️ (@kimmonismus)
- 发布时间：2026-06-05 00:26
- AIHOT 分数：75
- AIHOT 链接：https://aihot.virxact.com/items/cmpzqjovl060rslkp323grpta
- 原文链接：https://x.com/kimmonismus/status/2062571807274602534

## AI 摘要

Anthropic 内部数据显示 Claude 能力增速远超预期，可能接近自主设计继任者的递归自我改进。关键指标：工程师人均季度代码产出是此前四年平均的 8 倍；AI 可可靠完成的任务时长每 4 个月翻倍，从 Opus 3 的 4 分钟升至 Mythos Preview 的至少 16 小时。截至 2026 年 5 月，Claude 撰写代码占 Anthropic 代码库 80%+，代码质量已与人类持平，年内将超越。最困难任务成功率 6 个月从 26% 升至 76%。Anthropic 认为趋势停滞可能性最低，复合效率增益最可能，完全递归自我改进的对齐结果最不确定。

## 正文

Holy moly， Anthropic is getting very serious about recursive self-improvement！

One word： acceleration.

Insane blog article.

Tl；dr：

•We are close to an AI capable of fully autonomously designing and building its own successor

•They stress this isn't here yet and isn't inevitable， but could arrive sooner than most institutions are ready for

•Anthropic engineers now ship on average 8x as much code per quarter as they did in 2021-2025

•Task length AI can reliably complete is doubling roughly every 4 months （up from every 7 months）

•Opus 3 （Mar 2024） handled ~4-minute tasks； Sonnet 3.7 （a year later） ~90-minute tasks； Opus 4.6 （a year after that） 12-hour tasks

•SWE-bench went from low single digits to saturated in two years； CORE-bench （research reproduction） went ~20% to saturated in 15 months

•METR found Claude Mythos Preview could work "at least" 16 hours， at the top of what they can currently measure

•As of May 2026， Claude authored 80%+ of code merged into Anthropic's codebase （low single digits before Claude Code launched in Feb 2025）

•A March 2026 poll of 130 research staff： median respondent estimated ~4x output with Mythos Preview

•One April 2026 example： Claude shipped 800+ fixes cutting a class of API errors 1，000x， work an engineer estimated would have taken a human four years

•Claude-written code quality： worse than human in late 2025， roughly at parity now， expected to be strictly better within the year

•On the hardest open-ended tasks， Claude's success rate hit 76% in May 2026， up 50 points in six months

•Code-speedup test： Opus 4 averaged ~3x speedup （May 2025）， Mythos Preview ~52x （April 2026）； a skilled human needs 4-8 hours to hit 4x

•In an AI-safety research project， Claude agents recovered 97% of a performance gap （vs ~23% for two human researchers in a week）， over 800 compute-hours and ~$18K

•On picking the better "next step" in research sessions， the best model beat the human choice 51% （Nov 2025， Opus 4.5） rising to 64% （April 2026， Mythos Preview）

•Human comparative advantage， for now： research taste and judgment， i.e. choosing which problems matter and when an approach is a dead end

Three possible futures

•The trend stalls （S-curve）， but today's capabilities still diffuse widely； they consider this least likely

•Compounding efficiency gains， with humans still setting direction； 100-person firms doing the work of 10，000+； they think this is the likely path

•Full recursive self-improvement， where AI builds its successors and pace is set by compute； the alignment outcome here is what they're least certain about

### 引用推文

> Anthropic：Our internal data shows Claude is accelerating AI development-a possible path to recursive self-improvement, or AI autonomously building a more capable successo...
