Rohan Paul@rohanpaul_ai

2026-05-29 01:40·35天前

AI 摘要

Anthropic发布Claude Opus 4.8模型。其快速模式速度提升2.5倍，同时成本降低3倍。在agentic终端编码基准测试上，性能从66.1%大幅提升至74.6%，成为GDPval-AA基准的新领导者。新推出“动态工作流”功能，可通过Claude Code将大型工程任务分解为数十至数百个并行子任务，由多个AI智能体协同处理并互相验证。官方介绍称，该版本在判断力、诚实度以及独立工作能力上均有提升，今日起以相同价格提供服务。

Claude Opus 4.8 dropped.

2.5x faster fast mode， which is also 3x cheaper
has a new "dynamic workflows" feature that allows it to tackle very large-scale problems.
74.6% on agentic terminal coding is the biggest benchmark jump over Opus 4.7， rising from 66.1%
New "dynamic workflows" feature that allows it to tackle very large-scale problems.
The new leader on our GDPval-AA benchmark for agentic real-world work tasks

The dynamic workflows in Claude Code will break a massive engineering task into many smaller jobs， run them through tens to hundreds of parallel subagents， and check the results before handing anything back.

A normal coding agent works like one developer reading， editing， and testing in sequence， but dynamic workflows behave more like a temporary engineering team coordinated by Claude.

Claude first writes an orchestration plan， which is basically a task map that says what needs to be inspected， rewritten， tested， reviewed， or challenged.

Separate subagents then work on different parts of the repo at the same time， so one agent might inspect authentication code， another might port files， another might search for unsafe patterns， and another might try to break the proposed fix.

The major change is verification， because Claude does not just collect answers from subagents， but compares them， refutes weak findings， runs checks， and keeps iterating until the results converge.

ClaudeIntroducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer ...

Rohan Paul@rohanpaul_ai · X

76导出 Markdown

2026-05-29 01:40·35天前

在 X 看原推· x.com

AI 摘要

Claude Opus 4.8 dropped.

2.5x faster fast mode， which is also 3x cheaper
has a new "dynamic workflows" feature that allows it to tackle very large-scale problems.
74.6% on agentic terminal coding is the biggest benchmark jump over Opus 4.7， rising from 66.1%
New "dynamic workflows" feature that allows it to tackle very large-scale problems.
The new leader on our GDPval-AA benchmark for agentic real-world work tasks