Artificial Analysis@ArtificialAnlys

2026-05-29 00:57·35天前

AI 摘要

Anthropic 正式发布了 Claude Opus 4.8 模型。该模型在人工智能分析公司的 GDPval-AA 基准（专注于智能体的现实工作任务）上，以“max”努力设置获得了 1890 分。这一成绩比前代 Opus 4.7 高出 137 分，并以 121 分的优势领先于次优模型 GPT-5.5 xhigh。在直接对比中，这意味着 Opus 4.8 对 GPT-5.5 xhigh 拥有约 67% 的胜率。Anthropic 在模型公开发布前，为人工智能分析公司提供了早期访问权限以进行评测。

Anthropic just launched Claude Opus 4.8， and it is the new leader on our GDPval-AA benchmark for agentic real-world work tasks

Opus 4.8 scored 1890 on GDPval-AA at launch with its 'max' effort setting， +137 points from Opus 4.7 and +121 points ahead of the next-best model， GPT-5.5 xhigh.

Compared head-to-head on the GDPval task set， this implies a ~67% win rate against GPT-5.5 xhigh.

@AnthropicAI shared access with us ahead of the public release to benchmark this model and we're glad to see our benchmarks referenced in today's launch.

The rest of the Artificial Analysis Intelligence Index is in progress - we'll share final results soon！

智能体 Anthropic 模型发布评测/基准

在 X 查看原推导出 Markdown

Artificial Analysis@ArtificialAnlys · X

80导出 Markdown