# Opus 4.8 在 DeepSWE 上相比 Opus 4.7 有扎实提升，并降低每任务平均成本

- 来源：Chubby♨️ (@kimmonismus)
- 发布时间：2026-05-31 10:11
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmpt5go7607edsluzpszitb9a
- 原文链接：https://x.com/kimmonismus/status/2060906958610120785

## AI 摘要

Anthropic 的 Opus 4.8 在 DeepSWE 基准测试中表现较 Opus 4.7 有显著提升，同时降低了每项任务的平均成本。具体而言，在默认高思考努力（xhigh）设置下，其得分比 Opus 4.7 xhigh 高出 6%。然而，GPT-5.5 xhigh 在该项测试中仍以明显优势领先，且成本更低。推文作者对 OpenAI 近期的模型发布印象深刻，并期待 GPT-5.6，同时也开始认可 Opus 4.8，认为当前正处于两家前沿实验室持续推出真正令人印象深刻模型的时刻。

## 正文

Opus 4.8 is a solid jump over Opus 4.7 on DeepSWE， while also lowering the average cost per task.

However， GPT-5.5 xhigh still beats it by a pretty clear margin while being cheaper.

OpenAI has been cooking insanely hard with its models lately. Really excited to see what GPT-5.6 brings.

That said， I have to admit： I'm starting to really like Opus 4.8 as well.

We've entered a moment where both frontier labs keep shipping genuinely impressive models.

### 引用推文

> Datacurve：Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.