Opus 4.7 benchmarks colored by ranking. – Strong coding (SW · AI HOT

内容

精选全部 AI 动态 AI 日报主题收藏

接入

更多

关于更新日志反馈

内部员工登录

精选全部日报更多

内部员工登录

Opus 4.7 benchmarks colored by ranking. – Strong coding (SW · AI HOT

Deedy@deedydas

2026-04-16 22:55·77天前

AI 摘要

Opus 4.7 基准测试按排名着色。 – 编程（SWE-Bench）大幅提升 – 计算机使用大幅提升 – 视觉推理（CharXiv）大幅提升 – Terminal Bench 小幅提升 – BrowseComp 退步介于 4.6 和 Mythos 之间。 [图表由 4.7 生成] https://t.co/h7iXLx3xlY

Opus 4.7 benchmarks colored by ranking.

Strong coding （SWE-Bench） bump
Strong Computer use bump
Strong visual reasoning （CharXiv） bump
Weak Terminal Bench bump
BrowseComp regression

Slots in between 4.6 and Mythos.

【Chart generated by 4.7】

智能体 Anthropic 推理编码

在 X 查看原推导出 Markdown

Deedy@deedydas · X

导出 Markdown

2026-04-16 22:55·77天前

在 X 看原推· x.com

AI 摘要

Opus 4.7 基准测试按排名着色。 – 编程（SWE-Bench）大幅提升 – 计算机使用大幅提升 – 视觉推理（CharXiv）大幅提升 – Terminal Bench 小幅提升 – BrowseComp 退步介于 4.6 和 Mythos 之间。 [图表由 4.7 生成] https://t.co/h7iXLx3xlY

Opus 4.7 benchmarks colored by ranking.

Strong coding （SWE-Bench） bump
Strong Computer use bump
Strong visual reasoning （CharXiv） bump
Weak Terminal Bench bump
BrowseComp regression

Slots in between 4.6 and Mythos.

【Chart generated by 4.7】

智能体 Anthropic 推理

在 X 查看原推x.com