AI 摘要
Opus 4.7 基准测试按排名着色。 – 编程(SWE-Bench)大幅提升 – 计算机使用大幅提升 – 视觉推理(CharXiv)大幅提升 – Terminal Bench 小幅提升 – BrowseComp 退步 介于 4.6 和 Mythos 之间。 [图表由 4.7 生成] https://t.co/h7iXLx3xlY
Opus 4.7 benchmarks colored by ranking.
- Strong coding (SWE-Bench) bump
- Strong Computer use bump
- Strong visual reasoning (CharXiv) bump
- Weak Terminal Bench bump
- BrowseComp regression
Slots in between 4.6 and Mythos.
【Chart generated by 4.7】