Anthropic发布Claude Sonnet 5。在AA-Briefcase(智能体知识工作基准,测试模型处理数千文件并产出表格、演示和UI原型)上,Sonnet 5 (max)得1391 Elo,较Sonnet 4.6 (max)提升312分,排第二,仅次于Fable 5。提升来自rubric评分与分析质量,呈现仍落后Opus 4.8。max设置得分最高,但较低设置不处成本-性能帕累托前沿;Opus 4.8 (max)、GLM-5.2 (max)和MiniMax-M3在低努力下性价比更优。Sonnet 5成本较高,因turn数大增:max平均每任务183 turns(Sonnet 4.6 max的4倍多),medium平均55 turns,各设置成本跨度约17倍。
Claude Sonnet 5 ranks second only to Fable 5 on AA-Briefcase, our new agentic knowledge work benchmark, with a ~17x cost per task range across its five effort settings
@AnthropicAI has released Claude Sonnet 5, the latest addition to the Claude Sonnet family. On AA-Briefcase, Claude Sonnet 5 (max) scores 1391 Elo, a +312 point improvement over Claude Sonnet 4.6 (max), making it the second highest scoring model behind Claude Fable 5. This gain is driven primarily by improvements in rubric scoring and analytical quality, with Sonnet 5 trailing Claude Opus 4.8 on Presentation Elo.