Anthropic 发布 Claude Sonnet 5,号称"最有智能体特性的 Sonnet 模型"。编码得分 SWE-bench Pro 达 63.2%(Sonnet 4.6 为 58.1%,Opus 4.8 为 69.2%),知识工作略超 Opus 4.8。定价优惠:每百万 token 输入 $2、输出 $10,持续到 8 月 26 日,之后涨至 $3/$15。但升级并非全技能均匀提升,在 CyberGym(漏洞发现与利用测试)上弱于 Sonnet 4.6。Anthropic 明确表示未针对网络任务专门训练,该表现来自通用推理而非定向优化。
Claude Sonnet 5 upgrades are not uniform across every skill. e.g. its weaker than Sonnet 4.6 on CyberGym 🤔
Here, CyberGym is testing vulnerability discovery and exploit-finding behavior, not general reasoning or normal coding.
Anthropic also explicitly said in its announcment blog that Sonnet 5 was not deliberately trained for cyber tasks, so its cyber ability likely comes from general intelligence rather than targeted optimization.
So Sonnet 5's performance on CyberGym comes from general reasoning rather than specialized exploit skill.