AI 摘要
看到很多人说Opus 4.7相比4.6是净退步,但这似乎只是些个例。 离线和在线评估都指向明确的进步。 那是什么没被捕捉到呢?“个性”吗?
seeing lot of people saying that Opus 4.7 is a net regression vs 4.6, but it seems quite anecdotal.
offline and online evals point towards a clean step up.
what's not being captured? "personality"?