swyx 🇸🇬@swyx

2026-05-05 04:39·59天前

AI 摘要

看到很多人说Opus 4.7相比4.6是净退步，但这似乎只是些个例。离线和在线评估都指向明确的进步。那是什么没被捕捉到呢？“个性”吗？

seeing lot of people saying that Opus 4.7 is a net regression vs 4.6， but it seems quite anecdotal.

offline and online evals point towards a clean step up.

what's not being captured？ "personality"？

swyx 🇸🇬@swyx · X

2026-05-05 04:39·59天前

AI 摘要

看到很多人说Opus 4.7相比4.6是净退步，但这似乎只是些个例。离线和在线评估都指向明确的进步。那是什么没被捕捉到呢？“个性”吗？

seeing lot of people saying that Opus 4.7 is a net regression vs 4.6， but it seems quite anecdotal.

offline and online evals point towards a clean step up.

what's not being captured？ "personality"？