Opus 4.7 消耗的 token 数量约为原来的 1.3 倍。指令必须非常精确。许多人在抱怨这是一次"仓促发布"。在 Bullshit Benchmark 中,它的表现比 Opus 4.6 更差。反响非常两极分化。 Anthropic 这次可能帮了 OpenAI 一个大忙。Spud 预计下周发布。如果发布得当,它可能会盖过 Opus 的风头,让 ChatGPT 重回巅峰。 h/t @petergostev 提供基准测试和图片
Opus 4.7 consumes approximately 1.3 times as many tokens. The instructions must be very precise. Many are complaining about a "rushed release." In the Bullshit Benchmark, it performs worse than Opus 4.6. The mood is very mixed.
Anthropic may have done OpenAI a big favor with this. Spud is expected next week. And if the release is done right, it could overshadow Opus and catapult ChatGPT back to the top.
h/t @petergostev for the benchmark and image