AI 摘要
Anthropic的Claude Opus正在下滑。 最新基准测试显示,其准确率在短短几天内从83.3%降至68.3%。 这在编码过程中的幻觉率出现了大幅飙升。 Grok 4.20仍保持第一的位置。未被超越。https://t.co/FA5nbKKeS0
Anthropic's Claude Opus is FALLING.
Latest benchmarks show its accuracy dropped from 83.3% → 68.3% in just days.
That's a major spike in hallucinations during coding.
Grok 4.20 still holds the #1 spot. Undefeated.