Claude Fable 5 在 FrontierMath 最难题目上超越 GPT-5.5 13 个百分点
阅读原文· the-decoder.comAnthropic 的 Claude Fable 5 在 FrontierMath 最困难级别上达到 88% 准确率,远超 OpenAI 的 GPT-5.5(约 75%),领先 13 个百分点。相较于 2026 年初 Opus 4.5 不到 10% 的表现,实现巨大飞跃。AI 数学推理能力的进步速度持续加快。
Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems
Anthropic's new model, Claude Fable 5, posts top scores on the FrontierMath benchmark. According to Epoch AI, Fable 5 hits 87 percent accuracy on tiers 1 through 3 and 88 percent on the hardest tier 4 (v2).
Anthropic's models are getting dramatically better at math in a short span of time. As recently as early 2026, predecessor model Opus 4.5 scored below 10 percent on tier 4. OpenAI's GPT-5.5 reaches about 75 percent on the same tier, well behind Fable 5, although GPT-5.6 is already in the making.
All models were tested on Epoch AI's standard scaffold with maximum reasoning effort. FrontierMath is widely considered one of the toughest benchmarks for AI math reasoning. These math gains aren't just in benchmarks, real-world examples keep stacking up. Most recently, an OpenAI model solved a longstanding Erdős problem; so did Claude Mythos.