Claude Fable 5 在 FrontierMath 最难题目上超越 GPT-5.5 13 个百分点

2026-06-13 18:16·19天前·Matthias Bastian

AI 摘要

Anthropic 的 Claude Fable 5 在 FrontierMath 最困难级别上达到 88% 准确率，远超 OpenAI 的 GPT-5.5（约 75%），领先 13 个百分点。相较于 2026 年初 Opus 4.5 不到 10% 的表现，实现巨大飞跃。AI 数学推理能力的进步速度持续加快。

原文 · 未翻译

Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems

Anthropic's new model, Claude Fable 5, posts top scores on the FrontierMath benchmark. According to Epoch AI, Fable 5 hits 87 percent accuracy on tiers 1 through 3 and 88 percent on the hardest tier 4 (v2).

Anthropic's models are getting dramatically better at math in a short span of time. As recently as early 2026, predecessor model Opus 4.5 scored below 10 percent on tier 4. OpenAI's GPT-5.5 reaches about 75 percent on the same tier, well behind Fable 5, although GPT-5.6 is already in the making.

All models were tested on Epoch AI's standard scaffold with maximum reasoning effort. FrontierMath is widely considered one of the toughest benchmarks for AI math reasoning. These math gains aren't just in benchmarks, real-world examples keep stacking up. Most recently, an OpenAI model solved a longstanding Erdős problem; so did Claude Mythos.

AI News Without the Hype – Curated by Humans

The Decoder：AI News（RSS）

35导出 Markdown

Claude Fable 5 在 FrontierMath 最难题目上超越 GPT-5.5 13 个百分点

2026-06-13 18:16·19天前·Matthias Bastian

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems