Epoch AI@EpochAIResearch

2026-06-13 01:35·20天前

AI 摘要

FrontierMath: Tiers 1–4 (v2) 现已上线。我们完成了一项审计，修正了 42% 的问题中的错误。排名相似，但整体得分更高。目前的领先者是 GPT-5.5 (xhigh)，在 Tiers 1–3 上达到 85%，以及 Google 的 AI co-mathematician，在 Tier 4 上达到 76%。

FrontierMath： Tiers 1-4 （v2） is live.

We concluded an audit that addressed errors in 42% of problems. Rankings are similar but scores are higher across the board. The current leaders are GPT-5.5 （xhigh） with 85% on Tiers 1-3 and Google's AI co-mathematician with 76% on Tier 4.