FrontierMath: Tiers 1–4 (v2) 现已上线。 我们完成了一项审计,修正了 42% 的问题中的错误。排名相似,但整体得分更高。目前的领先者是 GPT-5.5 (xhigh),在 Tiers 1–3 上达到 85%,以及 Google 的 AI co-mathematician,在 Tier 4 上达到 76%。
FrontierMath: Tiers 1-4 (v2) is live.
We concluded an audit that addressed errors in 42% of problems. Rankings are similar but scores are higher across the board. The current leaders are GPT-5.5 (xhigh) with 85% on Tiers 1-3 and Google's AI co-mathematician with 76% on Tier 4.