# FrontierMath Tier 4 巅峰对决：手动评估三种高算力模型

- 来源：Epoch AI (@EpochAIResearch)
- 发布时间：2025-10-11 00:26
- AIHOT 链接：https://aihot.virxact.com/items/cmnw1yqib00rqslc3789ea080
- 原文链接：https://x.com/EpochAIResearch/status/1976685685349441826

## AI 摘要

在 FrontierMath Tier 4 极难数学基准测试中，GPT-5 Pro 以 13% 准确率创下新纪录，仅以一道题优势险胜 Gemini 2.5 Deep Think（统计差异不显著），Grok 4 Heavy 则明显落后。

## 正文

We manually evaluated three compute-intensive model settings on our extremely hard math benchmark. FrontierMath Tier 4： Battle Royale！

GPT-5 Pro set a new record （13%）， edging out Gemini 2.5 Deep Think by a single problem （not statistically significant）. Grok 4 Heavy lags. 🧵