# GPT-5.5 在 Agents' Last Exam 基准中排名第一，最难任务所有智能体成功率 0%

- 来源：Noam Brown (@polynoamial)
- 发布时间：2026-06-12 01:35
- AIHOT 分数：63
- AIHOT 链接：https://aihot.virxact.com/items/cmq9stb080ey8slldkreqys5f
- 原文链接：https://x.com/polynoamial/status/2065125807585149136

## AI 摘要

OpenAI 研究员 Noam Brown 表示，GPT-5.5 在 Agents' Last Exam（ALE）基准中排名第一，且按模型 token、成本或墙钟时间衡量同样表现最佳。ALE 由 @dawnsongtweets 团队创建，是一个滚动基准，包含超过 1500 个专家任务、覆盖 55 个职业，测试 AI 智能体能否执行实际经济价值工作。评估对象包括 GPT-5.5、Fable 5、Composer 2.5 等前沿系统。结果显示：当前智能体能解决部分专业任务，但在需要持续推理和深度专业知识的最难层级，所有被测前沿智能体（包括 Fable 5）成功率为 0%。

## 正文

I'm happy GPT-5.5 tops this eval

I'm even happier it's still doing the best when measured vs tokens， cost， or wall-clock time！

### 引用推文

> Dawn Song：Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is that really the case? Over the past many ...