# Ethan Mollick 批评 Intelligence Index v4.1 基准更新

- 来源：Ethan Mollick (@emollick)
- 发布时间：2026-06-17 06:21
- AIHOT 分数：29
- AIHOT 链接：https://aihot.virxact.com/items/cmqh80o8j01uvsle16axqhybz
- 原文链接：https://x.com/emollick/status/2067009697207427534

## AI 摘要

新版 GDPval-AA v2 成为 Intelligence Index v4.1 权重最高的评估，升级将 ELO 基线重置为人类 1000 分，引入前沿模型法官轮换面板，回合上限从 100 提升至 250。Claude Fable 5（有回退）以 1818 分领先，但当前不可用；Claude Opus 4.8 得 1638 分，GPT-5.5 (xhigh) 得 1531 分。Ethan Mollick 批评：AI 评估 AI 在取自另一闭卷基准的公开问题上表现意义有限，且人类 ELO 设定方式不透明，认为更新前后均非良好基准。

## 正文

This was not a good benchmark before it was updated and it is not a good benchmark now. Having AIs evaluate the work of other AIs on publicly available questions from a different closed benchmark doesn't tell you very much.

And it is unclear how they establish the human ELO.

### 引用推文

> Artificial Analysis：GDPval-AA v2 is the highest weighted evaluation in the Intelligence Index v4.1. The upgrade re-baselines ELO to human performance at 1000, introduces a rotating...
