Artificial Analysis指数适用于模型间粗略比较,但不适合趋势分析。有分析引用当前指数分数与OpenAI发布节奏,将每次更新的分数增益减半后进行保守外推,预测GPT的指数分数可能在2029年左右达到90分。这意味着模型在CritPt、HLE、SciCode等多样化前沿基准上的平均表现接近博士水平。该预测已大幅调低了当前进展速度,若智能体、测试计算或AI辅助研究等技术加速发展,这一目标可能更早实现,使晚期AGI成为基本预期。
The artificial analysis index is a normalized score of several benchmarks (and has changed over time) it is fine for roughly comparing models, it is not useful for trend analysis and it is unclear what individual point differences in the scores mean.