Artificial Analysis@ArtificialAnlys

2026-05-16 02:46·48天前

AI 摘要

AI在经济价值任务中快速进步：根据GDPval-AA Elo评分，GPT-5.5在实际工作产出中预计将赢得约98%的正面比较，对比对象是一年前GDPval-AA领先模型Claude 4 Sonnet GDPval-AA衡量模型在九个行业44个职业中的任务完成能力。图示展示了Claude 4 Sonnet（2025年5月）与GPT-5.5（xhigh，2026年5月）在库存管理任务中的幻灯片输出对比

AI is making rapid progress in economically valuable tasks： based on their GDPval-AA Elo scores， GPT-5.5 is expected to win ~98% of head-to-head comparisons on realistic work outputs against Claude 4 Sonnet， the leading model in GDPval-AA a year ago

GDPval-AA measures how well models complete tasks across nine industries and 44 occupations. The graphic shows slide outputs for an Inventory Management task from Claude 4 Sonnet （May 2025） against GPT-5.5 （xhigh， May 2026）.

OpenAI 推理评测/基准

在 X 查看原推导出 Markdown

Artificial Analysis@ArtificialAnlys · X

63导出 Markdown

2026-05-16 02:46·48天前

在 X 看原推· x.com

AI 摘要