AI在经济价值任务中快速进步:根据GDPval-AA Elo评分,GPT-5.5在实际工作产出中预计将赢得约98%的正面比较,对比对象是一年前GDPval-AA领先模型Claude 4 Sonnet GDPval-AA衡量模型在九个行业44个职业中的任务完成能力。图示展示了Claude 4 Sonnet(2025年5月)与GPT-5.5(xhigh,2026年5月)在库存管理任务中的幻灯片输出对比
AI is making rapid progress in economically valuable tasks: based on their GDPval-AA Elo scores, GPT-5.5 is expected to win ~98% of head-to-head comparisons on realistic work outputs against Claude 4 Sonnet, the leading model in GDPval-AA a year ago
GDPval-AA measures how well models complete tasks across nine industries and 44 occupations. The graphic shows slide outputs for an Inventory Management task from Claude 4 Sonnet (May 2025) against GPT-5.5 (xhigh, May 2026).