Artificial Analysis@ArtificialAnlys

2026-05-01 02:57·63天前

AI 摘要

在名为CritPt的尖端科学评估中，GPT-5.5 Pro (xhigh) 以比前代GPT-5.4 Pro (xhigh) 低60%的成本和令牌使用量，实现了0.5个百分点的性能提升，将得分推至30.5%。CritPt评估包含全球30多家机构的60多名研究人员贡献的研究生级别物理问题。自2025年11月发布以来，最高分从Gemini 3 Pro Preview的9%跃升至GPT-5.4 Pro的30%。OpenAI指出，GPT-5.5 Pro相比GPT-5.5“使用了更多计算资源进行深度思考，以提供更稳定的优质答案”。该模型每令牌定价相同，但通过使用更少的令牌完成了评估。

GPT-5.5 Pro achieves a small bump on GPT-5.4 Pro with 60% lower cost and token use in our frontier science eval， CritPt

CritPt tests models on graduate-level physics research problems contributed by 60+ researchers from 30+ institutions globally. When CritPt was released in November 2025， the highest score was 9% （Gemini 3 Pro Preview）. ~4 months later， GPT-5.4 Pro （xhigh） tripled this score with 30%.

Now， GPT-5.5 Pro （xhigh） has surpassed this result by half a percentage point at 60% lower cost. The model is priced identically per token， but used fewer tokens to complete the evaluation.

According to OpenAI， GPT-5.5 Pro "uses more compute to think harder and provide consistently better answers" than GPT-5.5.

Congratulations @OpenAI and @sama on this result

OpenAI 推理评测/基准

在 X 查看原推

Artificial Analysis@ArtificialAnlys · X

46导出 Markdown

2026-05-01 02:57·63天前

在 X 看原推· x.com

AI 摘要

GPT-5.5 Pro achieves a small bump on GPT-5.4 Pro with 60% lower cost and token use in our frontier science eval， CritPt

Now， GPT-5.5 Pro （xhigh） has surpassed this result by half a percentage point at 60% lower cost. The model is priced identically per token， but used fewer tokens to complete the evaluation.