在名为CritPt的尖端科学评估中,GPT-5.5 Pro (xhigh) 以比前代GPT-5.4 Pro (xhigh) 低60%的成本和令牌使用量,实现了0.5个百分点的性能提升,将得分推至30.5%。CritPt评估包含全球30多家机构的60多名研究人员贡献的研究生级别物理问题。自2025年11月发布以来,最高分从Gemini 3 Pro Preview的9%跃升至GPT-5.4 Pro的30%。OpenAI指出,GPT-5.5 Pro相比GPT-5.5“使用了更多计算资源进行深度思考,以提供更稳定的优质答案”。该模型每令牌定价相同,但通过使用更少的令牌完成了评估。
GPT-5.5 Pro achieves a small bump on GPT-5.4 Pro with 60% lower cost and token use in our frontier science eval, CritPt
CritPt tests models on graduate-level physics research problems contributed by 60+ researchers from 30+ institutions globally. When CritPt was released in November 2025, the highest score was 9% (Gemini 3 Pro Preview). ~4 months later, GPT-5.4 Pro (xhigh) tripled this score with 30%.
Now, GPT-5.5 Pro (xhigh) has surpassed this result by half a percentage point at 60% lower cost. The model is priced identically per token, but used fewer tokens to complete the evaluation.