Artificial Analysis@ArtificialAnlys

2026-06-18 03:32·15天前

AI 摘要

智谱发布 GLM-5.2（最大推理努力），在 CritPt 基准（未发表研究级物理问题）上得分 20.9%，与 Claude Opus 4.8 持平，远超其他开放权重模型。DeepSeek V4 Pro 仅得 12.9%；GLM-5.2 同时超越 GPT-5.5、Gemini 3.1 Pro 和 Claude Opus 4.7 等专有模型。仅 GPT-5.5 Pro 以 30.6% 领先。相比十周前 GLM-5.1 的 4.6%，实现 4.5 倍代际提升。

A standout number in Z ai's GLM-5.2 launch is CritPt， a benchmark of unpublished research-level physics problems where it ties with Claude Opus 4.8 and is well above other open weights models

Key takeaways：

➤ @Zai_org 's GLM-5.2 （max reasoning effort） leads open weights by a wide margin： the next open model， DeepSeek V4 Pro， scores 12.9%

➤ GLM-5.2 matches Claude Opus 4.8 （20.9%） and beats several proprietary models， including GPT-5.5， Gemini 3.1 Pro， and Claude Opus 4.7

➤ Only proprietary models score higher with GPT-5.5 Pro topping the benchmark at 30.6%

➤ A 4.5× generational jump： GLM-5.1 scored just 4.6% on CritPt ten weeks ago

开源生态推理评测/基准

在 X 查看原推导出 Markdown

Artificial Analysis@ArtificialAnlys · X

51导出 Markdown