Artificial Analysis@ArtificialAnlys

2026-06-23 02:13·10天前

AI 摘要

智谱 AI 的 GLM-5.2 在真实世界智能体工作基准 GDPval-AA 上获得 1524 Elo，排名第三，仅次于 Claude Fable 5 和 Claude Opus 4.8，与 GPT-5.5 持平。它是开源权重模型中领先的，超越 Gemini 3.5 Flash、Qwen 3.7 Max 等专有模型。任务为智能体型，平均每任务约 31 轮。此外，GLM-5.2 在 Artificial Analysis Intelligence Index 上也领先开源权重，并在 Agentic Index 和 AA-Briefcase 上均排名第三。

GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA， a real-world agentic work benchmark

GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA， which measures performance on real-world， economically valuable knowledge work through long-horizon， multi-turn tasks.

Key takeaways：

➤ #3 overall， behind only Claude Fable 5 （1783） and Claude Opus 4.8 （1615）， and level with GPT-5.5 （xhigh， 1509）

➤ The leading open weights model by a wide margin： the next open model， MiniMax-M3， scores 1408

➤ Ahead of many proprietary models， including Google's Gemini 3.5 Flash （1357）， Qwen 3.7 Max （1289）， Muse Spark （1158）

➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1，999 matches

➤ Consistent with the rest of its launch， GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index， ranks #3 on the Agentic Index， and #3 on AA-Briefcase

智能体开源生态推理评测/基准

在 X 查看原推

Artificial Analysis@ArtificialAnlys · X

59导出 Markdown