Chubby♨️@kimmonismus

2026-06-23 04:03·2天前

AI 摘要

GLM-5.2（max）在真实世界智能体工作基准 GDPval-AA 上获 1524 Elo，排名第三，仅次于 Claude Fable 5（1783）和 Claude Opus 4.8（1615），与 GPT-5.5（xhigh，1509）持平。该模型以约 31 轮次任务平均完成零售主管任务清单、紧急停止电路图等交付物，领先开源权重模型（下一名 MiniMax-M3 仅 1408），并超过 Google Gemini 3.5 Flash（1357）、Qwen 3.7 Max（1289）等闭源模型。GLM-5.2 同时在 Artificial Analysis Intelligence Index、Agentic Index 和 AA-Briefcase 上领跑开源榜单。

Absolutely incredible： GLM-5.2 （max） sits at #3 overall on GDPval-AA， a real-world agentic work benchmark， even ahead of GPT-5.5 （xhigh）.

Oh and btw： looks like open source is no longer 7 months behind.

GDPval-AA， a benchmark built around real professional and creative tasks. The models had to produce practical deliverables from identical briefs， including a retail supervisor's task list， an emergency-stop circuit schematic， and a music video moodboard.

Thats why we'll probably see a big leap with GPT-5.6. Even open source competition is catching up insanley fast.

Artificial AnalysisGLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, w...

智能体开源生态评测/基准

在 X 查看原推

Chubby♨️@kimmonismus · X