# Qwen3.7-Max 在 ITBench-AA 企业IT任务基准测试中位列第三

- 来源：Alibaba Cloud (@alibaba_cloud)
- 发布时间：2026-05-28 15:11
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmpp6jszi0cc6slv4yka6tqwm
- 原文链接：https://x.com/alibaba_cloud/status/2059895223963275479

## AI 摘要

由 Artificial Analysis 和 IBM Research 合作推出的首个评估模型处理真实企业IT任务能力的基准测试 ITBench-AA，聚焦于站点可靠性工程（SRE）任务。测试结果显示，通义千问（Qwen3.7-Max）以 42% 的分数排名第三。该测试中，所有前沿模型得分均低于 50%，其中 Claude Opus 4.7 以 47% 领先，GPT-5.5（xhigh）以 46% 紧随其后。在开源模型中，GLM-5.1（Reasoning）以 40% 领衔。该基准未来将扩展到财务运营（FinOps）等任务。

## 正文

📢Qwen3.7-Max just hit #3 on ITbench-AA - a fresh benchmark testing how well models handle real-world enterprise IT tasks， agentic-style.

🔧Agentic era， go with Qwen.🏃🏃

### 引用推文

> Artificial Analysis：Artificial Analysis and IBM Research are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, startin...