阿里云发布新旗舰模型Qwen3.7 Max,定位为“代理时代”的基础模型,强调其在端到端编码、办公自动化等实际任务中的执行能力。模型在一个内核优化任务中展示了35小时无人干预的自主运行能力,完成了超过1000次工具调用。但这并非模型的全面自我进化,而是针对特定优化目标的迭代改进。更值得关注的是,Qwen声称其代理能力能从多样化的训练环境中泛化,如同语言能力从文本中泛化。这一观点若成立,其意义将远超任何基准测试成绩。
Alibaba released Qwen 3.7 max. Benchmarks incredible.
Their new model ran autonomously for 35 hours, made 1,158 tool calls, and achieved a 10x speedup - on a single attention kernel.
This isn't "AI improving itself across the board." It's a model grinding through compile-profile-rewrite loops on one well-defined optimization target.
Impressive? Absolutely. The kind of self-improvement people will imagine when they see the headline? Not yet.
The actually interesting claim is buried deeper: Qwen says agentic capabilities generalize from diverse training environments the same way language capabilities generalize from diverse text. If that holds, it's a bigger deal than any benchmark number.