Qwen3.7-Plus：阿里巴巴将多模态AI打造成完全自主智能体

2026-06-06 14:54·26天前·Jonathan Kemper

AI 摘要

阿里巴巴Qwen团队发布Qwen3.7-Plus，一个将视觉感知、GUI操作和编码能力整合到单一智能体循环中的多模态智能体模型。在演示中，基于该模型的智能体自主开发了一款词汇学习应用，生成了超过10,000行代码，共执行了1,000次智能体调用，耗时11小时。该模型在Qwen自主基准测试的屏幕理解任务上领先，但整体性能表现参差不齐。Qwen3.7-Plus为闭源模型，价格远低于西方前沿模型。

原文 · 未翻译

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

Key Points

Alibaba has released Qwen3.7-Plus, a new AI model that combines visual understanding with agent capabilities, enabling it to autonomously operate graphical user interfaces and apps.

In testing, the system demonstrated its ability to recreate desktop applications, perform cloud tasks, and independently program a complete app with 10,000 lines of code.

While Qwen3.7-Plus outperforms competitors in operating user interfaces, it falls short in pure logic benchmarks. The model is available as a proprietary, comparatively inexpensive option through Alibaba Cloud.

Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal model built on top of the text-only Qwen3.7. It combines visual perception with classic agent capabilities like coding and tool use.

Billed as a "multimodal interactive hybrid agent," the model is designed to recognize real-world scenes, read screen content, operate graphical interfaces, write code from visual templates, and navigate mobile apps end to end. UI clicks and command-line instructions run within the same agent loop.

Eleven hours of autonomous app development

Using Qwen3.7-Plus, the team had a hybrid agent system build an English vocabulary learning app. According to Qwen, the agent ran for over eleven hours, producing more than 10,000 lines of code across more than 1,000 agent calls. The process covered requirements documentation, automated code generation, installation, test case creation, GUI-based testing, parallel test scenarios, and independent version management.

A second demo targets desktop apps: the agent reportedly recreated the native macOS Stocks app by operating it autonomously, parsing the UI structure, and generating SwiftUI code from it. It then connected an external API for real-time stock data, compiled the app, and ran ten functional tests on its own, including price lookups and search filters.

The Decoder：AI News（RSS）

66导出 Markdown