原文 · 未翻译
Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent
Key Points
Alibaba has released Qwen3.7-Plus, a new AI model that combines visual understanding with agent capabilities, enabling it to autonomously operate graphical user interfaces and apps.
In testing, the system demonstrated its ability to recreate desktop applications, perform cloud tasks, and independently program a complete app with 10,000 lines of code.
While Qwen3.7-Plus outperforms competitors in operating user interfaces, it falls short in pure logic benchmarks. The model is available as a proprietary, comparatively inexpensive option through Alibaba Cloud.
Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal model built on top of the text-only Qwen3.7. It combines visual perception with classic agent capabilities like coding and tool use.
Billed as a "multimodal interactive hybrid agent," the model is designed to recognize real-world scenes, read screen content, operate graphical interfaces, write code from visual templates, and navigate mobile apps end to end. UI clicks and command-line instructions run within the same agent loop.
Eleven hours of autonomous app development
Using Qwen3.7-Plus, the team had a hybrid agent system build an English vocabulary learning app. According to Qwen, the agent ran for over eleven hours, producing more than 10,000 lines of code across more than 1,000 agent calls. The process covered requirements documentation, automated code generation, installation, test case creation, GUI-based testing, parallel test scenarios, and independent version management.
A second demo targets desktop apps: the agent reportedly recreated the native macOS Stocks app by operating it autonomously, parsing the UI structure, and generating SwiftUI code from it. It then connected an external API for real-time stock data, compiled the app, and ran ten functional tests on its own, including price lookups and search filters.