# Qwen3.7-Plus：阿里巴巴将多模态AI打造成完全自主智能体

- 来源：The Decoder：AI News（RSS）
- 作者：Jonathan Kemper
- 发布时间：2026-06-06 14:54
- AIHOT 分数：66
- AIHOT 链接：https://aihot.virxact.com/items/cmq204teo00f7slmwy2rqqls5
- 原文链接：https://the-decoder.com/qwen3-7-plus-is-alibabas-bid-to-turn-multimodal-ai-into-a-full-blown-autonomous-agent

## AI 摘要

阿里巴巴Qwen团队发布Qwen3.7-Plus，一个将视觉感知、GUI操作和编码能力整合到单一智能体循环中的多模态智能体模型。在演示中，基于该模型的智能体自主开发了一款词汇学习应用，生成了超过10,000行代码，共执行了1,000次智能体调用，耗时11小时。该模型在Qwen自主基准测试的屏幕理解任务上领先，但整体性能表现参差不齐。Qwen3.7-Plus为闭源模型，价格远低于西方前沿模型。

## 正文

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

Key Points

Alibaba has released Qwen3.7-Plus, a new AI model that combines visual understanding with agent capabilities, enabling it to autonomously operate graphical user interfaces and apps.

In testing, the system demonstrated its ability to recreate desktop applications, perform cloud tasks, and independently program a complete app with 10,000 lines of code.

While Qwen3.7-Plus outperforms competitors in operating user interfaces, it falls short in pure logic benchmarks. The model is available as a proprietary, comparatively inexpensive option through Alibaba Cloud.

Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal model built on top of the text-only Qwen3.7. It combines visual perception with classic agent capabilities like coding and tool use.

Billed as a "multimodal interactive hybrid agent," the model is designed to recognize real-world scenes, read screen content, operate graphical interfaces, write code from visual templates, and navigate mobile apps end to end. UI clicks and command-line instructions run within the same agent loop.

Eleven hours of autonomous app development

Using Qwen3.7-Plus, the team had a hybrid agent system build an English vocabulary learning app. According to Qwen, the agent ran for over eleven hours, producing more than 10,000 lines of code across more than 1,000 agent calls. The process covered requirements documentation, automated code generation, installation, test case creation, GUI-based testing, parallel test scenarios, and independent version management.

A second demo targets desktop apps: the agent reportedly recreated the native macOS Stocks app by operating it autonomously, parsing the UI structure, and generating SwiftUI code from it. It then connected an external API for real-time stock data, compiled the app, and ran ten functional tests on its own, including price lookups and search filters.

A third use case shows a browser agent via "Qwen for Chrome," a sidebar extension. With user permission, the model switches into agent mode and carries out tasks in a cloud console, like purchasing the cheapest available virtual server instance, including configuring the image, storage, and security groups. In a follow-up task, the agent also handles scaling and maintenance, Qwen says.

GUI tasks shine, hard reasoning tests don't

The benchmarks Qwen published paint a clear picture: the model excels at operating graphical interfaces. On AndroidWorld and ScreenSpot Pro, Qwen3.7-Plus sits well ahead of GPT-5.4 (xhigh), Opus 4.6 Max, and Gemini 3.1 Pro. It also leads on agent-oriented terminal work and long-horizon task planning.

On classic multimodal reasoning, results are mixed. Qwen3.7-Plus tops some visual reasoning tests but falls short of Gemini 3.1 Pro and GPT-5.4 on tougher scientific tasks like MedXpertQA-MM. On the text side, the team describes performance as on par with max-tier models, without beating them across the board.

Cross-framework compatibility sets it apart

Qwen3.7-Plus supports the Anthropic API protocol and works directly with Claude Code, OpenClaw, and Alibaba's own Qwen Code. The API also offers a feature called preserve_thinking that retains reasoning content from earlier conversation turns. The Qwen team explicitly recommends this setting for agentic tasks.

preserve_thinking

Beyond image processing, the model also covers video understanding and driving scene analysis, positioning it as a foundation for embedded systems and autonomous driving.

Qwen3.7-Plus is available through Alibaba Cloud Model Studio and, like its text-based sibling Qwen3.7-Max, is a proprietary offering with no open weights. Alibaba prices the Plus tier well below Max: Qwen3.7-Plus costs $0.40 per million input tokens and $2.40 per million output tokens, compared to $2.50 and $7.50 for Qwen3.7-Max. That makes Plus roughly six times cheaper on input and three times cheaper on output and well below the list prices of Western frontier models.

AI News Without the Hype – Curated by Humans
