# Google将电脑控制功能直接集成到Gemini 3.5 Flash中

- 来源：The Decoder：AI News（RSS）
- 作者：Matthias Bastian
- 发布时间：2026-06-25 17:04
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmqta4hi401r8sl0e8arphq38
- 原文链接：https://the-decoder.com/google-bakes-computer-control-directly-into-gemini-3-5-flash-letting-the-model-see-and-operate-your-screen

## AI 摘要

Google将“Computer Use”功能直接集成到Gemini 3.5 Flash，模型可自主看、理解并操作电脑、浏览器和移动设备，此前该功能仅作为独立Gemini 2.5模型提供。结合函数调用、Search和Maps等工具，开发者可构建跨平台智能体，用于软件测试或办公自动化。在OSWorld基准测试中，Gemini 3.5 Flash得分78.4，高于Gemini 3 Flash(65.1)和GPT-5.4 mini(72.1)，略低于GPT-5.5(78.7)，Anthropic的Opus 4.8以83.4领先。安全方面采用对抗训练和两项可选企业防护：敏感操作需用户确认、自动阻止间接提示注入。该功能通过Gemini API和Gemini Enterprise Agent Platform提供，附带Browserbase演示和GitHub参考实现。

## 正文

Google bakes computer control directly into Gemini 3.5 Flash, letting the model see and operate your screen

Matthias Bastian View the LinkedIn Profile of Matthias Bastian

Jun 25, 2026

Google has integrated "Computer Use" directly into Gemini 3.5 Flash. The model can now see, understand, and interact with computers, browsers, and mobile devices on its own. Previously, this was only available as a separate Gemini 2.5 model. Combined with existing tools like function calls, Search, and Maps, developers can now build agents that work across browser, mobile, and desktop environments for tasks like software testing or office automation.

On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1). GPT-5.5 sits just ahead at 78.7, while Anthropic's Opus 4.8 leads at 83.4. Sonnet 4.6 also hits 78.4, and Gemini 3.1 Pro lands at 76.2.

To guard against prompt injection attacks, Google uses adversarial training and two optional enterprise safeguards. One requires user confirmation for sensitive or irreversible actions, while the other automatically stops tasks when it detects indirect prompt injections. Google also recommends sandboxing, human oversight, and strict access controls, with more details in its best practices documentation. The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. A Browserbase demo and a GitHub reference implementation are also available.
