# OpenComputer：为计算机使用智能体构建可验证软件世界

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-19 08:00
- AIHOT 分数：63
- AIHOT 链接：https://aihot.virxact.com/items/cmpdhc5hd03xkslk1cc32mxrn
- 原文链接：https://arxiv.org/abs/2605.19769

## AI 摘要

OpenComputer是一个验证器基础框架，旨在为计算机使用智能体构建可验证的软件世界。它集成了四个核心组件：针对特定应用的状态验证器、利用执行反馈进行自我优化的验证层、用于生成真实桌面任务的任务生成器，以及可记录轨迹并计算部分奖励的评估工具。目前，该框架已覆盖33款桌面应用，生成了包含浏览器、办公、创意等六类软件的1000个可机检任务。实验表明，其硬编码验证器比大语言模型评估更贴近人类判断。同时，研究揭示当前前沿智能体在端到端任务完成上仍面临瓶颈，表明稳健的计算机自动化仍存在挑战。

## 正文

We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specific state verifiers that expose structured inspection endpoints over real applications, (2) a self-evolving verification layer that improves verifier reliability using execution-grounded feedback, (3) a task-generation pipeline that synthesizes realistic and machine-checkable desktop tasks, and (4) an evaluation harness that records full trajectories and computes auditable partial-credit rewards. In its current form, OpenComputer covers 33 desktop applications and 1,000 finalized tasks spanning browsers, office tools, creative software, development environments, file managers, and communication applications. Experiments show that OpenComputer's hard-coded verifiers align more closely with human adjudication than LLM-as-judge evaluation, especially when success depends on fine-grained application state. Frontier agents struggle with end-to-end completion despite partial progress, and open-source models exhibit sharp drops from their OSWorld-Verified scores, exposing a persistent gap in robust computer automation.
