MobileForge:无标注自适应移动GUI智能体
阅读原文· arxiv.orgMobileForge由MobileGym和层次化反馈引导策略优化(HiFPO)组成,在真实移动应用中自动生成任务和评估rollout,将轨迹结果、步骤级过程反馈及纠正提示转化为提示上下文的步骤级GRPO更新。使用自动生成的无标注数据,MobileForge将Qwen3-VL-8B适配到AndroidWorld达67.2% Pass@3,接近闭数据专用模型GUI-Owl-1.5-8B的69.0%。进一步适配的ForgeOwl-8B在AndroidWorld上达77.6% Pass@3,并在域外MobileWorld GUI-only任务上取得41.0%成功率,成为当前最强的开源数据移动GUI智能体。代码、数据和模型将开源。
MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently updated, and hard to cover with human-written tasks, demonstrations, or reward labels. Existing annotation-free GUI learning reduces manual supervision, yet lacks a unified substrate connecting target-app exploration, curriculum mining, rollout execution, and feedback, while policy optimization often relies on isolated rollouts and coarse rewards that are hard to convert into reliable improvement signals. We present MobileForge, an annotation-free adaptation system for mobile GUI agents. MobileForge consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated annotation-free adaptation data, MobileForge adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld, close to the closed-data GUI-specialized GUI-Owl-1.5-8B base model at 69.0%. The MobileForge-adapted ForgeOwl-8B further reaches 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in our evaluation. Code, data, and trained models will be released at https://mobile-forge.github.io/.