PhoneBuddy：训练开放模型实现智能体手机使用

2026-06-22 08:00·11天前

AI 摘要

训练开放模型实现可靠手机操控面临真实设备慢、难重置，模拟环境不逼真的问题。PhoneBuddy提出结合真实应用与模拟环境PhoneWorld的训练方案：先共享监督微调，再对比真实RL与混合RL。在150项真实手机评估中，成功率从SFT的36.67%提升至混合RL的45.33%；在AndroidWorld上从60.3%升至83.2%。结果表明，模拟训练是真实RL的互补来源，优势在应用/小程序任务，跨应用工作流仍是开放挑战。

原文 · 未翻译

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app environment, PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a shared supervised fine-tuning stage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows, task success rate improves from 36.67\% after supervised fine-tuning to 40.67\% after real-app RL and 45.33\% after mixed RL. On AndroidWorld, the same progression rises from 60.3\% to 77.2\% to 83.2\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.

HuggingFace Daily Papers（社区热门论文）

52导出 Markdown

PhoneBuddy：训练开放模型实现智能体手机使用

2026-06-22 08:00·11天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译