RATs:玩耍式智能体机器人学习
阅读原文· arxiv.org论文提出Playful Agentic Robot Learning范式,让具身编码智能体在任务到达前自主玩耍持续学技能。RATs(机器人智能体团队)在玩耍阶段自主提出可学新探索任务,执行代码策略、诊断失败并重试,将成功执行蒸馏为持久化代码技能库。测试时从冻结库检索技能辅助新任务。在LIBERO-PRO和MolmoSpaces上,玩耍学习技能相比CaP-Agent0分别提升20.6和17.0个百分点;该技能库可直接插入其他推理时代码策略智能体,无需微调模型,在RoboSuite和真实世界迁移中分别提升8.9和8.8个百分点。
Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.