Play2Perfect:灵巧玩耍预训练对精确装配的关键因素
阅读原文· arxiv.orgPlay2Perfect 提出一种基于强化学习的任务无关预训练框架,让多指机器人在多样化对象和目标上通过“玩耍”习得可复用的操作先验(如抓取、手中重定向、姿态到达),再微调用于精确装配任务。系统研究表明,对象多样性、训练目标、轨迹多样性和目标精度是关键设计因素。该先验使样本效率比从零强化学习提升 33 倍。零样本 sim-to-real 迁移实现了 0.5 mm 间隙紧配插入 60% 成功率,以及长时序多部件装配和拧螺丝超过 50% 成功率。
Multi-fingered robots promise the speed and dexterity of human hands, yet challenging problems such as precise assembly have remained out of reach. These tasks are contact-rich, making data collection for imitation learning difficult, and sparse-reward, making direct exploration with reinforcement learning (RL) intractable. Consequently, prior work has made progress by structuring the problem with specialized grippers, tool attachments, and environment fixtures. In this work, we argue that before a robot can perfect precise assembly, it must first learn to play. We further ask the question: what factors in the process of learning to play matter for precise assembly? We propose Play2Perfect, an RL framework for task-agnostic pretraining through play on diverse objects and goals, which is then perfected on precise assembly. The goal of play is to acquire reusable manipulation priors, such as grasping, in-hand reorientation and pose reaching. Finetuning then adapts this general prior to assembly, focusing exploration on the final contact-rich, high-precision interactions needed for success. We systematically study key design choices in play pretraining, including object diversity, training objective, trajectory diversity, and goal precision. We show that our prior is 33x more sample-efficient than RL training from scratch, even when provided with dense, multi-stage rewards. We demonstrate zero-shot sim-to-real transfer, achieving 60% success on tight insertions with only 0.5 mm contact clearance, and over 50% success on long-horizon multi-part assembly and screwing.