# 面向通用人形机器人操作的触觉梦境学习

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-14 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo0cept100qcsli2ibst6q7i
- 原文链接：https://arxiv.org/abs/2604.13015

## AI 摘要

研究团队提出Humanoid Transformer with Touch Dreaming（HTD）模型，整合基于强化学习的全身控制器与VR遥操作数据收集系统，解决接触丰富场景下的人形机器人操作难题。该方法将触觉作为与视觉、本体感觉同等重要的模态，通过"触觉梦境"机制训练模型预测未来触觉潜变量及手部关节力，从而学习接触感知表征。在插入、整理书籍、叠毛巾、铲猫砂、端茶等五项真实世界灵巧操作任务中，HTD平均成功率较强基线提升90.9%，其中潜空间触觉预测相比原始触觉数据可带来30%的额外性能增益。

## 正文

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world. Project webpage: humanoid-touch-dream.github.io.
