# RL4IL：强化学习引导的检索与软融合实现缺失模态下的鲁棒多模态模仿学习

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-13 08:00
- AIHOT 分数：37
- AIHOT 链接：https://aihot.virxact.com/items/cmqjz0kjf01h2slhij7i6gr0x
- 原文链接：https://arxiv.org/abs/2606.15514

## AI 摘要

RL4IL是一种强化学习引导的模仿学习方法，通过近端策略优化对广度优先搜索候选集排序，并利用软交叉注意力融合头部聚合

## 正文

Robotic systems perceive the world through multiple input modalities -- including visual camera streams and natural language instructions -- and must select appropriate actions based on these signals. However, assuming the permanent availability of all input devices is unrealistic, as sensors may fail, become occluded, or drop out entirely during deployment. Robust handling of such missing-modality scenarios is therefore essential for real-world robot operation. This paper introduces RL4IL, a reinforcement learning guided method for imitation learning that selects the most suitable action for a given observation by identifying the most relevant expert demonstrations from a training library. A reinforcement learning policy, trained via Proximal Policy Optimisation over Breadth-First Search candidate sets, ranks candidate demonstrations and a soft cross-attention fusion head aggregates their action signals to produce the final prediction. When a modality is missing at inference time, a dedicated per-modality RL retrieval policy identifies donor demonstrations from the training library, and a soft imputation head reconstructs the missing embedding via cross-attention over the top-ranked donors -- without requiring any retraining of the system. Experiments on three LIBERO benchmark suites demonstrate that RL4IL substantially outperforms state-of-the-art imitation learning methods under sensor dropout conditions, while requiring no policy network training. The code can be found at https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera
