# PolicyTrim：提升VLA模型内在策略效率

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-21 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmqq4prau06l5slp5lrpym7rj
- 原文链接：https://arxiv.org/abs/2606.22540

## AI 摘要

VLA模型部署受执行效率制约，现有工作多聚焦单步推理延迟，未充分探索内在策略效率。PolicyTrim提出基于强化学习的后训练框架，通过动态探索策略奖励更长可执行动作块长度，并设计冗余感知奖励减少冗余物理步。在三个基准与三个VLA模型上，动作块利用率提升3倍，物理执行步减少51.4%，端到端部署速度提升5.83倍，任务成功率未受影响。

## 正文

Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centric efficiency to reduce per-step inference latency, the intrinsic policy efficiency of these models remains largely unexplored. Policy efficiency is fundamentally affected by two factors, namely the effective executable length of predicted action chunks and the total physical steps required to complete a task. These two factors jointly determine the total number of forward inference calls during execution. We observe that current VLA policies struggle with planning unreliability and action redundancy, suffering from severe prediction degradation at the tail of action chunks and tending to generate unnecessarily redundant physical steps. To address this, we propose PolicyTrim, a reinforcement learning-based post-training framework that extends the reliable action chunk length and reduces redundant physical steps. For reliable chunk extension, we employ a dynamic exploration strategy that explicitly rewards the successful completion of longer executable lengths, progressively pushing the trustworthy prediction horizon to its empirical limit. For step efficiency, we design a redundancy-aware reward that directly favors successful task completions with fewer steps while penalizing unreproducible shortcuts, effectively eliminating redundant physical actions. Extensive experiments across three benchmarks and three VLA models demonstrate that PolicyTrim improves action chunk utilization by 3times and reduces physical execution steps by 51.4\%. Ultimately, our framework delivers up to a 5.83times end-to-end deployment speedup without compromising task success rates.
