# 在潜在空间中学习高频连续动作块

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-24 08:00
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmpnsmgf810g2sl019noee4n2
- 原文链接：https://arxiv.org/abs/2605.24931

## AI 摘要

为解决机器人高频（如60Hz）动作控制中时序平滑性与空间一致性难以兼顾的问题，本研究将高频动作学习从动作空间转移到变分自编码器（VAE）的潜在空间，显著提升了控制质量。为进一步在异步推理下实现流畅执行，提出了“先复用后精炼”的块级策略，以增强相邻动作块间的连续性。实验表明，该方法使机器人能够更连贯地执行复杂的接触密集型任务，减少了停顿与抖动，动作完成更为平滑。代码与数据已开源。

## 正文

Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequencies, policies often fail to generate actions that are both temporally smooth and spatially consistent. We address this challenge by shifting high-frequency action learning from the action space to a latent space with variational autoencoder (VAE). This formulation significantly improves both temporal and spatial consistency of high-frequency control. To enable smooth real-time execution, we further introduce Reuse-then-Refine, a chunk-level refine strategy that improves continuity between adjacent action chunks under asynchronous inference. As a result, robots controlled by our policy can execute complex contact-rich tasks continuously, with less pauses and jerky motions. Experiments on three real-world contact-rich robotic tasks show that our approach consistently completes tasks with smooth motions. Our code and data are available at https://github.com/tars-robotics/RTR.
