# RAD-2：基于生成器-判别器框架的强化学习规模化方法

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-16 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo2bd9a5022qslbaeclw35h0
- 原文链接：https://arxiv.org/abs/2604.15308

## AI 摘要

RAD-2 提出了一种面向自动驾驶闭环规划的生成器-判别器框架，通过扩散模型生成多样化轨迹候选，并利用强化学习优化的判别器进行重排序。该方法引入时序一致性组相对策略优化与在线生成器优化技术，结合 BEV-Warp 高吞吐量仿真环境实现大规模训练。相比现有扩散规划器，RAD-2 将碰撞率降低 56%，并在真实场景部署中显著提升了驾驶安全性与平稳性。

## 正文

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of corrective negative feedback when trained purely with imitation learning. To address these issues, we propose RAD-2, a unified generator-discriminator framework for closed-loop planning. Specifically, a diffusion-based generator is used to produce diverse trajectory candidates, while an RL-optimized discriminator reranks these candidates according to their long-term driving quality. This decoupled design avoids directly applying sparse scalar rewards to the full high-dimensional trajectory space, thereby improving optimization stability. To further enhance reinforcement learning, we introduce Temporally Consistent Group Relative Policy Optimization, which exploits temporal coherence to alleviate the credit assignment problem. In addition, we propose On-policy Generator Optimization, which converts closed-loop feedback into structured longitudinal optimization signals and progressively shifts the generator toward high-reward trajectory manifolds. To support efficient large-scale training, we introduce BEV-Warp, a high-throughput simulation environment that performs closed-loop evaluation directly in Bird's-Eye View feature space via spatial warping. RAD-2 reduces the collision rate by 56% compared with strong diffusion-based planners. Real-world deployment further demonstrates improved perceived safety and driving smoothness in complex urban traffic.
