Valdi:价值扩散世界模型
阅读原文· arxiv.orgValdi(Value Diffusion World Models)将端到端在线训练模型预测控制(MPC)与潜在扩散动力学模型相结合,利用单步扩散同时用于训练和推理,在保证低延迟的同时建模不确定的未来。在CarRacing环境中的初步实验显示,Valdi的性能与确定性MLP基线相当,同时揭示了预测多模态性与控制效果之间的权衡。代码已开源。
World models can enable Model Predictive Control (MPC), but this requires dynamics prediction that is both fast enough for online use and expressive enough to represent uncertain futures. Diffusion models offer a natural mechanism for modeling uncertain dynamics, yet their iterative inference procedure makes them difficult to use for low-latency latent planning. We bridge this gap with Value Diffusion World Models (Valdi), combining end-to-end online training for MPC with a latent diffusion dynamics model. In preliminary experiments on the CarRacing environment, we show that Valdi, using a single diffusion step at both training and inference, matches a deterministic MLP baseline. Our experiments expose a trade-off between predictive multimodality and control performance in this setup. Code is available at https://github.com/Kit115/ValueDiffusionWorldModels.