# 腾讯混元发布UniRL及两种新RL算法

- 来源：Tencent Hy (@TencentHunyuan)
- 发布时间：2026-06-09 20:03
- AIHOT 分数：74
- AIHOT 链接：https://aihot.virxact.com/items/cmq6lzlnn09oqsl5ixe3tpi32
- 原文链接：https://x.com/TencentHunyuan/status/2064317430265192810

## AI 摘要

🚀推出UniRL，一个用于统一多模态模型的RL基础设施。附带两种新RL算法：DRPO和Flow-DPPO。

一个覆盖扩散/流匹配模型、LLM/VLM以及统一多模态模型的RL循环👇

代码：http://github.com/Tencent-Hunyuan/UniRL

（是的——U(you)-ni-(need) RL 😉）

## 正文

🚀Introducing UniRL， an RL infra for unified multimodal models. Together with two new RL algorithms： DRPO and Flow-DPPO.

One RL loop across diffusion/flow matching models， LLMs/VLMs， and unified multimodal models👇

Code： http://github.com/Tencent-Hunyuan/UniRL

（yes - U（you）-ni-（need） RL 😉）