Hao AI Lab@haoailab

2025-08-28 03:56·309天前

AI 摘要

研究探讨了基于强化学习的LLM游戏后训练能否泛化到其他任务。在相同任务族内（如6×6推箱子泛化至8×8版本），训练带来了高达56%的性能提升。但在跨领域任务中，效果有限或不稳定：Blocksworld有小幅提升，WebShop有约6%但不稳定，GSM8K则无改善。研究团队为此提出了GRL框架，这是一个以智能体为中心的多轮强化学习框架，旨在高度定制LLM与环境的交互，以系统研究泛化能力。

【1/5】【Lmgame Bench】 🎮

Question： Can RL-based LLM post-training on games generalize to other tasks？

We shared a preliminary study to explore this question： - Same-family （in-domain）： Training on 6×6 Sokoban → 8×8 and Tetris （1 block type） → Tetris （2 block types） transfers， yielding up to 56% improvement across same-family variants. - Other tasks （out-of-domain）： Blocksworld +3-7% and WebShop ~+6% （unstable）； GSM8K： no improvement.

We introduce GRL， an agent-centric multi-turn RL framework that makes LLM-environment interaction highly customizable for systematic generalization studies. Repo： https://github.com/lmgame-org/GRL Blog： https://lmgame.org/#/blog/grl （check it for details！）

智能体开源/仓库论文/研究

在 X 查看原推导出 Markdown

Hao AI Lab@haoailab · X

49导出 Markdown

2025-08-28 03:56·309天前

在 X 看原推· x.com

AI 摘要

【1/5】【Lmgame Bench】 🎮

Question： Can RL-based LLM post-training on games generalize to other tasks？