研究探讨了基于强化学习的LLM游戏后训练能否泛化到其他任务。在相同任务族内(如6×6推箱子泛化至8×8版本),训练带来了高达56%的性能提升。但在跨领域任务中,效果有限或不稳定:Blocksworld有小幅提升,WebShop有约6%但不稳定,GSM8K则无改善。研究团队为此提出了GRL框架,这是一个以智能体为中心的多轮强化学习框架,旨在高度定制LLM与环境的交互,以系统研究泛化能力。
【1/5】 【Lmgame Bench】 🎮
Question: Can RL-based LLM post-training on games generalize to other tasks?
We shared a preliminary study to explore this question: - Same-family (in-domain): Training on 6×6 Sokoban → 8×8 and Tetris (1 block type) → Tetris (2 block types) transfers, yielding up to 56% improvement across same-family variants. - Other tasks (out-of-domain): Blocksworld +3-7% and WebShop ~+6% (unstable); GSM8K: no improvement.
We introduce GRL, an agent-centric multi-turn RL framework that makes LLM-environment interaction highly customizable for systematic generalization studies. Repo: https://github.com/lmgame-org/GRL Blog: https://lmgame.org/#/blog/grl (check it for details!)