Hao AI Lab@haoailab · 8月28日49[1/5] [Lmgame Bench] 🎮
Question: Can RL-based LLM post-training on games generalize to other tasks?
We shared a preliminary study to explore this question:
- Same-family (in-domain): Training on 6×6 Sokoban → 8×8 and Tetris (1 block type) → Tetris (2 block types) transfers, yielding up to 56% improvement across same-family variants.
- Other tasks (out-of-domain): Blocksworld +3–7% and WebShop ~+6% (unstable); GSM8K: no improvement.
We introduce GRL, an agent-centric multi-turn RL framework that makes LLM–environment interaction highly customizable for systematic generalization studies.
Repo: https://github.com/lmgame-org/GRL
Blog: https://lmgame.org/#/blog/grl (check it for details!)
译研究探讨了基于强化学习的LLM游戏后训练能否泛化到其他任务。在相同任务族内(如6×6推箱子泛化至8×8版本),训练带来了高达56%的性能提升。但在跨领域任务中,效果有限或不稳定:Blocksworld有小幅提升,WebShop有约6%但不稳定,GSM8K则无改善。研究团队为此提出了GRL框架,这是一个以智能体为中心的多轮强化学习框架,旨在高度定制LLM与环境的交互,以系统研究泛化能力。