# 游戏强化学习训练能否提升LLM通用任务能力？

- 来源：Hao AI Lab (@haoailab)
- 发布时间：2025-08-28 03:56
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmnxjn85o00gvsl9o3givctkm
- 原文链接：https://x.com/haoailab/status/1960793679398084818

## AI 摘要

研究探讨了基于强化学习的LLM游戏后训练能否泛化到其他任务。在相同任务族内（如6×6推箱子泛化至8×8版本），训练带来了高达56%的性能提升。但在跨领域任务中，效果有限或不稳定：Blocksworld有小幅提升，WebShop有约6%但不稳定，GSM8K则无改善。研究团队为此提出了GRL框架，这是一个以智能体为中心的多轮强化学习框架，旨在高度定制LLM与环境的交互，以系统研究泛化能力。

## 正文

【1/5】 【Lmgame Bench】 🎮

Question： Can RL-based LLM post-training on games generalize to other tasks？

We shared a preliminary study to explore this question：
- Same-family （in-domain）： Training on 6×6 Sokoban → 8×8 and Tetris （1 block type） → Tetris （2 block types） transfers， yielding up to 56% improvement across same-family variants.
- Other tasks （out-of-domain）： Blocksworld +3-7% and WebShop ~+6% （unstable）； GSM8K： no improvement.

We introduce GRL， an agent-centric multi-turn RL framework that makes LLM-environment interaction highly customizable for systematic generalization studies.
Repo： https://github.com/lmgame-org/GRL
Blog： https://lmgame.org/#/blog/grl （check it for details！）