# PokeRL：面向《宝可梦 红》的强化学习系统

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-12 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo0835a6007lsli2o2c732a9
- 原文链接：https://arxiv.org/abs/2604.10812

## AI 摘要

PokeRL 是基于 PyBoy 的模块化深度强化学习系统，用于训练智能体完成《宝可梦 红》早期任务（离开房屋、探索真新镇、首次宿敌战）。针对 PPO 智能体易陷入动作循环、菜单垃圾信息及无目的漫游等训练脆弱性问题，该系统引入循环感知环境包装器（含地图掩码）、多层反循环与反垃圾机制及密集分层奖励设计。研究指出，这种明确建模失败模式的实用系统，是连接玩具级基准与完整宝可梦联盟冠军智能体的必要中间步骤。

## 正文

Pokemon Red is a long-horizon JRPG with sparse rewards, partial observability, and quirky control mechanics that make it a challenging benchmark for reinforcement learning. While recent work has shown that PPO agents can clear the first two gyms using heavy reward shaping and engineered observations, training remains brittle in practice, with agents often degenerating into action loops, menu spam, or unproductive wandering. In this paper, we present PokeRL, a modular system that trains deep reinforcement learning agents to complete early game tasks in Pokemon Red, including exiting the player's house, exploring Pallet Town to reach tall grass, and winning the first rival battle. Our main contributions are a loop-aware environment wrapper around the PyBoy emulator with map masking, a multi-layer anti-loop and anti-spam mechanism, and a dense hierarchical reward design. We argue that practical systems like PokeRL, which explicitly model failure modes such as loops and spam, are a necessary intermediate step between toy benchmarks and full Pokemon League champion agents. Code is available at https://github.com/reddheeraj/PokemonRL