# MemGUI-Agent：具有主动上下文管理的端到端长时域移动GUI智能体

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-18 08:00
- AIHOT 分数：51
- AIHOT 链接：https://aihot.virxact.com/items/cmqrh5dvw0k2aslp5gy4np9cw
- 原文链接：https://arxiv.org/abs/2606.19926

## AI 摘要

ReAct风格提示词在长时域移动GUI任务中因被动累积历史导致prompt膨胀和信息稀释。MemGUI-Agent引入ConAct机制，将上下文管理视为与UI动作同策略的一等动作，维护折叠动作历史、折叠UI状态和最近步骤记录三个结构化字段，保持上下文紧凑。基于2956条轨迹的MemGUI-3K数据集对8B模型进行监督训练，得到MemGUI-8B-SFT，在MemGUI-Bench上达到最优8B开放数据性能，并泛化到分布外MobileWorld基准。代码、数据和模型将开源。

## 正文

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.
