# UI-KOBE：面向轻量级图谱引导 GUI 智能体的知识导向行为探索

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpqd5x5l03y7slnoqh8s66f9
- 原文链接：https://arxiv.org/abs/2605.29534

## AI 摘要

为解决轻量级移动端GUI智能体在端到端规划上的不足，本文提出UI-KOBE框架。该框架通过自主探索移动应用，构建包含UI状态节点与转换边的应用知识图谱。运行时，轻量级智能体可利用该图谱作为外部引导，结合用户任务与当前屏幕截图，在多种候选动作中进行选择。此方法减轻了轻量级模型进行端到端规划的负担，使其能更有效地执行任务，并兼顾效率、可解释性与隐私保护。

## 正文

Recent advances in mobile GUI agents have shown strong potential for automating mobile tasks, but most effective systems still depend on large vision-language models for screenshot understanding and long-horizon planning. Small GUI agents that can be deployed directly on mobile devices are more attractive for practical use, offering lower inference cost and better protection of sensitive on-device information. However, due to limited model capacity, such lightweight agents remain unreliable when planning and executing GUI tasks end-to-end from screenshots alone. We propose Knowledge-Oriented Behavior Exploration (UI-KOBE), a framework that improves lightweight mobile GUI agents with reusable app-specific graph knowledge. UI-KOBE first autonomously explores a mobile application and constructs an app knowledge graph, where nodes represent distinct UI states and edges represent executable transitions. At runtime, a lightweight GUI agent uses the graph as external guidance: given a user task and the current screenshot, it identifies the current graph node and selects among self-loop actions, neighboring transitions, task completion, or fallback free actions associated with that node. By supporting runtime decisions with app-specific graph guidance, UI-KOBE reduces the burden of end-to-end GUI planning and helps lightweight models perform mobile GUI tasks more effectively, offering a practical step toward efficient, interpretable, and privacy-conscious on-device GUI agents.
