UI-KOBE：面向轻量级图谱引导 GUI 智能体的知识导向行为探索

2026-05-28 08:00·36天前

AI 摘要

为解决轻量级移动端GUI智能体在端到端规划上的不足，本文提出UI-KOBE框架。该框架通过自主探索移动应用，构建包含UI状态节点与转换边的应用知识图谱。运行时，轻量级智能体可利用该图谱作为外部引导，结合用户任务与当前屏幕截图，在多种候选动作中进行选择。此方法减轻了轻量级模型进行端到端规划的负担，使其能更有效地执行任务，并兼顾效率、可解释性与隐私保护。

原文 · 未翻译

Recent advances in mobile GUI agents have shown strong potential for automating mobile tasks, but most effective systems still depend on large vision-language models for screenshot understanding and long-horizon planning. Small GUI agents that can be deployed directly on mobile devices are more attractive for practical use, offering lower inference cost and better protection of sensitive on-device information. However, due to limited model capacity, such lightweight agents remain unreliable when planning and executing GUI tasks end-to-end from screenshots alone. We propose Knowledge-Oriented Behavior Exploration (UI-KOBE), a framework that improves lightweight mobile GUI agents with reusable app-specific graph knowledge. UI-KOBE first autonomously explores a mobile application and constructs an app knowledge graph, where nodes represent distinct UI states and edges represent executable transitions. At runtime, a lightweight GUI agent uses the graph as external guidance: given a user task and the current screenshot, it identifies the current graph node and selects among self-loop actions, neighboring transitions, task completion, or fallback free actions associated with that node. By supporting runtime decisions with app-specific graph guidance, UI-KOBE reduces the burden of end-to-end GUI planning and helps lightweight models perform mobile GUI tasks more effectively, offering a practical step toward efficient, interpretable, and privacy-conscious on-device GUI agents.

HuggingFace Daily Papers（社区热门论文）

61导出 Markdown

UI-KOBE：面向轻量级图谱引导 GUI 智能体的知识导向行为探索

2026-05-28 08:00·36天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译