关键要点:OpenAI昨日为Codex推出了从交互中打包技能的类似功能;论文提出三阶段流水线(GUI轨迹分割→聚类候选技能→训练技能感知策略)。聚类纯度优异(5/8簇达0.95以上),但可读性未迁移:GRPO仅将技能步骤准确率从18.5%提至20.5%,在BrowseComp+上无改善,甚至输给简单频率先验。作者指出三个缺陷:弱边界检测器、无序片段表示、离线奖励模型。
// Automating SKILL.md Generation //
Increasingly, mining sessions is one of the best ways to improve your agents.
OpenAI released something similar yesterday that lets Codex package skills from interactions.
(bookmark it)
This paper explains a related approach.
They run a three-stage pipeline that segments GUI trajectories, clusters them into candidate skills, and trains a skill-aware policy.
The clusters are genuinely readable, with five of eight hitting 0.95 or higher purity against ground-truth workflow labels.
But readability does not transfer. GRPO lifts skill-step accuracy only from 18.5% to 20.5%, leaves BrowseComp+ flat, and loses to trivial frequency priors.
The authors name the three culprits: a weak boundary detector, an orderless segment representation, and an offline reward model.
Paper: https://arxiv.org/abs/2606.20363
Learn to build effective AI agents in our academy: https://academy.dair.ai/