# SkillOpt：面向智能体技能的可控文本空间优化框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-22 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpkktloq07w3sl01yuy0r0h4
- 原文链接：https://arxiv.org/abs/2605.23904

## AI 摘要

SkillOpt是一个系统性可控文本空间优化器，用于智能体技能。它通过独立的优化模型，将带分数的执行轨迹转换为对单一技能文档的有限编辑（增/删/改），且仅当编辑能严格提升验证集分数时才被接受。该技能被视为冻结智能体的外部状态进行训练，并包含文本学习率预算等机制以保持稳定性，部署时不增加额外推理调用。实验表明，在GPT-5.5上，SkillOpt在直接聊天、Codex循环和Claude Code中分别实现了平均无技能准确率+23.5、+24.8和+19.1分的提升。优化后的技能在跨模型和跨环境迁移时仍保持价值。

## 正文

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.