SEAL：智能体与学习环境的协同进化

2026-05-23 08:00·41天前

AI 摘要

针对大语言模型智能体自我进化中，策略与环境被单独优化导致的错位问题，本文提出SEAL框架。它构建了一个闭环协同进化系统：通过收集策略轨迹并诊断失败，将失败诊断作为共享信号，同时优化智能体的模型策略和训练环境。环境侧进化其学习接口，提供更明确的工具可用性提示；策略侧则利用诊断信息更新模型。实验表明，仅使用400个训练样本，SEAL在三种骨干网络上平均提升了8.25至26.25个点，并展现出跨领域迁移能力。

原文 · 未翻译

Large Language Model (LLM) agents are increasingly improved through interaction, yet most self-evolution methods adapt either the policy or the learning environment in isolation. We identify this structural gap as Agent-Environment Misalignment: the agent's capability frontier changes during training, while the environment that provides supervision remains static or only weakly coupled to the agent's revealed failures. We propose SEAL, a closed-loop co-evolution framework for interactive tool-use agents. SEAL collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both environment-side adaptation and model-side policy optimization. The environment evolves its training-time learning interface by exposing clearer tool affordance cues, constraint information, and recovery-oriented feedback, while the policy is updated with diagnosis-guided advantage reweighting. Extensive experiments across in-distribution and out-of-distribution multi-turn tool-use evaluations show that SEAL improves low-resource agent learning: with only 400 training samples, it yields +8.25 to +26.25 average-point gains across three backbones and exhibits positive out-of-distribution transfer. These results demonstrate the value of jointly adapting the learner and its training-time learning substrate for robust self-improving LLM agents.

HuggingFace Daily Papers（社区热门论文）

55导出 Markdown

SEAL：智能体与学习环境的协同进化

2026-05-23 08:00·41天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译