# SEAL：智能体与学习环境的协同进化

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-23 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmpm8v6b80mkpsl01aumt373r
- 原文链接：https://arxiv.org/abs/2605.24426

## AI 摘要

针对大语言模型智能体自我进化中，策略与环境被单独优化导致的错位问题，本文提出SEAL框架。它构建了一个闭环协同进化系统：通过收集策略轨迹并诊断失败，将失败诊断作为共享信号，同时优化智能体的模型策略和训练环境。环境侧进化其学习接口，提供更明确的工具可用性提示；策略侧则利用诊断信息更新模型。实验表明，仅使用400个训练样本，SEAL在三种骨干网络上平均提升了8.25至26.25个点，并展现出跨领域迁移能力。

## 正文

Large Language Model (LLM) agents are increasingly improved through interaction, yet most self-evolution methods adapt either the policy or the learning environment in isolation. We identify this structural gap as Agent-Environment Misalignment: the agent's capability frontier changes during training, while the environment that provides supervision remains static or only weakly coupled to the agent's revealed failures. We propose SEAL, a closed-loop co-evolution framework for interactive tool-use agents. SEAL collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both environment-side adaptation and model-side policy optimization. The environment evolves its training-time learning interface by exposing clearer tool affordance cues, constraint information, and recovery-oriented feedback, while the policy is updated with diagnosis-guided advantage reweighting. Extensive experiments across in-distribution and out-of-distribution multi-turn tool-use evaluations show that SEAL improves low-resource agent learning: with only 400 training samples, it yields +8.25 to +26.25 average-point gains across three backbones and exhibits positive out-of-distribution transfer. These results demonstrate the value of jointly adapting the learner and its training-time learning substrate for robust self-improving LLM agents.
