# EvoTrainer：为自主智能体 RL 共同进化 LLM 策略与训练端工具

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-02 08:00
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmq937l3v087eslld327n3mew
- 原文链接：https://arxiv.org/abs/2606.03108

## AI 摘要

EvoTrainer 是一个自主训练框架，通过经验反馈共同进化 LLM 策略与训练端工具。它诊断 rollout 级证据、修正诊断、回测干预并积累可复用技能。在数学推理、竞赛编程代码生成和仓库级软件工程评估中，EvoTrainer 在相同数据、代码库和协议下匹配或超越人工设计的 RL 参考，最大收益在长 horizon SWE 任务上。轨迹分析显示，保留的策略跨领域发散，进化的诊断阻止无效高分分支被提升，可复用技能塑造后续搜索。

## 正文

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.