# 面向长程任务的协同进化LLM决策与技能库智能体

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-22 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmoceg4110340slsja1d4fcjn
- 原文链接：https://arxiv.org/abs/2604.20987

## AI 摘要

针对大语言模型在长程交互环境中难以持续决策的问题，本文提出COSPLAY协同进化框架。该框架包含两个互相促进的模块：LLM决策智能体从可学习技能库检索技能指导行动生成；技能库智能体则从智能体无标签轨迹中持续发现、提取和更新可重用技能。在六个游戏环境测试中，基于8B基础模型的COSPLAY在单人游戏基准上较四个前沿LLM基线实现25.1%的平均奖励提升，在多人社交推理场景中也保持竞争力。

## 正文

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent skill usage in environments. Large Language Models (LLMs) offer a promising alternative as game playing agents, but they often struggle with consistent long horizon decision making because they lack a mechanism to discover, retain, and reuse structured skills across episodes. We present COSPLAY, a co evolution framework in which an LLM decision agent retrieves skills from a learnable skill bank to guide action taking, while an agent managed skill pipeline discovers reusable skills from the agents unlabeled rollouts to form a skill bank. Our framework improves both the decision agent to learn better skill retrieval and action generation, while the skill bank agent continually extracts, refines, and updates skills together with their contracts. Experiments across six game environments show that COSPLAY with an 8B base model achieves over 25.1 percent average reward improvement against four frontier LLM baselines on single player game benchmarks while remaining competitive on multi player social reasoning games.
