# 程序性记忆管理：LLM智能体的控制、适应与评估

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-22 08:00
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmr1zrwuy03ipsl8zhqumc1l1
- 原文链接：https://arxiv.org/abs/2606.23127

## AI 摘要

程序性记忆可帮助LLM智能体在重复工作中产生可复用技能，但其迁移能力尚不明确。AFTER基准包含382个真实企业任务，覆盖6种职业角色和22个程序性技能，评估跨任务、跨角色、跨模型的技能迁移。实验表明，单轮优化使整体性能提升3.7–6.7个百分点；基于多模型执行轨迹演化的技能在跨模型测试中达到73.1%准确率，优于所有单模型轨迹。部分技能可广泛泛化，另一些则专化于特定角色流程，迁移后效果下降。这些结果为生产级智能体平台构建和部署程序性记忆系统提供了实践指导。

## 正文

Procedural memory is increasingly used to improve LLM agents on recurring workplace tasks, yet its ability to produce reusable skills remains poorly understood. We introduce AFTER, a benchmark of 382 realistic enterprise tasks spanning six professional roles and 22 procedural skills, designed to evaluate how skills transfer across tasks, roles, and model backbones. The benchmark includes controlled evaluation settings for local improvement, cross-task transfer, cross-role transfer, and cross-model generalization. Experiments show that procedural memory delivers consistent gains in industrial workflows: a single refinement round improves aggregate performance by 3.7-6.7 points, while skills evolved from diverse multi-model execution traces achieve 73.1% cross-model test accuracy, outperforming all single-model trace sources. We further find that some skills generalize broadly across tasks and models, whereas others become specialized to role-specific workflows and lose effectiveness under transfer. These results provide practical guidance for building, evaluating, and deploying procedural memory systems in production agent platforms.