# SePO：自演化提示智能体用于系统提示优化

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-03 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmq0dttfu05awsltrwcgtnft0
- 原文链接：https://arxiv.org/abs/2606.04465

## AI 摘要

SePO提出自指设计，单个提示智能体同时优化任务智能体及自身的系统提示，通过开放式演化搜索维护候选提示档案。训练分两阶段：预训练（多任务池演化）与微调（目标任务）。在数学（AIME'25）、抽象推理（ARC-AGI-1）、研究生科学（GPQA）、代码生成（MBPP）和数独五个基准上，SePO一致超越Manual-CoT、TextGrad和MetaSPO，平均准确率较Manual-CoT提升4.49个百分点。预训练习得的提示优化技能可泛化至未见任务。

## 正文

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet leave the prompt agent's own system prompt hand-engineered and fixed. We propose Self-Evolving Prompt Optimization (SePO), which treats the prompt agent's own system prompt as an optimization target alongside task agents' system prompts. SePO adopts a self-referential design. A single prompt agent improves both task agents' system prompts and its own under an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones. Training proceeds in two stages: pre-training evolves the prompt agent on a multi-task pool, and fine-tuning then applies it to a target task. Across five benchmarks spanning math (AIME'25), abstract reasoning (ARC-AGI-1), graduate-level science (GPQA), code generation (MBPP), and logic puzzles (Sudoku), SePO consistently outperforms Manual-CoT, TextGrad, and MetaSPO, improving the average accuracy by 4.49 points compared to Manual-CoT. The prompt optimization skill from pre-training also generalizes to tasks beyond the pre-training mixture, rather than memorizing per-task prompts.
