# 基于双向进化搜索的大语言模型自我改进方法

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-27 08:00
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmpoxnk890a22slv4f8fx0gb9
- 原文链接：https://arxiv.org/abs/2605.28814

## AI 摘要

针对现有大语言模型搜索方法（如 best-of-N 采样、树搜索）依赖稀疏验证信号且探索局限于高概率区域的问题，本文提出了双向进化搜索框架。该框架耦合了前向候选进化和后向目标分解。前向搜索利用进化算子重组部分轨迹，以生成难以通过单次模型展开获得的候选方案；后向搜索则递归分解原始任务，产生密集的中间反馈以指导搜索。理论分析表明，仅靠展开的搜索受限于窄熵壳，而进化算子可使其跳出，后向搜索则能指数级减少找到正确答案所需的样本数。实验显示，在主流后训练算法失效的任务上，BES 能带来一致性能提升；在三个开源问题求解基准的推理测试中，BES 在平均与最佳性能上均优于现有开源框架。

## 正文

Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.
