2FFS:面向随机Minimax树的双保真度最优动作识别算法
阅读原文· arxiv.org针对深度极小极大搜索与蒙特卡洛树搜索(MCTS)中启发式评估廉价但有偏、准确rollout可靠但昂贵的权衡,提出2FFS,一种双保真度树搜索算法。该算法将多保真度平坦bandit思想引入树结构,结合minimax式快速扩展与MCTS式随机采样,自适应决定何时利用廉价评估、何时调用昂贵准确评估。理论证明固定置信度正确性与有限终止性,并给出多项式深度成本上界。数值实验表明,相比现有BAI-MCTS基线,2FFS所需样本和计算操作显著更少。
We study fixed-confidence best-action identification (BAI) in stochastic minimax trees. This problem is increasingly relevant in modern AI planning, where deep minimax search and Monte Carlo Tree Search (MCTS) with language model long rollouts face a fundamental tradeoff: heuristic evaluations are cheap but biased, while accurate rollouts are reliable but prohibitively expensive. We propose 2FFS, a two-fidelity tree-search algorithm that brings multi-fidelity flat bandit ideas into trees. The algorithm combines minimax-style fast expansion with MCTS-style stochastic sampling, adaptively deciding when to exploit cheap biased evaluations and when to invoke expensive accurate evaluations for local certification. We prove fixed-confidence correctness, establish finite stopping for exact identification, and give a polynomial-depth cost upper bound for general-depth trees. Across numerical stochastic-tree experiments, 2FFS uses substantially fewer samples and computational operations comparing to existing BAI-MCTS baseline.