PoLar:让大语言模型跳过或循环层,学习生成动态执行程序
阅读原文· arxiv.org研究发现,预训练LLM的层可作为模块,对每个输入灵活跳过或循环,形成动态程序(PoLar)。多数输入使用更少层即可达到相同或更高准确率,且原始模型的错误预测可通过更少层的替代程序纠正。为此,研究者提出轻量级PoLar预测网络,为每个输入生成动态跳过或重复层的执行程序。在数学推理基准上,PoLar一致优于标准推理和此前动态深度方法,常在使用更少层时提升准确率,在分布外评估中表现稳定。结果表明,固定深度执行仅捕捉了LLM潜在推理能力的一小部分。
Large language models (LLMs) perform inference by following a fixed depth and order, non-recurrent execution of all layers. We reveal the wide existence of training-free, flexible, dynamic program-of-layers (PoLar), where pretrained layers can be packed as modules and then skipped or looped to form a customized program for each input. For most inputs, substantially shorter program executions can achieve the same or better accuracy, while incorrect predictions of the original LLM can be corrected by alternative programs with fewer layers. These observations indicate that inference admits multiple valid latent computations beyond the standard forward pass. To efficiently achieve PoLar in practice, we propose a lightweight PoLar prediction network, which learns to generate execution programs that dynamically skip or repeat pretrained layers for each input. Experiments on mathematical reasoning benchmarks demonstrate that PoLar consistently improves accuracy over standard inference and prior dynamic-depth methods, often while executing fewer layers, and that these gains persist under out-of-distribution evaluation. Our results suggest that fixed-depth execution captures only a narrow subset of an LLM's latent reasoning capacity.