# LayerRoute：面向智能体语言模型的输入条件自适应LoRA层跳过微调

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-01 08:00
- AIHOT 分数：45
- AIHOT 链接：https://aihot.virxact.com/items/cmq59qpet05vpslt2ajglenx2
- 原文链接：https://arxiv.org/abs/2606.01838

## AI 摘要

针对智能体语言模型中工具调用（短、确定、低困惑度）与规划推理（长、复杂、高困惑度）步骤异构但计算均分的问题，LayerRoute为Qwen2.5-0.5B-Instruct的24层transformer每层添加路由器和LoRA适配器（rank 8，约1.08M参数），仅训练1.10M参数（占494M主干0.22%），3000步（6.4分钟A100 40GB）后实现12.91%跳过差分：工具调用跳过15.25% FLOPs，规划步骤仅跳过2.34%，困惑度分别下降-1.29和-1.30。

## 正文

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.
