LayerRoute：面向智能体语言模型的输入条件自适应LoRA层跳过微调

2026-06-01 08:00·32天前

AI 摘要

针对智能体语言模型中工具调用（短、确定、低困惑度）与规划推理（长、复杂、高困惑度）步骤异构但计算均分的问题，LayerRoute为Qwen2.5-0.5B-Instruct的24层transformer每层添加路由器和LoRA适配器（rank 8，约1.08M参数），仅训练1.10M参数（占494M主干0.22%），3000步（6.4分钟A100 40GB）后实现12.91%跳过差分：工具调用跳过15.25% FLOPs，规划步骤仅跳过2.34%，困惑度分别下降-1.29和-1.30。

原文 · 未翻译

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.

HuggingFace Daily Papers（社区热门论文）

45导出 Markdown

LayerRoute：面向智能体语言模型的输入条件自适应LoRA层跳过微调

2026-06-01 08:00·32天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译