Prompt-Level Distillation：无需微调的模型推理效率提升方法

2026-06-02 08:00·31天前

AI 摘要

提出 Prompt-Level Distillation (PLD)，从 Teacher 模型提取显式推理模式并组织为结构化指令列表，注入 Student 模型的 System Prompt。在 Gemma-3 4B 上，PLD 将 StereoSet Macro F1 从 57% 提升至 90.0%，Contract-NLI 从 67% 提升至 83%，LogiQA 准确率达 70%；在 Mistral Small 3.1 上取得相似结果，验证跨架构泛化能力。PLD 无需微调，推理延迟极低，决策过程透明可人工验证，适合法律、金融、内容审核等监管行业及高吞吐边缘设备。

原文 · 未翻译

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\% to 90.0\%) and Contract-NLI (67\% to 83\%), while increasing LogiQA accuracy to 70\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.

HuggingFace Daily Papers（社区热门论文）

53导出 Markdown

Prompt-Level Distillation：无需微调的模型推理效率提升方法

2026-06-02 08:00·31天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译