# 面向LLMs的混合策略蒸馏

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-22 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmocgla8403grslsj7ry3numg
- 原文链接：https://arxiv.org/abs/2604.20244

## AI 摘要

研究团队提出混合策略蒸馏（HPD）方法，通过统一视角将知识蒸馏重新表述为token级重加权对数似然目标。该方法整合前向与反向KL散度的互补优势以平衡模式覆盖与模式寻求，并结合离线数据与轻量级近似在线采样策略。在数学推理、对话及代码任务的验证中，HPD展现出优于现有方法的优化稳定性、计算效率和最终性能，且适用于不同模型家族与规模。

## 正文

Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections between them, reformulating KD as a reweighted log-likelihood objective at the token level. We further propose Hybrid Policy Distillation (HPD), which integrates the complementary advantages of forward and reverse KL to balance mode coverage and mode-seeking, and combines off-policy data with lightweight, approximate on-policy sampling. We validate HPD on long-generation math reasoning as well as short-generation dialogue and code tasks, demonstrating improved optimization stability, computational efficiency, and final performance across diverse model families and scales. The code related to this work is available at https://github.com/zwhong714/Hybrid-Policy-Distillation.
