# 通过轨迹重写保护语言模型免受未授权蒸馏

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-16 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo8sduop06s5slmlbo44xuqz
- 原文链接：https://arxiv.org/abs/2602.15143

## AI 摘要

研究团队提出一种通过重写推理轨迹防止语言模型被未授权蒸馏的方法。该技术在保持答案正确性的前提下，动态修改教师模型的推理输出，既能降低响应的训练价值以实现反蒸馏，又能嵌入可验证的API水印。实验表明，简单的指令重写方法即可在维持甚至提升模型性能的同时有效阻止知识窃取，且水印检测几乎零误报。相关代码已开源。

## 正文

Knowledge distillation is a widely adopted technique for transferring capabilities from LLMs to smaller, more efficient student models. However, unauthorized use of knowledge distillation takes unfair advantage of the considerable effort and cost put into developing frontier models. We investigate methods for modifying teacher-generated reasoning traces to achieve two objectives that deter unauthorized distillation: (1) anti-distillation, or degrading the training usefulness of query responses, and (2) API watermarking, which embeds verifiable signatures in student models. We introduce several approaches for dynamically rewriting a teacher's reasoning outputs while preserving answer correctness and semantic coherence. Two of these leverage the rewriting capabilities of LLMs, while others use gradient-based techniques. Our experiments show that a simple instruction-based rewriting approach achieves a strong anti-distillation effect while maintaining or even improving teacher performance. Furthermore, we show that our rewriting approach also enables embedding watermarks that can be reliably detected with essentially no false alarms. Our code is available at https://github.com/xhOwenMa/trace-rewriting.