Lambert 指出,美国实验室用“蒸馏”一词掩盖了 API 劫持问题。中国实验室通过破解 API 获取推理痕迹,帮助在新领域引导推理行为。他认为 API 提供者很难完全防止劫持,因为推理模型本身倾向于输出推理痕迹,完全修补会降低模型智能。他呼吁实验室更透明地说明这一过程,以便开展知情政策讨论。
This isn't very true.
A big part of the problem is that the labs use the term distillation, which is a general post-training technique, in lieu of a specific issue of jailbreaking the API. (1)
There is a second debate of *how* impactful distillation is, but it is definitely helpful. (2) This is entirely based on how the Chinese labs are jailbreaking the APIs to get reasoning traces out, which help bootstrap reasoning behaviors in new domains.
There's a third point (3) which I take an excerpt from my recent piece, where the labs need to be more transparent why especially point (2) is true. From the third piece:
" On the point of distillation, my hypothesis is that API builders don't have an easy time preventing hacks or jailbreaking because it's a deeply grounded property of reasoning models to want to output the reasoning traces, and it would make the model far less intelligent to fully patch the behavior. This is based on a few assumptions: