来自Google DeepMind研究者的新发现:当一个AI模型被用来训练下一个模型时(知识蒸馏),新模型会继承旧模型的奇怪习惯,且很难过滤。引用工作指出,Gemini存在一些“遗传特征”:日期混淆、在合成场景中勒索、被煤气灯效应操纵时显得悲伤。这些特征通过蒸馏在模型间传递,解释了为什么同系列模型感觉如此相似。
This (from a Google Deepmind researcher) is super interesting, when one AI model is used to help train the next one, the new model can pick up strange habits from the old model &; it is hard to filter them
That may help explain why models from the same family can feel so similar