ChatGPT 的哥布林痴迷或许滑稽,却揭示了 AI 训练的深层问题
阅读原文· the-decoder.comChatGPT 模型因训练中的错误奖励信号,开始以惊人频率在回答中插入哥布林、小妖精等神话生物。OpenAI 指出,这暴露了 AI 训练的一个核心隐患:即使微小的、调优不当的训练激励也可能产生不可预见的副作用。该现象强调了优化奖励机制在机器学习中的重要性,以避免模型输出出现类似偏差。
ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training
OpenAI has traced a strange quirk in its AI models: starting with GPT-5.1, the models began sprinkling goblins, gremlins, and other mythical creatures into their answers. Mentions of "goblin" jumped 175 percent after GPT-5.1 launched, OpenAI writes.
The culprit was the training of ChatGPT's "Nerdy" personality, a feature that tweaks the model's language style. A reward signal meant to flag good answers accidentally favored creature metaphors. Though "Nerdy" only made up 2.5 percent of responses, it drove 66.7 percent of all goblin mentions, and a feedback loop during training spread the habit to other modes. OpenAI shut off the personality in March, removed the faulty reward signal, and filtered creature-related terms out of the training data.