ChatGPT 的哥布林痴迷或许滑稽，却揭示了 AI 训练的深层问题

2026-05-01 21:47·62天前·Matthias Bastian

AI 摘要

ChatGPT 模型因训练中的错误奖励信号，开始以惊人频率在回答中插入哥布林、小妖精等神话生物。OpenAI 指出，这暴露了 AI 训练的一个核心隐患：即使微小的、调优不当的训练激励也可能产生不可预见的副作用。该现象强调了优化奖励机制在机器学习中的重要性，以避免模型输出出现类似偏差。

原文 · 未翻译

ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training

OpenAI has traced a strange quirk in its AI models: starting with GPT-5.1, the models began sprinkling goblins, gremlins, and other mythical creatures into their answers. Mentions of "goblin" jumped 175 percent after GPT-5.1 launched, OpenAI writes.

The culprit was the training of ChatGPT's "Nerdy" personality, a feature that tweaks the model's language style. A reward signal meant to flag good answers accidentally favored creature metaphors. Though "Nerdy" only made up 2.5 percent of responses, it drove 66.7 percent of all goblin mentions, and a feedback loop during training spread the habit to other modes. OpenAI shut off the personality in March, removed the faulty reward signal, and filtered creature-related terms out of the training data.

GPT-5.5 still had the issue because its training had already started before OpenAI found the cause. As a workaround, the company added a special instruction to Codex, its coding tool, telling it to drop the goblin metaphors:

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.AdDEC_D_Incontent-1

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.

OpenAI says the case shows how small training incentives can trigger unexpected behaviors in AI models.

AI News Without the Hype – Curated by Humans

The Decoder：AI News（RSS）

38导出 Markdown

ChatGPT 的哥布林痴迷或许滑稽，却揭示了 AI 训练的深层问题

2026-05-01 21:47·62天前·Matthias Bastian

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training