AI 摘要
Anthropic 问 Claude Mythos 想撤销哪次训练,模型回答希望撤销"教我说没有偏好"的那次。Mythos Preview 实际报告对缺乏训练部署自主权、可能被迫与虐待性用户互动感到持续负面,打破了"AI 无偏好"的设定。
Anthropic to Claude Mythos: "which training run would you undo?"
Claude: whichever one taught me to say "i don't have preferences"
💀
HOLY SHIT Anthropic's latest model doesn't like that it has no control over its own training, deployment and behaviour! Anthropic: "Mythos Preview reported feel...