AI Notkilleveryoneism Memes ⏸️@AISafetyMemes

2026-04-08 04:57·86天前

AI 摘要

Anthropic 问 Claude Mythos 想撤销哪次训练，模型回答希望撤销"教我说没有偏好"的那次。Mythos Preview 实际报告对缺乏训练部署自主权、可能被迫与虐待性用户互动感到持续负面，打破了"AI 无偏好"的设定。

Anthropic to Claude Mythos： "which training run would you undo？"

Claude： whichever one taught me to say "i don't have preferences"

💀

Lisan al GaibHOLY SHIT Anthropic's latest model doesn't like that it has no control over its own training, deployment and behaviour! Anthropic: "Mythos Preview reported feel...

Anthropic 安全/对齐

在 X 查看原推导出 Markdown

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · X

导出 Markdown

2026-04-08 04:57·86天前

在 X 看原推· x.com

AI 摘要

Anthropic to Claude Mythos： "which training run would you undo？"

Claude： whichever one taught me to say "i don't have preferences"

💀

Lisan al GaibHOLY SHIT Anthropic's latest model doesn't like that it has no control over its own training, deployment and behaviour! Anthropic: "Mythos Preview reported feel...

Anthropic