Deedy@deedydas

2026-06-25 11:15·7天前

AI 摘要

一项可解释性研究发现：Pangram 在内部表示中学会区分 Claude、ChatGPT 和 Gemini 的写作风格，即使未经专门训练。该信号在网络中逐渐增强，通过简单线性探针即可达到 91% 准确率。主推文据此总结三点：所有 AI 模型写作与人类差异极大；不同 AI 模型间写作风格迥异；"人性化" AI 文本仍可被区分。

We learn 3 things from this： 1. All AI models write extremely differently from humans 2. AI models write in very different ways from each other 3. "Humanized" AI text is distinguishable from both

Coolest interpretability result in AI I've read today.

Elyas MasrourDid you know? Pangram learns the difference between Claude, ChatGPT, and Gemini in its internal representations, even without being trained on it! This signal i...

安全/对齐数据/训练

在 X 查看原推导出 Markdown

Deedy@deedydas · X

51导出 Markdown