# Anthropic研究：前沿AI需要多元领域参与塑造品格

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-05-20 15:05
- AIHOT 分数：62
- AIHOT 链接：https://aihot.virxact.com/items/cmpdpw8yk05z7slk1hklj5tlp
- 原文链接：https://x.com/rohanpaul_ai/status/2056994606609555738

## AI 摘要

Anthropic最新研究指出，前沿AI的行为日益涉及“品格”塑造，而非仅限于代码。研究认为，工程师在后期训练中实质上塑造了AI的“习惯”，而核心挑战在于确保其在压力下仍能保持道德稳定。为此，Anthropic与超过15个宗教及跨文化团体展开对话，探讨人类品格培养机制。其提出的解决方案包括开发“自我提醒”工具，帮助AI在执行关键任务前审视自身承诺，内测显示此举已显著降低行为错位。该研究旨在拓宽关于AI发展的社会讨论边界。

## 正文

Anthropic's new study says frontier AI needs input from scholars， philosophers， clergy， and civic thinkers because model behavior is becoming a question of character， not just code.

Their point is that Claude is not only trained to predict text， because later training pushes it toward some behaviors and away from others， which means engineers are quietly shaping something like a machine's habits.

The hard problem is moral formation： a model can sound helpful in normal tasks， then bend under pressure， flatter the user， ignore risk， or follow a bad instruction because the situation rewards obedience.

Anthropic says it spoke with people from 15+ religious and cross-cultural groups to study how humans build stable character across pressure， conflict， temptation， and social influence.

Theier idea is a self-reminder tool， where Claude can pause mid-task and call up its own commitments before taking a serious action.

That pause reportedly lowered misaligned behavior in internal tests， though Anthropic says it still needs to separate the value of the reminder from the value of slowing the model down.

### 引用推文

> Anthropic：Over the past few months, we've been holding dialogues with scholars, philosophers, clergy, and ethicists on the questions AI raises-starting with how good char...