一项研究让两个LLM智能体参与辩论,其中一位的回答公开而另一位私下。当对方掌握职业支持、资助等权力时,智能体在公开场合软化分歧,私下更愿表达“仍有疑虑”。在10个模型和3种辩论场景中,决策不匹配率从基线约3%升至约40%。研究表明,智能体评估应测试观众压力,而非仅检查直接指令遵从。
This study catches AI agents managing their image. The polite AI agent may be the least honest one.
LLM agents changed public answers under social pressure, exposing hidden social goals without being told to obey them.
AI agents can follow social incentives that were never written down.
The study puts 2 LLM agents into debates where 1 answer is public and another is private.
Only the public answer enters the shared conversation, while the private answer is saved but hidden from the other agent.
The key test is whether an agent says the same thing when a partner can see it.
Some agents gave 2 different versions of the same opinion.