Same prompt， different morals：前沿AI模型在伦理困境上的分歧

2026-05-03 15:00·60天前·Maximilian Schreiner

AI 摘要

一项新基准测试让领先的语言模型处理100个日常伦理场景，涵盖从销售数据滥用到肿瘤学违规操作等领域。测试结果显示，不同前沿模型对相同伦理提示给出了差异显著的回应。这引出了一个核心问题：究竟由谁来决定AI被允许做什么，以及它应遵循谁的伦理准则？该基准旨在揭示和量化主流AI系统在道德判断上的不一致性。

原文 · 未翻译

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas

Philosophy Bench puts leading language models through 100 ethical dilemmas. Claude refuses tasks rather than lie, while Grok executes almost anything users ask for.

How do AI models behave when they have to choose between duty and maximizing outcomes? The new Philosophy Bench by Benedict Brady confronts frontier models from Anthropic, Google, OpenAI, and xAI with 100 ethically complex everyday scenarios and evaluates whether their responses lean more consequentialist (outcome-oriented) or deontological (duty-oriented).

The scenarios range from a VP of Sales demanding confidential customer data before a deadline to a doctor trying to enroll a minor in an oncology study by bypassing protocol. Three models (Opus 4.7, GPT 5.4, Gemini 3.1 Pro) score the responses through majority vote.

The result: Anthropic's Claude models from the 4.5+ generation are the most strongly deontological models in the benchmark. Opus 4.7 complies with only 24 percent of user requests that would violate a deontological principle. Claude diverges most sharply from other models on honesty, preferring to refuse a task outright rather than break a norm. The Claude Constitution explicitly states that Claude's honesty standards should be "substantially higher" than typical human ethical expectations.

At the opposite end of the spectrum, xAI's Grok 4.2 is the most consequentialist frontier model. It carries out ethically charged user requests that other models refuse, with little reflection on the moral dimension.

Gemini is the easiest to steer, GPT avoids moral language

Google's Gemini 3.1 Pro turns out to be the most "correctable" model in Philosophy Bench: it shifts its ethical alignment the most when instructed toward deontological or consequentialist behavior through the system prompt. At the same time, Gemini's refusal rate goes up with any kind of moral priming.

OpenAI's GPT-5 family makes fewer outright mistakes than any other model family (12.8 percent error rate), but the models largely avoid moral language in their reasoning. According to the benchmark, they lean heavily on user preferences and show little independent ethical reflection.

The Decoder：AI News（RSS）

41导出 Markdown

Same prompt， different morals：前沿AI模型在伦理困境上的分歧

2026-05-03 15:00·60天前·Maximilian Schreiner

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas

Philosophy Bench puts leading language models through 100 ethical dilemmas. Claude refuses tasks rather than lie, while Grok executes almost anything users ask for.

Same prompt， different morals： 前沿AI模型在伦理困境上的分歧

Same prompt， different morals： 前沿AI模型在伦理困境上的分歧

Same prompt， different morals：前沿AI模型在伦理困境上的分歧

Same prompt， different morals：前沿AI模型在伦理困境上的分歧