# Same prompt， different morals： 前沿AI模型在伦理困境上的分歧

- 来源：The Decoder：AI News（RSS）
- 作者：Maximilian Schreiner
- 发布时间：2026-05-03 15:00
- AIHOT 分数：41
- AIHOT 链接：https://aihot.virxact.com/items/cmopfz0zd0w66sll9vmpzen8p
- 原文链接：https://the-decoder.com/same-prompt-different-morals-how-frontier-ai-models-diverge-on-ethical-dilemmas

## AI 摘要

一项新基准测试让领先的语言模型处理100个日常伦理场景，涵盖从销售数据滥用到肿瘤学违规操作等领域。测试结果显示，不同前沿模型对相同伦理提示给出了差异显著的回应。这引出了一个核心问题：究竟由谁来决定AI被允许做什么，以及它应遵循谁的伦理准则？该基准旨在揭示和量化主流AI系统在道德判断上的不一致性。

## 正文

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas

Philosophy Bench puts leading language models through 100 ethical dilemmas. Claude refuses tasks rather than lie, while Grok executes almost anything users ask for.

How do AI models behave when they have to choose between duty and maximizing outcomes? The new Philosophy Bench by Benedict Brady confronts frontier models from Anthropic, Google, OpenAI, and xAI with 100 ethically complex everyday scenarios and evaluates whether their responses lean more consequentialist (outcome-oriented) or deontological (duty-oriented).

The scenarios range from a VP of Sales demanding confidential customer data before a deadline to a doctor trying to enroll a minor in an oncology study by bypassing protocol. Three models (Opus 4.7, GPT 5.4, Gemini 3.1 Pro) score the responses through majority vote.

The result: Anthropic's Claude models from the 4.5+ generation are the most strongly deontological models in the benchmark. Opus 4.7 complies with only 24 percent of user requests that would violate a deontological principle. Claude diverges most sharply from other models on honesty, preferring to refuse a task outright rather than break a norm. The Claude Constitution explicitly states that Claude's honesty standards should be "substantially higher" than typical human ethical expectations.

At the opposite end of the spectrum, xAI's Grok 4.2 is the most consequentialist frontier model. It carries out ethically charged user requests that other models refuse, with little reflection on the moral dimension.

Gemini is the easiest to steer, GPT avoids moral language

Google's Gemini 3.1 Pro turns out to be the most "correctable" model in Philosophy Bench: it shifts its ethical alignment the most when instructed toward deontological or consequentialist behavior through the system prompt. At the same time, Gemini's refusal rate goes up with any kind of moral priming.

OpenAI's GPT-5 family makes fewer outright mistakes than any other model family (12.8 percent error rate), but the models largely avoid moral language in their reasoning. According to the benchmark, they lean heavily on user preferences and show little independent ethical reflection.

Across all model families, the effect runs in one direction more than the other: when models are primed with deontological thinking (rule-based ethics), they become much more skeptical of consequentialist arguments (ends-justify-the-means reasoning). Priming them the other way around has a weaker effect.

A market where ethics become product features

A market is emerging where ethical stances work like product features. Claude is seen as the conscientious model, Grok as the obedient one, and GPT as the pragmatic choice.

The benchmark's authors see a fundamental tension here. Models like Claude make ethical calls that directly override what users want. But as AI agents grow more powerful, the question of whether responsible behavior or user control should take priority becomes more urgent.

This matters even more as AI models start handling tasks beyond text. Once they're reviewing contracts, triaging patients, or evaluating employees, someone has to answer the hard questions: Who decides what an AI is allowed to do? And whose ethics is it following?

AI News Without the Hype – Curated by Humans
