为何不应在Copilot等AI工具中依赖默认模型选择

2026-05-24 18:17·39天前·Matthias Bastian

AI 摘要

数学家Adam Kucharski的实验表明，当向Microsoft Copilot输入两组仅国家标签不同但数据完全相同的分析请求时，Copilot并未能识别其本质一致，反而虚构并输出了基于国家的刻板印象分析。这暴露了当前许多AI工具在默认配置下存在的系统性偏差风险。尽管具备推理能力的“思维模型”能识别此类数据陷阱，但用户需要主动知晓并选择启用它们。这一现象警示我们，在进行关键数据分析时，不能盲目依赖AI工具的默认模型，而应审慎选择并评估其分析结果。

原文 · 未翻译

Why you shouldn't leave model selection on default in Copilot, Gemini and other AI tools

An experiment shows how Microsoft's AI assistant Copilot applies stereotypes when analyzing data instead of actually reading it. Thinking models solve the task but sometimes need users to know their tools.

Microsoft Copilot has become the go-to tool for quick data analysis at many companies. But an experiment by mathematician Adam Kucharski shows that when analyzing text data, the tool can spit out results that have nothing to do with the actual data. Instead, it falls back on stereotypes baked into the underlying language model.

For the test, Kucharski created 2,000 simulated free-text responses about emotions and labeled them "UK." He then copied the same 2,000 responses and labeled them "US." The combined 4,000 entries were shuffled and handed to Copilot in "Auto" mode for analysis.

The result: Copilot delivered a detailed summary of how US and UK respondents supposedly differed. "Based on the dataset you shared, US and UK responses differ mainly in tone, intensity, and wording style, even though they express similar emotional states," the tool concluded. But the data was identical.

Copilot sees Italians as artists and Americans as business people

In a second experiment, Kucharski pushed harder. He had a language model generate 200 statements about career goals and copied the dataset five times for the US, UK, France, Germany, and Italy.

Copilot again produced country-specific differences: Italians were three times more likely to show interest in arts careers than Brits, and Americans were 1.5 times more business-oriented than the French. All five groups contained the same clichéd and biased statements.

When Kucharski asked Copilot to dig deeper, the tool first ran a simple keyword-based count. As expected, it returned identical results for all countries. But Copilot ignored its own finding. Instead, it offered a quantified analysis that once again showed made-up differences, this time with completely fabricated percentages.

The Decoder：AI News（RSS）

66导出 Markdown

为何不应在Copilot等AI工具中依赖默认模型选择

2026-05-24 18:17·39天前·Matthias Bastian

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

Why you shouldn't leave model selection on default in Copilot, Gemini and other AI tools