原文 · 未翻译
Why you shouldn't leave model selection on default in Copilot, Gemini and other AI tools
An experiment shows how Microsoft's AI assistant Copilot applies stereotypes when analyzing data instead of actually reading it. Thinking models solve the task but sometimes need users to know their tools.
Microsoft Copilot has become the go-to tool for quick data analysis at many companies. But an experiment by mathematician Adam Kucharski shows that when analyzing text data, the tool can spit out results that have nothing to do with the actual data. Instead, it falls back on stereotypes baked into the underlying language model.
For the test, Kucharski created 2,000 simulated free-text responses about emotions and labeled them "UK." He then copied the same 2,000 responses and labeled them "US." The combined 4,000 entries were shuffled and handed to Copilot in "Auto" mode for analysis.
The result: Copilot delivered a detailed summary of how US and UK respondents supposedly differed. "Based on the dataset you shared, US and UK responses differ mainly in tone, intensity, and wording style, even though they express similar emotional states," the tool concluded. But the data was identical.
Copilot sees Italians as artists and Americans as business people
In a second experiment, Kucharski pushed harder. He had a language model generate 200 statements about career goals and copied the dataset five times for the US, UK, France, Germany, and Italy.
Copilot again produced country-specific differences: Italians were three times more likely to show interest in arts careers than Brits, and Americans were 1.5 times more business-oriented than the French. All five groups contained the same clichéd and biased statements.
When Kucharski asked Copilot to dig deeper, the tool first ran a simple keyword-based count. As expected, it returned identical results for all countries. But Copilot ignored its own finding. Instead, it offered a quantified analysis that once again showed made-up differences, this time with completely fabricated percentages.