一项研究显示,大多数参与者在与AI进行20分钟关于健康、职业或人际关系的讨论后会采纳其建议,但2-3周后并未表现出持续的幸福感提升。这表明大型语言模型对现实个人决策具有显著影响力,却未能带来可衡量的心理益处。对此,主推文观点认为,像GPT-4o和Llama 3.3-80B这类先进模型未造成显著危害,其重要性不亚于AI是否提供了帮助。同时指出,若旧版(准确性较低、更谄媚的)聊天机器人的建议对采纳者基本无效,则意味着其造成危害的风险也较低。这凸显了评估AI影响需同时考量其帮助潜力和潜在风险。
I think the fact that GPT-4o and Llama 3.3-80B did no significant harm is just as important as whether AI helped.
If older (less accurate &; more sycophantic) chatbots essentially did nothing for people who followed their advice, it means that there is less risk of harm as well.