Rohan Paul@rohanpaul_ai

2026-05-26 06:32·38天前

AI 摘要

Google最新论文指出，LLM的幻觉问题核心在于模型在该犹豫时仍表现确定，而非单纯事实错误。论文将优化目标从追求完美的事实准确性，转向让模型能诚实地区分“我确知”与“我猜测”。作者提出了“忠实不确定性”概念，要求模型的表述与其内部置信度相符。文章还引入了“效用税”概念，解释了为何产品倾向自信但可能错误的回答。对于智能体而言，元认知能力至关重要，它决定了何时调用工具、何时信任信息源。

New Google paper says LLMs should stop pretending certainty and instead clearly show when they are unsure.

Hallucination is less about machines being wrong than about machines sounding certain when they should hesitate.

That distinction changes the target-problem.

The paper changes the target from making models perfectly factual to making them honest about their own uncertainty.

For years， the obvious goal has been to make language models know more， so they make fewer factual mistakes.

Perfect factuality may be very hard， but a model that clearly separates "I know this" from "I am guessing" can stay useful without quietly damaging trust.

This paper argues that the harder missing skill is not knowledge， but self-knowledge.

A model can be well calibrated in the broad sense， knowing that answers like this are correct about 60% of the time， yet still fail to identify which particular answer is the dangerous one.

That is the trap： to eliminate errors， the system must refuse many answers that would have been right.

The authors call this the utility tax， and it explains why products keep drifting toward confident usefulness rather than cautious truth.

Here's the key point.

A wrong answer wrapped in honest uncertainty is not the same social object as a wrong answer delivered as fact.

It gives the user a different instruction： verify this， treat it as provisional， do not build too much on it.

The proposed fix is "faithful uncertainty，" where the model's language mirrors its internal confidence instead of smoothing doubt into authority.

For agents， this becomes even more important， because uncertainty is what should decide when to search， when to trust a source， and when to stop.

Tools expand what a model can access， but metacognition governs whether access is used wisely.

----

Paper Link - arxiv. org/abs/2605.01428v1

Paper Title： "Hallucinations Undermine Trust； Metacognition is a Way Forward"

Google 安全/对齐论文/研究

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

69导出 Markdown