Claude 的新模型在"出错时更'诚实'"
阅读原文· theverge.comAnthropic 在周四发布了其最新模型 Claude Opus 4.8。新模型在生成错误内容时,更倾向于主动标示不确定性,并减少做出无根据的断言。在内部评估中,其产出未经证实断言的可能性比前代模型降低约 4 倍。
AI
News
Anthropic
Claude’s new model is more ‘honest’ when it messes up
Opus 4.8 is launching on Thursday.
Opus 4.8 is launching on Thursday.
Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model’s “honesty.”
According to Anthropic, it trains “all [its] models to be honest — for instance, to avoid making claims that they can’t support.” But it notes that “a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence.”
The AI lab claims that early testers have found that Opus 4.8 “is more likely to flag uncertainties about its work and less likely to make unsupported claims.” In the company’s evaluations, Opus 4.8 is “around 4x less likely than its predecessor to allow flaws in code it’s written to pass unremarked.”