Rohan Paul@rohanpaul_ai

2026-06-13 06:23·20天前

AI 摘要

《自然·医学》一项研究发现，通用大语言模型在经医生评审的临床任务上已超越专用医疗 AI 产品。研究对比了 OpenEvidence、UpToDate Expert AI 与 GPT-5.2、Gemini 3.1 Pro、Claude Opus 4.6 在医学考试题、医生风格回答及实时临床提问上的表现。在来自真实临床场景的 100 个脱敏医生问题中，盲审医生更偏好前沿模型，尤其在其回答的完整性和清晰度方面。

A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks.

The authors compared OpenEvidence and UpToDate Expert AI with GPT-5.2， Gemini 3.1 Pro， and Claude Opus 4.6 on medical exam questions， clinician-style answers， and real questions doctors asked during care.

In 100 de-identified physician questions from live clinical use， blinded clinicians again preferred the frontier models， especially on completeness and clarity，

Anthropic Google OpenAI 论文/研究

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

73导出 Markdown

2026-06-13 06:23·20天前

在 X 看原推· x.com

AI 摘要

A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks.

In 100 de-identified physician questions from live clinical use， blinded clinicians again preferred the frontier models， especially on completeness and clarity，