《自然·医学》一项研究发现,通用大语言模型在经医生评审的临床任务上已超越专用医疗 AI 产品。研究对比了 OpenEvidence、UpToDate Expert AI 与 GPT-5.2、Gemini 3.1 Pro、Claude Opus 4.6 在医学考试题、医生风格回答及实时临床提问上的表现。在来自真实临床场景的 100 个脱敏医生问题中,盲审医生更偏好前沿模型,尤其在其回答的完整性和清晰度方面。
A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks.
The authors compared OpenEvidence and UpToDate Expert AI with GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on medical exam questions, clinician-style answers, and real questions doctors asked during care.
In 100 de-identified physician questions from live clinical use, blinded clinicians again preferred the frontier models, especially on completeness and clarity,