# LLM医学诊断软肋：早期鉴别诊断能力不足

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-04-14 09:38
- AIHOT 链接：https://aihot.virxact.com/items/cmny26ux702g8sl9ogf7rymmu
- 原文链接：https://x.com/rohanpaul_ai/status/2043866379540148327

## AI 摘要

一项研究对21个LLM进行29个临床病例的阶梯式测试，发现其在医学诊断最困难环节——早期鉴别诊断（differential diagnosis）表现糟糕。面对不完整的零散症状，所有模型在80%以上的早期任务中失败，常过早消除不确定性而非列出多种可能病因。当病例数据补充检查发现和实验室结果后，失败率降至40%以下，最佳系统最终诊断准确率达90%。这揭示了当前AI在信息不全时的诊断可靠性仍有重大局限。

## 正文

AI chatbots are still poor at the hardest part of medicine： figuring out what might be wrong before the full picture is available.

The study tested 21 LLMs on 29 clinical cases revealed step by step， which matters because real diagnosis usually starts with scattered symptoms， not neat final answers.

The weak spot was differential diagnosis， which means listing several plausible causes early instead of locking onto 1 answer too fast.

When the case data was incomplete， all models failed on more than 80% of these early diagnostic tasks， showing that they often collapse uncertainty too early.

When fuller details such as exam findings and lab results were added， failure rates dropped below 40%， and the best systems passed 90% accuracy on the final diagnosis.

---

ft .com/content/b10002fc-5fff-4e4d-bf64-0502b2d09bb1？syn-25a6b1a6=1

The study

jamanetwork. com/journals/jamanetworkopen/fullarticle/2847679