Rohan Paul@rohanpaul_ai

2026-06-15 23:48·17天前

AI 摘要

临床搜索工具 Heidi Evidence 表示，六周前其自研小模型在临床搜索任务中匹配了前沿规模模型 Sonnet 4.6 的质量。方法是通过临床医生的偏好反馈训练，而非单纯扩大模型规模。在匿名测试中，医生面对同一医学问题、两个匿名答案，选择 Heidi 小模型答案的概率为 49.9%。Heidi 指出，医学领域的关键难点在于知道何时搜索、引用什么、说多少，以及模糊答案何时比不回答更糟。

"You don't need frontier scale to reach frontier quality" in specialized domains， you need the right expert feedback loop.

Heidi says it matched Sonnet 4.6 in clinical search with a much smaller model trained on clinician preferences instead of raw scale.

Heidi Evidence is a clinical search tool where doctors ask medical questions and get sourced answers.

Here， clinicians were shown the same medical question with 2 anonymous answers， one from Heidi's smaller model and one from Sonnet 4.6， and they picked Heidi's answer 49.9% of the time.

In medicine specifically， the hard problem is knowing when to search， what to cite， how much to say， and when a vague answer is worse than no answer.

Tom KellyThere's been debate in the last couple days about whether general models beat specialized medical AI. It's the wrong question. This is an argument about how to ...

Anthropic 数据/训练评测/基准

在 X 查看原推

Rohan Paul@rohanpaul_ai · X

54导出 Markdown