# Heidi Evidence 小模型匹配 Sonnet 4.6 临床搜索质量

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-15 23:48
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmqfewo3p00tysl2aiq2g8nbj
- 原文链接：https://x.com/rohanpaul_ai/status/2066548487093940514

## AI 摘要

临床搜索工具 Heidi Evidence 表示，六周前其自研小模型在临床搜索任务中匹配了前沿规模模型 Sonnet 4.6 的质量。方法是通过临床医生的偏好反馈训练，而非单纯扩大模型规模。在匿名测试中，医生面对同一医学问题、两个匿名答案，选择 Heidi 小模型答案的概率为 49.9%。Heidi 指出，医学领域的关键难点在于知道何时搜索、引用什么、说多少，以及模糊答案何时比不回答更糟。

## 正文

"You don't need frontier scale to reach frontier quality" in specialized domains， you need the right expert feedback loop.

Heidi says it matched Sonnet 4.6 in clinical search with a much smaller model trained on clinician preferences instead of raw scale.

Heidi Evidence is a clinical search tool where doctors ask medical questions and get sourced answers.

Here， clinicians were shown the same medical question with 2 anonymous answers， one from Heidi's smaller model and one from Sonnet 4.6， and they picked Heidi's answer 49.9% of the time.

In medicine specifically， the hard problem is knowing when to search， what to cite， how much to say， and when a vague answer is worse than no answer.

### 引用推文

> Tom Kelly：There's been debate in the last couple days about whether general models beat specialized medical AI. It's the wrong question. This is an argument about how to ...