Wisedocs 发布 Medical Long Context Reasoning (MLCR) 基准,测试 LLM 对真实医疗档案的长文档推理能力。评测包含 250 个问题,横跨 6 个难度等级,另设私有保留集,涵盖复杂医学推理、幻觉检测及单次查询中的并行提问。Wisedocs 同步开源 10 个合成病例、低三级问题及评估工具。Artificial Analysis 将合作上线该基准。
Introducing MLCR, a novel Medical Long Context Reasoning benchmark. Our eval measures the ability of LLMs to answer real...