Ethan Mollick@emollick

2026-06-28 12:54·5小时前

AI 摘要

针对AI研究论文因同行评审周期长导致结果过时的问题，一篇医疗AI论文开源其评估框架（GitHub: health-ai-readiness-eval）。@yishan 用该框架在最新模型上复现测试：GPT-5.5 Pro 在放射影像解读中得分79/100，优于论文原始最佳模型（69/100），但未达到论文设定的“适合可靠医疗使用”标准（需抗扰动、识别信息不足、给出临床合理推理）。@yishan 未能完整复现定性评估，但基本测试表明最新模型虽有提升，尚不足以可靠用于临床。他呼吁所有AI论文开源实验框架，以便社区持续验证。

Nice example of the increasing benefits of open science and transparent methodologies when writing papers about AI.

YishanA big problem with research studies on AI models is that given how long the peer review process is, the results are always out-of-date by the time the paper is ...

OpenAI多模态推理评测/基准

在 X 查看原推

Ethan Mollick@emollick · X

2026-06-28 12:54·5小时前

在 X 看原推· x.com

AI 摘要

Nice example of the increasing benefits of open science and transparent methodologies when writing papers about AI.

YishanA big problem with research studies on AI models is that given how long the peer review process is, the results are always out-of-date by the time the paper is ...

OpenAI多模态推理评测/基准

在 X 查看原推x.com