# 开源模型真实能力与评测表现存差距

- 来源：Ethan Mollick (@emollick)
- 发布时间：2026-05-30 22:55
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpshsl2001lnsluzlghm5q8x
- 原文链接：https://x.com/emollick/status/2060736941453189622

## AI 摘要

Epoch AI 使用其综合指标 Epoch Capabilities Index 测量发现，开源模型与闭源模型的能力差距平均约为三个月。但主推文作者对此表示怀疑，认为开源大语言模型的实际表现（尤其是在分布外任务上）比评测分数所显示的更为脆弱，真实的体感差距可能远不止三四个月。

## 正文

I think Epoch does a great job benchmarking， but I continue to believe that open weights models are much more fragile， especially out-of-distribution， than their benchmarks indicate. Vibe-wise， I don't think they were only 3 months behind last year or only 4 months behind today.

### 引用推文

> Epoch AI：We measure the gap using the Epoch Capabilities Index, our aggregate measure of model capability. Compared to our last analysis, the gap has widened slightly - ...
