# 开源LLM与闭源LLM之间的差距

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：kkm
- 发布时间：2026-06-27 08:43
- AIHOT 分数：58
- AIHOT 链接：https://aihot.virxact.com/items/cmqvnobb70emrsl802qhq8m65
- 原文链接：https://blog.doubleword.ai/frontier-os-llm

## AI 摘要

在Artificial Analysis Intelligence Index上，开源LLM与闭源LLM的性能差距自2024年夏季开始持续缩小，线性外推预测到2026年12月3日差距降为零。但在全部18个不同基准上的平均差距几乎恒定，保持在不到5个月。编码基准的差距从15个月缩小至1-2个月，多数其他基准差距反而略有扩大。这一分析表明，LLM质量评测的单一基准可能误导结论，整体差距并未显著缩小。

## 正文

Interactive plot of the Artificial Analysis Intelligence Index for open and closed frontier models.

I have seen a version of the above plot going around Twitter and wanted to dig a bit deeper into it. What the plot above is showing is the gap between open weights LLMs and closed source LLMs. We measure this gap by looking at the frontier of performance of open weights LLMs on a benchmark and then looking back into the past how long ago was the closed source frontier at that level. It is a measure of how long it took for open source models to catch up to the new capabilities reached by the closed source model frontier. This benchmark is the Artificial Analysis Intelligence Index - their headline index that tries to assess the overall capabilities of models. In general it correlates quite well with the ‘vibe’ people seem to get from models.

You can see that around summer 2024 the gap on this benchmark starts to shrink, and has been reliably shrinking since then. If you plot a line of best fit and extend it into the future you find that the gap shrinks to 0 months around December 3rd 2026 - 6 months or so from the time of writing.

Now is probably a good time to liquidate your pension, fly to a remote island somewhere, and live out the remaining 6 months or so of civilization in peace.

…

Except.

This might not be the whole picture. This is only a single benchmark, and doesn’t give a complete picture of the capabilities of LLMs. Kindly, Artificial Analysis gives us access to 18 different benchmarks that they have measured for these models. I have repeated the analysis for all the 18 different benchmarks and I have summarized them in the plot below:

Interactive boxplot of monthly open frontier lag across Artificial Analysis metrics.

For each of the 18 datasets we have created a similar chart. You can see all 18 at the bottom of the page. At each month we have created a box plot of the gap for each dataset. We have then plotted all the box plots over time. We have also calculated the average of the gaps across datasets, and calcuated a line of best fit for that. That line is almost completely flat, at just under 5 months for the entire period.

What is notable is that a large amount of the total improvement of models has been in the coding benchmark. The coding index has gone from 15 months behind to only a month or two behind. Most other datasets have a moderate increase over time in their gaps.

So maybe the open source apocalypse won’t happen yet.

What this exercise does suggest is the difficulty of measuring LLM quality. Depending on how you measure it you would predict the open source singularity by Christmas, or you would say that open source LLMs are consistently 5 months behind close source, and that the gap might be growing.

Interactive frontier plot for artificial analysis intelligence index.
