研究人员通过询问不同难度知识问题,估计大型语言模型参数大小。结果显示,GPT 5.5约10T参数,Claude Opus 4.x约4-5T,Grok 4约3T。事实性知识容量与模型规模呈对数线性关系。论文提出7个知识层级,最高层级T7对所有模型接近零,表明预训练仍有显著提升空间。Gemini 3.1 Pro可能超过10T参数。此方法有助于推断模型训练成本及后训练在非事实性任务上的性能。
Researchers just estimated the size of all the LLMs by asking it knowledge questions of varying degrees of obscurity!
- GPT 5.5: ~10T params
- Claude Opus 4.x: ~4-5T
- Grok 4: ~3T
The idea here is that factual capacity scales log-linearly with size. The paper shows 7 knowledge tiers and T7 is essentially ~0% for all models, suggesting there is still significant headroom for pretraining. Gemini 3.1 Pro is likely >10T given its used as an anchor but has no direct estimate.
This means we can infer what different models might cost to some degree and their post-training effectiveness (performance at certain non-factual tasks given its size).
One of the coolest papers I've read of late.