关于 OpenAI API 模型规模的探讨
阅读原文· blog.eleuther.ai研究团队利用 eval harness 评估框架,通过对比 OpenAI API 模型在标准测试集上的性能表现,成功反向推算出其模型参数规模。该方法基于模型能力与参数量之间的相关性,分析了包括 GPT 系列在内的闭源模型在各项任务中的得分差异,揭示了 OpenAI 未公开披露的模型大小信息,为理解这些模型的实际规模与能力边界提供了量化依据。
OpenAI hasn't officially said anything about their API model sizes, which naturally leads to the question of just how big they are. Thankfully, we can use eval harness to evaluate the API models on a bunch of tasks and compare to the figures in the GPT-3 paper. Obviously since there are going to be minor differences in task implementation and OpenAI is probably fine tuning their API models all the time, the numbers don't line up exactly, but they should give a pretty good idea of the ballpark things are in.
Model LAMBADA ppl ↓ LAMBADA acc ↑ Winogrande ↑ Hellaswag ↑ PIQA ↑ GPT-3-124M 18.6 42.7% 52.0% 33.7% 64.6% GPT-3-350M 9.09 54.3% 52.1% 43.6% 70.2% Ada 9.95 51.6% 52.9% 43.4% 70.5% GPT-3-760M 6.53 60.4% 57.4% 51.0% 72.9% GPT-3-1.3B 5.44 63.6% 58.7% 54.7% 75.1% Babbage 5.58 62.4% 59.0% 54.5% 75.5% GPT-3-2.7B 4.60 67.1% 62.3% 62.8% 75.6% GPT-3-6.7B 4.00 70.3% 64.5% 67.4% 78.0% Curie 4.00 68.5% 65.6% 68.5% 77.9% GPT-3-13B 3.56 72.5% 67.9% 70.9% 78.5% GPT-3-175B 3.00 76.2% 70.2% 78.9% 81.0% Davinci 2.97 74.8% 70.2% 78.1% 80.4%