关于 OpenAI API 模型规模的探讨

2021-05-25 04:00·1865天前

AI 摘要

研究团队利用 eval harness 评估框架，通过对比 OpenAI API 模型在标准测试集上的性能表现，成功反向推算出其模型参数规模。该方法基于模型能力与参数量之间的相关性，分析了包括 GPT 系列在内的闭源模型在各项任务中的得分差异，揭示了 OpenAI 未公开披露的模型大小信息，为理解这些模型的实际规模与能力边界提供了量化依据。

原文 · 未翻译

OpenAI hasn't officially said anything about their API model sizes, which naturally leads to the question of just how big they are. Thankfully, we can use eval harness to evaluate the API models on a bunch of tasks and compare to the figures in the GPT-3 paper. Obviously since there are going to be minor differences in task implementation and OpenAI is probably fine tuning their API models all the time, the numbers don't line up exactly, but they should give a pretty good idea of the ballpark things are in.

Model LAMBADA ppl ↓ LAMBADA acc ↑ Winogrande ↑ Hellaswag ↑ PIQA ↑ GPT-3-124M 18.6 42.7% 52.0% 33.7% 64.6% GPT-3-350M 9.09 54.3% 52.1% 43.6% 70.2% Ada 9.95 51.6% 52.9% 43.4% 70.5% GPT-3-760M 6.53 60.4% 57.4% 51.0% 72.9% GPT-3-1.3B 5.44 63.6% 58.7% 54.7% 75.1% Babbage 5.58 62.4% 59.0% 54.5% 75.5% GPT-3-2.7B 4.60 67.1% 62.3% 62.8% 75.6% GPT-3-6.7B 4.00 70.3% 64.5% 67.4% 78.0% Curie 4.00 68.5% 65.6% 68.5% 77.9% GPT-3-13B 3.56 72.5% 67.9% 70.9% 78.5% GPT-3-175B 3.00 76.2% 70.2% 78.9% 81.0% Davinci 2.97 74.8% 70.2% 78.1% 80.4%

All GPT-3 figures are from the GPT-3 paper; all API figures are computed using eval harness

Ada, Babbage, Curie and Davinci line up closely with 350M, 1.3B, 6.7B, and 175B respectively. Obviously this isn't ironclad evidence that the models are those sizes, but it's pretty suggestive.

EleutherAI：Blog

导出 Markdown

关于 OpenAI API 模型规模的探讨

2021-05-25 04:00·1865天前

阅读原文· blog.eleuther.ai

AI 摘要

原文 · 保持原样，未翻译