GPT-2：过于危险，不宜发布（2019）

2026-06-10 04:10·23天前·AbuAssar

AI 摘要

2019年，GPT-2模型因被认为过于危险，未被公开完整发布。

原文 · 未翻译

GPT-2 is a direct scale-up of GPT-1, with more parameters and trained on more data. However, it was deemed too dangerous to release by OpenAI:

Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. OpenAI Blog – Better Language Models and Their Implications

GPT-1 was released to the public without such serious concerns. Therefore, the above claim made the public wonder how powerful GPT-2 must be in generating texts that look like humans wrote.

Moreover, what’s the difference between GPT-1 and GPT-2?

1 The Difference: GPT-1 vs. GPT-2

In the GPT-1 paper, they experimented with the model on zero-shot task transfer in that they used the pre-trained model with heuristic solutions to perform specific tasks. The experiment’s success suggests that without supervised fine-tuning, the language model already contains information required to perform specific tasks. All that knowledge is stored in network parameters (weights and biases).

In other words, more parameters should increase the capacity of the language model and make it more robust to those specific tasks. In this sense, fine-tuning simply adds the final touch to the model for a specific task, and therefore the main thing that makes GPT-1 great is the pre-training.

So, pre-training such a model with more parameters should improve the model’s performance further. Hence, GPT-2 is a direct scale-up of GPT-1, with more parameters and trained on more data. As such, GPT-1 and GPT-2 are not different in terms of architecture. Both are based on the transformer’s decoder.

However, their main difference is the number of parameters and the amount and variety of training texts that allows the neural network to acquire more language knowledge and understanding and absorb them into its parameters.

The larger model of GPT-2 (that was not released in February 2019) has 1.5 billion parameters, 10 times more than GPT-1. They trained the model on 40GB of web texts and achieved state-of-the-art results on various language modeling, reading comprehension, question answering, and summarization benchmarks.

2 GPT-2: 1.5B Release

The GPT-2 paper explains that there are four configurations of GPT-2.

Hacker News 热门（buzzing.cc 中文翻译）

30导出 Markdown