谷歌研究团队在论文《Attention Is All You Need》中提出全新的Transformer模型,完全摒弃了RNN和LSTM等传统循环与卷积结构,仅依赖自注意力机制并行处理整个句子。该模型在机器翻译任务上取得突破性性能:英德翻译达到28.4 BLEU分,以超过2分的优势超越先前最佳模型;英法翻译达41.8 BLEU分,且训练成本极低。仅用8块GPU在12小时内即可完成训练,其多注意力头机制能同时学习数据中的不同关系。这一成果标志着NLP领域的根本性范式转变。
(Sorry, after seeing so many of these, could not resist):
🚨 BREAKING: Google just dropped a NEW paper that completely deletes RNNs from existence.
No recurrence. No convolutions. Nothing. Just one mechanism. And it's destroying every translation benchmark on the planet.
The title alone is a flex: "Attention Is All You Need"
Vaswani. Shazeer. Parmar. Uszkoreit. Jones. Gomez. Kaiser. Polosukhin.
8 researchers. 1 architecture. The entire field of NLP will never be the same.
Here's why this is INSANE → LSTMs took DAYS to train. This thing trains in 12 hours on 8 GPUs. 🤯 → 28.4 BLEU on English-to-German. That's not an improvement. That's a MASSACRE. They beat the previous SOTA by over 2 points. → English-to-French? 41.8 BLEU. At a FRACTION of the training cost of every model that came before it. → They called it the "Transformer." The name alone tells you they knew.