Google 通过多令牌预测将 Gemma 4 提速三倍

2026-05-07 00:05·57天前·Matthias Bastian

AI 摘要

Google 为其 Gemma 4 开源模型家族发布了多令牌预测模块，可将文本生成速度提升高达三倍。该技术通过一个小型辅助模型一次性预测多个令牌，再由主模型单次检查完成验证，从而显著提高了推理效率。

原文 · 未翻译

Google speeds up Gemma 4 threefold with multi-token prediction

Google has released multi-token prediction drafters (MTP) for its open AI model family Gemma 4, designed to speed up text generation by up to three times. LLMs normally generate text one token at a time, loading billions of parameters from memory at each step. The processor's computing core spends most of its time just waiting for data, Google says.

The company's new MTP technology tackles this bottleneck. While the main model waits for its data, a small auxiliary model uses the idle capacity to suggest several tokens at once. The main model then checks all those suggestions in a single pass—if they're correct, they get accepted at once. The smaller model is just filling time that would otherwise go to waste, so the same text gets produced faster with no loss in quality or accuracy, according to Google.

The speedup works on smartphones, local computers, and cloud applications. The drafters are available under the open Apache 2.0 license on Hugging Face and Kaggle. Google's Gemma 4 open-weight model, introduced in early April, has already been downloaded over 60 million times.

AI News Without the Hype – Curated by Humans

The Decoder：AI News（RSS）

68导出 Markdown

Google 通过多令牌预测将 Gemma 4 提速三倍

2026-05-07 00:05·57天前·Matthias Bastian

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

Google speeds up Gemma 4 threefold with multi-token prediction