# Google 发布 Gemma 4 QAT 检查点，最小模型从 11.4GB 压缩至 1.1GB

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-06 07:34
- AIHOT 分数：68
- AIHOT 链接：https://aihot.virxact.com/items/cmq1l5wmm0gsmsltrqw0ci1n9
- 原文链接：https://x.com/rohanpaul_ai/status/2063041793827176568

## AI 摘要

Google 发布 Gemma 4 的 QAT（量化感知训练）检查点，将最小模型从 11.4GB 缩小至 1.1GB（纯文本版 0.84GB），便于手机和笔记本运行。常规 PTQ（训练后量化）因模型未学会应对舍入而损伤质量；QAT 在训练中模拟压缩，让模型在权重被挤压时学习，压缩版不易丢失推理能力。Google 还构建了移动端优化格式，包含静态激活、通道量化、定向 2-bit 量化及 KV 缓存优化，减少手机缩放计算并防止长对话过快消耗内存。

## 正文

Google just made Gemma 4 much easier to run on phones and laptops by releasing QAT （Quantization-Aware Training） checkpoints that shrink the smallest model from 11.4GB to 1.1GB， or 0.84GB for text-only use.

Normal PTQ （Post-Training Quantization.） compresses after training and can damage quality because the model never learned to survive that rounding.

QAT fixes this by simulating compression during training， so Gemma 4 learns while its weights are being squeezed， making the final compressed model less likely to lose reasoning quality.

Google also built a mobile-focused format with static activations， channel-wise quantization， targeted 2-bit quantization， and KV cache optimization， which means the phone does less scaling work， stores some token-generation parts more aggressively， and keeps long chats from eating memory too fast.