Google 最新的 Gemma 4 12B 模型已上线 Hugging Face,采用 Apache 2.0 许可证。该模型与 Gemma 4 E2B/E4B 共享相同多模态能力,支持文本、音频、图像和视频输入,无需单独编码器即可实现原生音频和视觉理解。这种无编码器统一设计方案使其部署体积更小,非常适合消费级设备和本地执行环境。官方称其旨在弥合边缘效率与高级推理之间的差距。
GOOGLE 🔥: A new Gemma 4 12B is now available on Huggingface under Apache 2.0 license!
Built with the same multimodal functionality as Gemma 4 E2B and E4B (text, audio, image, and video inputs), it brings native audio and vision understanding directly to local environments without the need for separate encoders.
This unified approach to multimodality makes the model encoder-free, offering a deployment size that is perfect for consumer devices and streamlined local execution.