# Gemma 4 12B 开源：无编码器多模态，支持文本/音频/图像/视频

- 来源：🚨 AI News | TestingCatalog (@testingcatalog)
- 发布时间：2026-06-04 00:08
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmpya2m4o035vslax8kvjl2rb
- 原文链接：https://x.com/testingcatalog/status/2062204758929051756

## AI 摘要

Google 最新的 Gemma 4 12B 模型已上线 Hugging Face，采用 Apache 2.0 许可证。该模型与 Gemma 4 E2B/E4B 共享相同多模态能力，支持文本、音频、图像和视频输入，无需单独编码器即可实现原生音频和视觉理解。这种无编码器统一设计方案使其部署体积更小，非常适合消费级设备和本地执行环境。官方称其旨在弥合边缘效率与高级推理之间的差距。

## 正文

GOOGLE 🔥： A new Gemma 4 12B is now available on Huggingface under Apache 2.0 license！

> Built with the same multimodal functionality as Gemma 4 E2B and E4B （text， audio， image， and video inputs）， it brings native audio and vision understanding directly to local environments without the need for separate encoders.

> This unified approach to multimodality makes the model encoder-free， offering a deployment size that is perfect for consumer devices and streamlined local execution.

### 引用推文

> Google Gemma：Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache...