Artificial Analysis@ArtificialAnlys · 6月6日52

Google’s newly released open weights model, Gemma 4 12B, supports transcription but is far from the frontier, scoring 8.8% on AA-WER (#58) Gemma 4 12B is the latest release from @GoogleDeepMind in the Gemma 4 family. With a score of 8.8% on AA-WER, it is able to capture a reasonable amount of conversation context, but underperforms compared to transcription-focused open weights models like Voxtral Mini Transcribe 2 (3.6% WER, with 4B parameters) and slightly larger open weights language models like Voxtral Small (2.8% WER, with 12B parameters). The new model launched alongside their local dictation app, Eloquent, available on MacOS and iOS. Gemma 4 12B is the largest in the Gemma 4 family to support transcription, alongside Gemma 4 E4B and Gemma 4 E2B, with Gemma 4 31B and Gemma 4 26B A4B supporting text, image and video input only. These models are available on a variety of platforms including Hugging Face, Ollama and LMStudio. We are currently running Gemma 4 12B through the full Artificial Analysis Intelligence Index and will share results soon.

译Google DeepMind 发布开源权重模型 Gemma 4 12B，支持语音转录，在 AA-WER 基准上得分为 8.8%（排名第 58），远低于专注转录的开源模型 Voxtral Mini Transcribe 2（4B 参数，WER 3.6%）和 Voxtral Small（12B 参数，WER 2.8%）。该模型是 Gemma 4 系列中支持转录的最大型号（另有 E4B、E2B），而 31B 和 26B A4B 仅支持文本、图片和视频输入。Google 同步推出本地听写应用 Eloquent（MacOS/iOS）。模型已在 Hugging Face、Ollama 和 LMStudio 上架。

Rohan Paul@rohanpaul_ai · 6月6日68

Google just made Gemma 4 much easier to run on phones and laptops by releasing QAT (Quantization-Aware Training) checkpoints that shrink the smallest model from 11.4GB to 1.1GB, or 0.84GB for text-only use. Normal PTQ (Post-Training Quantization.) compresses after training and can damage quality because the model never learned to survive that rounding. QAT fixes this by simulating compression during training, so Gemma 4 learns while its weights are being squeezed, making the final compressed model less likely to lose reasoning quality. Google also built a mobile-focused format with static activations, channel-wise quantization, targeted 2-bit quantization, and KV cache optimization, which means the phone does less scaling work, stores some token-generation parts more aggressively, and keeps long chats from eating memory too fast.

译Google 发布 Gemma 4 的 QAT（量化感知训练）检查点，将最小模型从 11.4GB 缩小至 1.1GB（纯文本版 0.84GB），便于手机和笔记本运行。常规 PTQ（训练后量化）因模型未学会应对舍入而损伤质量；QAT 在训练中模拟压缩，让模型在权重被挤压时学习，压缩版不易丢失推理能力。Google 还构建了移动端优化格式，包含静态激活、通道量化、定向 2-bit 量化及 KV 缓存优化，减少手机缩放计算并防止长对话过快消耗内存。

Rohan Paul@rohanpaul_ai · 6月6日48

Today’s edition of my newsletter just went out. 🔗 https://www.rohan-paul.com/p/anthropic-just-disclosed-that-claude 🗞️ Anthropic says 80% of its new production code is now authored by Claude 🗞️ New Google paper shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70% 🗞️ Google’s new open source Gemma 4 12B can analyze audio and video while running fully locally on a consumer 16GB GPU 🗞️ Alibaba’s Qwen3.7-Plus supports text, video, and image inputs at a low price of $0.4/$1.6 per 1M tokens, though it remains proprietary. 🗞️ Anthropic’s new chemistry report has a genuinely wild result.

译Anthropic 称其 80% 的新生产代码由 Claude 编写。Google 新论文显示，通用 LLM 通过规划证明与逐步验证，将形式数学求解性能从低于 10% 提升至 70%。Google 开源 Gemma 4 12B，可在消费级 16GB GPU 上本地运行，支持音频和视频分析。通义千问发布 Qwen3.7-Plus，支持文本、视频、图像输入，价格 $0.4/$1.6 每百万 token，闭源。Anthropic 新化学报告有惊人结果。

NotebookLM@NotebookLM · 6月6日31

Ok, it’s probably about time we changed the game. Stay tuned 👀

译好吧，大概是时候我们改变游戏规则了。敬请期待👀

Rohan Paul@rohanpaul_ai · 6月6日78

Anthropic previously committed to paying SpaceX $1.25B per month for GPU compute. With the new Google cloud deal that was disclosed today, adding $920mil monthly, the two AI labs (Google + Anthropic) together are now collectively paying SpaceX $2.17B per month, a huge $26 billion annualized revenue run rate. To note, Alphabet also has made a huge gain from backing SpaceX. Google invested about $900M in SpaceX in Jan-2015, for roughly 7%, when SpaceX’s valuation was around $12 B. And SpaceX now targets a $1.75T IPO valuation. A reported 6.11% Google stake at 12-25 would be worth about $107B at $1.75T, while a diluted 5% stake would be worth about $87.5B . Against a roughly $900M entry cost, that implies around 97x to 119x on paper, before taxes, lockups, dilution, or any discount investors apply after trading starts. The business shift is also important: Alphabet first backed SpaceX in 2015, partly for satellite internet, but the upside now includes Starlink, launch dominance, and newly filed AI compute contracts.

译Anthropic此前承诺每月向SpaceX支付12.5亿美元GPU算力。Google新披露的云服务协议每月再付9.2亿美元（年化约110亿美元），两家AI实验室合计月付21.7亿美元，年化营收达260亿美元。Alphabet于2015年以约9亿美元入股SpaceX约7%，当前SpaceX目标IPO估值1.75万亿美元，其6.11%股份对应约1070亿美元，投资回报约97-119倍。业务转向显示AI算力正成为战略商品。

Rohan Paul@rohanpaul_ai · 6月6日77

SpaceX just disclosed a new Cloud Service Agreement with Google. Google to pay SpaceX $920 million a month (about $11B a year) for compute capacity at xAI data centers Shows again AI compute is becoming a strategic commodity like launch capacity or energy, and the companies that can finance, power, cool, and operate giant GPU fleets may gain leverage far outside their original business.

译SpaceX 刚刚披露了一份与 Google 的新云服务协议。 Google 将每月向 SpaceX 支付 9.2 亿美元（约合每年 110 亿美元），用于 xAI 数据中心的计算能力。这再次表明，AI 算力正成为一种战略性商品，就像发射能力或能源一样，而那些能够为庞大的 GPU 集群提供资金、电力、冷却和运营的公司，可能会在其原有业务之外获得巨大的杠杆优势。

Emad@EMostaque · 6月6日73

This single deal is about the revenue of @CoreWeave to put it in perspective @SpaceX is the largest neocloud & its AI cloud revenue at $26b run rate is actually at the level of Google Cloud & AWS already, catching up to Azure ($37b run rate)

译SpaceX作为最大neocloud，其AI云收入年运行率已达260亿美元，与Google Cloud和AWS相当，正逼近Azure（370亿美元）。据SpaceX修订的S-1文件披露，其与谷歌签署大额协议：2026年10月至2029年6月每月9.2亿美元，双方可提前90天通知终止。Emad Mostaque指出，这一交易规模相当于CoreWeave的整个收入。

Chubby♨️@kimmonismus · 6月6日71

Google DeepMind released new Gemma 4 QAT models that make the model family much more efficient for local, on-device use. Using Quantization-Aware Training, the models are trained with compression in mind, which reduces memory needs while preserving more quality than standard post-training quantization. The release includes support for the popular Q4_0 format and a new mobile-specialized quantization format. Gemma 4 E2B can now run with around 1GB of memory (!), and the text-only version can even require less than 1GB (!). That makes local AI on phones, laptops, edge devices, and consumer GPUs far more practical. Really cool to see.

译Google DeepMind 发布 Gemma 4 QAT 量化感知训练模型，专为本地 / 设备端优化。通过量化感知训练减少内存占用，同时相比标准训练后量化保留更多质量。支持 Q4_0 格式及新的移动专用量化格式。Gemma 4 E2B 版本可运行于约 1GB 内存，纯文本版本甚至低于 1GB，使手机、笔记本、边缘设备和消费级 GPU 上的本地 AI 更实用。

Logan Kilpatrick@OfficialLoganK · 6月6日12

We are exploring doing a Google Summer of Building to help students, early career builders, and more get the most out of AI tools. Does this sound cool to you and should we do it?

译我们正在探索做一个Google Summer of Building，以帮助学生、早期职业开发者等更好地利用AI工具。你觉得这听起来酷吗？我们应该做吗？

Google AI@GoogleAI · 6月6日78

Here’s this week’s shipping recap 👇 — Nano Banana 2 & Nano Banana Pro are now GA and available via the Gemini Enterprise Agent Platform, Gemini API, and in @GoogleAIStudio —Co-Scientist, our new multi-agent system for structured scientific thinking, generates and refines novel hypotheses to solve complex scientific problems — dreambeans from @GoogleLabs works overnight to curate a personalized daily collection of topics that are relevant to you based on your connected Google apps — @GoogleGemma 4 12B, our unified encoder-free model, brings powerful multimodal intelligence straight to your laptop fully offline — Gemma 4 models and their drafters are now optimized with Quantization-Aware Training (QAT) to cut memory requirements and maximize on-device performance — @GoogleMagenta RealTime 2 is our open-weights live music model that you can play like an instrument using a MIDI keyboard, text prompts, and gestures

译Google AI 本周发布多项更新：Nano Banana 2 及 Pro 正式 GA，可通过 Gemini Enterprise Agent Platform、Gemini API 和 Google AI Studio 获取；Co-Scientist 多智能体系统面向科研自动生成优化新假设；Google Labs 推出 dreambeans，根据用户 Google 应用数据每日生成个性化话题集；Gemma 4 12B 统一无编码器多模态模型可完全离线运行于笔记本；Gemma 4 系列及草稿模型引入 QAT 降低内存需求；Google Magenta RealTime 2 开源实时音乐模型，支持 MIDI 键盘、文本提示和手势演奏。

Google AI Developers@googleaidevs · 6月6日72

New @GoogleGemma 4 QAT (Quantization-Aware Training) checkpoints are here, so you can run models locally on consumer GPUs and mobile devices with minimal quality loss. What’s new: 🔹 GGUF (Q4_0): Checkpoints: Max local performance across all sizes and drafter models 🔹 Custom Mobile Schema: We shrunk Gemma 4 down to less than 1GB for mobile devices by using a custom mixed precision schema designed for edge hardware (featuring targeted 2-bit decoding layers, optimized KV caches, and static activations) By simulating compression during training rather than after (Post-Training Quantization), we've drastically reduced the memory footprint and accelerated decode speeds while preserving reasoning quality. https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/

译谷歌发布 Gemma 4 量化感知训练 (QAT) 检查点，支持在消费级 GPU 和移动设备上本地运行，质量损失极小。新检查点提供 GGUF（Q4_0）格式，覆盖所有尺寸及起草模型，实现最佳本地性能。自定义移动模式采用混合精度方案，将 Gemma 4 压缩至 1GB 以下，包含 2-bit 解码层、优化 KV 缓存和静态激活。通过在训练中模拟压缩（而非训练后量化），大幅降低内存占用并加速解码，同时保持推理质量。

Google Gemini@GeminiApp · 6月6日81

You can now create and edit images directly in Gemini Live. Whether testing out room decor, getting help with math, or creating shareable memes, it all happens in real-time. Just open the Gemini app, tap the Live button, share your camera, and tell Gemini what you want to see.

译你现可直接在 Gemini Live 中创建和编辑图像。无论是测试房间装饰、解决数学问题，还是制作可分享的梗图，所有操作都实时完成。只需打开 Gemini 应用，点击 Live 按钮，共享摄像头，告诉 Gemini 你想看到的。

fofr@fofrAI · 6月5日62

Today I'm experimenting with Gemini 3.5 Flash and the Antigravity CLI to see how fast and how autonomously the agents can do things. - It took 20 minutes to install and run the original CompVis Stable Diffusion 1.5 repo, get the weights, debug, run inference and generate an image on a Linux CPU. It fixed every crash and managed dependencies while making changes to run on a CPU - I gave it the original Lora and SD papers and asked it to make a lora fine tuner from first principles, with a set of 10 images. That took about 1h30, most of the time being slow training runs on the CPU, but it did optimize for multiple CPUs. It worked, it made a lora that showed a likeness and then it wanted to hill climb. I told it to think of the poor CPUs - I wanted to experiment with the new Ideogram v4 weights. It used modal to find the right class of GPU, get the code, set up the env, get the weights, run inference, that took about 20 mins in total

译fofrAI 使用 Gemini 3.5 Flash 和 Antigravity CLI 实验 AI 智能体的自主性和速度。结果：20 分钟内在 Linux CPU 上安装并运行原版 Stable Diffusion 1.5，完成推理生成图像；基于 Lora 和 SD 论文，用 10 张图片从零实现 Lora 微调器（约 1 小时 30 分，主要为 CPU 训练）；通过 modal 约 20 分钟找到 GPU、获取 Ideogram v4 权重并运行推理。该推文展示了当前长周期智能体任务的基线案例。

Chubby♨️@kimmonismus · 6月5日26

Living in the EU be like: (hey @Google , any ETA for us living in the EU?)

译我们正推出 Search profiles，一种让发布者和创作者塑造其在搜索中形象的新方式。Search profiles 是一个专用的、可分享的空间，用于突出社交媒体、视频和新闻平台上的内容，帮助受众在搜索中找到关于来源的准确、最新信息。（欧盟用户：@Google，有没有上线时间表？）

Josh Woodward@joshwoodward · 6月5日72

Love this Gemini feature on my macOS app!

译Josh Woodward 喜欢 macOS 版 Gemini 应用的这个功能。同时按下两个 Command ⌘ 键，即可将当前活动窗口无缝附加到聊天中，无需手动截图或切换标签页。

fofr@fofrAI · 6月5日67

First frame now in Omni

译让图片动起来。上传图片作为首帧，添加提示词，用Gemini Omni Flash生成专属视频。主推文表示Omni现已支持首帧功能。

🚨 AI News | TestingCatalog@testingcatalog · 6月5日51

GOOGLE 🔥: A new Troubleshooting mode has been spotted on Gemini. In this mode, Gemini will explain troubleshooting process via text responses and interactive widgets. Even though it is working and available, it still looks like an unintended release and might get reverted soon. Models for Troubleshooting 👀

译GOOGLE 🔥：在 Gemini 上发现了一种新的故障排除模式。在此模式下，Gemini 将通过文本回复和交互式小部件解释故障排除过程。尽管它已经可以工作并且可用，但它看起来仍然像是意外发布，可能很快会被回滚。用于故障排除的模型 👀

Rohan Paul@rohanpaul_ai · 6月5日70

Another great paper from Google. Shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70%. A general LLM failed badly when asked to write full formal proofs in 1 try, but became much stronger when it planned, split the work into smaller claims, reused past claims, and learned from Lean’s feedback. The paper shows the weakness was not just the model’s math ability, but the way it was being used - the absence of structured interaction with a verifier. The key idea is that the model does not try to write one giant perfect proof at once, because that usually fails on long and tricky problems. Instead, LEAP stores the proof as a graph of goals and subgoals, so useful lemmas can be reused instead of rediscovered every time. The authors tested LEAP on Putnam 2025 and a new Lean benchmark built from 60 IMO-style problems, where ordinary one-shot proof writing did very poorly. LEAP solved all 12 Putnam 2025 problems and raised general LLM performance on the Lean IMO benchmark from under 10% to 70%. ---- Link – arxiv. org/abs/2606.03303 Title: "LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks"

译Google 新论文 LEAP 提出智能体框架，通过规划证明、分解子目标、复用已有引理并利用 Lean 验证器反馈，将通用 LLM 在形式化数学证明上的性能从不到 10% 提升至 70%。传统单次完整证明在长难题上表现极差，而 LEAP 将证明存储为有向图结构，先规划再逐步验证。在 Putnam 2025 竞赛中，LEAP 成功解出全部 12 道题；在包含 60 道 IMO 风格题目的 Lean 基准测试中，也实现了上述性能跃升。

Google Gemini@GeminiApp · 6月5日73

Get tailored help for what's on your screen using the Gemini app for macOS. 💻 Simply press both Command ⌘ keys at the same time to seamlessly attach your active window to the chat, without needing to take manual screenshots or switch tabs.

译使用适用于 macOS 的 Gemini 应用，获取针对屏幕内容的定制帮助。💻 只需同时按下两个 Command ⌘ 键，即可将当前活动窗口无缝附加到聊天中，无需手动截图或切换标签页。

NotebookLM@NotebookLM · 6月5日68

Today we’re launching another highly requested feature: Source Attribution! 🥳 No more guessing. Now you can see the exact formula (prompts + sources) used to make each of your artifacts. Want to make an adjustment? Just tap "Iterate" and customize to your heart’s content 💖

译今天，我们推出又一项呼声很高的功能：来源归属！🥳 无需再猜测。现在你可以看到每个创作物背后所用的确切公式（提示词 + 来源）。想要调整？只需轻点“迭代”，随心定制 💖

Chubby♨️@kimmonismus · 6月5日66

That’s so cool! I love the creativity of those guys. An open model for live music generation only 2.4B parameters. If you are bored on long flights you can now start creating bangers

译那太酷了！我爱这些家伙的创意。一个仅2.4B参数的开放模型，用于实时音乐生成。如果你在长途飞行中无聊，现在可以开始创作神曲了。

Google AI Developers@googleaidevs · 6月5日70

Play our new open-weights music model, @GoogleMagenta RealTime 2, using a MIDI keyboard, live text prompts, and even hand gestures ✌️ https://x.com/GoogleMagenta/status/2062589313372594538

译Google AI for Developers 宣布推出开放权重的实时音乐模型 Magenta RealTime 2 (MRT2)。该模型可通过 MIDI 键盘、实时文本提示甚至手势进行演奏。MRT2 在 MacBook 上原生运行，延迟低于 200ms，提供开放权重、开源推理引擎以及配套应用和插件套件。

NotebookLM@NotebookLM · 6月5日60

PRO TIP: Gamify your notebooks Don't just read your notes— investigate them. Our new Sherlock Holmes notebook turns studying into an interactive mystery game. Deduce facts, uncover clues, & prove that even the most complex matters can be elementary. ➡️ https://goo.gle/Sherlock

译专业技巧：将笔记本游戏化不要只是阅读笔记——去调查它们。我们全新的福尔摩斯笔记本将学习变成一款互动侦探游戏。推理事实，发现线索，证明即使是最复杂的问题也能迎刃而解。 ➡️ https://goo.gle/Sherlock

Google Gemini@GeminiApp · 6月5日60

See how easy it is to bring your wildest ideas to life with Gemini Omni. Just select "Create videos" in Gemini, add text, video, or up to five images, and let your imagination run wild.

译看看用 Gemini Omni 将最疯狂的创想变为现实有多简单。只需在 Gemini 中选择“Create videos”，添加文字、视频或多达五张图片，然后尽情释放你的想象力。

Google AI Developers@googleaidevs · 6月4日47

Join @GoogleDeepmind and @HeyGen on June 11th! Our LA event for builders working at the intersection of AI agents, creative tooling, and multimodal apps is now open for registration 👇 https://x.com/HeyGen/status/2062256762867388748

译加入 @GoogleDeepmind 和 @HeyGen，6月11日！我们面向AI智能体、创意工具和多模态应用开发者的洛杉矶活动现已开放注册 👇 https://x.com/HeyGen/status/2062256762867388748

Jeff Dean@JeffDean · 6月4日75

Check out our Gemma 4 12B model: it's a super capable open weights model that can run directly on your laptop.

译来看看我们的 Gemma 4 12B 模型：它是一个功能非常强大的开源权重模型，可以直接在你的笔记本电脑上运行。

meng shao@shao__meng · 6月4日50

2026 年 6 月 18 日起 Gemini CLI 和 Gemini Code Assist 的部分免费/消费级接入将停止服务，但企业版和付费 API key 接入并不受这次变化影响。 Gemini CLI 我都还没用过 🤦🏻‍♀️

译2026年6月18日起，Gemini CLI和Gemini Code Assist的部分免费/消费级接入将停止服务，但企业版和付费API key接入不受影响。推文作者表示尚未使用过Gemini CLI。

Josh Woodward@joshwoodward · 6月4日25

These are so fun!

译这些太有趣了！我们当前最喜欢的 Gemini Omni 趋势：使用真实世界镜头创造意想不到的转折。试试自己做一个！🧵

Berryxia.AI@berryxia · 6月4日66

端侧模型的能力还是在被放大！ Gemma 4 12B和Google AI Edge彻底打通，现在直接在笔记本上就能跑100% on-device的Agentic workflow。 Mac用户最爽：AI Edge Gallery直接生成代码，AI Edge Eloquent支持语音输入然后实时编辑文本，两者都是全新上线。底层用LiteRT-LM把Gemma 4 12B本地Serve起来，整个过程零网络、零延迟、数据全在自己机器里。以前我们总觉得12B模型本地跑agentic任务还差得远，结果Google这次把模型、推理引擎、开发工具链一次性打包好，让普通开发者在笔记本上就能把AI当成真正私有的、可连续执行的本地队友。这其实把行业当前最主流的路径直接反转了。大家都在卷云端更大模型、更低延迟，现在Google却用12B本地模型告诉你：真正的生产力跃迁，是把agentic能力彻底下沉到设备端，让AI成为你操作系统的一部分。

译谷歌将 Gemma 4 12B 与 Google AI Edge 深度整合，开发者可在笔记本上运行 100% on-device 的 Agentic workflow。Mac 用户新增两款工具：AI Edge Gallery 直接生成代码，AI Edge Eloquent 支持语音输入并实时编辑文本。底层通过 LiteRT-LM 本地 serve 模型，实现零网络、零延迟、数据完全留在设备端。Google 将模型、推理引擎和开发工具链打包，让开发者拥有私有的、可连续执行的本地 AI 队友。

Berryxia.AI@berryxia · 6月4日70

我擦@！我发现现在Apple的MLX框架和模型都可以Day0发布了？这看来是同步进行操作的，MLX框架以及和模型厂商直接第一时间进行了对接啊！强烈建议Mac的同学直接上MLX框架的模型，速度一般至少10-20%还是有的。

小互@xiaohu · 6月4日71

Google 发布 Gemma 4 12B 开源模型 16GB 笔记本跑全模态 AI Gemma 4 12B 采用了一种叫"Unified"的无编码器架构，让文字、图像、音频、视频四种输入直接进入同一个 Transformer 主干网络处理。模型可直接处理原始的图像和声音用一个类比讲清楚传统多模态模型处理图片和音频的方式，类似于一个只会中文的老板配了两个翻译：一个英文翻译（视觉编码器），一个日文翻译（音频编码器）。每次有英文或日文材料进来，必须先让翻译转成中文，老板才能看懂。翻译本身占工位（显存），翻译过程要排队等（延迟），而且老板拿到的是翻译加工过的版本，不是原文。 Gemma 4 12B 做的事情是：把两个翻译都裁了，让老板自己学会了直接看英文和日文。几个关键数字： 16GB 显存或统一内存能跑，4-bit 量化低到 8GB，目标就是在普通笔记本上本地运行 256K Token 上下文窗口，支持 140+ 种语言内置 Thinking 模式（逐步推理）和原生 Function Calling

译Google 发布 Gemma 4 12B 开源模型，采用无编码器 Unified 架构，可直接处理文本、图像、音频、视频，无需独立编码器。16GB 显存可运行，4-bit 量化后低至 8GB。支持 256K token 上下文、140+ 语言，内置 Thinking 模式和 Function Calling。

Berryxia.AI@berryxia · 6月4日69

Google 昨晚发布Gemma 4 12B 多模态的大模型，至少需要16G 内存就可以运行。应该和Qwen 的模型进行对比其效果如何～

SemiAnalysis@SemiAnalysis_ · 6月4日57

With the introduction of the TPUv8t, their new training focused TPU, Google unveiled a new scale-out network architecture called Virgo. Virgo is able to interconnect up to 134,400 chips with up to 47 Pbps of non-blocking bi-sectional bandwidth. (1/4)🧵

译随着TPUv8t（其新型训练专用TPU）的推出，Google公布了一种名为Virgo的全新横向扩展网络架构。Virgo能够将多达134,400个芯片互联，提供高达47 Pbps的无阻塞双向带宽。(1/4)🧵

Rohan Paul@rohanpaul_ai · 6月4日45

Sergey Brin joined Rocky Yu for an unscripted fireside chat On topics like superintelligence: or how specialized scientific models for tasks like fusion reactor management and protein folding are converging into general purpose models like Gemini.

译Sergey Brin 与 Rocky Yu 进行了一场即兴炉边谈话话题涉及超级智能：或讨论如何将用于聚变反应堆管理、蛋白质折叠等任务的专门科学模型，汇聚成类似 Gemini 这样的通用模型。

Sundar Pichai@sundarpichai · 6月4日73

Our new Gemma 4 12B model hits a sweet spot between size + performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can’t wait to see what the community does with this one!

译Gemma 4 系列累计下载量突破1.5亿次，Google随之推出新成员Gemma 4 12B。该模型仅12B参数，可在16GB VRAM笔记本上本地运行，兼顾尺寸与性能，支持多步推理和智能体工作流。采用Apache 2.0开源许可，供社区使用。

Chubby♨️@kimmonismus · 6月4日71

Gemma 4 12B shipped today under the label "encoder-free." A local 12b model that shows really good results. I'm a big fan of Gemma Gemma 4 12B is out: a dense, fully open model (Apache 2.0) that runs on a 16GB laptop and does agentic reasoning, vision and audio at a quality Google puts near its 26B model. The reason a 12B can pull this off: Google removed the separate vision and audio encoders and feeds both straight into the model, which keeps the memory footprint small enough for consumer GPUs. For on-device assistants and private coding agents, that lowers the bar a lot. always look forward to the updates. 12b is a good sweet spot in terms of size. a few facts: Vision: the 550M encoder (27 transformer layers) is now a 35M embedder, one matmul on 48x48 pixel patches. Roughly 15x smaller. Audio: the 300M encoder (12 conformer layers) is gone. Raw 16kHz audio cut into 40ms frames, projected straight into the LLM. So encoding didn't vanish, it collapsed into the backbone. The payoff is real: one shared set of weights, so you LoRA-tune vision, audio and text in a single pass.

译Google 开源 Gemma 4 12B（密集参数，Apache 2.0 许可），采用全新无编码器架构：移除独立的视觉（550M 参数、27 层 Transformer）和音频（300M 参数、12 层 Conformer）编码器。视觉改为 35M 嵌入层（约缩小 15 倍），音频以 40ms 帧直接投影到大语言模型。模型在 16GB VRAM 笔记本上即可运行智能体推理、视觉和音频任务，性能接近 26B 参数模型。共享权重支持一次 LoRA 调优覆盖视觉、音频和文本。

Demis Hassabis@demishassabis · 6月4日74

Celebrating the milestone of a massive 150+ million downloads of Gemma 4 with the release of the new Gemma 4 12B model! It's incredibly powerful for such a small model and it’s tiny enough to run locally on a laptop with just 16GB VRAM. Apache 2.0 license - happy building!

译Demis Hassabis 宣布 Gemma 4 系列下载量突破 1.5 亿，并正式发布新版 Gemma 4 12B 模型。该模型是一个统一的、无编码器的多模态模型，兼具边缘端效率与高级推理能力。尽管参数规模仅为 12B，但性能强劲，且足够小巧，可在仅需 16GB VRAM 的笔记本上本地运行。采用 Apache 2.0 开源许可证，方便开发者自由构建。

AYi@AYi_AInotes · 6月4日65

150M 的活，35M 干了， Google 新出的 Gemma 4 12B，把多模态里那个最重的零件，视觉编码器，从 150M-550M 直接压到 35M了，过去做多模态，套路是固定的，图片先扔给一个专门的视觉编码器翻译成模型能懂的语言，再交给大模型理解，就像配了个翻译官。这个翻译官，传统 ViT 编码器要 150M 到 550M 参数。 Gemma 4 12B 直接把翻译官辞了，只留一个 35M 的轻量嵌入器，把图片切成 48×48 的小块，当成 token 直接扔进去，让 Transformer 自己学着看世界，音频也一样，16kHz 原始波形切成 40ms 一帧，直接喂进同一个模型。也就是说，图片、声音、文字，第一次被当成同一种东西。为什么敢这么干，因为它赌的是一件事，当基座模型大到某个临界点，那些专门的子模块，就不再是必需品了。这个剧本你可能见过，当年 ViT 取代 CNN，也是同一个套路，规模够大的时候，与其手工设计一堆专用结构，不如把活儿直接交给一个统一的大模型自己学。现在这套逻辑，正从视觉单模态，蔓延到整个多模态架构。而且 12B 这个尺寸不是随便选的，刚好大到能扔掉编码器，又刚好小到能塞进 16GB 的笔记本里，据 aaryan_kakad 在 M4 Max 上的实测，4-bit 量化下识图延迟 1.2 到 1.5 秒，官方说 16GB 够用，社区的说法更实在，能跑，但高分辨率多图会压线。但这条新闻真正值得琢磨的，不是它能跑在你的笔记本上，是它意味着什么，过去做一个多模态应用，你得拼装 Whisper 转录、LLaVa 看图、再接一个 LLM，像攒一台机器，每个零件都得你自己调好接口、对齐、调试。如果 encoder-free 这条路走通，未来一个微调好的统一模型，可能就把这一整条流水线吃掉了。那一刻贬值的，不是某个工具，是你过去攒那台机器、拼那条 pipeline 攒下的全部手艺。模型不是在帮你省一个零件，是在悄悄重写哪种手艺还值钱。

译Google 推出 Gemma 4 12B（Apache 2.0），采用无独立视觉编码器的统一多模态架构。仅用 35M 参数的轻量嵌入器，将图像切为 48×48 块、音频（16kHz 原始波形）切为 40ms 帧，直接作为 token 输入 Transformer。M4 Max 上 4-bit 量化识图延迟 1.2-1.5 秒，官方称 16GB 内存可用，但社区指出高分辨率多图会压线。该设计暗示：当基座模型足够大，专用子模块不再是必需，未来一个微调好的统一模型可能取代传统拼装 Whisper、LLaVa 等多模态 pipeline。

Josh Woodward@joshwoodward · 6月4日44

A short backstory on this one: A small Google Labs team had an idea to make an app designed to connect you with what matters, without the endless scroll. "Hope scrolling, not doom scrolling" was the hallway pitch. "Go for it." And today, that little experiment is rolling out. Meet Dreambeans, a daily dose of inspiration, brewed fresh for you. We're excited to see what you think!

译Google Labs 发布实验性移动应用 Dreambeans。该应用利用 Personal Intelligence 连接用户 Google 应用，每天推送个性化故事集合，帮助用户发现可能错过的内容，并聚焦真正重要的事。团队将其理念描述为“希望滚动，而非末日滚动”。当前仅限美国符合条件的 Google AI Ultra 用户（18 岁以上）使用，同时开放公开等待名单。

🚨 AI News | TestingCatalog@testingcatalog · 6月4日46

GOOGLE 🔥: A new Dreambeans experiment is now available in Google Labs for US-based Google AI Ultra users on the waitlist. This experiment uses Personal Intelligence to deliver daily stories based on the user's data context. Not a testing time for the most 👀

译GOOGLE 🔥: 一项新的 Dreambeans 实验现已于 Google Labs 上线，面向美国地区的 Google AI Ultra 用户（需加入候补名单）。该实验利用个人智能，根据用户的数据上下文提供每日故事。对大多数人来说，这并非测试时间👀