Meet Hiroki-san (@tomiyasu16) who is running his farm in Japan with ChatGPT and Codex: https://chatgptpro.substack.com/p/hiroki-tomiyasu

译日本北海道农户Hiroki Tomiyasu（@tomiyasu16）从未学过农业、未继承土地，曾是公务员。他用ChatGPT和Codex自行构建工具运营100公顷农场：通过聊天APP远程控制温室通风口（ESP32板+电机驱动+Cloudflare Workers）；自动检测温度并开窗的机器人；卫星作物健康数据叠加农田地图；Airtable关联地块、任务、物料、传感器；从照片生成电控柜布线图。这些以前只有大型农企才能负担。

Chubby♨️@kimmonismus · 6月6日71

Google DeepMind released new Gemma 4 QAT models that make the model family much more efficient for local, on-device use. Using Quantization-Aware Training, the models are trained with compression in mind, which reduces memory needs while preserving more quality than standard post-training quantization. The release includes support for the popular Q4_0 format and a new mobile-specialized quantization format. Gemma 4 E2B can now run with around 1GB of memory (!), and the text-only version can even require less than 1GB (!). That makes local AI on phones, laptops, edge devices, and consumer GPUs far more practical. Really cool to see.

译Google DeepMind 发布 Gemma 4 QAT 量化感知训练模型，专为本地 / 设备端优化。通过量化感知训练减少内存占用，同时相比标准训练后量化保留更多质量。支持 Q4_0 格式及新的移动专用量化格式。Gemma 4 E2B 版本可运行于约 1GB 内存，纯文本版本甚至低于 1GB，使手机、笔记本、边缘设备和消费级 GPU 上的本地 AI 更实用。

jason@jxnlco · 6月6日63

wow its @tomiyasu16 https://x.com/itsolelehmann/status/2062840689415905369?s=46

译日本北海道前公务员 @tomiyasu16 从未学过农业，也未继承土地，利用 OpenAI 的 Codex 自行构建了 100 公顷西兰花农场的全套自动化工具：通过 ESP32、电机驱动器和 Cloudflare Workers 从聊天 App 远程控制温室通风；自动检测温度并开窗；将卫星作物健康数据叠加在地图上；用 Airtable 关联地块、任务、材料和传感器；从照片生成配电盘接线图。这些此前只有大型农业企业才能负担的工程，被他用一台笔记本和 Codex 开发成功。

Google AI Developers@googleaidevs · 6月6日72

New @GoogleGemma 4 QAT (Quantization-Aware Training) checkpoints are here, so you can run models locally on consumer GPUs and mobile devices with minimal quality loss. What’s new: 🔹 GGUF (Q4_0): Checkpoints: Max local performance across all sizes and drafter models 🔹 Custom Mobile Schema: We shrunk Gemma 4 down to less than 1GB for mobile devices by using a custom mixed precision schema designed for edge hardware (featuring targeted 2-bit decoding layers, optimized KV caches, and static activations) By simulating compression during training rather than after (Post-Training Quantization), we've drastically reduced the memory footprint and accelerated decode speeds while preserving reasoning quality. https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/

译谷歌发布 Gemma 4 量化感知训练 (QAT) 检查点，支持在消费级 GPU 和移动设备上本地运行，质量损失极小。新检查点提供 GGUF（Q4_0）格式，覆盖所有尺寸及起草模型，实现最佳本地性能。自定义移动模式采用混合精度方案，将 Gemma 4 压缩至 1GB 以下，包含 2-bit 解码层、优化 KV 缓存和静态激活。通过在训练中模拟压缩（而非训练后量化），大幅降低内存占用并加速解码，同时保持推理质量。

Berryxia.AI@berryxia · 6月5日60

Locally AI 被 LM Studio 纳入麾下后，就退出移动手机版的客户端。果然速度很AI，可以手机端更好跑本地大模型。但是，目前其实这个场景我觉得还是没有真的挖掘出来或者说没有很好的支持用户的需求。

Rohan Paul@rohanpaul_ai · 6月5日53

Nemotron 3 Ultra vs GPT-5.5 on atomic[.]chat, a desktop app that runs LLMs locally. Nemotron 3 Ultra gave almost similar result on a test to build HTML5 canvas with real physics, while being 10X cheaper. - Nemotron 3 Ultra: 11.3k tokens, $0.051 - GPT 5.5: 11.0k tokens, $0.57 Nemotron 3 Ultra has 550 bn total parameters (55 bn active per token), because it is a Mixture-of-Experts model.

译在 atomic.chat 本地桌面应用中，Nemotron 3 Ultra（MoE 架构，总参数 550B，每 token 活跃 55B）与 GPT-5.5 在构建带物理引擎的 HTML5 canvas 任务（旋转水桶、高尔顿板、极端质量块碰撞）上表现几乎相同。Nemotron 3 Ultra 消耗 11.3k tokens、花费 $0.051，GPT-5.5 消耗 11.0k tokens、花费 $0.57，前者成本仅为后者的约 1/10，质量差距远小于价格差距。

Berryxia.AI@berryxia · 6月5日60

😂 LM Studio 手机版也发布了，这下你可以“烧”你的iPhone 在本地跑大模型了……😆

宝玉@dotey · 6月5日55

Codex 新的这个 Build iOS Apps 插件，可以让你方便的在 Codex 查看和测试你的 iOS 应用，预览 SwiftUI组件，修改了还能马上看到更新。简单解释下原理：这个插件可以理解成：把原本只能在 Xcode 和 iOS Simulator 里看的 App，搬到了 Codex 右边的浏览器里。视频里右侧那个 iPhone 画面，其实还是一台真实运行中的 iOS Simulator，只是插件用一个第三方 npm serve-sim 把模拟器画面持续截取成视频流，再显示到浏览器页面中。所以 Codex 不只是读代码，它还能“看见”App 当前长什么样。浏览器能操作这个 App，是因为插件还建立了一条控制通道。你在浏览器里点击、拖动、输入文字时，浏览器会把这些动作换算成模拟器里的触摸坐标或键盘事件，再发回 iOS Simulator。也就是说，浏览器本身没有运行 iOS App，它只是像一个远程屏幕：一边接收模拟器画面，一边把你的操作转发给模拟器。结合 Codex 的 Browser Use 操作浏览器能力，就可以实现 Codex 自行调试 iOS App。视频里看到的“选择元素”也不是在选网页里的按钮。iOS App 没有网页 DOM，所以插件会读取 App 的 Accessibility 信息，也就是系统给辅助功能用的那套“这个按钮叫什么、位置在哪里、是不是可点”的数据。然后插件在浏览器画面上盖一层透明的 HTML 按钮，每个透明按钮对应一个 iOS UI 元素。这样 Codex 就能点选、识别、描述这些原生 iOS 元素。 SwiftUI Preview 和热重载则是另一个能力：插件会临时生成一个专门用来展示 Preview 的小 App，把你的 SwiftUI 预览放进去运行。你改代码后，它可以只重新编译预览相关的小动态库，再通知正在运行的 Preview App 更新画面，不一定每次都完整重装 App。最终效果就是：Codex 能在同一个窗口里读代码、改 UI、运行模拟器、看结果、继续调整，形成一个完整的 iOS 开发闭环。

译OpenAI Codex 推出 Build iOS Apps 插件，让用户在不离开 Codex 的情况下查看、测试 iOS 应用，预览 SwiftUI 并热重载。原理：借助第三方 npm serve-sim 将 iOS Simulator 画面截取为视频流显示在浏览器中，同时建立控制通道将浏览器的点击、拖动等操作转换为模拟器触摸或键盘事件。插件读取 iOS App 的 Accessibility 信息，在浏览器画面上覆盖透明 HTML 按钮，使 Codex 能选择原生 UI 元素。SwiftUI 预览和热重载通过临时生成预览小 App，仅重新编译相关动态库，无需完整重装应用。

Chubby♨️@kimmonismus · 6月5日66

That’s so cool! I love the creativity of those guys. An open model for live music generation only 2.4B parameters. If you are bored on long flights you can now start creating bangers

译那太酷了！我爱这些家伙的创意。一个仅2.4B参数的开放模型，用于实时音乐生成。如果你在长途飞行中无聊，现在可以开始创作神曲了。

歸藏(guizang.ai)@op7418 · 6月4日68

做了个简单的小工具即览 Glimpse 没有 AI，不联网，专注于解决手机上的 Markdown 和 HTML 的 AI 产出物预览问题。苹果审核卡了我三天，开放了 8000 个测试名额，应该够了。如果满了的话，可能等几天我就上架了

译歸藏发布手机端小工具「即览 Glimpse」，无 AI、不联网，专用于解决 iOS 上预览 AI 产出的 Markdown 和 HTML 内容的问题。经过三天苹果审核，现已开放 8000 个测试名额。若名额用完，预计数天后正式上架 App Store。

Jeff Dean@JeffDean · 6月4日75

Check out our Gemma 4 12B model: it's a super capable open weights model that can run directly on your laptop.

译来看看我们的 Gemma 4 12B 模型：它是一个功能非常强大的开源权重模型，可以直接在你的笔记本电脑上运行。

Berryxia.AI@berryxia · 6月4日66

端侧模型的能力还是在被放大！ Gemma 4 12B和Google AI Edge彻底打通，现在直接在笔记本上就能跑100% on-device的Agentic workflow。 Mac用户最爽：AI Edge Gallery直接生成代码，AI Edge Eloquent支持语音输入然后实时编辑文本，两者都是全新上线。底层用LiteRT-LM把Gemma 4 12B本地Serve起来，整个过程零网络、零延迟、数据全在自己机器里。以前我们总觉得12B模型本地跑agentic任务还差得远，结果Google这次把模型、推理引擎、开发工具链一次性打包好，让普通开发者在笔记本上就能把AI当成真正私有的、可连续执行的本地队友。这其实把行业当前最主流的路径直接反转了。大家都在卷云端更大模型、更低延迟，现在Google却用12B本地模型告诉你：真正的生产力跃迁，是把agentic能力彻底下沉到设备端，让AI成为你操作系统的一部分。

译谷歌将 Gemma 4 12B 与 Google AI Edge 深度整合，开发者可在笔记本上运行 100% on-device 的 Agentic workflow。Mac 用户新增两款工具：AI Edge Gallery 直接生成代码，AI Edge Eloquent 支持语音输入并实时编辑文本。底层通过 LiteRT-LM 本地 serve 模型，实现零网络、零延迟、数据完全留在设备端。Google 将模型、推理引擎和开发工具链打包，让开发者拥有私有的、可连续执行的本地 AI 队友。

Berryxia.AI@berryxia · 6月4日70

我擦@！我发现现在Apple的MLX框架和模型都可以Day0发布了？这看来是同步进行操作的，MLX框架以及和模型厂商直接第一时间进行了对接啊！强烈建议Mac的同学直接上MLX框架的模型，速度一般至少10-20%还是有的。

小互@xiaohu · 6月4日71

Google 发布 Gemma 4 12B 开源模型 16GB 笔记本跑全模态 AI Gemma 4 12B 采用了一种叫"Unified"的无编码器架构，让文字、图像、音频、视频四种输入直接进入同一个 Transformer 主干网络处理。模型可直接处理原始的图像和声音用一个类比讲清楚传统多模态模型处理图片和音频的方式，类似于一个只会中文的老板配了两个翻译：一个英文翻译（视觉编码器），一个日文翻译（音频编码器）。每次有英文或日文材料进来，必须先让翻译转成中文，老板才能看懂。翻译本身占工位（显存），翻译过程要排队等（延迟），而且老板拿到的是翻译加工过的版本，不是原文。 Gemma 4 12B 做的事情是：把两个翻译都裁了，让老板自己学会了直接看英文和日文。几个关键数字： 16GB 显存或统一内存能跑，4-bit 量化低到 8GB，目标就是在普通笔记本上本地运行 256K Token 上下文窗口，支持 140+ 种语言内置 Thinking 模式（逐步推理）和原生 Function Calling

译Google 发布 Gemma 4 12B 开源模型，采用无编码器 Unified 架构，可直接处理文本、图像、音频、视频，无需独立编码器。16GB 显存可运行，4-bit 量化后低至 8GB。支持 256K token 上下文、140+ 语言，内置 Thinking 模式和 Function Calling。

Berryxia.AI@berryxia · 6月4日69

Google 昨晚发布Gemma 4 12B 多模态的大模型，至少需要16G 内存就可以运行。应该和Qwen 的模型进行对比其效果如何～

MiniMax (official)@MiniMax_AI · 6月4日65

We are part of @nvidia and @Microsoft ’s Local LLM lineup at #GTC Taipei.🔥 The PC is being reinvented around local, agentic, open-weight models MiniMax-M3 is built exactly for this future: Open-weight. 1M context. Strong coding. Native multimodality. Excited for what comes next!

译我们已加入 @nvidia 和 @Microsoft 在 #GTC Taipei 的本地 LLM 阵容。🔥 PC 正围绕本地、智能体、开放权重模型重新定义。 MiniMax-M3 正是为此未来而打造：开放权重。 1M 上下文。强编码能力。原生多模态。对接下来的一切充满期待！

Sundar Pichai@sundarpichai · 6月4日73

Our new Gemma 4 12B model hits a sweet spot between size + performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can’t wait to see what the community does with this one!

译Gemma 4 系列累计下载量突破1.5亿次，Google随之推出新成员Gemma 4 12B。该模型仅12B参数，可在16GB VRAM笔记本上本地运行，兼顾尺寸与性能，支持多步推理和智能体工作流。采用Apache 2.0开源许可，供社区使用。

Chubby♨️@kimmonismus · 6月4日65

I took a "behind-the-scenes" tour at Microsoft today, where I was able to inspect the Surface Laptop Ultra firsthand and therefore was able to record those clips. The most obvious takeaway: Microsoft is now aiming to enter into direct competition with Apple and challenge the MacBook Pro. Needless to say, I wasn't able to conduct any real-world testing. However, the build quality, thermal management, the display, and- above all- the NVIDIA chip are certainly impressive. Whether it will truly manage to challenge Apple's MacBooks remains to be seen. But one thing is certain: Microsoft means business.

译微软推出全新Surface Laptop Ultra，定位创作者和AI笔记本，搭载NVIDIA新芯片（RTX GPU），最高提供1 petaflop AI算力、128GB统一内存。配备15英寸mini-LED PixelSense Ultra触摸屏（3:2比例，262 PPI，峰值2000尼特HDR亮度），厚度不足18mm。作者在幕后参观中亲手检测，认为做工、散热、显示屏和芯片令人印象深刻，微软明确将目标对准MacBook Pro，意在直接挑战苹果。

Chubby♨️@kimmonismus · 6月4日71

Gemma 4 12B shipped today under the label "encoder-free." A local 12b model that shows really good results. I'm a big fan of Gemma Gemma 4 12B is out: a dense, fully open model (Apache 2.0) that runs on a 16GB laptop and does agentic reasoning, vision and audio at a quality Google puts near its 26B model. The reason a 12B can pull this off: Google removed the separate vision and audio encoders and feeds both straight into the model, which keeps the memory footprint small enough for consumer GPUs. For on-device assistants and private coding agents, that lowers the bar a lot. always look forward to the updates. 12b is a good sweet spot in terms of size. a few facts: Vision: the 550M encoder (27 transformer layers) is now a 35M embedder, one matmul on 48x48 pixel patches. Roughly 15x smaller. Audio: the 300M encoder (12 conformer layers) is gone. Raw 16kHz audio cut into 40ms frames, projected straight into the LLM. So encoding didn't vanish, it collapsed into the backbone. The payoff is real: one shared set of weights, so you LoRA-tune vision, audio and text in a single pass.

译Google 开源 Gemma 4 12B（密集参数，Apache 2.0 许可），采用全新无编码器架构：移除独立的视觉（550M 参数、27 层 Transformer）和音频（300M 参数、12 层 Conformer）编码器。视觉改为 35M 嵌入层（约缩小 15 倍），音频以 40ms 帧直接投影到大语言模型。模型在 16GB VRAM 笔记本上即可运行智能体推理、视觉和音频任务，性能接近 26B 参数模型。共享权重支持一次 LoRA 调优覆盖视觉、音频和文本。

Demis Hassabis@demishassabis · 6月4日74

Celebrating the milestone of a massive 150+ million downloads of Gemma 4 with the release of the new Gemma 4 12B model! It's incredibly powerful for such a small model and it’s tiny enough to run locally on a laptop with just 16GB VRAM. Apache 2.0 license - happy building!

译Demis Hassabis 宣布 Gemma 4 系列下载量突破 1.5 亿，并正式发布新版 Gemma 4 12B 模型。该模型是一个统一的、无编码器的多模态模型，兼具边缘端效率与高级推理能力。尽管参数规模仅为 12B，但性能强劲，且足够小巧，可在仅需 16GB VRAM 的笔记本上本地运行。采用 Apache 2.0 开源许可证，方便开发者自由构建。

AYi@AYi_AInotes · 6月4日65

150M 的活，35M 干了， Google 新出的 Gemma 4 12B，把多模态里那个最重的零件，视觉编码器，从 150M-550M 直接压到 35M了，过去做多模态，套路是固定的，图片先扔给一个专门的视觉编码器翻译成模型能懂的语言，再交给大模型理解，就像配了个翻译官。这个翻译官，传统 ViT 编码器要 150M 到 550M 参数。 Gemma 4 12B 直接把翻译官辞了，只留一个 35M 的轻量嵌入器，把图片切成 48×48 的小块，当成 token 直接扔进去，让 Transformer 自己学着看世界，音频也一样，16kHz 原始波形切成 40ms 一帧，直接喂进同一个模型。也就是说，图片、声音、文字，第一次被当成同一种东西。为什么敢这么干，因为它赌的是一件事，当基座模型大到某个临界点，那些专门的子模块，就不再是必需品了。这个剧本你可能见过，当年 ViT 取代 CNN，也是同一个套路，规模够大的时候，与其手工设计一堆专用结构，不如把活儿直接交给一个统一的大模型自己学。现在这套逻辑，正从视觉单模态，蔓延到整个多模态架构。而且 12B 这个尺寸不是随便选的，刚好大到能扔掉编码器，又刚好小到能塞进 16GB 的笔记本里，据 aaryan_kakad 在 M4 Max 上的实测，4-bit 量化下识图延迟 1.2 到 1.5 秒，官方说 16GB 够用，社区的说法更实在，能跑，但高分辨率多图会压线。但这条新闻真正值得琢磨的，不是它能跑在你的笔记本上，是它意味着什么，过去做一个多模态应用，你得拼装 Whisper 转录、LLaVa 看图、再接一个 LLM，像攒一台机器，每个零件都得你自己调好接口、对齐、调试。如果 encoder-free 这条路走通，未来一个微调好的统一模型，可能就把这一整条流水线吃掉了。那一刻贬值的，不是某个工具，是你过去攒那台机器、拼那条 pipeline 攒下的全部手艺。模型不是在帮你省一个零件，是在悄悄重写哪种手艺还值钱。

译Google 推出 Gemma 4 12B（Apache 2.0），采用无独立视觉编码器的统一多模态架构。仅用 35M 参数的轻量嵌入器，将图像切为 48×48 块、音频（16kHz 原始波形）切为 40ms 帧，直接作为 token 输入 Transformer。M4 Max 上 4-bit 量化识图延迟 1.2-1.5 秒，官方称 16GB 内存可用，但社区指出高分辨率多图会压线。该设计暗示：当基座模型足够大，专用子模块不再是必需，未来一个微调好的统一模型可能取代传统拼装 Whisper、LLaVa 等多模态 pipeline。

郭明錤｜Ming-Chi Kuo@mingchikuo · 6月4日65

1. 我大約一年前做的這張 Apple 的 XR 頭戴裝置與智慧眼鏡之規劃路線（roadmap）沒什麼參考價值了，目前只剩兩個智慧眼鏡裝置有能見度。 2. 規劃路線大改是由 Apple 的下一任 CEO John Ternus 拍板定案（其實已經改變一段時間，只是我沒即時更新），我認為移除 Vision Pro 系列、並將資源轉向具有更廣大消費潛力的智慧眼鏡類產品是正確決定。 3. 最新的供應鏈調查指出，Apple 具有顯示功能的 AR / XR 智慧眼鏡（採用光波導）將延後到 2029 年。沒有顯示功能的 AI 眼鏡（類似 Ray-Ban Meta）預計還是在 2027 年推出。

译苹果分析师郭明錤更新预测：此前规划的XR头戴装置路线图已作废，目前仅两款智能眼镜设备有能见度。路线图大改由下一任CEO John Ternus拍板，Vision Pro系列被移除，资源转向智能眼镜。最新供应链调查显示，具有显示功能的AR/XR智能眼镜（光波导）推迟至2029年，无显示功能的AI眼镜（类似Ray-Ban Meta）仍预计2027年推出。郭明錤认为智能眼镜将带动下一波消费电子趋势。

郭明錤｜Ming-Chi Kuo@mingchikuo · 6月4日63

1. The Apple XR headset and smart glasses roadmap I put together about a year ago is no longer a useful reference. For now, only two smart glasses products remain visible in the roadmap. 2. The major overhaul was signed off by Apple's next CEO, John Ternus. This shift actually happened a while back. I'm just late updating the chart. I think removing the Vision Pro line was the right call, as Apple shifts resources toward smart glasses with greater mass-market potential. 3. My latest supply chain checks suggest Apple’s display-equipped AR/XR smart glasses device, powered by optical waveguides, has slipped to 2029. The display-less AI glasses, similar to Ray-Ban Meta, are still expected to ship in 2027.

译郭明錤更新苹果XR头显与智能眼镜路线图，原先版本已失效。目前仅剩两款智能眼镜产品在规划中，主要调整由苹果下任CEO John Ternus批准，取消了Vision Pro产品线，将资源转向更具大众市场潜力的智能眼镜。最新供应链调查显示，配备光学波导显示屏的AR/XR智能眼镜设备推迟至2029年；不带显示屏的AI眼镜（类似Ray-Ban Meta）预计2027年出货。

🚨 AI News | TestingCatalog@testingcatalog · 6月4日51

Perplexity Personal Computer is now available to Max and Enterprise Max users on Windows! Waitlist below 👀

译Perplexity Personal Computer 现面向 Max 和 Enterprise Max 用户开放 Windows 版本！等候名单如下 👀

🚨 AI News | TestingCatalog@testingcatalog · 6月4日65

GOOGLE 🔥: A new Gemma 4 12B is now available on Huggingface under Apache 2.0 license! > Built with the same multimodal functionality as Gemma 4 E2B and E4B (text, audio, image, and video inputs), it brings native audio and vision understanding directly to local environments without the need for separate encoders. > This unified approach to multimodality makes the model encoder-free, offering a deployment size that is perfect for consumer devices and streamlined local execution.

译Google 最新的 Gemma 4 12B 模型已上线 Hugging Face，采用 Apache 2.0 许可证。该模型与 Gemma 4 E2B/E4B 共享相同多模态能力，支持文本、音频、图像和视频输入，无需单独编码器即可实现原生音频和视觉理解。这种无编码器统一设计方案使其部署体积更小，非常适合消费级设备和本地执行环境。官方称其旨在弥合边缘效率与高级推理之间的差距。

Chubby♨️@kimmonismus · 6月4日57

First hands-on with Microsoft’s new Surface Laptop Ultra. Microsoft is clearly positioning this as a new class of creator and AI laptop, powered by new NVIDIA silicon with an RTX GPU built for local AI, creative workflows, and gaming. A few standout specs: -New NVIDIA chip with RTX GPU -Up to 1 petaflop of AI compute -Up to 128GB unified memory -15-inch mini-LED PixelSense Ultra touchscreen -3:2 aspect ratio -262 PPI -Up to 2,000 nits peak HDR brightness -Less than 18mm thick

译首次上手微软新的 Surface Laptop Ultra。微软明确将其定位为面向创作者和 AI 的新品类笔记本电脑，由搭载 RTX GPU 的新 NVIDIA 芯片驱动，专为本地 AI、创意工作流和游戏打造。几个突出规格： - 带 RTX GPU 的新 NVIDIA 芯片 - 最高 1 petaflop AI 算力 - 最高 128GB 统一内存 - 15 英寸 mini-LED PixelSense Ultra 触摸屏 - 3:2 比例 - 262 PPI - 最高 2000 尼特峰值 HDR 亮度 - 厚度不足 18mm

Google AI Developers@googleaidevs · 6月4日77

We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀 The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license. Here’s what makes it unique: + Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone. + Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.

译Google发布Gemma 4 12B，一款无编码器的统一多模态模型，可直接将视觉和音频输入送入LLM主干，无需传统多模态编码器。该模型填补了移动端E4B模型与26B MoE模型之间的空白，封装前沿推理与原生音频能力，采用Apache 2.0许可。在16GB VRAM下即可本地运行复杂多步骤智能体工作流，性能接近26B模型。

Perplexity@perplexity_ai · 6月3日61

Personal Computer is coming to Windows. Personal Computer for Windows runs on your machine and orchestrates across the apps and files you use every day. We'll roll out first to paying Max and Enterprise Max subscribers on the waitlist.

译Personal Computer 即将登陆 Windows。面向 Windows 的 Personal Computer 在你的机器上运行，并协调你每天使用的应用和文件。我们将首先向等候名单上的付费 Max 和 Enterprise Max 订阅用户推送。

小互@xiaohu · 6月3日71

被 AI 不听话折磨了大半年，终于找到解法了发现一个开源项目 OpenSquilla，国内团队做的他们用 Python 把"小龙虾"重写了一遍解决了它太费token、不按照规则执行以及安全的问题 100 次对话就能省下 100万 Token 先说省钱：它集成了一个本地的小模型，你发的每一个请求，在真正发给大模型之前，会被这个小模型极速向量化，分析这个请求到底是简单任务还是复杂任务。简单的发给便宜模型，复杂的才派顶级模型上场。就跟医院分诊台一个道理，感冒发烧不用挂专家号。关键是这个分类在本地跑，不花 token，速度极快，基本感知不到。官方跑了个测试，25 个任务，纯用 Claude Opus 4.7 总成本 6.2 美金，用 OpenSquilla 路由 Opus4.7、GLM5.1、DS4 Flash 混着跑，分数几乎一样，成本只要 0.68 美金。同样的效果，成本砍到九分之一！这下我终于敢把 Opus 和 GPT 接进去了！每轮对话还会显示本轮省了多少 token。而且省 token 不只省在模型调用上。我装了九十多个 Skill，每轮对话都把所有 Skill 的 description 全塞进上下文里，算了一下每轮要消耗 9000 左右 Tokens。 OpenSquilla 会根据当前对话语义只注入匹配度最高的几个 Skill，按我的规模大概 100 次对话就能省 100万 Token

译国内团队开源项目OpenSquilla用Python重写“小龙虾”，解决费token、不按规则执行及安全问题。它集成小模型对请求实时分类：简单任务走便宜模型，复杂任务走顶级模型。测试25个任务，纯Claude Opus 4.7成本6.2美金，OpenSquilla混跑Opus 4.7、GLM5.1、DS4 Flash成本仅0.68美金，分数几乎一样。同时，它根据对话语义只注入匹配度最高的Skill（原90+个），每轮省约9000 Token，100次对话累计省100万Token。

🚨 AI News | TestingCatalog@testingcatalog · 6月3日44

Perplexity Computer will soon be able to dynamically split compute power between local models and cloud models! If that would drive Perplexity Computer costs down, it would be huge, since it is one of the top blockers for many at this moment. Soon 👀

译Perplexity Computer 很快将能够在本地模型和云端模型之间动态分配算力！如果这能降低 Perplexity Computer 的成本，那将是巨大的进步，因为目前这是许多用户的主要障碍之一。很快 👀

小互@xiaohu · 6月3日64

收到Mac mini被开发者追捧的吸引微软发布了一台类似Mac mini的台式机： Surface RTX Spark Dev Box 它是一个小盒子，放在桌上就行配置了英伟达最新的 RTX Spark 芯片，128GB 内存，算力达到 1 petaflop（1000 万亿次运算），能在本地跑 1200 亿参数的大模型，不用连云端 GPU。外观看起来像一个"压扁的 Xbox Series X"，顶部有类似的散热格栅，只是通风孔是方形的而不是圆形的。整个机身是阳极氧化铝 3D 打印的，顶部有 1000 个通风孔。定位：给开发者在本地跑 AI 模型、Agent 工作流、模型微调用的，不用什么都往云上送，省钱也快开箱即用：预装了开发者版 Windows 11 Pro，VS Code、GitHub Copilot、WSL、PowerShell 7 都配好了，开机就能写代码散热：整个铝合金机身就是散热系统，100W 功耗，顶部有 1000 个通风孔，能扛长时间训练任务不降频价格：官方还没公布，行业分析师估计在 3000 到 3500 美元之间，同类产品 AMD Ryzen AI Halo PC 和 NVIDIA DGX Spark 大约卖 3999 美元今年晚些时候在美国上市...

译微软推出Surface RTX Spark Dev Box，一款专为本地AI开发的小型台式机。它搭载NVIDIA RTX Spark芯片、128GB内存，算力达1 petaflop，可在本地运行1200亿参数大模型。其阳极氧化铝机身集成了散热系统，功耗100W。设备预装了开发者版Windows 11 Pro及开发工具链，预计售价3000至3500美元，将于今年晚些时候在美国上市。

SenseTime@SenseTime_AI · 6月3日34

At SenseTime, we believe the future of #AI is shaped by continuously pushing the boundaries of #FoundationalInnovation. At the 2026 AI Innovation Forum, our Co-founder and Chief Scientist Dr. @lindahua highlighted an important industry trend: #ModelArchitecture optimization can significantly reduce the compute required per unit of intelligence. He also note that China’s AI ecosystem should leverage application and model innovation to drive chip development forward. 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭, SenseTime's latest multimodal model built on our proprietary 𝗡𝗲𝗼-𝗨𝗻𝗶𝗳𝘆 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲, demonstrates this in practice — achieving significantly lower #ComputeCosts in infographic generation while being simultaneously adapted to multiple #ChineseChips. At the same time, we continue developing AI solutions that genuinely solve user problems and create sustainable #CommercialValue, strengthening our long-term competitiveness in the evolving AI arena. Forum organisers: China International Capital Corporation Limited, @hkust

译商汤联合创始人兼首席科学家在2026 AI创新论坛指出，模型架构优化能显著降低单位智能所需的算力。其新发布的多模态模型SenseNova U1基于自研Neo-Unify架构，实践了这一理念，在生成信息图时实现了显著更低的计算成本，并已适配多款中国芯片。商汤强调持续通过应用与模型创新推动芯片发展，以创造商业价值与长期竞争力。

🚨 AI News | TestingCatalog@testingcatalog · 6月3日65

HERMES 🔥: A new Hermes Desktop app from Nous Research is now available on macOS, Windows, and Linux! Testing time 👀

译HERMES 🔥：Nous Research 推出的全新 Hermes 桌面应用现已登陆 macOS、Windows 和 Linux！测试时间 👀

Satya Nadella@satyanadella · 6月3日74

With Project Solara, we are building a new platform purpose-built for agent-first devices. Excited to work with @cristianoamon and @Qualcomm on this!

译通过Project Solara，我们正在构建一个专为智能体优先设备打造的新平台。很高兴能与@cristianoamon和@Qualcomm合作！

Microsoft Research@MSFTResearch · 6月3日54

Agentic experiences powered by small models that fit on your own device. Hear from Maya Murad on MagenticLite at the Microsoft Research Lab at #MSBuild.

译由可在您自己设备上运行的小型模型驱动的智能体体验。请听 Maya Murad 在 #MSBuild 微软研究院实验室介绍 MagenticLite。

Chubby♨️@kimmonismus · 6月3日51

Very excited for this „no prior“ episode! Curious if the hear more about their project Solaris, their agentic handhelds

译非常期待这期“无先例”节目！好奇能否了解更多关于他们的项目Solaris，他们的智能体手持设备。

Chubby♨️@kimmonismus · 6月3日56

Microsoft scout revealed „your always-on personal agent for work.“ If "AI" was the Word of the Year in 2025, in 2026 it will be "agents" (always-on). Everything is agentic this year.

译微软 Scout 揭示了“您始终在线的个人工作智能体”。如果说“AI”是2025年的年度词汇，那么2026年将是“智能体”（始终在线）。今年一切都是智能体化的。

AYi@AYi_AInotes · 6月3日57

Damn，这副眼镜里跑的是完整的 Linux！不是概念图，也不是 PPT，是 Buildroot Linux + Arm Cortex A7， SSH 进去就能跑你的 Claude Code、Codex、OpenClaw。而且整个系统 8 月前会开源到 GitHub。我觉得这副眼镜最狠的地方不是把电脑塞进眼镜里，而是它竟然把 vibe coding 从桌面拽到了你脸上。以前你写代码得坐在电脑前，现在你的 coding agent 就坐在你肩膀上，你眼睛看到什么，它实时拿到视觉上下文，骨传导麦克风里直接给你反馈。不是 AR 眼镜那种花活，是实打实的 Agent Terminal。说白了，这相当于把你的 Claude 从聊天框里拽出来，变成跟着你走的搭档。你走在路上突然想到一个 bug，不用掏手机、不用找电脑，眼镜里的 agent 已经在听着了。这种「计算跟着人走」的范式，可能才是第4类生产力计算机的真正形态。 laptop 是你去找电脑， Monako 是电脑跟着你。当 agents 成为主要工作伙伴时，计算形态会从「人追设备」变成「设备追人」。

译这副智能眼镜内置Arm Cortex A7处理器，运行完整的Buildroot Linux系统，可通过SSH直接运行Claude Code、Codex等编程工具。整个系统将于8月前开源至GitHub。其核心价值在于将编程智能体从桌面带到用户眼前，通过眼镜的视觉上下文和骨传导麦克风实现“计算跟人走”的实时协作，被视为一种新型的“Agent Terminal”。

郭明錤｜Ming-Chi Kuo@mingchikuo · 6月3日63

我對 NVIDIA RTX Spark 的幾個想法（先不討論規格細節）：裝置端 AI agent 敘事、實現檢視與 Apple WWDC 1. 核心是 NVIDIA CEO 黃仁勳提出的「重新發明 PC」口號，以及裝置端 AI agent workflow 的概念展示（會說概念展示，是因為沒有實機演示）。上述口號與概念展示，有助於短期內加速形成市場對裝置端 AI agent 的共識。 2. 裝置端 AI agent 展示概念元素： OS + cloud/local LLM switching + agent harness + cross-app workflow + sandbox 此概念並非原創，但藉由 GTC 的高曝光度與敘事張力，在可見未來將會主導裝置端 AI agent 使用者情境的敘事。 3. 雖然黃仁勳領先提出了裝置端 AI agent 的願景與敘事，但畢竟未來 2 年內，RTX Spark 裝置仍是筆記型電腦的利基市場，因此現在判斷商業競爭誰輸誰贏還太早。 4. 在 GTC 前，絕大部分關於 RTX Spark（N1X）的討論與預測都聚焦在晶片代號、規格與供應鏈；相較之下，作業系統的重要性鮮少被提及。而黃仁勳此次演說，將作業系統與晶片平台一同放在「重新發明 PC」的核心位置，這也呼應了我先前提出的核心觀點：裝置端 AI 推動升級換機潮的關鍵在作業系統。 5. 軟體是使用者體驗的關鍵。若要確保使用者能體驗到黃仁勳展示的 agentic workflow，仍有很多工作待完成。至少要看到 NVIDIA 的 CUDA Toolkit 公開支援 Windows Arm64，以及 Microsoft 讓 Windows 本機 AI agent 架構從預覽版走向正式商用（GA），包括目前仍在 public preview 的 MCP on Windows、ODR、agent 連接器，以及仍在 private preview 的 Agent Workspace。如果硬體發售時，上述開發與 OS 工具仍不到位，RTX Spark 裝置就很難兌現發表會的核心訴求，也就是讓使用者真正創造並體驗 AI agent workflow 這個關鍵賣點。 6. 在黃仁勳提出「重新發明 PC」的口號後，Apple 預計在 6 月 8 日舉辦的 WWDC，會如何回應裝置端 AI agent workflow，就變成除了 Siri 改善程度以外的另一個觀察重點。對 NVIDIA 與 Microsoft 而言，即使 RTX Spark 後續開發與出貨時程有任何變動，也無損這兩家公司在 AI 基礎建設的強勁成長動能。相較之下，消費電子就是 Apple 硬體事業的全部，而裝置端 AI 就是消費電子創新趨勢的主軸，因此 Apple 除了要提出吸引人的敘事外，也需要給出明確的實現規劃，例如更明確的開發工具、agent-ready OS 的更新時程等。

译郭明錤认为，NVIDIA CEO黄仁勋在GTC提出的“重新发明PC”口号，核心是展示端侧AI agent工作流概念。他指出，该概念并非原创，但将借助高曝光度主导未来叙事。然而，现实挑战在于：配备N1X芯片的RTX Spark装置未来两年出货量约1000万台，仍属利基市场；且当前PC主流AI应用与端侧算力关系不大。关键制约在于操作系统支持，Windows需完善相关工具才能兑现端侧AI agent体验。这也将影响Apple在WWDC上如何回应。

郭明錤｜Ming-Chi Kuo@mingchikuo · 6月3日63

A few thoughts on NVIDIA RTX Spark, setting aside the specs for now: the on-device AI agent narrative, a reality check on delivery, and Apple’s WWDC. 1. At the heart of it are two things: Jensen Huang’s “reinvent the PC” slogan and a concept demo of an on-device AI agent workflow. (I call it a concept demo because there was no live demo.) The slogan and concept demo should help speed up market consensus around on-device AI agents in the near term. 2. The key elements of the on-device AI agent concept: OS + cloud/local LLM switching + agent harness + cross-app workflow + sandbox The concept isn't new, but thanks to GTC's reach, it will likely shape how people talk about on-device AI agent use cases for the foreseeable future. 3. Jensen laid out the vision and narrative for on-device AI agents earlier than most. But over the next two years, RTX Spark devices will still be a niche slice of the laptop market, so it's too early to call who wins commercially. 4. Before GTC, most discussion and predictions around RTX Spark / N1X focused on its codename, specs, and supply chain. The operating system rarely came up. In his keynote, Jensen placed the OS alongside the chip platform at the heart of “reinventing the PC.” That echoes my earlier point: the operating system is the key to on-device AI driving the next upgrade cycle. 5. Software is what makes or breaks the user experience. For users to actually experience the agentic workflow Jensen showed, a lot still has to happen. At a minimum, NVIDIA’s CUDA Toolkit needs to officially support Windows Arm64, while Microsoft needs to move Windows’ on-device AI agent stack from preview to general availability (GA), including MCP on Windows, ODR, and agent connectors (all still in public preview), plus Agent Workspace (still in private preview). If these developer and OS tools still aren't in place when the hardware ships, RTX Spark devices will struggle to deliver on the keynote’s core promise: enabling users to actually create and experience AI agent workflows, the product’s core selling point. 6. After Huang's "reinvent the PC" pitch, how Apple responds to on-device AI agent workflows at WWDC (expected June 8) becomes another thing to watch, alongside how much Siri improves. For NVIDIA and Microsoft, even if RTX Spark's development or shipping timeline slips, it won't dent their strong growth in AI infrastructure. Apple is in a different position: consumer electronics is its entire hardware business, and on-device AI is where consumer electronics innovation is heading. So beyond a compelling narrative, Apple also needs to show a concrete plan to deliver, including clearer developer tools and an agent-ready OS update timeline.

译郭明錤分析了NVIDIA在GTC上提出的RTX Spark笔记本及设备端AI智能体概念。他指出，该概念演示（无实际现场展示）包含操作系统、云/本地LLM切换、智能体框架等要素。供应链调查显示，配备相关N1X芯片的设备未来两年出货量约1000万台，仍属利基市场。当前PC端主流AI应用仍依赖云端算力。若设备出厂时，NVIDIA CUDA Toolkit未正式支持Windows Arm64，且Microsoft的设备端AI智能体栈（包括MCP on Windows、ODR等）仍处于预览状态，则RTX Spark将难以兑现其核心卖点。此外，Apple在WWDC上如何应对设备端AI智能体工作流也值得关注。