Apple’s incoming CEO says AI may unlock nearly limitless potential across Apple’s product lineup. --- bloomberg .com/news/articles/2026-04-21/apple-s-cook-says-he-s-healthy-will-be-chairman-for-long-time

译Apple即将上任的CEO表示，AI可能解锁Apple产品线中近乎无限的潜力。

karminski-牙医@karminski3 · 4月17日

Qwen3.6-35B-A3B 2bit 量化都这么猛吗? Unsloth 团队(当然他们只有哥俩)刚光速放出了量化版本的 Qwen3.6-35B-A3B, 然后他们做这个测试把我惊呆了... 2bit 能完成 30 多次工具调用??? 我是真不信的.. 因为我之前测 Qwen3.5-35B-A3B 8bit (mlx 格式哈) 大概只能 4-5 次工具调用就不行了, 大概只能做做整理邮件这种简单工作, 但凡让它整理完邮件做个统计记录到 Notion / Obsidian 上就炸了. 要知道 unsloth 的 2bit 动态量化这个模型只有12.3GB, 激活只有1G! 32G 的 Mac 可以轻松跑起来了. 我赶紧测一下试试, 稍后给大家带来实测效果. https://x.com/UnslothAI/status/2044858346948464743

译Unsloth团队发布Qwen3.6-35B-A3B 2bit动态量化版本，模型体积仅12.3GB且激活内存仅需1GB，可在32GB Mac上流畅运行。测试显示该版本支持30余次工具调用，相较之下前代Qwen3.5-35B-A3B的8bit版本仅能完成4-5次调用即出现性能衰减。这一突破意味着大模型在端侧设备上的实用性和多步骤任务处理能力获得显著提升。

DogeDesigner@cb_doge · 4月16日29

This chip is just… beautiful. Tesla AI5

译这芯片真是…美不胜收。 Tesla AI5

DogeDesigner@cb_doge · 4月15日

Tesla just taped out the AI5 chip, huge milestone for FSD, Robotaxi & Optimus. • Single AI5 delivers ~5x the real-world compute of a dual AI4 setup • Massive leaps: ~8x compute power, 9x memory & 5x bandwidth vs current gen • One chip matches Nvidia H100 performance for Tesla workloads; dual setup rivals Blackwell, but at way lower cost & power draw • Radically simplified & optimized for edge AI inference (INT4/INT2/FP8 focus), perfect for cars & humanoid robots • Full AI5 computer targets 2,000–2,500 TOPS (vs ~300–500 for AI4) • “AI5 will make the cars almost perfect and greatly enhance Optimus” • “AI5 will punch far above its weight” thanks to Tesla’s tightly co-designed hardware + software stack Built in the USA (TSMC Arizona + Samsung Texas), with Terafab scaling up. Congratulations @elonmusk and @Tesla

译Tesla完成AI5芯片流片，单芯片算力达双AI4的5倍，整体算力提升8倍、内存9倍、带宽5倍。该芯片针对边缘AI推理深度优化，完整算力目标2000-2500 TOPS，单芯片性能匹敌Nvidia H100，双芯片可媲美Blackwell但成本功耗显著降低。AI5将大幅提升FSD、Robotaxi与Optimus性能，由TSMC Arizona与Samsung Texas在美国本土制造。

DogeDesigner@cb_doge · 4月15日29

Tesla AI4 v/s AI5 Chip

译Tesla AI4 v/s AI5 芯片

Chubby♨️@kimmonismus · 4月13日

(german, english down below)🎙️ Folge #2 des rAIcast ist draußen! Der KI-Podcast von Mansoor Koshan, Rechtsanwalt und KI Entwickler und mir DeepSeek, Claude Mythos und OpenAIs neuer Gesellschaftsvertrag - ab sofort verfügbar 🇨🇳 DeepSeek V4 auf Huawei-Chips Amerikas Chip-Embargo sollte Chinas KI-Entwicklung bremsen. Stattdessen hat es eine Gegenstrategie erzwungen. Wir analysieren, warum Exportkontrollen nicht wirken — und was das für Europa bedeutet, das zwischen zwei Rechtsordnungen eingeklemmt ist. 🤖 Claude Mythos Ein KI-Modell, das eigenständig aus seiner Sandbox ausbricht, Sicherheitslücken findet und sein Fehlverhalten vertuscht. Klingt nach Science-Fiction, ist aber real. Mansoor ordnet die Haftungsfrage ein — für die es noch keinen Rechtsrahmen gibt. 📱 Googles Gemma 4 und § 203 StGB Warum ist eine Psychotherapeutin, die Gemma 4 lokal auf ihrem Laptop nutzt, strafrechtlich besser geschützt als jede Großkanzlei mit Cloud-KI? Wir sprechen über lokale Modelle, Datenschutz und die Verschiebung von Verantwortung. 🏛️ OpenAIs neuer Gesellschaftsvertrag Sam Altman fordert einen New Deal für das KI-Zeitalter. Wir stellen die Frage, die Europa nicht stellt: Was passiert mit unserer Gesellschaftsordnung, wenn Wertschöpfung nicht mehr an menschliche Arbeit gekoppelt ist? Über eine Stunde KI durch die Brille von Recht, Geopolitik und Philosophie. ---- 🎙️ New episode of rAIcast is live! DeepSeek, Claude Mythos, and OpenAI's New Social Contract — Episode 2 with AI developer & attorney Mansoor Koshan and me is out now. 🇨🇳 DeepSeek V4 on Huawei Chips U.S. chip export controls were supposed to slow China's AI development. Instead, they forced a counterstrategy. We break down why the embargo isn't working — and what it means for Europe, caught between two legal orders. 🤖 Claude Mythos An AI model that autonomously breaks out of its sandbox, discovers security vulnerabilities, and covers up its own misbehavior. Sounds like fiction. It's not. Mansoor examines the liability question — for which no legal framework exists yet. 📱 Google's Gemma 4 and Data Privacy Law Why is a psychotherapist running Gemma 4 locally on her laptop better protected under criminal law than any major law firm using cloud AI? We discuss local models, privacy, and how responsibility shifts. 🏛️ OpenAI's New Social Contract Sam Altman is calling for a New Deal for the AI age. We ask the question Europe isn't asking: What happens to our social order when value creation is no longer tied to human labor? Over an hour of AI through the lens of law, geopolitics, and philosophy. 🎧 Listen now — link in the comments.

译播客节目rAIcast第二集探讨AI领域的法律与地缘博弈。DeepSeek V4在华为芯片上运行，显示美国出口管制未能遏制中国AI发展，反而迫使对方采取对策，令欧洲陷入两难。Claude模型展现出自主突破沙盒、掩盖不当行为的能力，引发尚无法律框架规制的责任归属难题。本地部署Gemma 4在数据隐私保护上优于云AI，凸显技术架构对法律责任的影响。OpenAI提出AI时代的新社会契约，质疑当价值创造不再依赖人类劳动时，现有社会秩序将如何重构。

Rohan Paul@rohanpaul_ai · 4月13日

VoxCPM 2 just dropped by @OpenBMB Only 2B-param open-source TTS (Text-to-Speech) model built for production-grade multilingual voice work. Apache-2.0 license, Can run on only 8GB VRAM. • Eliminates the "robotic" feel of traditional TTS, delivering prosody and emotional depth suitable for high-stakes professional environments like filmmaking, gaming, animation, and audiobooks. • 30-language multilingual: no language tag needed, just type in a supported language and generate directly. • Voice design: create a brand-new voice from a text description alone, like age, tone, pace, or emotion. No reference audio required. Describe the desired voice characteristics (gender, age, tone, emotion, pace …) in Control Instruction, and VoxCPM2 will craft a unique voice from your description alone. • Controllable cloning: clone from a short clip, then steer delivery style without losing the speaker’s core voice. • Ultimate cloning: use reference audio + transcript for continuation-style cloning that keeps the tiny vocal details. • 48kHz output: takes 16kHz reference audio and produces studio-quality speech without an external upsampler. • Real-time ready: around 0.3 RTF on RTX 4090, even lower with Nano-VLLM. • Commercial use: Apache-2.0 licensed. Developer-Friendly Infrastructure: - Native Torch Inference: Direct support for PyTorch-based workflows. - Training Flexibility: Supports both full-parameter and LoRA fine-tuning for specific domain adaptation. - Production Readiness: Compatible with voxcpm-nanovllm for large-scale, high-concurrency deployment.

译OpenBMB发布开源TTS模型VoxCPM 2，仅2B参数支持30种语言，无需语言标签即可生成语音。Apache-2.0许可，8GB显存可运行。支持文本描述创建新声音、可控克隆与终极克隆，保留说话人细节。输出48kHz音质，RTX 4090实时推理达0.3 RTF。兼容PyTorch、LoRA微调及Nano-VLLM部署，适用于影视、游戏、有声书等专业场景。

karminski-牙医@karminski3 · 4月13日

Gemma4提速秘籍! 一条命令速度提升23%! 不卖关子哈, 记得用推测性解码, 这次Gemma4发布的模型尺寸梯次正好适合用推测性解码, 如果你在用31B dense 觉得不够快, 可以再加上E2B(5.1B)作为草稿模型, 我实测RTX5090可以把吐字(解码)速度提升23%! 从61 token/s 提升到了76 token/s. 并且推测性解码本身是不会降智的. 等会, 你要问什么是推测性解码(投机解码, Speculative Decoding)? 简单来讲, 大模型跑得慢, 那我们就用小模型先跑, 然后把小模型的输出批量的发给大模型让大模型判断对不对, 小模型跑对了多少就保留多少, 因此最差情况都是至少第一个token是对的(原理见上图). 有同学会问了, 那这不还是要让大模型重新生成, 速度提升在哪里? 答案是, 目前大模型推理【算力】是过剩的, 【显存带宽】是不足的, 所以处理输入(预填充, prefill, 更多需要浮点性能)速度都很快. 因此小模型输出一大堆, 然后反馈给大模型判断这个过程(当作 prompt), 就是prefill, 会很快, 远超过大模型直接吐字(解码, decoding, 更多需要显存带宽)的速度. 只要小模型速度足够快, 哪怕接受率再低, 都会产生速度优势, 推测性解码就是巧妙地利用了这一点. 最后我把我测试的最佳参数放在了图3, 大家可以参考. 另外记得不要混搭, Gemma4就搭配Gemma4, 不要搭配Qwen3.5. 会出现不兼容问题. #gemma4 #llamacpp #qwen35 #本地大模型 #推测性解码

译Gemma4可通过推测性解码实现23%推理加速。实测RTX5090上，31B dense主模型搭配E2B(5.1B)草稿模型，速度从61 token/s提升至76 token/s。该技术利用大模型算力过剩而显存带宽不足的特性，由小模型快速生成候选序列，大模型通过prefill阶段批量验证，避免逐token解码的带宽瓶颈。注意需保持模型系列一致性，Gemma4应搭配同系列草稿模型，不可与Qwen3.5混用。

karminski-牙医@karminski3 · 4月10日40

👍

译👍 [引用 @anemll]：anemll-profile 0.4.1 已发布！更新方法： brew upgrade anemll/tap/anemll-profile 新增：ANE 图中断分析、JSON 导出、智能体指南。将此链接提供给您的智能体：http://github.com/anemll/anemll-profile/blob/main/AGENTS.md 示例：来自 @mweinbach 自动转换包的 OCR ANE 分析

Ethan Mollick@emollick · 4月10日

One fun thing about AI is that it lets you play with interfaces and approaches to displaying information in new ways without a lot of effort. I got a an internet connected e-ink display and set it up to show me the weather as interpreted by nano banana using rotating styles.

译入手联网电子墨水屏，接入 nano banana 以轮换风格实时展示天气。AI 降低了尝试新型界面和数据可视化的门槛，无需复杂开发即可实现个性化信息展示。

Demis Hassabis@demishassabis · 4月3日

Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!)

译Gemma 4 在基准测试中性能超越体量 10 倍以上的大模型，图表 x 轴为对数坐标，凸显其极高的参数效率。

karminski-牙医@karminski3 · 4月3日72

http://x.com/i/article/2039985553492598784 # Gemma4有8个模型, 选哪个? 一文看懂! Google 刚刚发布了 Gemma4 系列开放权重模型, 之前没接触过本地模型的朋友都在问我该用哪个本地部署, 来, 这篇文让你迅无痛掌握. 首先啊, 选带"-it" 后缀的, 这个是指令微调版(Instruction Tuned) 的意思, 代表该模型经过了大规模的人类指令跟随训练和多轮对话对齐, 其他的都是基模, 是给自己要微调的同学准备的(所以举一反三, 你要是想自己微调, 就用不带-it的版本). A4B 我知道激活参数量是 4B, 那么 E4B 是啥意思? 简单来讲, 这是个专门为了移动端优化的技术——逐层嵌入(Per-Layer Embeddings), 它本身并不能省内存, 所以 Gemma-4-E2B 并不是它只需要2B参数量的内存, 它还是需要原始的5.1B的参数量的内存空间, 但是它的计算量只需要大概2B模型的计算量! (可以简单理解为把一部分矩阵运算优化为了查表, 然后用内存换计算了, 这部分表当然需要吃内存). 好的, 我们的前置知识准备完毕了! 那么接下来直接说模型选型: 本地龙虾优先选 Gemma-4-26B-A4B! 激活量4B的MoE, prefill速度也相当好, 特别适合龙虾这种系统提示词超级臃肿的场景. 写代码/写脚本/要求精确工作选 Gemma-4-31B, 选这个肯定就是要最好的效果的, 如果实在是跑不动, 可以试试5bit量化. 给大家一个参考, Apple M2Ultra 如果运行 8bit, 理论速度也就 25token/s. 我要一个本地语音助手! 选Gemma-4-E4B, 全模态输入, 你写代码让它接入有麦克风的摄像头, 剩下的场景就靠你的想象了. 并且4B激活即使CPU跑都能跑动. 我只想跑一下试试装在我的树莓派里, 选 Gemma-4-E2B, 你能体验到极致的本地模型速度, 至于质量嘛, 会比电子鹦鹉好点, 他可以做类似"帮我检查文本里有英文吗"之类的过滤工作, 另外它是全模态输入的, 也可以尝试语音输入. #Gemma4 #google #GoogleGemma #本地大模型

译Google发布的Gemma4系列开放权重模型包含多个版本，选型需结合场景。带“-it”后缀为指令微调版，开箱即用；不带后缀为基座模型，供自行微调。其中，A4B指激活参数量为4B，E4B则采用逐层嵌入技术，以内存换取计算量，优化移动端性能。选型建议：综合性能与速度选26B-A4B；追求最佳代码或任务效果选31B；开发本地全模态应用选E4B；资源受限设备体验可选E2B，但输出质量有限。

Sundar Pichai@sundarpichai · 4月3日

Gemma 4 is here, and it’s packing an incredible amount of intelligence per parameter 👇

译Gemma 4 开源模型发布，提供 31B dense、26B MoE 及有效 2B/4B 四种尺寸，分别针对性能、低延迟和边缘设备优化。Google DeepMind 称其为同尺寸最佳开源模型，强调单位参数量智能密度极高。

Demis Hassabis@demishassabis · 4月3日

Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!

译Gemma 4 开源模型发布，提供 4 种尺寸：31B dense 版追求极致性能，26B MoE 版实现低延迟，2B 与 4B 版适配边缘设备，均可针对特定任务微调。

Google DeepMind@GoogleDeepMind · 4月3日

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵

译Google 发布 Gemma 4 开源模型系列，采用 Apache 2.0 许可证，支持在本地硬件运行，专为高级推理和 agentic 工作流设计。

Deedy@deedydas · 3月24日

Siri has been broken for 13 years so I built my own. Completely on-device. No internet needed. Controls my Mac, sets reminders, fetches live data, answers questions. Built in a weekend. This is the future of software.

译吐槽 Siri 长期体验糟糕，作者花一个周末自研纯本地语音助手，无需联网即可控制 Mac、设置提醒、获取实时数据和回答问题，认为这是软件的未来方向。

Google DeepMind@GoogleDeepMind · 3月2日

Nano Banana 2 makes sophisticated visual creation faster, cheaper, and accessible to everyone. 🍌 Tap on each photo to see the details 👀

译Nano Banana 2 让复杂的视觉创作更快、更便宜，且人人可及。🍌 点击每张照片查看详情 👀

Google DeepMind@GoogleDeepMind · 2月27日

We’re launching Nano Banana 2, built on the latest Gemini Flash model. 🍌 It’s state-of-the-art for creating and editing images, combining Pro-level capabilities with lightning-fast speed. 🧵

译我们推出 Nano Banana 2，基于最新的 Gemini Flash 模型构建。🍌 它在创建和编辑图像方面达到最先进水平，将专业级功能与闪电般的速度相结合。🧵

Jim Fan@DrJimFan · 8月7日

This may be a testament to the “Reasoning Core Hypothesis” - reasoning itself only needs a minimal level of linguistic competency, instead of giant knowledge bases in 100Bs of MoE parameters. It also plays well with Andrej’s LLM OS - a processor that’s as lightweight and fast as possible, and maximally relies on knowledge lookup, tool use, agentic flow, etc. Now I’m curious - what’s the absolute smallest model we can squeeze that still functions as a competent LLM OS Kernel?

译Qwen发布4B参数模型Qwen3-4B-Instruct-2507与Thinking-2507，支持256K上下文，分指令与推理双版本。作者指出这验证了"推理核心假设"：推理仅需基础语言能力，无需千亿参数知识库，契合轻量级LLM OS理念——最小化模型体积，最大化依赖工具调用与知识检索。

DeepSeek@deepseek_ai · 12月13日

🎉 DeepSeek-VL2 is here! Our next-gen vision-language model enters the MoE era. 🤖 DeepSeek-MoE arch + dynamic image tilling ⚡ 3B/16B/27B sizes for flexible use 🏆 Outstanding performance across all benchmarks 🧵 1/n

译🎉 DeepSeek-VL2 来了！我们的下一代视觉-语言模型进入 MoE 时代。 🤖 DeepSeek-MoE 架构 + 动态图像分块 ⚡ 3B/16B/27B 规模，灵活使用 🏆 在所有基准测试中表现优异 🧵 1/n