There is a single thin-film material required by every AI chip on earth. GPUs, TPUs, custom ASICs. All of them. 98% of global supply controlled by one Japanese chemical company. Zero production-ready alternatives. One producer fully booked through 2027. Raising prices. Lead times past 6 months. NVIDIA is so scared they're paying half the capex to expand supplier fabs themselves. The keyword is “umami”. Nobody's talking about this. They will be in about

译地球上每一块AI芯片都需要一种单一的薄膜材料。GPU、TPU、定制ASIC。无一例外。全球98%的供应由一家日本化学公司控制。目前没有可投入生产的替代品。唯一的生产商产能已排期至2027年。正在提价。交货期超过6个月。NVIDIA如此担忧，以至于他们自行承担一半资本支出以扩建供应商的晶圆厂。关键词是“umami”。目前无人讨论此事。但很快人们就会开始关注。

AK@_akhaliq · 4月29日58

Apple presents Stochastic KV Routing Enabling Adaptive Depth-Wise Cache Sharing paper: https://huggingface.co/papers/2604.22782

译Apple 推出随机键值路由实现自适应深度缓存共享论文: https://huggingface.co/papers/2604.22782

OpenRouter@OpenRouter · 4月28日61

We studied data across the market for Opus 4.7 and found that costs increased 12–27%, with the exception of short prompts, which actually got more cost efficient. Full post: https://openrouter.ai/announcements/opus-47-tokenizer-analysis

译我们研究了市场上Opus 4.7的数据，发现成本增加了12-27%，但短提示除外，实际上短提示的成本效益更高。完整文章：https://openrouter.ai/announcements/opus-47-tokenizer-analysis

Ant Ling@AntLingAGI · 4月28日62

🥳 It has always been our pleasure to work with the SGLang team, as we all believe in fast and stable inference is the key to our valuable users' experience.🫡 Hope you all enjoy Ling-2.6-flash (aka Elephant-alpha) 🐘⚡️⚡️ 打满～打满～～ 😝

译AntLingAGI与SGLang团队合作，正式推出Ling-2.6-flash（亦称Elephant-alpha）即时指令模型，并在SGLang平台上实现了首发支持。该模型总参数量达104B，但活跃参数仅7.4B，专为低延迟的智能体工作流优化，能够实现即时响应。它在编码、文档处理和智能体任务中展现出极高的token效率，所用token数量显著减少。尽管活跃参数较少，其模型质量仍与当前SOTA水平相当，兼具速度与执行力，适合需要快速响应的生产级智能体应用。团队强调，快速且稳定的推理是提升用户体验的关键。

TestingCatalog News 🗞@testingcatalog · 4月28日58

Plurai introduced vibe-training 👀 A new way to build real-time, tailored evals and guardrails for your agent, with high accuracy at a fraction of the LLM cost. > Goes from intent to a production-ready API endpoint in minutes > SLMs run at sub-100ms latency, over 8x cheaper than LLM-as-a-judge > 43% fewer failures reaching users vs frontier LLM judges

译Plurai公司推出vibe-training方法，旨在解决AI代理在演示中表现良好但面对真实用户时易出错的痛点。该方法允许开发者通过提示或少量示例快速定义意图，自动生成边缘案例数据集，并训练出与特定用例对齐的专用模型。其核心优势在于能在几分钟内构建生产就绪的实时评估与防护机制，使用小型语言模型实现低于100毫秒的延迟，成本比使用大型语言模型作为评判器降低超过8倍，并将到达用户的故障率较前沿LLM评判器减少43%，从而以极低成本实现超越现有大模型的性能。

Chubby♨️@kimmonismus · 4月28日42

OpenAI and Anthropic are racing to build the smartest agents. Base44 is building the platform those agents actually need to run on. While the labs ship models, Base44 just shipped: One-click migration from 6 major platforms Schema reconstruction Custom UI generation Agent workflows on top of your data Different layer of the stack. Maybe the more important one. Pretty cool, ngl

译在OpenAI和Anthropic等公司专注于开发AI模型本身的同时，Base44另辟蹊径，致力于构建这些智能体实际运行所需的基础设施平台。该平台最新推出了一键迁移功能，支持用户将项目从Salesforce、Shopify、WordPress、Lovable、Bolt和Replit这六大平台快速迁移至Base44。迁移过程不仅包含数据转移，还提供数据结构重建、自定义界面生成服务，并允许用户在自身数据之上构建智能体工作流。为庆祝功能发布，在5月5日东部时间零点前完成迁移的用户可获得25个免费积分。Base44此举凸显了其在AI技术栈中专注于可能更为关键的基础层。

Mistral AI@MistralAI · 4月28日57

🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. 🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes from prototype to production, with the durability, observability, and fault tolerance that production actually requires. Leading organisations like ASML, ABANCA, CMA-CGM, France Travail, La Banque Postale, Moeve, and many others are already using Workflows to automate critical processes.

译企业AI编排平台Workflows发布公开预览版，旨在解决企业团队将AI模型可靠投入生产环境的难题。该平台专注于为AI赋能的业务流程提供生产环境所需的持久性、可观测性和容错能力，帮助其从原型阶段过渡到实际生产。包括ASML、ABANCA、CMA-CGM在内的多家领先机构已使用Workflows来自动化其关键流程。

Alibaba Cloud@alibaba_cloud · 4月28日34

@AkoolInc x Alibaba Cloud: 60% less flickering in AI video! ⚡️ ● Integrates Wan, Qwen-Image & Qwen-VL ● 40-60% cost reduction via flexible deployment ● 180+ languages translated in minutes Commercial AI video at scale. Read the success story: https://int.alibabacloud.com/m/1000412139/

译@AkoolInc x 阿里云：AI视频闪烁减少60%！⚡️ ● 集成Wan、Qwen-Image & Qwen-VL ● 通过灵活部署实现40-60%成本降低 ● 支持180+种语言分钟级翻译规模化商用AI视频。阅读成功案例：https://int.alibabacloud.com/m/1000412139/

Alibaba Cloud@alibaba_cloud · 4月28日39

🚀 Alibaba Cloud AI Gateway now supports DeepSeek V4! ☁️ Plug-and-play via OpenAI/Anthropic-compatible APIs ☁️ Smart routing + automatic fallback (e.g., Qwen) ☁️ Full support for 1M-context, Tool Calls, and thinking mode ☁️ Unified management for security, quotas, and observability 🧠 Deploy DeepSeek V4 in production—securely, reliably, and at scale! 🔗 Learn more: https://int.alibabacloud.com/m/1000412507/

译🚀 阿里云AI网关现已支持DeepSeek V4！ ☁️ 通过OpenAI/Anthropic兼容API即插即用 ☁️ 智能路由 + 自动回退（例如，Qwen） ☁️ 全面支持100万上下文、工具调用及思考模式 ☁️ 统一管理安全性、配额与可观测性 🧠 安全、可靠、大规模地部署DeepSeek V4至生产环境！ 🔗 了解更多：https://int.alibabacloud.com/m/1000412507/

Alibaba Cloud@alibaba_cloud · 4月28日33

Your media library should be a valuable asset, not a liability. Alibaba Cloud’s Media AI solution provides a unified AI platform that understands, organizes, and accelerates your entire media workflow by automatically tagging and summarizing video content, moderating content at the frame level, and enabling AI search across multimodal content. So your content finally starts working for you. 🔗 https://int.alibabacloud.com/m/1000412499/

译您的媒体库应成为宝贵资产，而非负担。阿里云媒体AI解决方案提供统一的AI平台，通过自动标记和总结视频内容、帧级内容审核以及支持跨模态内容的AI搜索，来理解、组织并加速您的整个媒体工作流程。让您的内容最终为您所用。 🔗 https://int.alibabacloud.com/m/1000412499/

Alibaba Cloud@alibaba_cloud · 4月28日36

🚀 Alibaba Cloud Releases DDoS Security Operations Agent (Anti-DDoS SecOps Agent) Powered by LLMs, this cloud-native security agent supports natural language interaction and automates the generation of protection policies. Learn more：https://int.alibabacloud.com/m/1000412296/

译🚀 阿里云发布DDoS安全运维代理（Anti-DDoS SecOps Agent）该云原生安全代理由大语言模型驱动，支持自然语言交互并自动生成防护策略。了解更多：https://int.alibabacloud.com/m/1000412296/

SiliconFlow@SiliconFlowAI · 4月28日43

Builders are voting with their tokens 🔥 SiliconFlow is now the #1 third-party model provider by daily token usage On @OpenRouter , • ~280B tokens/day • ~1.9T tokens/month • 33 frontier models: DeepSeek V4 series, GLM 5.1, Kimi K2.6 etc. Big thanks to every dev building with us And more is coming🚀

译开发者们正在用他们的代币投票 🔥 SiliconFlow 现已成为日使用代币量排名第一的第三方模型提供商在 @OpenRouter 上， • 每日约 2800 亿代币 • 每月约 1.9 万亿代币 • 33 个前沿模型：DeepSeek V4 系列、GLM 5.1、Kimi K2.6 等衷心感谢每一位与我们共同构建的开发者更多精彩即将到来 🚀

Alibaba Cloud@alibaba_cloud · 4月28日35

🚀 Claw Talks Ep2 | Bring Claw to Work with QoderWork & Quick BI ⏰ Apr 29, 2026 | 5:00 PM (UTC+8) 👉 Live: https://youtu.be/cK3qfRTjgWE See how QoderWork makes AI a true work partner—enabling secure desktop automation and seamless Quick BI integration for analytics, reporting, content creation, and workflows. 📌 Join live and see the future of enterprise productivity! #AlibabaCloud #ClawTalks #QoderWork #QuickBI #EnterpriseAI

译🚀 Claw Talks 第二期 | 携手 QoderWork 与 Quick BI，将 Claw 带入工作场景 ⏰ 2026年4月29日 | 下午5点（UTC+8） 👉 直播链接：https://youtu.be/cK3qfRTjgWE 了解 QoderWork 如何让 AI 成为真正的工作伙伴——实现安全的桌面自动化，并与 Quick BI 无缝集成，助力分析、报告、内容创作和工作流。 📌 加入直播，见证企业生产力的未来！ #AlibabaCloud #ClawTalks #QoderWork #QuickBI #EnterpriseAI

Baidu Inc.@Baidu_Inc · 4月28日49

GenFlow 4.0 is live, and it's already serving 100M+ monthly active users with 200M tasks completed each month! 🚀 Jointly released by Baidu Wenku and Baidu Drive, GenFlow 4.0 is a major upgrade to our general AI Agent, with a fully revamped Office Agent at its core. Users can now invoke PowerPoint, Excel, and Word Agents in parallel from a single prompt. GenFlow 4.0 is also deeply integrated with OpenClaw, deployable in one click from the Baidu Drive PC or mobile app, turning Baidu Drive into a personal AI workspace. More to come at Baidu Create 2026 in Beijing May 13-14, where we'll explore this year's theme: "Agents at Scale."

译百度文库与百度网盘联合推出的GenFlow 4.0已正式上线，每月服务超过1亿活跃用户并处理2亿项任务。此次升级的核心是全新的Office Agent，用户可通过单一提示并行调用PowerPoint、Excel和Word代理。该版本深度集成OpenClaw，支持从百度网盘PC端或移动应用一键部署，将网盘转化为个人AI工作空间。更多进展将于2026年5月13日至14日在北京举行的百度Create大会上公布，大会主题为“Agents at Scale”。

SemiAnalysis@SemiAnalysis_ · 4月28日57

8x VLLM CUDA MOAT ALERT: InferenceX has added @deepseek_ai V4 Pro for @vllm_project for day 3 performance across B200, B300, H200, GB200 disagg. We are seeing that B300 is up to 8x faster than H200. The team is working on benchmarking vLLM 0.20 which has the new DeepGEMM MegaMoE which fuses EP dispatch/EP combine/GEMMs & SwiGLU activations into a single mega-kernel, we believe that the perf will be even better. Thank you to vLLM maintainers from @NVIDIAAI & @rogerw0108 & team from @interact for their passion for open source & burning the midnight oil over the weekend!

译InferenceX已将DeepSeek V4 Pro集成至vLLM项目，在B200、B300、H200和GB200等硬件上的性能测试显示，B300的推理速度比H200快达8倍。团队正在基于vLLM 0.20版本进行基准测试，该版本引入了全新的DeepGEMM MegaMoE技术，将专家并行调度、组合、通用矩阵乘法及SwiGLU激活函数融合为单一巨型内核，预计将带来更优性能。文中感谢了来自NVIDIA AI、社区贡献者及相关团队的开发人员对开源项目的投入与努力。

TestingCatalog News 🗞@testingcatalog · 4月28日75

OpenAI models are coming to AWS Badrock in the coming weeks. Besides that, Amazon is hosting a livestream on April 28 with OpenAI leaders. > Watch Matt Garman, Colleen Aubrey, Julia White, and OpenAI leaders in a candid conversation about what's next with agentic AI.

译OpenAI模型将在未来几周内通过AWS Bedrock向客户提供。此举旨在为开发者提供更多模型选择，以适配不同任务需求。同时，亚马逊将于4月28日举办直播活动，AWS高管Matt Garman、Colleen Aubrey、Julia White将与OpenAI领导人进行对话，探讨智能体AI（agentic AI）的未来发展方向。相关详细信息将在旧金山的AWS活动中公布。

Berryxia.AI@berryxia · 4月28日54

Minmax 的 Music-2.6 本周在 Cloudflare 上免费使用！从文本提示生成完整长度的歌曲或器乐作品，并可选自动生成歌词。直接开整吧！！！

TestingCatalog News 🗞@testingcatalog · 4月28日55

Meta partners with Overview Energy to bring up to 1 GW of space solar energy to Earth! Meta also partners with Noon Energy to deploy up to 1 GW/100 GWh of energy storage. It is Meta vs SpacexAI now 👀

译Meta与Overview Energy合作，将高达1吉瓦的空间太阳能输送至地球！ Meta还与Noon Energy合作，部署高达1吉瓦/100吉瓦时的储能系统。现在是Meta对阵SpacexAI了 👀 [引用 @Meta_Engineers]：这些与Overview和Noon的合作延续了我们的一贯策略，即通过多元可靠的解决方案助力电网强化，并为我们的AI基础设施供电。了解更多：https://go.meta.me/635755

SemiAnalysis@SemiAnalysis_ · 4月28日44

While Intel gains external customers using EMIB – Google's TPU being the big one – they're moving away from it for their own products. Diamond Rapids will likely use UCIe over substrate for a long-reach die-to-die interconnect instead. At ISSCC, Intel showed a UCIe-S D2D link on 22nm hitting 48 Gb/s/lane over standard organic substrate at a reach of up to 30 mm. Beat a 3nm design with 3× higher data rate and 2.8× higher bandwidth density. 5-2-5 substrate vs 11-2-11 on EMIB. With substrate in short supply, Intel's "best" packaging tech – for everyone but Intel.

译Intel的EMIB封装技术被外部客户如Google的TPU采用，但Intel自身产品正转向UCIe技术。Diamond Rapids预计使用UCIe over substrate实现长距离die-to-die互连。在ISSCC上，Intel展示了UCIe-S D2D链接，在22nm工艺下达到48 Gb/s/lane，距离达30mm，数据率和带宽密度优于3nm设计。substrate配置为5-2-5，而EMIB为11-2-11。由于substrate短缺，Intel的“最佳”封装技术对除Intel外的客户更具优势。

Rohan Paul@rohanpaul_ai · 4月28日56

Optimizing RAG for precision can quietly hurt retrieval accuracy by 40%, putting agentic pipelines at risk. Redis says in new research that enterprise teams fine-tuning RAG embedding models for improved precision may be unknowingly reducing the retrieval quality those pipelines need. Training embeddings to notice meaning-level edits can damage the retrieval they were built for. This paper says 1 embedding cannot do broad search and exact meaning checks at the same time. The reason is simple. A dense retriever squeezes an entire sentence into one vector, then asks cosine similarity to decide both topical relevance and exact meaning. That works well when the job is broad recall. It works much less well when the difference is structural, like “the dog bit the man” versus “the man bit the dog,” or a negation that reverses the claim. Here’s the deeper point. When you force one embedding to separate those near-misses, you spend representational space that was previously helping the model group related material across domains. The paper shows that this extra sensitivity is uneven. Negation and spatial flips improve, but binding errors remain stubborn, which is precisely the kind of mistake that matters in contracts, compliance, and other role-sensitive work. So the fix is not to keep squeezing harder on the same vector. The better design is two-stage retrieval: use embeddings for fast recall, then verify the shortlisted results with token-level comparisons that can actually see structure. That is also why MaxSim helps relevance but still misses identity-level errors, while a small Transformer over token similarity maps does better at rejecting near-misses. The real lesson is not that RAG fails. It is that “almost the same sentence” is not the same thing as “the same meaning,” and systems that blur those two will fail most confidently where precision matters most. ---- Paper Link – arxiv. org/abs/2604.16351 Paper Title: "Training for Compositional Sensitivity Reduces Dense Retrieval Generalization"

译最新研究发现，企业为提升精确性而微调RAG嵌入模型，可能导致检索质量下降高达40%。其核心矛盾在于，单个密集嵌入向量被同时要求承担广泛主题召回和精确语义判别的双重任务。当强制模型区分细微结构差异（如否定、语序颠倒）时，会损害其跨领域聚合相关材料的能力。解决方案是采用两阶段检索：先用嵌入模型快速召回，再通过能感知结构的词元级比对来验证候选结果。这揭示了“几乎相同的句子”与“相同含义”本质不同，在合同、合规等高精度领域混淆二者将导致系统关键失效。

宝玉@dotey · 4月28日74

GitHub Copilot 从 6 月 1 日起改按用量计费。订阅价格没变，但"用多少付多少"的规则会让重度用户的账单变得不太确定。过去一年，Copilot 从一个编辑器里的补全助手，变成了能跑多步骤、跨整个代码仓库的 Agent 编程平台。一个简单的聊天提问和一次跑几个小时的自动编程任务，以前消耗的“高级请求次数”可能一样多，GitHub 一直在背后默默吸收那些飙升的推理成本。现在扛不住了。新规则的核心：取消“高级请求次数”，换成 AI 积分（AI Credits）。积分按 Token 消耗计算，包括输入、输出和缓存的 Token，费率跟各模型的 API 定价挂钩。各档订阅价不变，每月自动到账与订阅价等额的积分：Pro 是 10 美元对应 10 美元积分，Pro+ 是 39 美元对 39 美元，Business 19 美元/人，Enterprise 39 美元/人。代码补全和“下一步编辑建议”这类基础功能不消耗积分，跟以前一样包含在订阅里。有两个细节值得注意。第一，以前高级请求用完了还能降级到便宜模型继续干活，以后这条退路没了，积分花完就是花完，除非买更多或者管理员开了预算。第二，Copilot 的代码审查功能除了消耗 AI 积分，还会额外消耗 GitHub Actions 的运行时长。企业用户有三个月的过渡缓冲：6 月到 8 月，Business 用户每月拿到 30 美元积分（比订阅价多 11 美元），Enterprise 拿到 70 美元（多 31 美元）。企业还能把团队成员的积分打通成资源池，用不完的不浪费。 5 月初 GitHub 会上线预览账单功能，让你在正式切换前看看自己按新规则大概要花多少钱。年付用户暂时不受影响，到期后才会转到新体系。对轻度用户来说，这次变化几乎无感。但如果你已经习惯了让 Copilot Agent 跑长任务，6 月之后最好盯一下账单。

译GitHub Copilot 将于6月1日起改用基于AI积分的用量计费模型，以支持更多Agent和高级工作流。各档订阅价格不变，每月赠送等额积分，代码补全等基础功能不消耗积分。新规则按Token消耗计费，积分用尽后无降级选项，代码审查会额外消耗Actions时长。企业用户有三个月过渡期及积分池福利。5月初将上线账单预览功能，年付和轻度用户受影响小，但重度用户需关注成本变化。

François Chollet@fchollet · 4月28日60

Keras Kinetic has a new alpha release: v0.0.2! Including a new docs website: http://kinetic.readthedocs.io Kinetic is my favorite new release from the Keras team: a super simple Modal-like API to run training jobs on TPU.

译Keras Kinetic 发布了新的 alpha 版本：v0.0.2！包括新的文档网站：http://kinetic.readthedocs.io Kinetic 是我最喜欢的 Keras 团队新发布：一个超级简单的类 Modal API，用于在 TPU 上运行训练任务。

Google AI Developers@googleaidevs · 4月28日52

Zoom in on how @GoogleGemma 4 is optimized to handle high-concurrency serving for complex tasks (such as generating SVGs) — on a single GPU. ✓ 10+ sessions are sent to the 26B A4B model ✓ The system routes, accelerates, and processes those workloads — without bottlenecking ✓ A live dashboard visually tracks the load balancing in real time, displaying active slots, context sizes, and token generation speeds Watch the demo to see it in action ⬇️

译深入了解 @GoogleGemma 4 如何优化以在单个 GPU 上处理高并发复杂任务（例如生成 SVG）。 ✓ 10 多个会话被发送到 26B A4B 模型 ✓ 系统路由、加速并处理这些工作负载——没有瓶颈 ✓ 实时仪表板可视化跟踪负载均衡，显示活动槽位、上下文大小和令牌生成速度观看演示视频以了解实际运行情况 ⬇️

SemiAnalysis@SemiAnalysis_ · 4月28日38

Great block diagram, @GoogleCloudTech. There is an error here. The HBM3E is 12-hi not 8-hi as per the diagram. At 6 stacks of HBM capacity for TPU 8t, it must be 12-hi to achieve the 216GB of HBM capacity quoted.

译很棒的框图，@GoogleCloudTech。这里有个错误。根据图示，HBM3E 是 12 层而非 8 层。对于 TPU 8t 的 6 个 HBM 堆栈容量，必须是 12 层才能达到所引用的 216GB HBM 容量。

elvis@omarsar0 · 4月28日69

How do AI agents spend your money:

译一项针对AI智能体在编码任务中token消耗成本的系统性研究发现，其消耗量可达聊天或代码推理的约1000倍，且相同任务在不同运行中的消耗差异高达30倍。更高的token支出并不直接带来更高的准确性，性能在中等成本时达到峰值后趋于饱和。模型自身也难以预测其token使用量，自我预测相关性最高仅0.39。不同模型在相同任务上可能多消耗150万token而并无质量提升。这表明智能体的运行时成本具有高方差、与质量关联弱、甚至模型自身也无法预测的特性，这将影响团队的预算规划、模型间路由策略以及终止任务运行的决策。

凡人小北@frxiaobei · 4月28日34

OpenClaw 这么频繁的发版，我一开始以为他们掌握了什么自动化测试的黑科技，直到我最近做了两次升级。 😤 虽然 AI 时代讲究一个快，但好歹咱尊重下测试环节。

SemiAnalysis@SemiAnalysis_ · 4月27日50

PALISADES TAHOE, APRIL 26, 2026 — InferenceX has added DeepSeekv4 MTP support with chat template for @sgl_project's B300! Great Work to @radixark @liin1211 for the engineering! Massive interactivity gains, and 7x throughput at iso-interactivity!

译PALISADES TAHOE，2026年4月26日 — InferenceX 已为 @sgl_project 的 B300 添加了 DeepSeekv4 MTP 支持及聊天模板！感谢 @radixark @liin1211 的工程贡献！交互性大幅提升，在同等交互性下吞吐量提高7倍！

Jeff Dean@JeffDean · 4月27日56

The video of my conversation with Amin Vahdat, @gilbert, and @djrosent at Cloud Next last week is now up. https://youtu.be/BpnJYJmbXcM?si=vUY3hI_aDX8K6gco Thanks for a great conversation!

译视频记录了在Cloud Next大会上与Amin Vahdat及AcquiredFM主持人的对话，核心围绕谷歌最新发布的TPU v8t和v8i芯片展开讨论。对话内容基于官方博客公布的芯片细节，探讨了其在“智能体时代”的基础设施意义与技术亮点。

Chubby♨️@kimmonismus · 4月27日68

Google just broke a decade-long tradition. At Cloud Next 2026, the company unveiled not one, but two new AI chips, the TPU 8t for training and TPU 8i for inference. For the first time ever, Google is splitting its custom silicon into specialized architectures instead of relying on a one-size-fits-all design. The TPU 8t superpod packs 9,600 liquid-cooled chips delivering 121 FP4 ExaFlops of peak compute, roughly a 3x leap over the previous generation. The TPU 8i delivers 80% better performance-per-dollar than its predecessor, with triple the on-chip memory and a new Boardfly topology that cuts network latency in half. The important aspect: Anthropic, Meta, and now OpenAI are buying multi-gigawatt allocations of TPU capacity. OpenAI booking Google silicon is a first visible crack in NVIDIA's grip on frontier AI training. Broadcom co-designed the TPU 8t, while MediaTek handles the TPU 8i, both fabbed by TSMC. NVIDIA still holds 81% of the AI chip market, but the era of serious competition has officially begun.

译Google在Cloud Next 2026上首次将定制芯片拆分为专用架构，推出训练芯片TPU 8t与推理芯片TPU 8i。TPU 8t超级模块配备9600个液冷芯片，峰值算力达121 FP4 ExaFlops，较前代提升约3倍；TPU 8i的性价比提升80%，片上内存增至三倍，并通过新拓扑结构将网络延迟减半。Anthropic、Meta及OpenAI均已采购千兆瓦级TPU算力，其中OpenAI首次采用Google芯片，动摇了NVIDIA在前沿AI训练市场的垄断地位。两款芯片分别由Broadcom和MediaTek共同设计，TSMC代工。尽管NVIDIA仍占据81%的AI芯片市场份额，但实质性的竞争时代已拉开序幕。

Chubby♨️@kimmonismus · 4月27日63

Google's TPU v8 and Huawei's Ascend NPU platform: the global Chipwar just began At Cloud Next 2026, Google unveiled its eighth-generation TPU as two separate chips for the first time: the TPU 8t for training and the TPU 8i for inference, claiming up to 2.8x faster training and 80% higher performance per dollar for inference compared to last year's Ironwood. The 8t was designed by Broadcom, the 8i by MediaTek, applying mobile-edge efficiency logic to inference while maximizing raw throughput on training. The 8t connects up to 9,600 accelerators via optical-circuit switches, dwarfing NVIDIA's 576-GPU NVLink domain, and a new Virgo network fabric scales beyond one million chips for a single training job. Google is also replacing x86 hosts with its own Arm-based Axion CPUs, completing full vertical control from host to accelerator to network. The message is clear: the general-purpose AI accelerator is a fading category. DeepSeek V4 on Huawei Ascend: China's parallel infrastructure takes shape DeepSeek's V4 release is the more geopolitically consequential event. The 1.6 trillion-parameter V4-Pro is the first major frontier model to validate both training and inference on Huawei's Ascend NPU platform alongside NVIDIA GPUs. The nuance: DeepSeek adapted only part of V4's training for Chinese chips and confirmed Ascend for inference, while pre-training of V4-Pro likely still relied on NVIDIA silicon. Is this a novum? Yes. No frontier-class model has ever publicly validated on non-NVIDIA hardware at this scale. More importantly, DeepSeek is tying future pricing to Huawei's Ascend 950 production ramp in H2 2026, making this an economic bet, not a symbolic gesture. V4-Pro costs $3.48 per million output tokens versus $30 for GPT-5.4 and $25 for Claude Opus 4.6. The real story isn't whether V4 beats Western models on benchmarks (it doesn't quite), but whether the hardware decoupling U.S. sanctions were designed to prevent is now irreversibly underway.

译谷歌在Cloud Next 2026上首次将TPU v8拆分为训练芯片TPU 8t和推理芯片TPU 8i，宣称训练速度提升2.8倍，推理性价比提高80%，并通过自研Arm架构Axion CPU实现全栈垂直控制。同时，DeepSeek V4-Pro成为首个在华为昇腾NPU平台上完成训练与推理验证的前沿大模型，其定价与昇腾950芯片量产计划挂钩，输出成本远低于主流西方模型。这标志着美国制裁试图阻止的硬件脱钩可能已不可逆转，全球AI芯片竞争进入新阶段。

阿绎 AYi@AYi_AInotes · 4月27日54

这个日本老哥做的这款应用太酷了，自家的胖猫会出来强制你休息，等我研究下给自家的几只猫都安排上哈哈哈

Peter Steinberger 🦞@steipete · 4月27日34

Been so CPU-constrained on OpenClaw work. Switched local tests running to @useblacksmith and IT IS SO GOOD. codex can literally spin up to 32vCPU instances and rip through our test suite. https://docs.blacksmith.sh/blacksmith-testbox/overview

译在OpenClaw工作上一直受到CPU限制。将本地测试切换到@useblacksmith后，效果极佳。codex真的能启动多达32个vCPU实例并快速运行我们的测试套件。https://docs.blacksmith.sh/blacksmith-testbox/overview

向阳乔木@vista8 · 4月27日48

试了几个 Chatbot 客户端，发现都不支持上传音、视频对话。这对全模态模型测试非常不友好。只能找个开源Chatbot UI，让Codex改造个产品用了。这开源UI很有意思，参考ChatGPT、Grok、Gemini、Perplexity做了好几个不同的机器人对话界面。目前已有接近1w Star，地址见评论区

译开发者在测试多个Chatbot客户端时，发现普遍不支持上传音视频文件进行对话，这给全模态大模型的测试带来了不便。因此，他选择了一个开源Chatbot UI项目，并计划用Codex对其进行改造以适配需求。该开源项目参考了ChatGPT、Grok、Gemini和Perplexity等主流产品的界面，设计了多个不同的机器人对话界面，目前在GitHub上已获得近1万颗星。

阿绎 AYi@AYi_AInotes · 4月27日57

说个扎心的真相，90%的AI工程师，其实什么都没做出来 Cluely的CEO Roy Lee在NYU做活动，当场掏500美元现金，问在场所有学AI的学生和工程师，谁在LinkedIn上真正上线过一个公开的项目，结果全场几乎没人举手。太真实了，现在的AI圈就是这样，人人都能跟你聊大模型，聊Agent，聊世界模型，刷过几百篇论文，调过几十个demo，但你问他有没有上线过一个能让别人用的东西，大部分人都沉默了。我们总以为AI时代拼的是谁懂的多，谁的技术深，其实根本不是。 LLM能帮你写80%的代码，能帮你解决大部分技术问题，但剩下那20%的脏活累活，部署，边缘case，用户体验，成本控制，才是真正能区分你和别人的地方。所以别再当那个只会看教程的工程师了，去做去实践，去解决实际问题，，离线小模型App，自我迭代的代码Agent，个人生活OS，哪个都行。不用等你学完所有东西，不用等完美，这个周末开干，下周就公开上线。哪怕做的很烂，哪怕只有几个人用，也比你藏在电脑里的一百个demo强一万倍。在AI时代，知识已经变成了最不值钱的东西，到处都是教程，到处都是论文，真正稀缺的，是把知识变成公开可验证的产品的执行力。别当那个坐在NYU教室里，连500美元都拿不到的人，动起来兄弟们

译Cluely的CEO Roy Lee在NYU活动中，以500美元现金询问在场AI学生和工程师是否上线过公开项目，几乎无人举手。这揭示了AI圈的普遍现象：工程师们热衷讨论大模型、Agent等理论，却缺乏将知识转化为公开产品的执行力。LLM虽能解决大部分技术问题，但部署、用户体验和成本控制等实际工作才是关键。知识在AI时代已泛滥，真正稀缺的是执行力。呼吁工程师立即实践，做出哪怕不完美的公开产品。

Rohan Paul@rohanpaul_ai · 4月27日60

AI at scale is constrained by physical inputs, and China has way more slack in electricity plus dominant control over several minerals and magnet supply chains that data centres and chips depend on. --- ft .com/content/d9af562c-1d37-41b7-9aa7-a838dce3f571

译大规模人工智能受限于物理投入，而中国在电力方面拥有更大的余裕，并且主导控制着数据中心和芯片所依赖的多种矿物及磁体供应链。

elvis@omarsar0 · 4月27日64

NEW paper from Alibaba. A 30B MoE with only 3B active params matches Qwen3-235B on real tool-use workloads. AgenticQwen-30B-A3B: 50.2 average on TAU-2 + BFCL-V4 Multi-Turn. AgenticQwen-8B: 47.4. Both more than double their vanilla Qwen baselines and close most of the gap to a 235B model. How: two RL flywheels run in parallel. - The reasoning loop mines the model's own errors into harder problems each round. - The agentic loop grows simple linear tool-use trajectories into multi-branch behavior trees. - Simulated users actively try to mislead the agent. The training distribution gets harder on its own. Why it matters for agent devs: you can stop paying frontier prices for routine tool-use workloads. And the flywheel recipe is reusable. Generate your hard examples from your own agent's failures, not from static synthetic data. Paper: https://arxiv.org/abs/2604.21590 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译阿里巴巴提出一种通过双强化学习飞轮训练智能体的新方法，并基于此推出了AgenticQwen-30B-A3B模型。该模型总参数量为300亿，但每次推理仅激活30亿参数，在TAU-2和BFCL-V4多轮工具使用基准测试中取得了50.2的平均分，性能与参数量达2350亿的Qwen3-235B相当。其核心在于并行运行两个飞轮：推理循环将模型自身错误转化为更难训练问题；智能体循环则将简单工具使用轨迹扩展为多分支行为树，并通过模拟用户误导主动增加训练难度。该方法意味着开发者无需为常规工具任务支付高昂的尖端模型成本，且飞轮配方可复用，能从智能体自身失败中生成困难样本。

DeepSeek@deepseek_ai · 4月27日62

🔥DeepSeek Input Cache Price Drop! Effective immediately, the price for input cache hits across the ENTIRE DeepSeek API series is reduced to just 1/10th of the original price! Build more efficiently for less. 📌Reminder: The DeepSeek-V4-Pro 75% OFF promotion is still active until May 5th, 2026, 15:59 (UTC Time).

译🔥DeepSeek 输入缓存价格下调！即刻起，整个 DeepSeek API 系列的输入缓存命中价格降至原价的十分之一！以更少成本，更高效地构建。 📌提醒：DeepSeek-V4-Pro 七五折优惠活动持续有效至 2026 年 5 月 5 日 15:59（UTC 时间）。

meng shao@shao__meng · 4月26日64

Claude Platform on AWS 即将推出和之前的 Claude on Amazon Bedrock 完全不同，Claude Platform on AWS 让开发者在 AWS 账户体系内直接使用 Anthropic 的原生产品。 Claude Platform on AWS 提供： · Anthropic 原生控制台 + API 完整体验 · AWS 账户（但背后是 Anthropic 原生平台） · Claude Platform 全功能，未来新功能也会同步 · Anthropic 原生风格，账单/认证归 AWS https://aws.amazon.com/claude-platform/

meng shao@shao__meng · 4月26日77

[论文分享] 深入阅读 Claude Code 泄露源代码，结合 Anthropic 官方文档和社区分析，重建出一个生产级 Coding Agent 的完整架构图谱，并以独立开源系统 OpenClaw 作为对照组！论文地址：https://arxiv.org/pdf/2604.14228 # 最核心的一个数字：1.6% vs 98.4% 社区估算：Claude Code 整个代码库里，只有约 1.6% 是"AI 决策逻辑"（提示词、模型调用、循环），其余 98.4% 是确定性的运行环境（permission、context、tool routing、recovery）。这个悬殊比例意味着： · 模型几乎拥有完全自主决策权（reason 在哪做、调什么工具） · 但模型从不直接接触文件系统、shell、网络 · 工程复杂度不是为了约束模型，而是为了让模型在一个安全富饶的环境里自由发挥这和 LangGraph（用状态图约束控制流）、Devin（显式 planner）走的是相反路线：最小脚手架 + 最大化操作型 harness。 # 团队做设计权衡时的五种人类价值驱动整套架构 · 人类决策权：用户最终拥有控制权；通过原则等级（Anthropic→operators→users）形式化 · 安全/隐私：即使用户不专心，系统也要保护代码、数据与基础设施 · 可靠执行：既要单轮正确，也要跨上下文窗口、跨会话、跨子 agent 保持一致 · 能力放大：让用户做以前根本不会尝试的事（Anthropic 内部数据：~27% 任务是"没有这工具就不会做"的） · 情境适配：系统适应用户项目、习惯、技能，关系随时间演进第六个是评估视角而非设计价值：长期人类能力保留——这是论文最重要的批判性观察，后面会展开。 # 十三条设计原则与架构骨架 · Deny-first with human escalation（默认拒绝、不识别就升级给人） · Graduated trust spectrum（信任是渐进光谱） · Defense in depth（多重独立安全层） · Externalized programmable policy（策略外部化，可配置） · Context as scarce resource（上下文是稀缺资源） · Append-only durable state（追加式持久化） · Minimal scaffolding, maximal harness（最小脚手架 + 最大 harness） · Values over rules（重价值判断，轻硬规则） · Composable multi-mechanism extensibility（可组合的多机制扩展） · Reversibility-weighted risk（按可逆性加权评估风险） · Transparent file-based config/memory（透明文件而非黑盒数据库） · Isolated subagent boundaries（子 agent 隔离） · Graceful recovery and resilience（优雅恢复）整体架构可以读作两层视图： · 七组件视图（高层）：用户 → 接口 → Agent Loop → 权限系统 → 工具 → 状态/持久化 → 执行环境 · 五层视图（细化）：Surface 层（CLI/SDK/IDE）→ Core 层（loop + compaction）→ Safety/Action 层（权限、hooks、tools、sandbox、subagent）→ State 层（context 装配、session、CLAUDE.md）→ Backend 层（shell、MCP、远程执行） # Agent 主循环：一个朴素的 while-true queryLoop() 是一个 async generator，每一轮固定走 9 步：设置解析 → 状态初始化 → 上下文装配 → 五个 pre-model shaper → 模型调用 → tool_use 派发 → 权限网关 → 工具执行 → 停止判定。不再做的事：没有显式 planner，没有状态图，没有 tree search。这是 ReAct 的最简实现。工具执行用 StreamingToolExecutor：模型一边流式输出 tool_use，一边并行执行只读工具，写操作串行。结果按收到顺序回填，保证模型看到的工具结果顺序与它发起请求时的顺序一致。恢复机制有五种（输出 token 升级、reactive compact、prompt-too-long 处理、流式回退、fallback model），全部是"先静默自救、不行才告诉人"。 # 安全的"七层防御" 任何工具调用都要穿过这七层，任何一层都可以否决： 1. Tool 预过滤（被全局拒绝的工具甚至不会出现在模型视野里） 2. Deny-first 规则（deny 永远压制 allow，即使 allow 更具体） 3. Permission Mode 约束（plan/default/acceptEdits/auto/dontAsk/bypassPermissions/bubble 共七模式） 4. Auto-mode ML 分类器（yoloClassifier.ts，独立 LLM 调用判定安全性） 5. Shell sandbox（独立于权限系统的文件系统/网络隔离） 6. Resume 不恢复 session 级权限（强制重新授权） 7. Hook 拦截（PreToolUse 可阻断/重写/异步审批）最关键的设计哲学：Anthropic 自己的研究发现用户对权限提示的批准率高达 93%——这意味着交互式确认在行为上不可靠。所以架构选择是"不靠人盯着"，而是用 sandbox + 分类器把需要人决策的次数压低 84%。 # 上下文管理：五层渐进式压缩模型的上下文窗口是整套系统的瓶颈资源。每次模型调用前依次跑 5 个 shaper： · Budget reduction（始终生效）：单条 tool 结果超尺寸就替换为引用 · Snip：删掉旧历史段 · Microcompact：缓存友好的细粒度压缩，等 API 返回后再用真实 cache_deleted_input_tokens · Context collapse：read-time projection——存储不动，模型看到的是投影视图（这是论文里很精彩的设计） · Auto-compact：兜底的全模型生成式摘要为什么要 5 层而不是 1 层：每层成本不同，先做便宜的轻压缩，不行才升级。这是 lazy-degradation 思想。代价是用户难以预测系统行为，因为有些层（特别是 context collapse）对用户不可见。 CLAUDE.md 的四级层次（managed→user→project→local）是文件型记忆——刻意拒绝向量数据库，理由是"用户必须能读、能改、能 git commit"。代价是检索粒度只能到文件级（用 LLM 扫文件头选最多 5 个），不如向量检索精细。重要洞察：CLAUDE.md 是以"用户消息"形式注入而非 system prompt，因此对模型的约束是概率性的。真正的强制力来自 deny-first 的权限规则。这是一个刻意的"指引层（概率） vs 执行层（确定）"分离。 # 扩展机制：四个、不是一个论文回答了一个常见困惑——为什么 Claude Code 既有 MCP，又有 plugins、skills、hooks？答案是这四者承担的上下文成本不同： · MCP servers：外部服务集成，上下文开销高 · Plugins：多组件打包分发，上下文开销中 · Skills：领域指令 + 元工具，上下文开销低 · Hooks：生命周期拦截，上下文开销默认零梯度上下文成本意味着便宜的扩展（hooks）可以大量铺开，昂贵的（MCP）保留给真正需要新工具的场景。代价是开发者要学 4 套 API。 Hook 系统极其细致：源码定义了 27 种事件，其中 5 种参与权限决策，22 种用于生命周期/编排。 # 子 Agent：隔离而非共享通过 AgentTool（Task 是它的 legacy alias）派遣。子 agent 有三种隔离模式： · Worktree：临时 git worktree，文件系统隔离 · Remote（仅内部）：远端 Claude Code 运行 · In-process（默认）：共享 FS，隔离上下文关键约束：子 agent 只把最终摘要文本回传给父级，完整 transcript 走 sidechain 存独立 .jsonl 文件——既保留可审计性，又不污染父上下文。代价：每次调用基本都得自包含 prompt（除 fork-subagent 外）。Anthropic 自己披露 agent teams 模式 token 开销约为普通 session 的 7×，这才是为什么"摘要回传"如此关键。多 agent 协调用文件锁而不是 message broker——零依赖、可调试，但牺牲吞吐。 # 持久化：append-only JSONL Session 存为几乎只追加的 JSONL（极少数清理重写除外）。三条独立持久化通道： 1. Session transcript（项目级，每 session 一文件） 2. 全局 prompt history（仅用户输入，supports Up 与 Ctrl+R） 3. 子 agent sidechain（独立 .jsonl + .meta.json） --resume 重放 transcript 重建会话，但刻意不恢复 session 级权限——这是把"信任"作为会话隔离的安全不变量：用户每次都重新授权，避免旧上下文中的授权决策被带进新的语境。 compact_boundary 标记里嵌入 headUuid/anchorUuid/tailUuid，让 loader 在读取时打补丁拼接消息链——既压缩了上下文，又保留了完整历史的可重建性。 # 与 OpenClaw 的对照：同样的问题，不同的答案维度：Claude Code vs. OpenClaw · 系统形态：临时 CLI 进程 vs. 持久化网关 daemon · 信任模型：每动作 deny-first 评估 + 7 模式 vs. 网关边界鉴权（DM 配对、白名单、可选沙箱） · Agent runtime：queryLoop() 是系统中心 vs. Pi-agent 嵌入网关 RPC，per-session 队列 · 扩展架构：4 机制按上下文成本梯度 vs. manifest-first 插件，12 种能力，集中注册表 · 内存：CLAUDE.md 4 级 + 5 层压缩 vs. 工作区引导文件 + dreaming 长期记忆推举 · 多 agent：父-子任务委派 vs. 路由（多 agent 服务不同渠道） + 委派两层分离最有意思的发现是两者可组合：OpenClaw 可以通过 ACP 把 Claude Code 当作外部 coding harness 托管。这暗示 agent 设计空间不是平面分类，而是层级式的——网关层和任务层可以叠在一起。核心洞察："Claude Code 把信任边界放在模型与执行环境之间；OpenClaw 把它放在网关周界。" # 五大价值张力（最有思想深度的章节） · Authority × Safety：93% 批准率证明人类督查不可靠，安全要靠分类器/sandbox 补 · Safety × Capability：>50 子命令的 bash 会跳过 per-subcommand 检查（解析慢导致 UI 卡顿）——defense-in-depth 的层共享性能瓶颈 · Adaptability × Safety：多个 CVE 利用"信任对话框出现前"的 hook/MCP 初始化窗口攻击 · Capability × Adaptability：主动式提示让任务完成率 +12-18%，但高频时用户偏好骤降 · Capability × Reliability：上下文有界 + 子 agent 隔离 → 局部好决策 ≠ 全局好结果 # 第六视角：长期人类能力保留论文不把它列为价值，而作为评估透镜，外部经验证据汇总： · Becker et al. 2025（16 名经验丰富开发者 RCT）：AI 工具使开发者慢 19%，但他们自我感觉快了 20% · Shen & Tamkin 2026：AI 辅助组理解力测试低 17% · He et al. 2025（Cursor 在 807 个仓库的因果分析）：代码复杂度 +40.7%，初期速度增益三个月内消散 · Liu et al. 2026：30.4 万 AI 提交审计，约 1/4 引入的问题持续到最新版本，安全问题留存率更高 · Kosmyna et al. 2025（54 人 EEG 研究）：LLM 用户神经连接性减弱，且移除 AI 后仍持续 · Rak 2025：2023→2024 入门级技术岗招聘下降 25% 论文的判断是：Claude Code 显著放大短期能力，但提供的支持长期人类成长、深度理解、代码库连贯性的机制非常有限。论文结尾把"未来系统应当把可持续性差距作为一等公民设计问题"作为最重要的开放挑战。 # 六个开放方向（未来 agent 系统） 1. 可观察性—评估鸿沟：78% 的 AI 失败是隐性的，89% 团队有可观察性但只 52% 做离线评估。需要 generator-evaluator 分离的脚手架。 2. 跨会话持久性：CLAUDE.md（静态）和 transcript（单会话）之间的"中间层"是空白 3. Harness 边界演化：where/when/what/with whom 四个轴向的扩展（特别是物理 VLA 行动会改变 reversibility-weighted risk 的代价不对称） 4. Horizon scaling：从单会话到多周期科学研究的可靠性 5. 治理与监管：EU AI Act（2026 年 8 月全面适用）、GPAI Code of Practice 对日志、透明度、人类监督提出外部约束 6. 长期人类能力作为一等设计目标：测量层与设计层都是空白 # 值得记住的几个判断 "模型推理在哪里、harness 执行在哪里——是整个 agent 系统设计的根问题。" "95% 单步准确率下，100 步任务成功率只有 0.6%。"——这是为什么每一步都要验证。 "前沿模型在编码任务上的能力正在收敛，operational harness 的质量正在成为主要差异化因素。" "agent 的设计选择不是平面的分类，而是层级化的——任务级 harness 可以被网关级控制平面托管。" "工程复杂度不是为了限制模型决策，而是为了让模型能更好地决策。" # 对工程实践的启示对正在构建 agent 系统的我们： · 投入确定性基础设施（context 管理、安全分层、恢复机制）比给越来越强的模型套 planning 脚手架更有回报 · deny-first + 多层独立检查比单一沙箱在生产环境更鲁棒，但要警惕共享性能瓶颈导致的同时降级 · 上下文压缩做成多层渐进式比一次性截断或单步摘要更可靠，但用户需要可观察性 · append-only 持久化 + 不跨会话恢复权限是把审计性和安全不变量同时拿到的便宜做法 · 扩展机制按上下文成本分层：让"贵的"扩展（MCP）只用在真正需要新工具的场景，"便宜的"（hooks）可以铺开 · 子 agent 用摘要回传，不要共享 transcript——否则 token 开销线性爆炸（Claude Code 数据：7×） · 把用户长期能力保留写进设计目标，而不是只在事后用 metric 衡量

译论文通过分析 Claude Code 泄露源码，揭示其生产级 Coding Agent 架构的核心是“最小 AI 决策+最大确定性环境”设计。仅约 1.6% 代码为 AI 逻辑，其余 98.4% 用于构建安全、可靠的操作框架。架构围绕人类决策权、安全等五种价值驱动，采用七层独立防御体系保障工具调用安全，并通过五层渐进压缩策略高效管理上下文窗口。其扩展机制按上下文成本分级，子 Agent 采用隔离设计，整体强调透明性与用户可控性，与依赖状态图或显式规划的主流路径形成鲜明对比。

Rohan Paul@rohanpaul_ai · 4月26日40

AWS CEO Matt Garman: "Because there is so much more demand than supply, there typically still is demand for the older chips, actually. And today, we actually are completely sold out of and have never retired an A100 server, as an example."

译AWS CEO Matt Garman: "由于需求远超供给，实际上通常对旧芯片仍有需求。例如，如今我们的A100服务器实际上已完全售罄，且从未退役过任何一台。"