特么人需要睡觉，大模型迎无一例外啊！我最近在用大模型做真正需要深度推理的项目时候十万token的合同、整个codebase塞进去都没问题。可一旦让我多跳追问、把散落的事实串起来，它就开始犯糊涂。明明信息全在，却总觉得它知道答案在哪，就是拼不起来。不仅睡觉，记忆也是大问题， CMU和UMD的研究者最近发了一篇论文，直接把这堵墙拆开了。论文标题就叫Language Models Need Sleep。他们用Rule 110这种图灵完备的toy task做实验，发现问题根本不在内存容量。 hybrid模型的fast weights能存下信息，但真正把context翻译成可用的内部表示，需要多次forward pass去巩固。他们把这个过程叫sleep。在清KV cache前，让模型对当前context多跑几次forward pass，把记忆慢慢沉淀进fast weights。预测时还是单次forward，延迟一点没变。结果在多跳推理任务上，准确率直接拉升52%。同一个小模型，同样的token预算，只是多给它一点离线整理时间。这和行业现在狂加上下文窗口、搞test-time compute完全是两个方向。 o1那种在回答时多想几秒，用户得等。而sleep是在读取context的间隙里多算，用户什么都感觉不到，答案却更靠谱。大脑其实早就这么干了。白天海马体快速存，白天睡着时慢波睡眠把记忆replay到新皮层。进化保留了1/3时间不响应外界，就是为了让认知更深。我们一直以为智能就是always-on、一击即中。其实最强的智能，可能需要清醒期和睡眠期的节奏。

译研究者提出新方法，认为大语言模型在处理长上下文信息后，需要类似“睡眠”的巩固过程以提升多跳推理能力。该方法要求在清除KV cache前，让模型对当前context进行多次forward pass，将信息沉淀进模型的快速权重中，而非在用户等待时进行思考。实验表明，在相同token预算下，此方法可将多跳推理任务的准确率大幅提升52%，且推理延迟不变。

Ant Ling@AntLingAGI · 5月26日62

SwiGLU is everywhere in modern LLMs — but for large inputs it behaves like x². That quadratic blow-up inflates activations, amplifies outliers, and makes deep network or low-precision (FP8/FP4) training prone to loss spikes. We propose PowLU, a drop-in activation built for stable large-scale pre-training. 🧵

译SwiGLU在现代大语言模型中无处不在——但对于大输入，它的行为类似于x²。这种二次增长会膨胀激活值，放大异常值，并使深层网络或低精度（FP8/FP4）训练容易出现损失尖峰。我们提出了PowLU，一种为稳定大规模预训练而设计的即插即用激活函数。🧵

Alibaba Cloud@alibaba_cloud · 5月26日39

AI Key Frames — your front-row access to Qwen Live. The biggest model won't win the AI race — the fastest system will. Yun Jin, VP of Engineering at Fireworks AI, explains why inference has become the real battleground, and how the cloud is being rebuilt for the age of agents. Step into the AI-native momentum. 🚀 Stay tuned: https://int.alibabacloud.com/m/1000413447/

译AI Key Frames — 直击 Qwen 直播现场。最大的模型不会赢得 AI 竞赛——最快的系统才会。Fireworks AI 工程副总裁云锦解释了为何推理已成为真正的战场，以及云端如何为智能体时代而重建。步入 AI 原生浪潮。 🚀 敬请关注：https://int.alibabacloud.com/m/1000413447/

Alibaba Cloud@alibaba_cloud · 5月26日68

Qwen3.7-Max is officially the #2 AI coding model globally. Scoring 1541 on Code Arena, it trails only Claude. Built for production: runs 35-hour tasks, 1000+ tool calls, and ships 2-week projects in hours.

译Qwen3.7-Max 正式成为全球第二大 AI 编程模型。在 Code Arena 上得分 1541，仅次于 Claude。专为生产环境打造：可运行 35 小时任务、1000+ 次工具调用，并在数小时内交付两周的项目。

Alibaba Cloud@alibaba_cloud · 5月26日48

AI Key Frames — your front-row seat to Qwen Live, at Qwen Conference 2026. Decode the core of AI productivity. Reshape the growth curve with full-stack AI. Exclusive conversations with pioneers in the industry, across the new frontiers of AI — inference, content creation, and the open AI ecosystem. Step into the AI-native momentum. 🚀 Stay tuned: https://int.alibabacloud.com/m/1000413447/ #AlibabaCloud #AINative #QwenConference2026 #Qwen #LLM

译AI Key Frames — 您在Qwen Conference 2026的Qwen直播前排席位。解码AI生产力的核心。用全栈AI重塑增长曲线。与行业先驱独家对话，跨越AI新前沿——推理、内容创作与开放AI生态系统。步入AI原生浪潮。 🚀 敬请关注：https://int.alibabacloud.com/m/1000413447/ #AlibabaCloud #AINative #QwenConference2026 #Qwen #LLM

Qwen@Alibaba_Qwen · 5月25日61

✅Implicit caching is now live on Qwen3.7-Max — kicks in automatically, no setup needed. ⚡️Faster + cheaper out of the box. Need higher, more deterministic hit rates? Try explicit caching instead. 🙌 🔗Best practices 🔗 ：https://www.alibabacloud.com/help/en/model-studio/explicit-cache-best-practice

译✅隐式缓存现已在Qwen3.7-Max上线——自动启用，无需设置。 ⚡️开箱即用，更快更便宜。需要更高、更确定的命中率？请尝试显式缓存。🙌 🔗最佳实践🔗：https://www.alibabacloud.com/help/en/model-studio/explicit-cache-best-practice

Rohan Paul@rohanpaul_ai · 5月25日67

🇨🇳 Huawei reveals a new chip design breakthrough under US sanctions pressure. A design approach meant to close the gap with TSMC and Intel without relying only on smaller transistors, by making chip signals travel less distance. They want 1.4nm-class density without owning the world’s best lithography tools i.e. they are trying to replace Moore’s Law with Tau Scaling Law. To note, Huawei has been blocked from normal access to TSMC since the US tightened foreign direct product rules around Huawei in 2020, and TSMC later said it had not supplied Huawei since mid-September 2020. Proposed τ Scaling as a new way to make chips faster when shrinking transistors is no longer delivering the same gains. Said its next Kirin phone chip will be the first full test of Tau Scaling Law, Old chip progress mostly came from making every transistor smaller, but Huawei’s idea shifts the target from smaller geometry to shorter signal delay, meaning less time wasted while electrical signals crawl through wires, gates, memory paths, and system links. LogicFolding attacks the circuit layout itself by folding logic blocks closer together, shortening critical wires, reducing resistance and parasitic capacitance, and letting signals switch faster with denser placement. So LogicFolding is the circuit-level piece: it tries to place related logic closer together, shorten key wires, cut electrical drag from resistance and parasitic capacitance, and raise performance without needing a full manufacturing-node leap. Huawei is also pushing the same timing idea across the full stack: transistors, circuits, chip architecture, software scheduling, and system interconnects all get tuned to reduce τ, the delay constant that limits speed and efficiency. The bold claim is that Huawei has already mass-produced 381 chips using this thinking, and future high-end chips could reach density comparable to 14Å, or 1.4nm, without relying only on classic process shrinkage. Says this path could reach 1.4nm-class, or 14Å-class, density by 2031, while TSMC and Intel target similar physical nodes around 2029. Huawei calls it Her’s Law, after He Tingbo, the chip leader who helped turn HiSilicon into Huawei’s survival engine after US export controls. --- huawei. com/en/news/2026/5/ieee-iscas-tau-scaling

译华为提出τ缩放定律，旨在不依赖更先进制程的情况下，通过LogicFolding技术折叠逻辑模块、缩短信号传输距离来提升芯片性能与密度。华为称已量产采用此思路的381颗芯片，并计划于2031年实现等效1.4nm（14Å）密度，该定律以海思负责人何庭波命名。同样，华为在存储领域也展示了类似的“侧向创新”路径，其通过改变封装方式（Die-on-Board）而非追求最先进的NAND层数，推出了容量达122.88TB的AI SSD。

Rohan Paul@rohanpaul_ai · 5月25日32

AI wins when reality has been translated for it.

译当现实被为AI翻译后，AI便能取胜。

StepFun@StepFun_ai · 5月25日39

Everyone has messy meeting notes. Few actually fix the problem. @aresotik built exactly that: paste in messy notes, get back clean action items and follow-ups. Powered by Step Plan + Step 3.5 Flash. Simple, and actually useful.

译@aresotik 用 Step Plan 和 Step 3.5 Flash 构建了一个轻量级会议纪要助手，以解决普遍存在的会议记录杂乱、行动项难以追踪的问题。用户粘贴原始笔记后，工具能输出包含摘要、行动项、风险、截止日期和跟进文案的结构化内容。其中 Step Plan 是 StepFun 提供的订阅制服务，支持开发者在各类工具中高效调用 Step 3.5 Flash 等模型。该工具设计简单，旨在提供实际帮助。

Rohan Paul@rohanpaul_ai · 5月25日47

Chamath on all important “prefill” and “decode.” in AI compute. Prefill is compute-bound; massive parallel GPUs win, so Nvidia dominates as context grows. Decode is memory-bandwidth bound as each next token depends on scanning what’s already generated

译Chamath谈AI计算中至关重要的“预填充”和“解码”。预填充是计算密集型；大规模并行GPU胜出，因此随着上下文增长，Nvidia占据主导。解码是内存带宽密集型，因为每个下一个token都依赖于扫描已生成的内容。

Chubby♨️@kimmonismus · 5月25日71

Google DeepMind's AlphaProof Nexus autonomously solved 9 open Erdős problems, some unsolved for 56 years, at a cost of a few hundred dollars per problem. It also proved 44 open OEIS conjectures, resolved a 15-year-old question in algebraic geometry, and discovered a novel algorithmic parameter in optimization theory that humans hadn't found. The core mechanism combines LLM reasoning (Gemini 3.1 Pro hype?!) with Lean formal verification. The AI generates proof attempts, Lean's compiler checks every logical step automatically. No human review needed to confirm correctness. The most surprising finding: a basic agent that simply alternates LLM generation with compiler feedback replicated all 9 Erdős successes. The full-featured system with evolutionary search and reinforcement learning only provided meaningful advantages on the hardest problems. This shows a more recent broader trend: as foundation models improve, simple agentic loops are catching up to complex specialized architectures . What sets this apart from OpenAI's informal proof approach: formal verification acts as an automatic filter. The failure analysis showed the AI frequently hallucinated lemmas it claimed were established results, and often disguised the core difficulty by rephrasing it as a helper lemma. Informal proofs would let these errors pass. Lean catches them immediately. The agent also detected misformalizations in existing mathematical literature, correcting ambiguities in problem statements before solving the corrected versions. It served as both a solver and a diagnostic tool. Current limitations are real. Successes cluster in combinatorics, number theory, and optimization where Lean's math library is mature. Problems requiring substantial new theory remain out of reach. Most Erdős problems still weren't solved tho.

译Google DeepMind的AlphaProof Nexus系统自主解决了9个开放的Erdős问题（部分问题存在56年），每个问题的成本约几百美元。它还证明了44个OEIS猜想，解决了一个15年的代数几何问题，并在优化理论中发现了新算法参数。其核心机制是将大语言模型的推理能力与Lean形式化验证系统结合，Lean自动检查每一步逻辑，无需人工复核。研究发现，一个仅交替使用大语言模型生成与编译器反馈的基础智能体，便能复现全部9个Erdős问题的成功。该系统还能检测并修正现有数学文献中的表述错误。其局限在于成功案例集中于Lean数学库成熟的领域（如组合、数论），仍无法解决需要全新理论的大问题。

Rohan Paul@rohanpaul_ai · 5月25日54

“I do see more and more mass-produced mathematics at scale." ~ Terry Tao AI makes this scalable. Will turns proof-writing into search problem: it generates 1000s of mini-lemmas from a goal, then cheap checkers kill most and keep the few that works

译“我确实看到越来越多大规模批量生产的数学。” ~ Terry Tao AI让这变得可扩展。Will将证明写作转化为搜索问题：它从一个目标生成数千个迷你引理，然后廉价的检查器淘汰大部分，只保留少数有效的。

Rohan Paul@rohanpaul_ai · 5月25日62

Reuters: DeepSeek just made its V4-Pro price cut permanent, pushing the price down to 25% of its original API cost. DeepSeek has not confirmed that better Ascend 950 supply caused the permanent cut, but the timing points to a cost curve moving downward as China’s AI stack shifts from restricted Nvidia chips toward Huawei hardware. --- reuters. com/world/china/chinas-deepseek-make-permanent-75-price-cut-flagship-v4pro-ai-model-2026-05-23/

译路透社报道，深度求索宣布其旗舰模型V4-Pro的API价格永久下调75%，但未直接确认这是由于华为昇腾芯片供应改善。报道分析，此举时机恰逢中国AI算力栈从受限的Nvidia芯片向华为昇腾硬件迁移带来的成本下降。据引述分析，DeepSeek的核心战略是通过架构创新（如MoE、DSA，以及V4-Pro的CSA/HCA技术）大幅降低对高端HBM GPU的依赖，其技术指标显示1M-token推理FLOPs和KV cache显著降低。其目标在于优化模型，使更多样的硬件（如LPDDR、NAND、定制ASIC）能够运行前沿AI，以适应不同的工业基础。

Rohan Paul@rohanpaul_ai · 5月25日65

New Alibaba + Nanjing Univ paper claims million-token prefill can be sped up 9.36X (compared against FlashAttention-2) with only lightweight adaptation Shows standard LLMs can handle very long context faster by making attention selectively sparse. The problem is that full attention gets very expensive when the input grows to hundreds of thousands or 1M tokens, because the model keeps comparing too many tokens with too many other tokens. The paper’s claim is that a trained full-attention model already has a hidden sparse structure, so the model does not need to be rebuilt or trained from scratch. RTPurbo uses that structure by finding the few attention heads that really need faraway tokens, while letting the other heads focus mostly on nearby text. For those retrieval heads, it uses a small 16-dimensional token finder to guess which old tokens matter, then runs the real attention only on that selected set. The authors tested this on long-context benchmarks and reasoning tasks, and RTPurbo kept accuracy close to full attention while reaching up to 9.36x faster prefill at 1M tokens and about 2x faster decoding. RTPurbo's engineering rule: keep expensive long-context access only where it matters, and route the rest through a smaller search space. The clever part is the 16-dimensional indexer. It does not replace the model’s real attention computation; it acts like a cheap scout, finding likely useful tokens before the full representation is used on the selected set. RTPurbo is not proof that every model can be safely sparsified this way. But it is strong evidence that the waste in long-context inference is more structured than it looks. ---- Paper Link – arxiv. org/abs/2605.16928v1 Paper Title: "Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps"

译阿里巴巴与南京大学提出RTPurbo，一种轻量级适配方法。该方法发现，已训练的全注意力模型内存在隐藏的稀疏结构。它利用一个轻量的16维token查找器作为“侦察兵”，为少数需要长程信息的关键注意力头定位重要token，而让其他头主要关注局部文本。基于此，RTPurbo在100万token预填充任务上，相比FlashAttention-2实现了高达9.36倍的加速，解码阶段也约有2倍加速，同时在长上下文和推理基准上保持了接近全注意力模型的精度。该研究表明，长上下文推理中的计算浪费具有可挖掘的结构性。

Chubby♨️@kimmonismus · 5月25日60

Nine more Erdős problems have been solved. This time, however, by Google DeepMind. This shouldn't be underestimated, because on the one hand it increases competitive pressure, and on the other hand it proves that the other Frontier Labs can easily keep up.

译又有九个Erdős问题被解决了。但这次，是Google DeepMind完成的。这不容小觑，因为一方面它加剧了竞争压力，另一方面也证明了其他前沿实验室可以轻松跟上。

Rohan Paul@rohanpaul_ai · 5月25日73

A large MoE model may be wasting half its expert compute on tokens that barely need expert help. In this paper 50% of expert computation removed, with almost no loss in accuracy. This makes already-trained MoE models like Qwen3 and GLM stop calling half their experts when a token is too easy to need them. Zero-Expert Self-Distillation Adaptation (ZEDA), a low-cost framework that transforms post-trained static MoE models into efficient dynamic ones. Shows that many MoE tokens do not need real experts, only permission to skip them. That sounds like a small routing trick, but it changes the economics of deployed language models. Standard MoE models already avoid using every parameter, yet they still spend the same expert budget on every token. ZEDA adds a strange new option to the router: experts that output exactly nothing. When the model routes a token to one of these zero experts, it is not making the model dumber; it is admitting that this token does not need another expensive transformation. The clever part is not the dummy expert, but the adaptation method. Instead of retraining the model from scratch, the original MoE becomes a frozen teacher, while the new dynamic version learns when it can safely skip work. Across Qwen3-30B-A3B and GLM-4.7-Flash, the result is roughly half the expert computation removed, with only marginal average accuracy loss and about 20% real inference speedup. The deeper finding is: compute use did not simply track task difficulty. The model spent more expert budget where uncertainty or teacher-student disagreement rose, while structured code and math fragments often needed less. That makes ZEDA feel less like pruning and more like attention to computational doubt. ---- Paper Link – arxiv. org/abs/2605.18643 Paper Title: "Post-Trained MoE Can Skip Half Experts via Self-Distillation"

译论文提出ZEDA框架，可将训练后固定的静态MoE模型（如Qwen3、GLM）转变为动态模型，允许路由器在token过于简单时跳过专家调用。实验显示，在Qwen3-30B-A3B和GLM-4.7-Flash上，ZEDA可移除约50%的专家计算量，仅带来轻微准确率损失，并实现约20%的实际推理速度提升。研究发现，计算分配主要依据模型的不确定性，而非单纯跟随任务难度。

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · 5月25日68

I'm old enough to remember when everyone thought AI solving ONE novel math problem would be a front page story around the world Today, AI solved not one, but NINE open problems - some 50 years old. AND proved ***44*** out of 492 open OEIS conjectures. Zero media coverage.

译我还记得以前，大家认为AI解决一个新数学难题就能登上全球头条。今天，AI解决了不止一个，而是九个开放问题——有些已存在50年。并且证明了492个OEIS开放猜想中的***44***个。零媒体报道。

Berryxia.AI@berryxia · 5月25日48

当初美国最后悔没有封杀的中国这家公司，没有之一。中国古话说得好：拳怕少壮啊！一个中国小团队，面对美国GPU全面禁运，却没有选择“堆算力”，而是花了两年时间，发明了一堆连OpenAI都没想到的黑科技。他们把KV Cache压缩到原来的1/10，让1M上下文只需要5.48GB显存。他们把MoE玩到极致，把训练成本砍掉40-50%。他们甚至发明了“Engram”模块，用LPDDR内存直接换算力…… 而这一切，不是为了今天卖几个coding plan，而是为了悄悄打造一个10万亿美元的AI硬件新生态，顺便让自己估值冲到1T美元。他们叫DeepSeek。故事得从2024年开始讲。那时全世界都在卷dense模型、卷多模态、卷语音视频。 DeepSeek却反其道而行：他们死磕Mixture of Experts（MoE），一个公认极难训稳的架构。他们从第一性原理出发，发明了GRPO算法，取代了行业通用的PPO。他们提出RLVR（Reinforcement Learning from Verified Rewards），让模型真正学会“用正确答案奖励自己”。他们搞出Multi Token Prediction做推测解码，把训练信号密度直接拉满。更狠的是，他们把注意力机制彻底重构： - MLA（V2时期）→ KV Cache直接砍90% - DSA/CSA/HCA（V3/V4）→ 长上下文下计算量几乎不增长 - mHC（Manifold-Constrained Hyper-Connections）（2025.12）→ 让27B模型在BIG-Bench Hard上直接+7.2分，训练开销却只多了6.7% 最骚的是Engram（2026 Q1）： Transformer本来没有原生的“知识查找”机制，只能靠暴力计算模拟检索。 DeepSeek直接把经典N-gram升级成O(1)哈希查找，用内存换算力—LPDDR一查就行，比再跑一遍Transformer层便宜太多了。这些创新加在一起，产生了核聚变般的效果：用KV Cache计算器测1M上下文： - DeepSeek V4 Pro → 仅需5.48GB HBM - GLM5（已抄MLA+DSA）→ 60GB - Qwen3-235B → 89GB 差距大到离谱。这意味着什么？ 1️⃣意味着长时序Agent终于能经济地跑了，KV Cache可以轻松offload到SSD，重新计算成本暴降。 2️⃣意味着中国本来就丰富的NAND（YMTC）和LPDDR（CXMT）突然成了AI基础设施的战略级资源。 3️⃣意味着HBM这个最稀缺、最难造的资源，需求被大幅缓解，连GPU/ASIC的压力都跟着降低。 DeepSeek的CEO梁文峰，看的从来不是今天卖订阅的几亿美元。他看的是：用算法创新，把中国记忆体、ASIC、CPU、网络芯片全部盘活，让整个硬件生态不再被CUDA和HBM卡脖子。他们甚至开源了TileLang，让内核代码一次编写、多硬件运行，直接打破CUDA护城河。这才是真正的“英雄之旅”： - 面对资源短缺，他们没有抱怨，而是把短缺变成了创新燃料。 - 他们不急着赚钱，而是先把地基打成别人抄都抄不完的壁垒。 - 他们把开源当武器，把“AGI for everyone”写进了战略。而现在，整个行业都在吃他们两年前埋下的果实： ZAI的GLM抄了MLA+DSA，Moonshot的Kimi也承认架构基于DeepSeek…… DeepSeek今天做的，明天就会变成全行业的标配。你今晚就可以感受到这个长局的威力。打开DeepSeek官网，试试他们的V4 Pro——1M上下文长持缓存价格不到Sonnet 4.6的3%，还能挂好几个小时。这不是营销，这是他们用真实技术堆出来的降维打击。整个框架100%开源，论文、代码思路、架构细节全在arXiv上。 Big Tech靠封锁和闭源赚快钱，DeepSeek却在用开源+算法，把整个AI硬件的未来重新洗牌。而你，现在已经知道了。

译DeepSeek面对GPU禁运，通过算法创新实现突围。核心成果包括：将KV Cache压缩至1/10，使1M上下文仅需5.48GB HBM；将MoE训练成本降低40-50%。其推出的Engram模块可利用LPDDR内存以O(1)查找换取算力。技术突破还涵盖MLA（KV Cache削减90%）、DSA/CSA/HCA等注意力机制重构，以及GRPO算法。效果显著：在1M上下文显存需求对比中，其V4 Pro（5.48GB）远低于GLM5（60GB）和Qwen3-235B（89GB）。该战略旨在盘活中国NAND与LPDDR资源，降低对HBM依赖，并已开源TileLang以打破CUDA壁垒。其V4 Pro模型1M上下文长缓存价格不足Sonnet 4.6的3%。

Rohan Paul@rohanpaul_ai · 5月24日54

🇨🇳 🇺🇸 China's Huawei’s new 122TB SSD shows how export controls can move innovation sideways instead of simply stopping it. Huawei just built a 122.88TB AI SSD by changing the package around the memory, not by matching Samsung’s most advanced 400+ layer 3D NAND. And a 245TB version discussed as a future step. High-capacity SSDs usually grow by stacking more NAND layers inside each chip, but Huawei’s access to those chips is blocked because its Entity List status restricts items tied to US technology. So it is not trying to win only by making taller 3D NAND stacks, where Samsung has already shown 400-plus-layer V-NAND work. Instead, Huawei is shifting the contest from the chip itself to the way chips are packed together. Huawei’s workaround is Die-on-Board, which puts NAND dies directly onto the circuit board, cuts out some normal chip packaging, and raises board-level density by packing more lower-density memory into the same device. Direct die placement creates heat and signal problems, but it shows how packaging can recover some of the capacity lost when a company cannot buy the best memory chips.

译华为在先进NAND芯片受限的背景下，未直接追赶三星主导的高层数堆叠技术，而是采用“Die-on-Board”封装方案，通过将NAND裸片直接安装在电路板上提升存储密度，推出122.88TB AI SSD并计划推出245TB版本。与此同时，DeepSeek通过MoE、CSA/HCA等架构优化，大幅降低模型对HBM和算力的依赖，使国产硬件更适配前沿AI需求。两者路径形成呼应：华为从封装层面绕过芯片性能差距，DeepSeek从算法层面缓解硬件稀缺压力，共同体现了在外部限制下通过底层技术创新开辟新赛道的战略思维。

Chubby♨️@kimmonismus · 5月24日48

"We look forward to making Mythos-class models available through general release" I don't understand Anthropic's strategy regarding Mythos. On the one hand, everyone is saying that Mythos has achieved the expected quality and is finding bugs and exploits that no other model has ever found. On the other hand, precisely for this reason, Anthropic has repeatedly stated that it's "too powerful for release." Why the sudden about-face? One explanation: PR. The preview, including a benchmark, combined with the statement that the model wouldn't be released due to its power, generated a lot of attention. But does Anthropic really need that? Anthropic is so significant because they primarily serve enterprises. Their biggest problem: compute. Too many want Claude, too little compute to support it adequately. Therefore, this PR move wasn't necessary, and the IPO is still in the near future. In short: it seems downright erratic to now do the exact opposite of what was stated. Be that as it may, once the guardrails are in place and there is general availability, SWEs will receive a significant boost. Judging by the benchmarks, nothing even comes close to the myth so far.

译Anthropic的发布策略看似矛盾：其Mythos模型性能卓越，能发现其他模型无法发现的漏洞，因此一度被官方声称“过于强大而无法发布”；但最新声明又表示将通过通用发布使其可用。这突然的转向被分析可能并非PR炒作，考虑到其核心瓶颈是算力不足且临近IPO，热度并非关键需求。尽管策略存疑，但一旦模型就绪并设立安全护栏，其远超现有水平的性能将为软件工程带来显著提升。推文引用显示，此次宣布可能是认真举措。

Orange AI@oran_ge · 5月24日52

这篇文章的核心就是这一张图了 deepseek v4 pro 虽然不是最好的模型但是缓存基本不要钱这是所有大模型都需要的技术 opus 用这个技术成本都能下降10倍同时相信 v4.1 有了真实的 harness 数据进行训练之后，一定会很快变好

译DeepSeek v4 Pro 虽然并非最强模型，但其核心优势在于采用了几乎零成本的缓存技术。该技术被视为大模型领域的重要突破，若应用于如 Claude Opus 等顶级模型，可使运营成本下降约10倍。文章认为这是所有大模型都需要的关键技术。此外，随着未来 v4.1 版本使用更真实的训练数据，其性能预计会快速提升。

Rohan Paul@rohanpaul_ai · 5月24日62

Great article here on DeepSeek. Their real story is not cheaper chatbots, but architecture that turns hardware scarcity into strategy. DeepSeek is not trying to sell coding seats, it is trying to make Chinese memory, accelerators, and systems useful for frontier AI. Every recent DeepSeek move attacks a bottleneck that makes frontier models dependent on elite HBM-heavy GPU stacks: MoE activates only parts of a model, DSA reduces long-context attention cost, and V4-Pro’s official card says CSA/HCA cuts 1M-token single-token inference FLOPs to 27% and KV cache to 10% of V3.2. Engram, a separate research line, pushes the same logic from another side: let static knowledge live in scalable lookup memory, then fetch it predictably from host memory instead of forcing every fact through dense computation. That sounds like engineering detail until you see the business consequence. If models need less HBM and less brute-force compute, then second-best chips, abundant LPDDR, NAND, and customized ASICs become less second-best. Reuters has already reported a permanent 75% DeepSeek V4-Pro price cut, while noting Huawei Ascend supply constraints and expected supernode availability, which is exactly the kind of feedback loop that they wanted. DeepSeek is not only optimizing models for benchmarks, it is optimizing AI for a different industrial base. The prize is not the app layer. The prize is making scarcity programmable.

译DeepSeek的核心战略并非开发廉价聊天机器人，而是通过一系列架构创新（如MoE动态激活、DSA优化、CSA/HCA技术）显著降低对高端HBM GPU的依赖。此举旨在将硬件稀缺性转化为技术优势，使次优芯片、LPDDR内存及定制ASIC能支持前沿AI，从而优化AI以适配不同的工业基础。这一路径已产生实际商业影响，如V4-Pro大幅降价并与国产硬件生态形成联动，最终目标是实现“硬件稀缺性可编程”。

Ethan Mollick@emollick · 5月24日44

GPT-5.5 Pro is a very solid fact checker. I can throw entire chapters at it and it will hunt down every key reference accurately. The only real annoyance is that it loves nuance, so returns a lot of “the general idea is right, but you are not taking into account tiny detail X”

译GPT-5.5 Pro是一个非常可靠的事实核查工具。我可以把整章内容丢给它，它能准确找出每一个关键参考文献。唯一的烦恼是它过于注重细微差别，经常返回“大体思路正确，但你没有考虑到微小细节X”这类反馈。

🚨 AI News | TestingCatalog@testingcatalog · 5月24日65

ANTHROPIC 🔥: Mythos 1, "claude-mythos-1-preview", is being prepared for a release on Claude Code and Claude Security. The model became visible for a short amount of time on Claude; besides that, new strings mentioning Mythos have been added. > Access to the Claude Mythos model in Claude Code and Claude Security. It still doesn't mean the general public will have access to this exact model, according to Anthropic's earlier communication. More below 👇

译ANTHROPIC 🔥：Mythos 1，即"claude-mythos-1-preview"，正准备在Claude Code和Claude Security上发布。该模型曾在Claude上短暂可见；此外，新增了提及Mythos的字符串。 > 在Claude Code和Claude Security中访问Claude Mythos模型。根据Anthropic之前的沟通，这仍不意味着公众将能访问此确切模型。更多详情请见下方 👇

Chubby♨️@kimmonismus · 5月23日72

http://x.com/i/article/2058171296316297216 # The Battle for AI Silicon: a brief overview of the chip market and who is winning This is a version that normally appears in the newsletter every Saturday. I thought, because it's so important, that I should publish it here today as well. Two days ago, NVIDIA reported $81.6 billion in quarterly revenue, with data center sales alone hitting $75.2 billion, up 92% year over year. Those numbers are so large they almost lose their meaning. To put them in perspective: NVIDIA's data center business now generates more revenue in a single quarter than most Fortune 500 companies produce in an entire year. The AI chip market has become one of the most consequential economic arenas on the planet, determining who can train the next frontier model, who can serve billions of inference requests, and ultimately, who controls the infrastructure layer of artificial intelligence itself. But beneath the headline dominance, something more interesting is happening. The market is fragmenting. Google has split its latest TPU generation into two separate chips for the first time, one for training, one for inference. AMD is shipping competitive hardware and building rack-scale systems that directly challenge NVIDIA's architecture. Cerebras and Groq have demonstrated that specialized silicon can outperform general-purpose GPUs for specific workloads by an order of magnitude. And in China, Huawei is assembling a parallel compute ecosystem that operates entirely outside the Western supply chain, with DeepSeek's V4 model now running natively on Chinese chips. The question worth examining is whether NVIDIA's position as the undisputed platform of AI compute will hold as the market matures, or whether the shift from training to inference, the rise of vertical integration, and the geopolitical fracturing of the semiconductor supply chain will produce a fundamentally different competitive landscape. # NVIDIA: The Platform, Not Just the Chip Understanding NVIDIA's dominance requires looking beyond raw compute performance. The company's real advantage is systemic. CUDA, the programming framework introduced in 2006, has accumulated roughly four million developers worldwide. Every major AI lab, from OpenAI to Anthropic to Meta AI, builds on CUDA. The libraries, the debugging tools, the kernel optimizations, the deployment pipelines: they all assume NVIDIA hardware. Switching costs are not just financial but organizational. Migrating away from CUDA means rewriting code, retraining teams, and accepting months of reduced productivity. On top of this software moat, NVIDIA has built what analysts increasingly call a "copper moat," the proprietary NVLink interconnect system that connects GPUs within rack-scale systems at bandwidths far exceeding any external networking solution. The latest Blackwell 300 and upcoming Vera Rubin platforms sell not as individual chips but as integrated AI factories: dozens of GPUs, custom CPUs, liquid cooling, high-bandwidth memory pools, and networking fabric bundled into a single purchasable unit. For customers building large training clusters, this integration eliminates enormous amounts of engineering work. The financial results reflect this. NVIDIA's fiscal 2026 revenue reached $215.9 billion, with $193.7 billion from the data center segment alone, a 68% increase year over year (NVIDIA, 02/25/2026). The company's Q2 FY2027 guidance of $91 billion suggests the trajectory has not slowed. Gross margins remain near 75%, indicating that despite increasing competition, NVIDIA retains substantial pricing power (SEC Filing, 05/20/2026). The roadmap underscores the strategy. Blackwell Ultra ships this year, Vera Rubin follows in the second half of 2026 with HBM4 memory and a new CPU architecture, and Rubin Ultra arrives in 2027 with four GPU dies per package and up to one terabyte of HBM4e. NVIDIA has deliberately shifted to a one-year product cadence, which creates a structural problem for competitors: by the time a rival ships a chip designed to match Blackwell, NVIDIA has already moved on to Rubin. # Google TPU 8t/8i: Vertical Integration as Weapon Google represents the most serious long-term threat to NVIDIA's position, but the nature of that threat is often misunderstood. Google does not need to replace NVIDIA on the open market. Google needs to reduce its own dependency on NVIDIA within Google Cloud and for its internal AI workloads, primarily Gemini and DeepMind's research. The eighth-generation TPU, announced at Google Cloud Next in April 2026, marks an architectural first: Google split the design into two distinct chips. The TPU 8t is built for large-scale training, scaling up to 9,600 chips per superpod with a new optical 3D torus interconnect called Virgo that can link over one million TPU 8t chips in a single cluster with near-linear scaling efficiency. The TPU 8i targets inference and reasoning workloads, featuring 288 GB of HBM alongside 384 MB of on-chip SRAM, three times more than the previous Ironwood generation, specifically designed to hold the large key-value caches that modern language models require during inference (Google Blog, 04/22/2026). The split is important because training and inference have fundamentally different hardware requirements. Training demands raw compute throughput and massive parallelism. Inference, especially for reasoning models and agentic systems that chain multiple inference calls together, demands low latency, large memory for context windows, and energy efficiency. By building dedicated silicon for each workload, Google can optimize in ways that a general-purpose GPU never can. The deeper advantage is vertical integration. Google controls the models (Gemini), the cloud platform, the data centers, the chip design, and the internal demand. DeepMind acts as a permanent large-scale customer whose needs feed directly back into hardware design. This kind of hardware-model co-design is extraordinarily difficult for a merchant chip vendor like NVIDIA to replicate. The critical limitation remains ecosystem breadth. TPUs are powerful within Google's software stack, particularly JAX and Pathways. Outside that world, they are far less portable than CUDA-based GPUs. Notably, Google itself continues to offer NVIDIA's Vera Rubin platform on Google Cloud, a tacit acknowledgment that many customers still need or prefer the NVIDIA ecosystem (TechCrunch, 04/22/2026). Google's TPU strategy is best understood not as a frontal attack on NVIDIA, but as a dual-sourcing and bargaining play. # AMD, Cerebras, Groq: Challengers From Every Angle AMD occupies the most strategically important position after NVIDIA and Google. For any enterprise or hyperscaler seeking to reduce NVIDIA dependency without committing to Google's vertically integrated stack, AMD is the natural alternative. The company reported $34.6 billion in total revenue for 2025, with its data center segment growing 39% year over year in Q4 (AMD, 01/2026). The current MI350 series ships with 288 GB of HBM3e and up to 8 TB/s of memory bandwidth. The upcoming MI400, expected in the second half of 2026, targets direct competition with NVIDIA's Vera Rubin. AMD's most powerful weapon may ultimately be price: reports suggest MI450 could be priced approximately 40% below comparable NVIDIA chips (SemiAnalysis, 2026). The persistent challenge remains ROCm, AMD's CUDA alternative, which has improved substantially but still lacks the depth of NVIDIA's developer ecosystem. Cerebras and Groq have abandoned the GPU paradigm entirely for inference workloads. Their argument is simple: during autoregressive token generation, the workload is memory-bandwidth-bound, not compute-bound. GPUs are structurally mismatched for this task. Cerebras addresses this with the Wafer Scale Engine, a single chip occupying an entire silicon wafer, holding 4 trillion transistors and 44 GB of on-chip SRAM. Independent benchmarks confirm that the CS-3 delivers 21 times faster throughput than NVIDIA's B200 at 32% lower cost for inference workloads (SemiAnalysis, 2025). In May 2026, Cerebras partnered with AWS to offer its inference capabilities through Amazon Bedrock. Groq takes a different path. Its Language Processing Unit uses static compiler scheduling, where the entire execution graph is planned down to individual clock cycles before inference begins. The result is deterministic latency: every token takes exactly the same amount of time to generate. Groq achieves up to 1,200 tokens per second for large models with sub-100 millisecond time-to-first-token. In December 2025, NVIDIA acquired a non-exclusive license to Groq's inference technology, a strong signal that even the market leader sees SRAM-centric architectures as the future of inference (Groq, 12/2025). # DeepSeek V4 on Huawei Chips: China's AI independence becomes real The geopolitical dimension of the chip market has moved beyond theory into operational reality. Since 2020, US export controls have blocked Huawei and SMIC from accessing EUV lithography machines. Without EUV, cutting-edge chip production below 7nm was widely considered impossible. Yet SMIC has found a workaround using DUV lithography with quadruple patterning, enabling structures in the 5nm class, albeit with severe tradeoffs: early yield rates around 20%, meaning four out of five chips came off the line defective (Asia Financial, 2025). Huawei's CloudMatrix 384 system integrates 384 Ascend 910C chips and delivers approximately 300 petaflops of BF16 compute, nearly double NVIDIA's GB200 NVL72. The cost: 3.9 times the power consumption and roughly triple the price (Igor's Lab, 2025). China's strategy is brute force, compensating for chip-level inefficiency with sheer scale. This works because energy is cheaper and more abundant in China: the country plans to add 3.4 terawatts of new generation capacity over the next five years, nearly six times the US figure (Oxford Energy, 02/2026). The decisive turning point came in April 2026, when DeepSeek released V4, the first Chinese frontier model explicitly trained and optimized for Huawei Ascend chips. V4 uses a mixture-of-experts architecture with up to one trillion total parameters, 37 billion activated per inference. Following the release, China's largest tech companies, Alibaba, ByteDance, and Tencent, rushed to secure hundreds of thousands of Huawei chips (Reuters, 04/29/2026). What DeepSeek demonstrated is that the bottleneck was never hardware alone, it was the software layer: compilers, distributed training frameworks, communication libraries. With V4, that software stack has reached sufficient maturity. A fully China-controlled AI ecosystem that does not require CUDA now exists. # Conclusion The AI chip market in 2026 is splitting along three axes. Workload: training remains NVIDIA-dominated, while inference opens doors for specialists. Openness: NVIDIA locks in via CUDA/NVLink, Google offers vertical optimization, AMD provides the open alternative. Geopolitics: two parallel ecosystems are now operational, Western (NVIDIA/CUDA) and Chinese (Ascend/CANN). The market is not witnessing a dethroning, but a fragmentation that rewards different architectures for different purposes. NVIDIA is not about to be dethroned. No competitor matches its combination of compute performance, software ecosystem, production volume, system integration, and supply chain depth. The company's $81.6 billion quarterly revenue and 75% gross margins speak to a business with extraordinary structural advantages. But the nature of those advantages is shifting. The CUDA moat is being complemented and partially superseded by the "copper moat," the proprietary networking stack that binds customers at the system level. The more important development is that the market itself is diversifying. Inference, which is growing faster than training and will likely constitute the majority of AI compute demand within the next two to three years, favors different hardware characteristics: low latency, large on-chip memory, energy efficiency, and deterministic performance. This is the opening that Cerebras, Groq, and Google's TPU 8i are exploiting. China's AI compute ecosystem is becoming functionally independent, not through chip-level parity but through a combination of brute-force scaling, cheap energy, model-architecture optimization, and a maturing domestic software stack. The AI chip market of 2030 will not be a single global arena. It will be two parallel systems with limited interoperability, each with its own hardware standards, software ecosystems, and competitive dynamics. The battle for AI silicon is no longer just a corporate rivalry. It is a contest over the means of production for the most consequential technology of our time. Sources: 1. NVIDIA FY2026 Annual Results (02/25/2026) https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2026 / NVIDIA Q1 FY2027 Earnings (05/20/2026) https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2027 1. Google Blog: TPU 8t and 8i (04/22/2026) https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/ 1. TechCrunch: Google Cloud TPU chips vs NVIDIA (04/22/2026) https://techcrunch.com/2026/04/22/google-cloud-next-new-tpu-ai-chips-compete-with-nvidia/ 1. AMD Instinct MI350 and beyond (06/2025) https://www.amd.com/en/blogs/2025/amd-instinct-mi350-series-and-beyond-accelerating-the-future-of-ai-and-hpc.html 1. SemiAnalysis: AMD MI350/MI400 analysis https://newsletter.semianalysis.com/p/amd-advancingai-mi350x-and-mi400 1. SemiAnalysis: Cerebras inference https://newsletter.semianalysis.com/p/cerebras-faster-tokens-please 1. Groq and NVIDIA licensing agreement (12/2025) https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale 1. Reuters: DeepSeek V4 adapted to Huawei chips (04/24/2026) https://www.reuters.com/world/china/deepseek-v4-chinese-ai-model-adapted-huawei-chips-2026-04-24/ 1. Reuters: Chinese firms scramble for Huawei chips after DeepSeek V4 (04/29/2026) https://www.reuters.com/world/china/big-chinese-tech-firms-scramble-secure-huawei-ai-chips-after-deepseek-v4-launch-2026-04-29/ 1. Oxford Energy: China data centre advantage (02/2026) https://www.oxfordenergy.org/wpcms/wp-content/uploads/2026/02/Comment-The-China-data-centre-advantage.pdf 1. Epoch AI: AI Chip Production (01/2026) https://epoch.ai/data-insights/ai-chip-production 1. Epoch AI: Hyperscalers and compute ownership (04/2026) https://epoch.ai/data-insights/hyperscalers-control-most-compute 1. Epoch AI: AI chip supply chain constraints (03/2026) https://epoch.ai/data-insights/ai-chip-supply-chain-constraints 1. Epoch AI: Hyperscaler capex trend (02/2026) https://epoch.ai/data-insights/hyperscaler-capex-trend

译NVIDIA凭借CUDA软件生态与NVLink互联技术构建了强大系统性壁垒，在数据中心业务保持高速增长。然而市场正加速分化：Google将TPU拆分为专用芯片，AMD推出机架级系统正面竞争，Cerebras等专用芯片在特定任务上展现数量级优势。与此同时，华为正构建独立于西方供应链的并行计算生态。随着AI负载从训练转向推理，加上垂直整合趋势与地缘政治影响，AI基础设施的竞争格局可能迎来根本性重塑。

Rohan Paul@rohanpaul_ai · 5月23日52

Agentic AI may be forcing the old computing stack with lot more focus on CPU back into the center of the story. Here, Ark Invest CEO and CIO Cathie Wood quoting OpenAI's CFO Sarah Friar who has said - "people are chasing GPUs. They're going to be really shocked at how agentic AI activates CPUs" The market has spent years treating GPUs as the scarce ingredient, because training large models made parallel math look like destiny. But agentic AI changes the bottleneck. An agent does not simply ask one giant model for one answer; it plans, calls tools, checks memory, retrieves files, writes code, queries databases, and loops until the task is done. That means inference is not just matrix multiplication. It is orchestration, data movement, networking, storage, scheduling, and a lot of general-purpose work that CPUs still handle better than accelerators. ---- From "Bloomberg Podcasts" YT channel (link in comment)

译代理型AI（Agentic AI）的兴起正悄然改变AI计算的格局。与过去市场将GPU视为训练大模型的核心稀缺资源不同，代理型AI的任务并非单一查询，而是一个涉及规划、工具调用、记忆检索、代码执行与数据库查询的持续循环过程。这一复杂的推理与编排过程，包含大量数据移动、调度等通用计算任务，恰恰是CPU相较于GPU等加速器更为擅长的工作。正如ARK Invest CEO Cathie Wood引用OpenAI CFO Sarah Friar的话所指出的，专注于GPU的人们可能会对代理型AI如何激活CPU的能力感到惊讶。这暗示着AI计算的瓶颈正从模型训练的并行计算，转向代理执行阶段的通用处理能力，使得CPU的重要性得以重新凸显。

swyx@swyx · 5月23日58

co-sign. a very handy mental framework for what kinds of learning transformers do well today, and why it runs into limitations. when @ankit2119 and i wrote about the need for adversarial world models earlier this year, we were describing a couple of the functions of these rungs of thinking that bring us ever closer to the kolmogorov-limit generator of reality. throwing more params, more power, more everything at a demonstrably inefficient paradigm will be outclassed by the simple solution that can hypothesize and seek truth rather than backfit a house of cards - although the bitter lesson is it is simpler to scale and we may hit agi anyway because human intelligence just isn’t that smart nor plentiful

译本文肯定了对Transformer当前学习能力及局限性的分析框架，并指出对抗性世界模型是逼近现实本质的关键功能之一。作者认为，单纯增加参数和算力以扩展一个低效范式，将被能主动假设与验证真理的简洁方案所超越，尽管规模化可能因人类智能本身有限而意外通向AGI。引用推文补充了强化学习（RL）作为从干预中学习的范式，比监督学习更强大，而世界建模与RL的结合有望实现对反事实的学习。

ginobefun@hongming731 · 5月23日39

#BestBlogs 早报 05-23 今日主题： - Agent 架构在生产端形成长时程与实时两类分化（LangChain Interrupt 2027）； - Notion 以爵士乐队模式和哑铃型人才重新创业（Ivan Zhao × Sequoia） - GLM-5.1 高速版 400 tokens/s 打破「快必然小」惯例（智谱 × TileRT）

ginobefun@hongming731 · 5月23日61

http://x.com/i/article/2057993057891655680 # BestBlogs 早报 · 05-23｜Agent 架构分化、Notion 重组、GLM-5.1 高速版在线阅读和收听：https://www.bestblogs.dev/explore/brief/2026-05-23 ## 导语今天是 2026 年 5 月 23 日，欢迎收听 BestBlogs 早报 EP65。本期早报聚焦三条主线：Agent 架构的生产端分化、SaaS 公司在 AI 时代的组织重构，以及推理速度的新基准。LangChain 在 Interrupt 2027 主题演讲中，正式点名 Agent 生产分裂为两类——长时程知识型与亚秒延迟响应型，两条路径的工程取舍已趋于清晰。Notion CEO Ivan Zhao 则把 SaaS 公司「重新创业」的经历讲成了一门组织课，「爵士乐队」取代「行进乐队」，哑铃型人才结构上线。智谱同期发布 GLM-5.1 高速版，400 tokens/s 打破「快的模型必然更小」的行业惯例，让 Coding Agent 密集调用场景第一次有了旗舰质量加持。此外，阿里云在 2026 年峰会亮出全栈 Agent 化升级、Spotify 把 AI 开发体验推广到 3000 名工程师、李飞飞团队发布 ESI-Bench 挑战 AI 空间智能，以及 OpenAI 与 Anthropic 截然不同的财务走势——这些内容都在今天的速览与补充阅读中等你探索。 ## 精讲一：AI 智能体的未来：展望 Interrupt 2027 来源：LangChain 两类 Agent 的生产分叉如果说 2024 年是 Agent 的「探索期」，那么 2026 年已经进入「生产分化期」。LangChain 在 Interrupt 2027 主题演讲中，清晰描绘了 Agent 在生产环境中分裂为两个截然不同类别的格局：长时程知识 Agent（Long-Horizon Knowledge Agents）的设计目标是跨越分钟、小时乃至数天的任务周期。它们需要安全沙箱环境来执行代码、多层子 Agent 协同、以及多 Agent 框架支撑，核心追求的是长期结果而非单次提示的响应。典型场景包括大型重构、深度调研、多步骤自动化流水线。延迟敏感型客户体验 Agent（Latency-Sensitive CE Agents）则以亚秒延迟为硬性约束，服务于用户互动、支持自动化、销售流程等实时场景。这一路径正在加速推动行业向原生语音模型（Voice-to-Voice）转型，告别「STT → Text LLM → TTS」的拼接架构，转向更低延迟的端到端原生语音交互。 LangSmith Fleet：让领域专家无需写代码就能构建 Agent 演讲中同步发布的 LangSmith Fleet 是「托管式 Agent 规模落地」的一个具体基准。它让领域专家通过自然语言而非代码来构建 Agent，内置 200+ 集成与 7500 个长尾工具。内测数据显示：商机合格率提升 240%、每位销售工程师每月节省 40 小时。这不是演示 demo，而是在生产环境中跑出来的数字。持续学习循环：三层优化框架 LangChain 提出了一个 Agent 系统持续优化的三层框架，这是目前工程实践中最具指导价值的部分之一： - 模型层（Model Layer）：面向特定领域的基础模型微调（如针对代码调试场景的 Qwen 系列）。关键洞察是针对领域特定任务的微调可以同时提升精度和响应速度 - Harness 层（Harness Layer）：连接 LLM 与工具/沙箱的结构化应用代码。研究表明 Agent 驱动的 Harness 迭代（在 Terminal Bench 2 测试的方案）可以在不更新基础模型的前提下持续超越人工工程优化——这意味着应用层的架构优化有时比升级底层模型更高效 - Context 层（Context Layer）：运行时行为调整所需的外部引导文件、本地记忆资产与配置摘要。这一层是成本最低、迭代最快的优化通道，也是长期积累的重要资产三层叠加的关键优势在于：每一层都可以独立迭代，团队可以根据瓶颈位置针对性地投入优化资源，而不需要等待底层模型版本更新。为了集中研究自动化优化系统，LangChain 同步宣布了内部研究部门 LangChain Labs，专门追踪生产 trace 历史以优化执行 Harness。这个部门的成立本身也说明：Agent 系统的优化已经复杂到需要专职团队持续研究。开源模型的成本优势正在放大演讲中还有一个值得关注的信号：在 token 密集型场景（如代码调试）中，开源基础模型的基础性能已逼近前沿闭源模型，而运营 token 成本显著更低。更重要的是，开源架构允许团队在私有用户 trace 上进行后训练与微调，这对需要公司特定领域知识的 Agent 场景具有战略价值。 LangChain 的整体判断是：未来两年 Agent 工程的核心挑战，不在于「能不能跑起来」，而在于「如何在长时程与低延迟两类截然不同的约束下，分别做到最好」。沙箱执行环境、多 Agent 协调框架、Native Voice 实时交互——这三个方向将成为 Agent 基础设施演进的主轴，决定下一代 Agent 应用的能力天花板。对于正在构建 Agent 系统的团队，认清自己的产品属于哪一类，将直接影响技术栈的选型方向。观看完整视频 → ## 精讲二：Notion 创始人 Ivan Zhao：重塑公司的艺术来源：Sequoia Capital 从「行进乐队」到「爵士乐队」 Notion CEO Ivan Zhao 在 Sequoia 的深度对话中，把过去三年的组织转型概括为一句话：「我们想成为一支爵士乐队，而不是行进乐队。」行进乐队的运作方式是：有固定脚本，人人按部就班，指令自上而下流动。爵士乐队则不同——有底层结构，但个体在共享上下文中高度自主，可以即兴发挥，可以互相补位。这个比喻精准描述了 Notion 内部的组织信条：分布式决策、共享上下文、自律而非管控。「酿啤酒 vs 造桥」：AI 产品为何抵制传统 PM 流程 Ivan 提出了理解 AI 产品开发本质的核心比喻：造桥（Classic Software）：可预测的工程过程。能设计出来的，基本都能造出来。传统 PM 收集需求 → 设计师出方案 → 工程团队实现，流水线清晰。酿啤酒（AI Software）：高度实验性，充满不确定性。你不能「命令酵母按你要的口味发酵」，只能投入最好的人才，持续做 eval，看模型最终能产出什么。这个认知让 Notion 彻底调整了产品开发模式——不再以客户需求为唯一驱动，而是技术优先加实验驱动。PM 开始直接参与 token 消耗分析和模型 eval，设计师开始写代码，工程师开始做产品判断。哑铃型人才结构：架构师 + 初级 IC 随着 AI 编程能力的成熟，Notion 重构了整个工程团队的人才结构，形成所谓「哑铃型分布」：一端：高级架构师，提供方向感、审美判断、系统设计，以及语言模型无法模拟的领域 taste。另一端：初级独立贡献者（IC），高能量、充满好奇心，同时驱动 4 到 6 个 Coding Agent 并行工作，充当 Agent 编排者而非纯粹的代码执行者。中间的「经验层」被大幅压缩——不是因为他们不重要，而是这个能力区间已经被 AI 工具基本覆盖。解散 CMO，品牌嵌入产品另一个令人印象深刻的决策是：Ivan 解散了 CMO 职位，把品牌叙事的责任直接嵌入产品团队。背后逻辑是：在 AI 时代，品牌的塑造越来越发生在产品体验的每一个触点，而不是独立的营销活动。产品即品牌，品牌即产品，二者不应再由两个分离的组织来驱动。与今日其他主题的关联 Ivan Zhao 的分享与今天早报的另外几个主题形成了有趣的共鸣。 LangChain 对 Agent 架构分化的描述，印证了 Notion 内部「酿啤酒」式开发方式的合理性——当底层模型本身具有不确定性时，严格的 PM 流程确实会成为阻碍而非支撑。而哑铃型人才结构的「初级 IC 驱动 4-6 个 Coding Agent」场景，直接依赖 GLM-5.1 高速版这类推理速度提升——只有当模型响应足够快，并行驱动多个 Agent 才能在体感上从「等待」变成「协作」。从产品公司 CEO 的视角来看，Ivan 的这场分享本质上是在回答一个问题：当 AI 让「执行」的边际成本趋近于零，公司的核心竞争力应该沉淀在哪里？他的答案是：沉淀在共识（Shared Context）、判断力（Taste）与信任（Trust）之中——这些是模型无法复制的东西。观看完整视频 → ## 精讲三：GLM-5.1 高速版：400 tokens/s，顶尖模型跑出最快速度来源：智谱打破「快 = 小」的行业惯例长期以来，AI 推理领域有一个默认共识：高速模型 = 轻量模型，想要极致低延迟就必须牺牲能力。GLM-5.1 高速版正面打破了这一惯例——在完整保留 GLM-5.1 旗舰能力的前提下，将输出速度推至 400 tokens/s，刷新当前全球大模型厂商 API 的速度上限。这个数字意味着什么？一位作者连续伏案数天才能写完的文字量，它在 1 分钟内交付完毕；一名工程师埋头敲键盘 3 天才能完成的开发任务，在喝一杯咖啡的时间里完成。为什么 Coding Agent 特别需要高速模型 Coding Agent 是这次发布最重要的受益场景。原因在于 Agent 的任务特性：一个 Coding Agent 任务往往需要经历数十轮模型调用。单轮响应只要慢上几秒，整体耗时就可能拉长十几分钟。面对大型重构项目，每一步响应慢 1 秒，逐步累加又是几分钟的空等。 GLM-5.1 高速版带来的体感改变是质变而非量变：模型开始真正成为可以实时协作的伙伴，「和你坐在一起盯着画布调参」。这是之前无论是小模型的快还是大模型的慢都无法实现的体验。 TileRT：系统级优化的三层架构 400 TPS 是稳定生产能力，不是峰值数字。背后是智谱 GLM 团队与 TileRT 团队联合的系统级优化，在三个层面同时发力：推理引擎层：针对 GLM-5.1 的架构特点，重写核心推理路径，提升单卡吞吐能力。调度系统层：动态批处理、请求合并与 KV 缓存调度优化，大幅降低高并发场景下的尾延迟。基础设施层：推理集群部署、网络链路、负载均衡的协同优化，确保高速能力在生产环境稳定可用。 TileRT 的设计核心是在编译期（AOT）将整个计算图静态编排为一个常驻 GPU 的 persistent Engine Kernel，彻底抛弃 Runtime 层的动态调度开销。算子间的中间结果不再写回 Global Memory，而是经由寄存器、Shared Memory 与 L2 Cache 直传，host 调度与跨算子同步全部压进同一个常驻 kernel——这是速度大幅提升的技术根因。在多卡尺度上，TileRT 进一步将 SM 内部的 Warp Specialization 思路外推到整张 8 卡 NVL 拓扑，不同 GPU rank 不再执行同构逻辑，而是按计算密度与数据依赖被特化为不同 worker，进一步榨取集群整体吞吐。适用场景与当前开放状态 GLM-5.1 高速版当前面向智谱 MaaS 平台部分企业客户开放，模型 ID 为 GLM-5.1-highspeed。重点适用于以下延迟敏感场景： - AI 编程：多轮 Coding Agent 调用中每轮节省数秒，整体任务时长显著压缩 - 实时交互：3D 场景根据用户输入实时建模，此前因延迟无法实现的产品形态开始具备落地可能 - 实时语音：作为原生语音 Agent 的后端推理引擎，低延迟响应是音质之外的关键体验要素 - 商业决策辅助：高并发场景下的实时分析与方案生成 400 TPS 与旗舰能力的同时达成，把「速度 vs 质量」的权衡从一道单选题变成了可以同时满足的工程目标。这对整个行业的推理架构方向有示范意义。阅读原文 → ## 速览以下 7 篇精选内容，每篇约 150 字导读，覆盖 Agent 工程实践、组织变革、AI 基础设施与产业财务等多个维度。 1. Spotify 如何把 AI 开发体验扩展到团队与 Agent：Claude Code、Honk、Backstage 与 MCP Spotify 工程基础设施负责人 Niklas Gustavsson 分享了公司如何将 AI 辅助开发规模化到 3000 名工程师的完整路径。关键数据：引入 Claude 3.5 Opus 后，99% 的工程师每周使用 AI 工具，94% 表示 AI 直接提升了交付表现，PR 频率上升 76%。Spotify 的做法是把 Claude Code、标准化代码库（Fleetshift）、内部工具平台（Backstage）、验证闭环与 MCP 整合成一套系统——不是单点替换工具，而是重构整个开发者体验的架构层。对于正在规模化 AI 开发工具的工程团队，这是少有的「大规模落地」案例。观看视频 → 2. 阿里李飞飞首秀：一口气面向 Agent 发了 32 个新品阿里云 CTO 李飞飞在 2026 年峰会上完成首秀，发布超 50 项新品。核心是「芯 - 云 - 模型 - 推理」全栈 Agent 化升级：自研芯片真武 M890（性能较上一代提升 3 倍）、Agentic Cloud（运行时、编排、治理、安全、记忆、数据平面六大能力模块）、旗舰模型 Qwen3.7-Max（Arena 国产第一），以及面向 Agent 友好的新产品「千问云」。这是国内云厂商首次围绕 Agent 进行全栈产品发布，从面向人的云服务向面向 Agent 的云基础设施的战略转向信号明确。阅读原文 → 3. 专业化胜过规模：大多数 AI 采购决策忽视的战略变量一个 30 亿参数的专业化模型，在结构化 OCR 基准测试中以约五十分之一的成本，超越了所有商业前沿 API 的表现。核心结论：当模型的训练分布足够贴近部署任务时，参数规模不再是决定性变量。这一发现对企业 AI 采购决策有直接启示——对于有明确领域边界的任务，专业化小模型在性价比上可以碾压通用旗舰。文章提供了结构化 OCR 的完整对比数据，包括生产稳定性与退化率指标，结论扎实可复现。阅读原文 → 4. AI 原生工程 Meta Reality Labs 旗下 Horizon Experiences 团队负责人 Ian Thomas 分享了构建「AI 原生工程」文化的案例。核心愿景是将工程师从「建造者」转变为「探索者与创新者」——用 AI 消化大量日常性工作（更新测试、修复 bug、处理平凡的代码变更），释放人的时间聚焦于真正需要创造力的问题。演讲分享了从小型社区到大规模应用框架的结构化路径，以及可量化的生产力提升数据。对于正在思考如何在团队层面而非个人层面推广 AI 工程实践的技术管理者，值得参考。阅读原文 → 5. Agent 核心技术概念与范式发生了哪些演变以及背后的思考来自阿里云开发者的系统性梳理，覆盖 Agent 从 2023 年早期 ReAct 架构到 2026 年自进化阶段的四个演进阶段。每个阶段都有明显的技术特征标志：被动式响应 → 结构化工作流 → 多 Agent 协同 → 自进化。文章从 Prompt、Planning、Memory、Tools、Workflow、Environment 六个核心维度，深入分析了技术概念前后变化及其背后的工程化逻辑。对于仍在用「早期 Agent 框架思维」理解当前 Agent 系统的开发者，这篇文章能帮助重新校准认知坐标。阅读原文 → 6. 李飞飞再出手，空间智能的 ImageNet 来了李飞飞团队发布 ESI-Bench，一个专门评测具身空间智能的新基准，包含 10 个任务类别、29 个子类别、3081 个任务实例。与此前 benchmark 不同的是，ESI-Bench 第一次把「观察者」变成「行动者」，要求 AI 智能体主动行动才能获取解题信息。核心结论清晰：感知不是瓶颈，行动才是。当前最强多模态模型（含 GPT-5 和 Gemini 系列）在主动探索任务上的表现远低于给定最优视角时的得分，说明 AI 能「看懂」但仍然「不知道该怎么动」。阅读原文 → 7. OpenAI「赚一块亏一块二」，Anthropic 已开始赚钱两家 AI 巨头同期亮出底牌：OpenAI Q1 营收 57 亿美元，但运营利润率为 -122%，每赚 1 美元亏损 1.22 美元；Anthropic Q1 营收 48 亿美元，Q2 预测营收 109 亿美元，并实现约 5.59 亿美元运营利润，成为 AI 模型公司中率先摸到盈利门槛的案例。差异根源在于客户结构——OpenAI 需要补贴庞大的 9 亿周活免费用户群，Anthropic 几乎全部收入来自企业和开发者。两种模式的财务命运正在快速分化，这篇文章是理解当前 AI 商业格局的精要读本。阅读原文 → ## 补充阅读以下 9 篇内容作为延伸阅读，适合有特定兴趣方向的读者深入探索。从 0 到 1 搭建 Agent：Agent 原理分析及个人助手实践（长文干货）（阿里技术）系统覆盖 Agent 全链路原理，包括记忆系统、RAG、Function Calling 与 MCP，并附带个人助手项目的完整实践方案。约需 50 分钟阅读时间，适合想亲手构建 Agent 系统的开发者作为入门参考手册。阅读原文 → 腾讯云 Agent Memory 节省 61% Token 提升 52% 成功率的诀窍：Mermaid 无限画布 × 上下文卸载（腾讯技术工程）解决 Agent 长任务中上下文快速耗尽的实际工程问题。「上下文卸载 + Mermaid 无限画布」的组合方案，在超长 Session 实验中节省 61% Token 并将任务通过率从 33% 提升至 50%。适合正在处理 Agent 长任务内存压缩问题的工程师。阅读原文 → Gemini 负责人：在智能体时代从执行者转向指挥者（Silicon Valley Girl） Google Gemini 负责人 Josh Woodward 谈 Agent 时代的人机协作范式转变。Gemini Spark 的目标是让知识工作者从任务执行者转变为「AI 网络的指挥者」，通过原生生态系统集成并行运行数百个后台任务。适合想了解 Google 在 Agent 时代整体战略思路的读者。观看视频 → 你的 Coding Agent 应该做 AI 系统工程（AI Engineer） Hugging Face 的 Ben Burtenshaw 提出 Coding Agent 的下一步：进入 AI 系统工程领域，包括 CUDA kernels 优化、自动 fine-tuning，以及基于 open primitives 构建多 Agent 研究实验室。适合已有 Coding Agent 使用经验、想进一步探索其能力边界的工程师。观看视频 → Cerebras 630 亿美元 IPO 背后：晶圆级芯片、OpenAI 大单与 AI 基础设施竞赛（No Priors） Cerebras 创始人兼 CEO Andrew Feldman 讲述公司如何把晶圆级芯片的逆向押注推进成一家上市 AI 基础设施公司。推理速度已从技术奢侈品变为商业必需品——这个判断与今天 GLM-5.1 高速版的发布形成有趣的呼应。观看视频 → 最新对话 Claude Code 负责人：智能体时代的爆发，Anthropic 重构生产力边界（Web3 天空之城）深度编译 Claude Code 负责人 Boris Cherny 的访谈。Anthropic 产品需求同比增长 80 倍，Claude Code 是核心引擎。文章覆盖范式转移、生产力实证（引入 Claude Code 后每位工程师产出提升约 250%）、组织变革启示，以及软件行业护城河的演变方向。与今天精讲二 Notion 的组织重构主题形成互文。阅读原文 → 如何用 AI 构建自我改进型公司（Y Combinator） YC 视角的 AI-native 组织设计：不要停留在 copilot 式生产力提升，而应把公司重构为由传感器、策略、工具、质量门和学习系统组成的递归自我改进循环。与今天多篇内容的组织变革主题高度呼应，适合思考 AI-native 公司架构的创业者和管理者。观看视频 → 浏览器自动化：从 GUI 到 OpenCLI（大淘宝技术）针对 Agent 操控浏览器「路不好走」的实际痛点，提出 OpenCLI 方案：直接解析和复现浏览器底层 API 请求，绕过不稳定的前端 UI 自动化。思路清晰、工具可直接上手（npm install）。适合正在为 Agent 构建浏览器自动化能力的工程师。阅读原文 → 马斯克的「一人王朝」，6 月 12 日敲钟（腾讯科技） SpaceX 正式提交 S-1，计划 6 月 12 日纳斯达克上市，目标估值 1.75 万亿至 2 万亿美元，马斯克保留 85% 投票权。财务结构「冰火两重天」：Starlink 年入 114 亿美元营业利润，xAI 单季亏损 64 亿美元，天上赚的钱被地上的大模型全部烧完。AI 叙事如何支撑超高估值，这篇文章提供了一手数据。阅读原文 → ## 今日阅读路径时间有限？以下是根据今日内容为你规划的最短有价值阅读路径：如果你只有 15 分钟：优先读「精讲三」——GLM-5.1 高速版的发布代表了一个具体可感知的技术里程碑，400 tokens/s 旗舰质量是 2026 年推理能力的新基准，对所有在生产中使用 AI 模型的人都有直接参考价值。文章篇幅适中，技术细节扎实，10 分钟读完，结论即可用。如果你有 30 分钟：加上「精讲一」——LangChain 对 Agent 架构分化的描述是目前最清晰的生产端视角之一，长时程 vs 延迟敏感的框架能帮你理清当前项目的技术取舍。如果你有 1 小时，想要更完整的视角：三篇精讲都读完，再加速览中的「阿里 Agent 全栈发布」和「OpenAI vs Anthropic 财务对比」——这两篇分别代表了 AI 基础设施格局和 AI 商业模式的两个关键截面，与精讲主题形成完整的上下文。工程师专题路径：精讲三（推理速度与 TileRT 架构）→ Spotify 案例（工具规模化）→ Agent 范式演变（技术全景）→ Coding Agent AI 系统工程（能力拓展边界）管理者与创业者路径：精讲二（Notion 组织重构与爵士乐队模式）→ Claude Code 负责人访谈（生产力实证）→ 如何用 AI 构建自我改进型公司（系统设计框架）以上就是今天 BestBlogs 早报全部内容。感谢阅读，我们明天见。

译LangChain在演讲中指出，Agent生产架构已明确分化为长时程知识型与低延迟响应型两类，工程路径各异。Notion CEO分享了公司转向更灵活的“爵士乐队”模式，并采用“哑铃型”人才结构以适应AI开发。智谱同期发布GLM-5.1高速版，在保持旗舰能力的同时实现400 tokens/s的输出速度。三者共同揭示了AI应用在底层架构、组织形态与基础性能上的演进方向。

Rohan Paul@rohanpaul_ai · 5月23日79

Google DeepMind's new paper. Shows that AI can now search formal mathematics proofs, but only inside carefully constrained worlds. The striking result is not that the system “thinks like a mathematician,” but that it keeps forcing its thoughts through Lean, where every step must compile. The problem is that LLMs can sound convincing in math while still making tiny mistakes, so the authors use Lean, a proof system that checks every logical step. Their system, AlphaProof Nexus, lets an LLM keep editing a formal proof, read compiler errors, try again, and sometimes ask a stronger proof tool for help on smaller subproblems. The stronger version also keeps a shared pool of partial proof attempts, rates which ones look promising, and uses those attempts to guide later searches. That changes the role of the model from a persuasive storyteller into a generator of candidates that can be killed quickly when they are wrong. The verifier is not a cosmetic add-on, it is the mechanism that makes exploration tolerable. Without it, a beautiful proof sketch can hide a false lemma; with it, the model has to turn insight into executable logic, or fail visibly. The authors tested the system on real unsolved math problems, including 353 formalized Erdős problems and 492 open conjectures from the Online Encyclopedia of Integer Sequences. The main result is that the best agent solved 9 Erdős problems and proved 44 sequence conjectures, while also helping with problems in optimization, graph theory, algebraic geometry, and quantum optics. The failures are as revealing as the wins, because the agents sometimes buried the hard part inside a helper lemma or hallucinated a known result, exactly the kind of error formal checking is built to expose. The real shift is not full mathematical autonomy, but a new division of labor: humans choose the formal question, libraries define the terrain, models propose routes, and the proof assistant refuses to be impressed. ---- "Advancing Mathematics Research with AI-Driven Formal Proof Search" Paper Link – arxiv. org/abs/2605.22763

译Google DeepMind提出了AlphaProof Nexus系统，它将大型语言模型与Lean形式化验证工具相结合。该系统允许LLM在生成证明的过程中，不断读取Lean的编译错误并进行修正，还可调用更强的工具辅助解决子问题。这一机制迫使模型将每一步逻辑都转化为可编译、可验证的代码，从而将其角色从“令人信服的叙述者”转变为“候选方案生成器”。在针对353个Erdős问题和492个开放猜想的测试中，系统成功解决了9个Erdős问题并证明了44个序列猜想。该研究展示了形式化验证在暴露AI逻辑错误、建立“人类提问-模型探索-验证器把关”新分工中的关键作用。

Rohan Paul@rohanpaul_ai · 5月23日61

Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model. 6.7× faster than the next GPU cloud, validated by Artificial Analysis. The hard part is moving model weights and activations fast enough, because normal GPU clusters split the model across many chips and spend a lot of time passing data between them. Cerebras uses wafer-scale chips, meaning one processor is built across a full silicon wafer, so more of the routing happens on-chip with much higher bandwidth and lower delay. The real business claim is not just speed, but speed on a model big enough for enterprise coding agents, where every extra second slows testing, debugging, and iteration. --- cerebras. ai/blog/cerebras-kimi-k2-Enterprise

译Cerebras在其晶圆级芯片上实现了每秒981 tokens的推理速度，处理参数规模达1万亿的Kimi K2.6模型。该速度已获Artificial Analysis验证，是当前最快GPU云方案的6.7倍。其技术优势源于单一晶圆集成设计，大幅减少了芯片间通信延迟，从而突破了传统GPU集群因跨芯片数据搬运造成的性能瓶颈。这一速度提升对需要运行企业级编码代理等大型AI应用至关重要，能显著缩短测试、调试与迭代周期。

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · 5月23日38

"But Cars Don't Actually Run," Says Increasingly Nervous Horse For the 7th Time This Year

译这篇推文通过一匹马反复否定汽车能力的幽默比喻，讽刺了人类面对AI等新技术时重复出现的恐惧心理。引用部分列举了多种将人类思维简单还原或否定的论调（如“只是模仿/数学/本能”）。核心观点是，当前对AI的担忧延续了历史模式，且这些用来贬低AI的“简化论”论据，同样可以荒谬地应用于否定人类自身的思维复杂性。

🚨 AI News | TestingCatalog@testingcatalog · 5月23日81

DeepSeek permanently reduced pricing for DeepSeek V4 Pro by 75%! > $0.003625 per million input tokens (with cache) > $0.435 per million input tokens. > $0.87 per million output tokens. Cache is almost free 👀

译DeepSeek永久下调DeepSeek V4 Pro定价75%！ > 每百万输入token $0.003625（使用缓存） > 每百万输入token $0.435。 > 每百万输出token $0.87。缓存几乎免费 👀

SemiAnalysis@SemiAnalysis_ · 5月23日82

Agentic workloads are quietly rewriting inference economics. We pulled data from 432k real coding agent requests at SemiAnalysis and the median one isn't 32k, isn't 64k, but 96k input tokens. For context, that's more than the entire text of The Great Gatsby being shoved into the model before you've even typed your question. (1/3)🧵

译智能体工作负载正在悄然重塑推理经济学。我们从SemiAnalysis的43.2万个真实编码智能体请求中提取数据，发现中位数并非3.2万或6.4万，而是9.6万输入token。作为参考，这意味着在你输入问题之前，模型已处理了超过《了不起的盖茨比》全文长度的文本。（1/3）🧵

Rohan Paul@rohanpaul_ai · 5月23日67

Demis Hassabis on the limit in today’s AI: language can describe the world, but it cannot contain it - and why "World Models" are his "longest standing passion". Language models absorbed far more structure about reality from text than many researchers expected, because human language quietly carries physics, psychology, culture, tools, plans, and cause-and-effect. But text is still a compressed residue of experience, not experience itself. A sentence can say a cup falls from a table, yet it does not fully encode weight, grip, balance, friction, timing, sound, surprise, or the tiny motor corrections a body makes before it even notices them. The world is not only made of facts that can be named; it is made of constraints that have to be lived through, touched, predicted, violated, and repaired. That is why world models matter. They aim to learn the hidden grammar of physical reality: how objects persist, how forces unfold, how space changes when an agent moves, and how action creates feedback. Language models can often reason about the world because people have written so much about it. World models try to learn what the world is like before it becomes words. The difference is exactly what matters because intelligence is not just answering well; it is knowing what would happen next if you moved, reached, pushed, smelled, slipped, or failed. A mind trained only on descriptions may become brilliant at explanation. A mind trained on experience may become better at consequence. --- Full video from "Google DeepMind" and "Hannah Fry" YT channel (link in comment)

译Demis Hassabis指出当前AI的局限在于语言能描述世界，但无法“包含”世界。尽管语言模型从文本中学到了比预期更多的现实结构，但文本终究是经验的压缩残留。真正的智能不仅在于回答问题，更在于理解行动的后果。世界模型旨在学习物理现实的隐藏语法，例如物体持续性、力的作用和空间变化。这种学习试图在信息被语言化之前捕捉世界的本质，从而让AI不仅能解释，更能预测行动带来的直接影响。

Yuchen Jin@Yuchenj_UW · 5月23日68

Wow. A massive 75% discount from DeepSeek. Either they’ve done some serious inference optimizations, or Huawei chips are just that much cheaper? More open-source AI models, better token economy.

译哇。DeepSeek给出了高达75%的折扣。要么他们做了重大的推理优化，要么华为芯片就是这么便宜？更多开源AI模型，更好的token经济。

向阳乔木@vista8 · 5月22日19

今天孩子语文老师布置了一份特别的作业：要求跟 AI 提问，写一篇作文。我觉得很有创意，可以锻炼提问能力、AI辅助创作能力。准备了 Claude Sonnet 4.6、ChatGPT 5.5、Gemini 3.5 Flash，到时候看孩子喜欢哪个。

译一位家长分享，孩子的语文老师布置了一项创新作业：要求学生通过向AI提问来完成作文。家长认为，这种形式能有效锻炼学生的提问技巧与AI辅助创作能力。为此，他提前准备了Claude Sonnet 4.6、ChatGPT 5.5、Gemini 3.5 Flash等主流AI模型，供孩子届时根据喜好选择使用。

Alibaba Cloud@alibaba_cloud · 5月22日79

Qwen3.7-Max is now live on Model Studio with 50% OFF (May 22–June 22)! 1M Context Window. Built to process and retain large-scale enterprise data streams flawlessly during long-context agent reasoning. 🚀 Try it: https://int.alibabacloud.com/m/1000413314/

译Qwen3.7-Max现已登陆Model Studio，限时五折（5月22日至6月22日）！ 100万上下文窗口。专为在长上下文智能体推理中，无缝处理和保留大规模企业数据流而构建。 🚀 立即体验：https://int.alibabacloud.com/m/1000413314/

Qwen@Alibaba_Qwen · 5月22日66

👀👀

译近期一项针对前沿AI模型在真实智能体任务上的测试显示，Qwen 3.7-Max在效果与成本方面全面领先。该任务要求模型自主编写并迭代优化一个能自我训练的Tetris机器人程序。在10轮自我改进中，Qwen 3.7-Max仅花费1.32美元，便将机器人性能提升了56%。相比之下，Claude Opus 4.7花费12.15美元提升了28%，GPT-5.5花费2.85美元提升了7%。结果表明，在需要长时间自主推理、代码阅读与迭代的复杂智能体循环场景中，Qwen Max具备极强的成本效益比与自我改进能力。

Alibaba Cloud@alibaba_cloud · 5月22日54

Foundation Model Forum | Qwen Conference 2026 Decoding the next leap in core intelligence. Join the session at Sands Expo Singapore to explore pre-training breakthroughs, reasoning logic, and future model roadmaps. 🚀 Secure your seat now: https://click.qwencloud.com/m/20000000190/

译基础模型论坛 | Qwen Conference 2026 解码核心智能的下一次飞跃。加入新加坡金沙会展中心的会议，探索预训练突破、推理逻辑与未来模型路线图。 🚀 立即预约席位：https://click.qwencloud.com/m/20000000190/