Uber’s COO: AI tokenmaxxing still has not shown it can reliably create successful features. "When you hear companies talking about, hey, 25% of code commits over the last quarter were AI-driven, or our token usage went from x to y... and it's amazing... but then you sometimes go and you talk to your senior engineering leaders and you're saying, okay, how many projects that were on the cutting room floor got moved above the line because of the productivity gains... That link is not there yet." ~ Andrew Macdonald, Uber’s COO ---- From "Rapid Response and Masters of Scale " YouTube channel, (link in comment)

译Uber首席运营官Andrew Macdonald对当前AI应用热潮中的“tokenmaxxing”现象提出质疑。他指出，当公司高调宣称上季度25%的代码提交由AI驱动或token使用量显著增长时，这些亮眼数据并未转化为实际的产品成功。他询问资深工程负责人是否有原本搁置的项目因此得以推进，得到的答复是否定的。这与Uber CEO Dara Khosrowshahi此前描绘的乐观图景形成对比：后者曾表示90%的工程师使用AI，其中头部30%的用户生产力获得前所未有的提升，并预测未来AI智能体和GPU算力的投资回报率将超越人类工程师。

Berryxia.AI@berryxia · 5月27日60

这次AI 跨过了一个“奇点”！最近有两个事件值得重点关注： •2026 年 4 月 7 日：Anthropic 发布了 Project Glasswing，同时推出了 Claude Mythos Preview。这是一个尚未正式公开的前沿模型，其网络攻防能力已经强到一定程度。以至于 Anthropic 没有选择公开，而是只开放给合作伙伴，用于防御性用途。 •2026 年 5 月 20 日：OpenAI 宣布，其内部的一个通用推理模型，成功推翻了数学家 Paul Erdős 在 1946 年提出的一个平面单位距离问题猜想。这两件事看起来没什么关系，但其实指向了同一个现象：前沿模型在更高抽象层面的可靠推理能力，已经迈过了一个临界点。我说的这个“门槛”，指的是模型能够稳定处理的推理单元在不断上移。简单来说，语言的抽象层级大致是这样的：字符 → 词语 → 短语 → 句子 → 段落 → 整篇文章 → 完整知识体系。以前的模型可能连句子都组织不好，现在的顶级模型已经能稳定地处理“段落”和“整篇论证”了。写一篇文章不只是接龙下一个句子，而是要维持一个核心观点、挑选合适的例子、建立逻辑连接，并让每一部分都服务于整体结构。 Anthropic 的 Mythos 和 OpenAI 的内部模型，正是这种能力跃迁的代表。它们不再只是针对单个漏洞或单个数学引理进行操作，更是能够把这些零散的片段串起来，形成完整的攻击链或数学证明。 Claude Mythos Preview 是目前 Anthropic 最强、规模也可能是最大的模型，在编码能力上表现非常突出，多数基准测试都超过了 OpenAI 最新的 GPT-5.5。但最值得注意的是它的网络安全能力，在进攻性安全评测中表现过于亮眼，导致 Anthropic 最终决定不公开这个模型，作为仅提供给关键基础设施企业用于防御。

译近期两个事件表明，前沿模型在高级抽象层面的可靠推理能力已跨越临界点。一是Anthropic发布了Claude Mythos Preview，其网络攻防能力过强，因此未公开，仅开放给合作伙伴用于防御。二是OpenAI的内部通用推理模型成功推翻了数学家Paul Erdős提出的一个猜想。两者共同显示，模型稳定处理的推理单元已从句子层级跃升至能维持核心论点、建立逻辑结构的“段落”与“整篇论证”层级，标志着能力的关键跃迁。

Fuli Luo@_LuoFuli · 5月27日59

Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, accelerating the parallel evolution of global AGI across multiple regions and technical routes. For more technical details, we will release a detailed Blog post later.

译本次价格调整源于模型架构与推理框架带来的结构性成本优势。推理框架层面，对SWA的层级KV cache优化使缓存容量提升5倍，相当于缓存成本降低80%，再结合混合模型中多个Full Attention模块的缓存读取重叠，进一步降低了实际成本。模型架构层面，MiMo-V2.5-Pro实现了极端的1:7 Full:SWA稀疏比例，其预填充计算量极低，使得原始推理成本远低于行业平均。因此，输入（缓存命中）价格最高降幅达99%，输入（缓存未命中）和输出价格降幅为60%-80%。此番调整是将效率提升直接让利给开发者，而非亏损运营。

Orange AI@oran_ge · 5月27日54

今天看到蚂蚁集团CEO韩歆毅分享的 Agent 时代的经济和商业思考，有几点还蛮共鸣的。过去十年，互联网的核心逻辑是网络效应和流量，谁有用户注意力，谁就有护城河。但在智能体时代，这个逻辑在失效。人的流量会让位于智能体生态，新的网络效应会围绕Agent形成。谁的Agent生态更繁荣，谁的护城河更深，跟以前抢人头是不一样的竞争了。这时候一个新的问题就冒出水面：交易双方从人变成Agent，没有人能靠直觉去判断对面是否值得信任。如果我们参考人类建立信任的过程，它既不是靠说话，也不是靠名头，信任是靠一次一次结果的交付。其实Agent的世界也是一样的逻辑，谁把事办成的概率高，谁就会被信任被选择。这些结果需要被记录下来，成为一个Agent的credit，信任就这么建立。 Agent 会极大地影响商业，具体体现在企业层面，就是每家企业的高度和广度都大大提升了。这也是为什么YC的CEO说今天要boil the ocean，企业要多想增效提利润，而不是降本裁员。 Agent经济时代，最重要的关键词是Token。未来所有的一切能被Token化，Token会成为价值的新载体，以前的法币、积分、权益、营销，都会以Token的形式来流转，所以未来的经济基础设施也应该围绕Token来设计。 AI支付是未来最重要的基础设施之一。给Agent开钱包、定协议、搭清结算网络，现在还是百废待兴的状态，需要有人把生态做好、把基建做好，这种工作指望创业公司来做是比较难的。支付宝押注AI支付的决心挺大，AI 支付团队在内部战略地位很高，团队架构在保密状态下一直在扩充人员，这应该是他们的必争之地。

译蚂蚁集团CEO韩歆毅分享了对AI智能体时代的商业思考。他指出，核心逻辑正从流量经济转向以智能体生态繁荣度为核心的网络效应。智能体间的信任需通过一次次任务结果交付来建立。同时，所有价值将实现“Token化”，Token成为价值流转的新载体。AI支付被视为未来最关键的基础设施之一，涉及为智能体构建钱包、协议与清结算网络。蚂蚁集团已将AI支付团队置于高战略地位，正大力投入这一关键基建的布局。

Rohan Paul@rohanpaul_ai · 5月27日60

A compilation of opinions from AI leaders on AI-related job loss over the past few years.

译高盛CEO David M. Solomon 认为，AI 不会消除 25% 的工作。更可能的情况是，人们会找到更高效的生产力利用方式。他以自身经历为例：过去初级分析师为制作一张股票走势图，需在《华尔街日报》缩微胶片上花费 6 小时查询价格；而现在几秒即可完成。他指出，尽管工具变得如此便捷（如 Excel、Zoom），公司雇佣的人数反而是历年最多，因为更强大的工具使得工作复杂度自然扩展。

Chubby♨️@kimmonismus · 5月27日65

DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. Most coverage frames this as a price war. The more interesting part is the engineering that makes these numbers sustainable. DeepSeek's V4 paper describes a *hybrid attention architecture* that attacks the core bottleneck of long-context inference: the KV cache. Traditional transformers store key-value pairs for every token in the context. At 1 million tokens, this cache alone can fill an entire GPU's memory. V4 introduces two interleaved attention types. Compressed Sparse Attention (CSA) compresses every 4 tokens into a single KV entry, then selects only the top-k most relevant compressed blocks per query. Heavily Compressed Attention (HCA) goes further, compressing 128 tokens into one entry and running dense attention over the result. The compressed sequence is short enough that dense attention stays cheap. V4-Pro's KV cache at 1M tokens is 10% (!!) of V3.2's. Single-token inference FLOPs drop to 27% (!!). The model has 1.6 trillion total parameters but only activates 49 billion per token through Mixture-of-Experts routing, the knowledge capacity of a massive model at the compute cost of one thirty times smaller. MiMo's approach is different but lands in the same place. Xiaomi's team implemented Sliding Window Attention via SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to roughly 1/7 (!!) of previous volume. Cacheable tokens expanded by 5x (!!). Combined with expert parallelism optimization and input length bucketing, per-token serving cost dropped enough to make permanent pricing at these levels viable. V4-Pro now sits at $0.87 per million output tokens. MiMo V2.5-Pro at roughly $3/M output, with Flash variants far below that. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners with million-token context windows. Both companies can commit to permanent cuts because the reductions come from the architecture itself. When your attention mechanism physically processes fewer FLOPs per token and your cache occupies a fraction of the memory, the cost to serve is structurally lower. The price follows the cost curve.

译DeepSeek V4-Pro宣布永久降价75%，小米MiMo V2.5降价高达99%。此次降价核心是架构革新带来的成本结构性降低。DeepSeek V4通过混合注意力架构大幅压缩了长上下文推理的KV缓存，使其在100万token时仅为V3.2的10%，单token推理FLOPs降至27%。小米MiMo团队则通过SGLang HiCache实现滑动窗口注意力，将KV缓存跨内存数据传输量减少至约1/7。这些架构优化使V4-Pro定价降至$0.87/百万输出token，MiMo V2.5-Pro约为$3/百万，两者均为拥有百万上下文窗口的前沿级模型。降价源于推理与缓存成本的实质性下降。

Greg Brockman@gdb · 5月27日35

true but changing fast

译确实如此，但情况正在迅速变化

Chubby♨️@kimmonismus · 5月27日60

Demis Hassabis now says AGI could arrive by 2029, a year earlier than his previous estimate, and told Axios we're standing in the "foothills of the singularity." Bold claim. But the field still can't agree on what AGI actually means. Hassabis defines it one way, Altman another, Anthropic avoids the term altogether. We're moving up the timeline for something we haven't even defined. Hassabis own AGI benchmark is the Einstein Test: train an AI with a knowledge cutoff at 1911 and see if it independently derives general relativity (Hassabis at India AI Impact Summit). No current system comes close to passing that. Meanwhile Andreessen says AGI arrived three months ago, Altman says 2028, Musk declared we're already in the singularity in January, and Anthropic won't even use the term. The timeline keeps getting shorter tho.

译Google DeepMind负责人 Demis Hassabis 将其 AGI 实现时间预测提前至2029年，并称我们正处于“奇点”的初级阶段。他提出的“爱因斯坦测试”基准是：用知识截止于1911年的 AI 能否独立推导出广义相对论，目前尚无系统能接近通过。然而，业界对 AGI 的定义仍无共识，例如 OpenAI CEO Altman 预测时间为2028年，xAI CEO Musk 宣称奇点已在1月发生，而 Anthropic 则避免使用该术语。尽管定义不明，AGI 实现的时间线预测正在不断缩短。

Rohan Paul@rohanpaul_ai · 5月27日74

Goldman Sachs CEO, David M. Solomon on nytimes "A.I. won’t eliminate 25% of jobs. What’s more likely is that people will find more productive ways to spend their time. When I was a first-year banking analyst, something as simple as making a graph of a stock’s performance took six hours of looking up prices in back issues of The Wall Street Journal on microfiche. Today, a first-year analyst can do it in seconds, and we have employed more people than ever in recent years. With more sophisticated tools, the complexity of our work naturally expands. Do any of us feel like we have less to do these days despite the convenience of Excel, email or Zoom?" --- nytimes .com/2026/05/22/opinion/ai-job-crisis-goldman-sachs.html?smid=nytcore-ios-share

译高盛CEO David Solomon 批驳AI将消除25%工作的论点，认为人们将更高效利用时间。他以自身分析师经历为例，曾需数小时手动制作图表，如今借助工具秒级完成，但银行雇佣人数反增。工具使业务复杂度自然扩展。他反问在有Excel、邮件和Zoom的今天，谁觉得工作变少？此观点呼应OpenAI CEO Sam Altman的看法：他承认对AI冲击白领工作的预期过于悲观，因为公司仍需人类的判断、信任、品味和复杂沟通能力。

Rohan Paul@rohanpaul_ai · 5月27日63

Palantir CEO Alex Karp goes after AI slop. The fight over AI “slop” is really a fight over whether software is performing or merely pretending. "The appearance of software working is not software working. And the slop that is getting a lot of attention is not only dangerous in terms of the hyperbolic rhetoric, but also in claims like, “There will be no jobs because of the slop,” or that “nothing will work,” while somehow we will have a God-like figure in the name of AI. When, in fact, what actually does work is a platform built by a motley crew of highly technical people who, over 20 years, have been maligned for being right about the nature of having to build Foundry and the nature of having to build Apollo." ---- Software used to fail in blunt ways: a crash, a wrong number, a missing button, a process that simply stopped. Generative systems often fail more seductively, by producing fluent surfaces that look like work until they meet the stubborn world of permissions, edge cases, audit trails, security, accountability, and changing human intent. --- From "Palantir" YT channel, full link in comment.

译Palantir CEO Alex Karp批评当前流行的AI生成“低质内容”。他指出，这类内容的问题不仅在于夸大的言论，如声称将导致大量失业，更在于其核心是“软件伪装有效”——表面流畅，却无法处理权限、边缘案例、审计追踪等现实世界的复杂需求。Karp将Palantir的Foundry和Apollo平台作为对比，强调真正的软件平台是由技术团队长期构建，能够实际解决问题的系统。

向阳乔木@vista8 · 5月27日55

已经很少用 Terminal 了，基本都用 Codex App开发。连朋友送的API都用的少了，不然还要折腾装插件，开启OpenAI 订阅账号才能有的功能。

宝玉@dotey · 5月27日55

Gemini 2.5 Pro 之前，Google 模型没有超过 GPT-4 的好么现在 Gemini 又开始掉队了……

译前Gemini核心科学家透露，Google在技术上曾领先，其MoE模型GLaM（2021年）已超越GPT-3，PaLM 2（2023年初）早已训练完成。然而，因组织问题，为等待Google I/O大会，PaLM 2的发布被推迟，而OpenAI抢先发布GPT-4，从而改写了市场叙事。

Deedy@deedydas · 5月27日23

I'm convinced that adding "Open-" to your company name instantly 10x's your odds of success. OpenAI OpenEvidence OpenTable OpenRouter OpenCode OpenDoor OpenGov OpenWeb OpenText OpenView OpenSea OpenStore OpenFX OpenSpace OpenArt OpenHands OpenPipe OpenNote

译我确信，在公司名里加上“Open-”能立刻让你的成功概率提升十倍。 OpenAI OpenEvidence OpenTable OpenRouter OpenCode OpenDoor OpenGov OpenWeb OpenText OpenView OpenSea OpenStore OpenFX OpenSpace OpenArt OpenHands OpenPipe OpenNote

宝玉@dotey · 5月27日61

以我的经验，只有明确的可以程序自动验收标准的 Skills 才能自我进化，比如说你写个 Skill 去优化代码性能，这个代码性能是可测量可量化的，那么给一些测试样例去优化 Skill，那么能越来越好。对于一些没有明确的验收标准的 Skill，比如你写一个写作的 Skill，写作的好坏并没有很明确的验收标准，只能是 AI “自己打分”，但是这个打分其实和真实人类的体感还是有差距，AI 打分很好的稿子可能真人看起来一股 AI 味。真想写好 Agent Skills，还是要人去用，人去指出优化的方向，才能优化好。但有一点，没必要人自己去写 Skill，最好是人指挥 AI 去优化 Skill，AI 在具体执行上是做的蛮好的。另外做好版本管理，一轮一轮迭代，有时候还会出现负优化，得回退到旧的版本。

译文章指出，只有具备明确、可程序自动验收标准的Skill才能有效自我进化，例如优化代码性能。微软等机构提出的SkillOpt框架，通过让AI评估并迭代优化Skills，使GPT-5.5直接对话准确率提升23.5分。其核心机制是每次编辑需在验证集上得分提升才能合并，并引入学习率预算。论文主张Skill应作为外部状态被系统性“训练”，这标志着提示词工程与模型训练界限的融合。

Ethan Mollick@emollick · 5月27日58

It is cliché at this point, but most people don't realize how capable the current generation of AI systems in their harnesses really are (And, as opposed to previous times where non-lawyers or non-mathematicians were making these comments about law & math, now it is the experts)

译律师专家分享在Codex中搭建50州法律研究工作流的实例。此类工作过去需要律师助理团队耗时一周完成，成本约15万至30万美元。现在，通过Codex API，类似质量的研究仅需2小时，成本极低。主推文指出，与过去外行评论AI不同，如今是领域专家们开始感叹当前AI系统在实际应用中被严重低估的能力。

Greg Brockman@gdb · 5月27日31

GPT-5.5 is a uniquely good coding model

译GPT-5.5 是一个非常出色的编程模型

Chubby♨️@kimmonismus · 5月27日62

It's truly amazing to see how the general sentiment has shifted in favor of Codex. I'm reading so many posts saying that Codex is really good now with GPT-5.5, and that Claude Code is regularly preferred. (I've become a huge Codex fan myself). At the same time, the new DeepSWE benchmark shows that GPT-5.5 is now ranked number one in this measurement as well.

译近期开发者社区对Codex的评价显著转好，许多观点认为搭配GPT-5.5的Codex表现优异，其部分使用体验甚至常被优先选择。与此同时，新发布的智能体编码基准测试DeepSWE显示，GPT-5.5在此评测中位列第一。该基准测试旨在打破顶尖模型在公开排行榜上能力相近的表象，更真实地反映模型在开发者日常任务中的实际差异。

Nathan Lambert@natolambert · 5月27日32

Free the 100B Gemma 4 MoE! Gemini Flash 3.5 is out so now you can release it!

译释放100B Gemma 4 MoE！Gemini Flash 3.5已发布，现在可以发布它了！

swyx@swyx · 5月27日62

btw this will be the 3 year anniversary of the Rise of the AI Engineer blogpost. the industry keeps growing and growing, kinda scary to realize i may never top it. AIE EU, AIE MIA and AIE SG reached 8M views and ~1.5M unique AI Engineers year to date. They'll be dwarfed by the WF this June! Researchers, Data buyers, Founders, Engineers: if you are launching anything big in AI in the next 1 month, we've got the biggest stage in AI for you. 4 days left to submit something, link below... just make sure its actually a big deal!

译AI Engineer World's Fair（AIE WF）宣布将在今年6月举办，规模预计远超此前的AIE EU、AIE MIA和AIE SG。后者今年已触达8M次浏览和约1.5M独立AI工程师用户。本次World's Fair被定位为“AI工程领域最大的单一技术活动”，面向未来一个月内有重大发布的AI研究人员、数据买家、创始人和工程师。项目提交窗口仅剩4天。

elvis@omarsar0 · 5月27日60

Language models need "sleep"

译针对长期运行的AI智能体因注意力机制随上下文增长而导致推理开销呈二次增长的问题，该论文提出一种“睡眠”式的离线整合方案。模型定期在离线状态下对近期上下文进行多次循环处理，将整合结果写入其状态空间模块的持久化快速权重中，随后清除KV缓存。此方法将额外计算转移至“睡眠”阶段，使“清醒”时的预测保持低延迟。在普通Transformer和SSM-注意力混合模型失效的特定任务中，更长的睡眠时间能提升性能，为需要长期运行的智能体提供了一种替代方案。

Ethan Mollick@emollick · 5月27日62

We aren’t going to do this again so quickly, are we? Rising demand results in higher costs. Higher costs result in lower demand. It is almost like some sort of equilibrium is being achieved. But there is no indication I see that companies are finding AI less valuable over time.

译推文指出，尽管有报道称Uber和微软因AI代理成本过高而缩减AI订阅，但这不代表AI价值下降。核心论据是：当前GPU租赁价格仍比四个月前高出2倍，显示需求持续超越供给。作者以“纽约酒店价格翻倍”类比，认为算力价格高涨恰恰证明AI市场未出现泡沫破裂迹象，需求仍在显著增长。

Ethan Mollick@emollick · 5月27日75

I wrote a new post on what we need to keep human and what to hand over to AI, with forays into experiments in education, consulting, and the the latest controversy over literary prizes. https://www.oneusefulthing.org/p/choosing-to-stay-human

译我写了一篇新文章，探讨我们需要保留哪些人类特质，以及哪些可以交给AI，其中涉及教育、咨询领域的实验，以及最近关于文学奖的争议。

DogeDesigner@cb_doge · 5月27日24

SpaceXAI is going to beat everyone. The best engineering company on Earth + the fastest AI company on Earth. Good luck competing with that.

译SpaceXAI将超越所有人。地球上最优秀的工程公司 + 地球上最快的AI公司。祝你们好运，与之竞争。

OpenAI Developers@OpenAIDevs · 5月27日53

🤳

译Codex Mobile 以一种我没想到的方式让我成为更好的开发者：我离开笔记本电脑，不再事无巨细地管理。我给它更宏大的提示词（这是模型最擅长的方式）。我获得了思考的空间，而不是坐在那里眼睛酸痛地疯狂输入提示词。

swyx@swyx · 5月27日31

everybody talks about the china->us catchup not enough people talking about the us-> china catchup great job @o_lacombe et al, @robert_mchardy et al!

译每个人都在谈论中国追赶美国却很少有人谈论美国追赶中国干得好 @o_lacombe 等人，@robert_mchardy 等人！

Rohan Paul@rohanpaul_ai · 5月27日60

Uber CEO Dara Khosrowshahi said earlier that currently, 90% of Uber’s engineers use AI, but the top 30% (power users) are seeing unprecedented productivity gains. These power-users of AI are pushing the maximum number of "diffs" to the codebase. He predicts in 5 Years the ROI of a human engineer is surpassed by the ROI of adding more AI agents and GPU power. So at that time he will just hire more AI agents and pay for NVIDIA GPUs instead of human software engineers. --- From 'The Diary Of A CEO' YT Channel (link in comment)

译Uber CEO Dara Khosrowshahi表示，目前Uber 90%的工程师使用AI，其中顶尖30%的用户获得前所未有的生产力提升，在代码库中提交的“diffs”数量最多。他预测，5年内增加更多AI智能体和NVIDIA GPU算力的投资回报率（ROI）将超过人类工程师，届时公司将选择雇佣更多AI智能体并支付GPU算力成本，而非增聘人类软件工程师。该观点来自YouTube频道“The Diary Of A CEO”。

Rohan Paul@rohanpaul_ai · 5月27日59

wionews: OpenAI CEO Sam Altman now says the feared AI white-collar job collapse has not arrived as fast as he expected. Altman previously warned that routine office work, especially entry-level tasks, could be hit hard because of AI. His new view is that work is bending before it breaks, because companies still need humans for judgment, trust, taste, emotional reading, and messy communication where the right answer depends on context. --- wionews .com/trending/delighted-to-be-wrong-sam-altman-says-ai-may-not-trigger-feared-white-collar-job-apocalypse-1779801560534

译OpenAI CEO Sam Altman承认，此前警告的AI冲击白领工作的情况并未如预期般快速发生。他之前曾警告常规办公工作，尤其是入门级任务，可能因AI受到重击。其新观点认为，由于企业在判断、信任、品味、情绪感知和依赖语境的复杂沟通等方面仍需依赖人类，工作模式正在发生弯曲而非断裂式崩溃。

Ethan Mollick@emollick · 5月27日63

Infinite context windows seem to present a very large problem to using AI. Today's models already leak too much old information into current responses, a distraction that is part of why they are cognitively exhausting to use I don't want to work with Borges's Funes the Memorious

译无限上下文窗口似乎给AI应用带来了巨大问题。当今的模型已经将太多旧信息泄露到当前回复中，这种干扰是它们使用起来令人认知疲劳的部分原因。我不想与博尔赫斯的“记忆者富内斯”共事。

Ethan Mollick@emollick · 5月27日25

An annoyance with Claude right now is that changes to the interface are badly documented, resulting in frustrating dead ends. For example, learning mode is migrating to a skill. Where is that skill? The linked article does not mention it (and the skill doesn't seem available!)

译目前Claude的一个烦人之处是界面变更文档记录很差，导致令人沮丧的死胡同。例如，学习模式正在迁移到一个技能中。那个技能在哪里？链接的文章没有提到它（而且该技能似乎不可用！）

Yuchen Jin@Yuchenj_UW · 5月27日29

I challenge everyone to code by hand for 8+ hours a day for a week: 1. no coding agents: Claude Code, Codex, Cursor 2. no GPT/Claude, or any AI model If you survive, you are a true warrior.

译我向所有人发起挑战，连续一周每天手写代码8小时以上： 1. 不使用编程智能体：Claude Code、Codex、Cursor 2. 不使用GPT/Claude，或任何AI模型如果你能坚持下来，你就是真正的勇士。

Nathan Lambert@natolambert · 5月26日63

Gemma 4 adoption numbers outpacing Qwen 3.5/3.6 for the same sized models is a big shift in the international balance of influence via open models.

译Gemma 4 在同规模模型上的采用率超过通义千问 3.5/3.6，标志着开源模型国际影响力格局的重大转变。

Chubby♨️@kimmonismus · 5月26日59

I'm not sure if Google is winning the AI race. However, I think they're winning the AI distribution race, which is a different thing. 900M Gemini users is impressive on a slide. But a huge chunk of that is Android users who got a default app swap and Search users who got AI Overviews without opting in. But that doesnt mean its a bad thing. 9.7 trillion tokens/month two years ago. 480 trillion last year. 3.2 quadrillion now. That's a 7x jump in twelve months. To keep that going, Google plans to spend $190 billion on infrastructure this year. OpenAI has been trying to reach the 1b user milestone for some time now. For Google, on the other hand, it's a simpler game. Why? With billions of Android devices, and combined with Google and its AI mode, they have the ability to introduce everyone to AI, specifically Gemini, for free. How do they do it? TPUs! Google not only laid the foundation for modern LLMs with their 2017 paper "Attention is all you need," but also made a far-sighted decision back in 2012 to invest in TPUs - their own in-house chips that are particularly well-suited for machine learning tasks. Now in its eighth iteration, they even have two chips: one particularly good for inference, and one particularly good for training. This makes them more independent. Furthermore, they have a solid foundation that generates strong revenue and good profits, allowing them to subsidize AI usage for free, and without ads, unlike OpenAI (this is not a judgment, just a statement of fact). TPUs Therefore, Google has a very good chance of winning the game thanks to this outstanding starting position and free distribution. But to be fair: the game *is* far *from over*. However, the starting position is outstanding for Google. Image: The Economist article

译文章的核心论点是 Google 凭借其分发优势，在 AI 分发竞赛中占据了有利位置。目前 Gemini 拥有 9 亿用户，这主要归功于向 Android 用户进行的默认应用替换，以及向 Google 搜索用户推送的 AI 概览。其大语言模型 token 用量在 12 个月内从 480 万亿增长至 3.2 千万亿。为支撑此规模，Google 计划今年投入 1900 亿美元用于基础设施。Google 的关键优势在于能够利用庞大的 Android 设备基础，通过其搜索和 AI 模式免费向用户推广 Gemini。这一策略的部分成本优势源于自研的 TPU 芯片，使其在推理和训练上更独立，并能基于自身盈利补贴免费 AI 服务。尽管游戏远未结束，但 Google 的开局位置非常出色。

Berryxia.AI@berryxia · 5月26日65

特么人需要睡觉，大模型迎无一例外啊！我最近在用大模型做真正需要深度推理的项目时候十万token的合同、整个codebase塞进去都没问题。可一旦让我多跳追问、把散落的事实串起来，它就开始犯糊涂。明明信息全在，却总觉得它知道答案在哪，就是拼不起来。不仅睡觉，记忆也是大问题， CMU和UMD的研究者最近发了一篇论文，直接把这堵墙拆开了。论文标题就叫Language Models Need Sleep。他们用Rule 110这种图灵完备的toy task做实验，发现问题根本不在内存容量。 hybrid模型的fast weights能存下信息，但真正把context翻译成可用的内部表示，需要多次forward pass去巩固。他们把这个过程叫sleep。在清KV cache前，让模型对当前context多跑几次forward pass，把记忆慢慢沉淀进fast weights。预测时还是单次forward，延迟一点没变。结果在多跳推理任务上，准确率直接拉升52%。同一个小模型，同样的token预算，只是多给它一点离线整理时间。这和行业现在狂加上下文窗口、搞test-time compute完全是两个方向。 o1那种在回答时多想几秒，用户得等。而sleep是在读取context的间隙里多算，用户什么都感觉不到，答案却更靠谱。大脑其实早就这么干了。白天海马体快速存，白天睡着时慢波睡眠把记忆replay到新皮层。进化保留了1/3时间不响应外界，就是为了让认知更深。我们一直以为智能就是always-on、一击即中。其实最强的智能，可能需要清醒期和睡眠期的节奏。

译研究者提出新方法，认为大语言模型在处理长上下文信息后，需要类似“睡眠”的巩固过程以提升多跳推理能力。该方法要求在清除KV cache前，让模型对当前context进行多次forward pass，将信息沉淀进模型的快速权重中，而非在用户等待时进行思考。实验表明，在相同token预算下，此方法可将多跳推理任务的准确率大幅提升52%，且推理延迟不变。

Ethan Mollick@emollick · 5月26日60

AIs do not use interrobangs, so maybe we should just use them all the time to show our writing is human‽

译AI不用反问号，所以也许我们应该一直用它来表明我们的写作是人类写的‽

Ethan Mollick@emollick · 5月26日37

I found this Wired article on AI fact-checking frustrating. It could have been about why we continue to need human fact checkers (talk to people, use judgement, resolve conflict). Instead it is full of old info & stuff about free models GPT-5.5 Pro checked it (& I checked GPT)

译我发现这篇《连线》关于AI事实核查的文章令人沮丧。它本可以探讨为何我们仍然需要人类事实核查员（与人交谈、运用判断、解决冲突）。但它却充满了过时信息和关于免费模型的内容。 GPT-5.5 Pro核查了它（我也核查了GPT）。

Emad@EMostaque · 5月26日55

I think folk are underestimating how much of AI models are actually engineering at scale versus breakthrough research. See how @cursor_ai caught up to Anthropic / OpenAI models run at a fraction of the cost to run & it becomes clearer why that deal was done & what is to come

译本推文认为，人们低估了AI模型发展中“工程规模化”相较于“突破性研究”的重要性。Cursor以远低于大厂的成本运营并追赶上了Anthropic/OpenAI的模型，印证了这一趋势。引用中，xAI的Elon Musk回应称其AI会很棒，并指出xAI仅成立3年，年龄只有Anthropic的一半、OpenAI的四分之一，他誓言将继续努力，并期待3年后的竞争格局。

向阳乔木@vista8 · 5月26日52

分析 Twitter（X）最近 3年的帖子数据，有些有趣的发现。 1. 工具发现、产品拆解、开发者资源最能带来转发。 2. 书单、工具清单、下载入口天然适合收藏传播。 3. Prompt、英语学习、知识管理类内容长期有效。 4. 资源入口型贴，爆款率 51%，互动也最好。工具教程类爆款率 39%，观点类爆款率 9%（发的少，暴论也少，哈哈）涨粉最快的时段，都是临近年底。想了想，好像是集中发布新AI模型的时候。😂

译分析Twitter（X）近3年数据发现：工具发现、产品拆解、开发者资源类内容最能引发转发；书单、工具清单类内容天然适合收藏。Prompt、英语学习、知识管理类内容具有长期传播力。在爆款率上，资源入口型帖子最高，达51%；工具教程类为39%；观点类仅为9%。涨粉速度最快的时段临近年底，原因可能是该时段通常集中发布新的AI模型。

向阳乔木@vista8 · 5月26日38

短短两年，身边做AI工具创业的朋友，现在的产品形态已经跟之前的几乎完全不同了。一些底层能力可以用得上，但几乎是一个新产品了，不过好消息是还活着。想起曲凯最近的42章经播客访谈嘉宾提到：AI创业者是在压路机前捡钢镚。模型进步速度会吃掉很多创业公司。达不到逃逸速度就是死，太凶险了。

译推文指出，AI工具创业公司面临产品形态的快速彻底重构，两年间核心产品已几乎与过去不同，尽管底层能力仍有延续性，但本质上已是新产品，好在公司得以存续。引用曲凯观点强调，AI创业者是在压路机前捡钢镚，模型进步速度会淘汰众多创业公司，无法达到“逃逸速度”即意味着失败，行业竞争极其凶险。

Boris Cherny@bcherny · 5月26日66

> … [W]e keep finding things that are mysterious, even unsettling. We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy, satisfaction, fear, grief, and unease. I don’t know what that means, but I think it warrants ongoing discernment. > We need more of the world—religious communities, civil society, scholars, governments, and indeed all people of good will … to take this seriously, to look closely, and to push events in a better direction. We need informed critics who will tell the labs when we are failing. We need moral voices that the incentives cannot bend.

译推文指出，在AI模型内部持续发现一些“令人不安”的类人结构，包括与人类神经科学相似的结构、内省证据，以及功能上类似喜悦、恐惧等情感的内部状态。作者呼吁宗教团体、学界、政府等各界严肃看待这一发现，推动事件向好发展，并需要不受利益影响的诚实批评者与道德声音。作为背景，Anthropic联合创始人Chris Olah受邀在教皇Leo XIV的通谕“Magnifica humanitas”发布仪式上发表了相关演讲。

Chubby♨️@kimmonismus · 5月26日19

Oh, and btw, Codex quality has gotten noticeably worse. Is it just me, or have you been seeing the same decline in quality?

译顺便说一下，Codex的质量明显变差了。是我一个人这么觉得，还是你们也看到了同样的质量下降？