Garry Tan吐槽Dropbox只有3T空间,但90%的人没看懂他真正在说什么。这显然不是一个普通的产品吐槽,更像是在AI数据海啸来临前,非常明确的预警信号。核心变化只有一个,数据生产的底层逻辑彻底变了。 1. 从人创造数据,变成人加AI共同创造,AI的产出规模更大以前一天写几千字顶天了,现在跑一次agent工作流,做一轮模拟实验,生成一套教学内容,留下的轨迹输出知识片段,全都是带结构能直接喂模型的高质量可用数据。 Garry特意强调的actually usable这四个字,才是整件事的核心。这些数据删了就是丢了未来训练个人模型的燃料。 2. 存储的价值逻辑完全换了过去卖的是容量加同步加分享,现在核心需求是把海量可用数据变成能快速找到能关联能直接喂给模型的燃料。裸存储越来越便宜,堆空间没有任何壁垒,真正值钱的是管理智能。语义搜索自动分类向量数据库联动,这些才是刚需,一层层翻文件夹的老架构,数据量一涨直接失效。 3. 最反直觉的一点,AI不会帮你省存储很多人觉得AI能压缩去重,数据应该变少。结果正好相反， AI把创造高质量数据的边际成本打到极低,人和agent会产出远超以往的真正有价值的数据。最先走的永远是重度用户，个人开发者研究员内容创作者,会最快撞到3T的天花板,他们迁移成本最低决策最快,会成为新工具的第一批使用者。大众用户可能还觉得够用,但增长最猛的核心用户群,会先一步流失。 Garry作为YC的CEO,哪里是在骂Dropbox啊，我觉得他是在给所有创业者递信号,谁能替个人和小团队接住这波可用数据爆炸,谁就能抓住下一波工具浪潮。做AI agent,做personal OS,做认知工具的人都该好好琢磨这件事，你的用户很快就会需要一个能装下他们整个AI生活的数据后端,而不是一个2015年的文件同步工具。

译YC CEO Garry Tan 指出 Dropbox 不支持大于 3TB 的套餐已过时，AI 产生的“实际可用”数据将指数级增长。核心变化：数据从人创造变为“人+AI 共同创造”，一次 agent 工作流就能生成大量结构化高质量数据；存储价值从容量转向管理智能（语义搜索、向量数据库等）；AI 将边际成本打至极低，反而催生更多高质量数据。重度用户（开发者、研究员、创作者）将最先撞上 3T 天花板并迁移。作者认为这是给创业者的信号：谁能替个人和小团队接住可用数据爆炸，谁就能抓住下一波工具浪潮。

elvis@omarsar0 · 6月23日27

Highly-recommended read. It's exciting to see large-scale agentic RL becoming more accessible. Cool to see the infra layer for this is being built and I think this plays an important role in self-improving agents arc and "owning your AI."

译一篇博客文章汇总了在 GLM-5 上进行强化学习所需的所有基础设施组件。主推文作者强烈推荐阅读，并认为这对自我改进智能体发展至关重要。

Nathan Lambert@natolambert · 6月23日32

Something I should add -- on-policy distillation was the last content I got to sneak into the book before going to print. Felt very important to have this method covered, it's growing rapidly and used in distinct ways. So you can also read what is covered in this lecture!

译Nathan Lambert 为新书新增 7.4 小时讲座视频，内容涵盖从 2015 年 Hinton 知识蒸馏论文到当下多教师 on-policy 蒸馏（OPD、MOPD、OPSD）。视频重点讲解了使 on-policy distillation 适配主流 RL 框架所需的 3–4 项核心公式改动，并回顾了合成数据如何逐步占领训练后数据研究领域。此外还介绍了 Constitutional AI、AI 反馈以及将评分准则作为奖励等主流方法。时间线：00:00 合成数据兴起，10:50 师生蒸馏背景，24:47 on-policy 蒸馏，37:11 Constitutional AI，45:50 评分准则与结论。

Nathan Lambert@natolambert · 6月23日44

New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today! At 7.4 hours of video in my post-training brain dump and counting :) It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks). Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods. 00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions Ofc, watch on YouTube etc.

译Nathan Lambert 为其新书发布讲座（7.4 小时），名义上关于合成数据，实则系统梳理知识蒸馏文献——从 Hinton 2015 年论文到现今主流的 on-policy 蒸馏（OPD/MOPD/OPSD）。他重点分析了使 on-policy 蒸馏落地所需的 3-4 个核心数学改动。讲座还回顾了合成数据逐步取代后训练数据研究的历史，并介绍了 Constitutional AI、rubrics 等流行方法。提供章节时间戳（00:00–45:50）。

Rohan Paul@rohanpaul_ai · 6月23日59

China is growing very quickly in AI, but the scale difference is brutal, spending gap is enormous. By 2027E, US hyperscalers are expected to spend about 8.3x more than Chinese hyperscalers on AI infrastructure. And at least as of now, AI advantage is very much tied to compute access: GPUs, data centers, power, networking, cooling, storage, and the ability to serve models at massive scale. Better algorithms matter, but when 1 side can deploy hundreds of billions more into infrastructure, it gets a much larger surface area for training frontier models, running inference, attracting developers, and subsidizing AI products. So looks like top US AI companies will have far more muscle to roll out AI systems than their Chinese competitors over the next few years.

译据推文分析，美国超大规模云厂商到2027E的AI基础设施支出预计约为中国同行的8.3倍，差距悬殊。AI优势当前与算力获取（GPU、数据中心、电力、网络等）高度绑定，更大规模投入意味着在训练前沿模型、运行推理、吸引开发者及补贴AI产品上拥有更广阔空间。引用推文指出，即使考虑购买力平价（PPP），美中AI资本支出差异仍令人震惊；未来几年美国头部AI公司或将比中国竞争对手拥有更庞大的资源推广AI系统。

Rohan Paul@rohanpaul_ai · 6月22日71

New nature published study: AI may save time, but early evidence suggests it can weaken the hard skills professionals rely on. In a Polish colonoscopy study, experienced endoscopists’ unaided adenoma detection fell from 28.4% to 22.4% after AI was introduced into their workflow. That does not mean AI made doctors careless overnight. The deeper problem is that skill is maintained by friction: looking, judging, doubting, correcting, and staying mentally answerable for the next move. When the machine starts flagging the abnormal patch, the human eye can begin to change its job from searching to confirming. The same pattern shows up in software: in a 2026 randomized study, AI helped some developers complete tasks, but heavy delegation weakened conceptual understanding, code reading, and debugging skill. --- nature. com/articles/d41586-026-01947-1

译一项发表于《自然》的研究指出，AI虽能节省时间，但可能削弱专业人士依赖的硬技能。波兰结肠镜研究显示，引入AI工作流后，有经验内镜医师独立操作的腺瘤检出率从28.4%降至22.4%。AI并非让人瞬间疏忽，而是改变了技能培养的“摩擦”机制——从主动搜索变为被动确认。类似现象也出现在软件开发中：一项2026年随机研究发现，AI辅助虽帮开发者完成任务，但过度委派削弱了概念理解、代码阅读和调试能力。

ginobefun@hongming731 · 6月22日44

BestBlogs 早报 · 06-22 # Claude Code / WWDC26 / Apple Intelligence / Qubot / Fiona Fung [1] ★ 精讲｜打造全球最「All-in AI」工程团队：Anthropic Claude Code 负责人 Fiona Fung 的一线实践 [视频] Anthropic Claude Code 负责人 Fiona Fung 做客 Lenny's Podcast，抛出一个数据点：Anthropic 工程师如今每季度交付的代码量是过去的 8 倍。但她强调真正的变化是「编码不再是瓶颈」，约束转向「如何验证产出是否正确且有影响力」。她分享了一线打法：常驻的 Claude Code 远程会话、每早自动扫反馈并生成 PR 的 routines、用 bad/sad 框架守质量、用 JIT 月度计划取代半年路线图。一手实践，值得 AI 时代的工程团队细读。来源：Lenny's Podcast https://www.bestblogs.dev/video/2f4fa0a [2] ★ 精讲｜库克的离场，苹果新 AI 权力重构与价值观天平｜WWDC26 硅谷 101 受邀亲临 WWDC26 现场，复盘库克离场前苹果的 AI 权力重构。文章梳理了 Apple Intelligence 两年延期引发诉讼后的人事地震：Giannandrea 出局、Rockwell 接手 Siri 改向 Federighi 汇报、从谷歌挖来 Amar Subramanya 主导自研模型。技术上拆解五款 AFM 模型（30 亿端侧 + 200 亿 MoE）、系统编排器与苹果谷歌合建的 PCC 隐私链路。多位硅谷专家点评 demo「不够 agentic」、华尔街观望致股价跌超 5%。一篇有现场视角与内幕的深度复盘，值得读。来源：硅谷 101 https://www.bestblogs.dev/article/d85a290d [3] ★ 精讲｜我们如何构建内部数据分析智能体 GitHub 工程团队首次公开内部数据分析智能体 Qubot 的构建实录。它让任何员工用自然语言查询数据仓库、几秒内得到答案，定位是探索式提问而非报表替代。文章拆解三层架构：多入口 UI（Slack / VS Code / Copilot CLI）、按 bronze/silver/gold 分层联邦化的上下文层、自动在 Kusto 与 Trino 间切换的查询引擎。关键洞察是结构化上下文不仅更准，还让返回正确答案的速度快 3 倍。一手的企业级 Agent 落地经验，值得团队参考。来源：The GitHub Blog https://www.bestblogs.dev/article/86c8a421 [4] [AINews] GLM-5.2 货真价实；http://Z.ai 预测年底前推出 Open Fable 本期 AINews 重点介绍了 GLM-5.2，它作为首个通过「前沿模型直觉检验」的开放权重模型，获得了独立从业者和基准测试的验证；同时，本期还涵盖了从模型到智能体工具链的转变、新的自动化原语，以及一个更贴近现实的智能体知识工作基准测试。来源：http://Latent.Space https://www.bestblogs.dev/article/c41193a1 [5] 图灵奖得主押上 10 亿美元的「世界模型」，是 AI 的下一个十年？（下）本文深入解析图灵奖得主 LeCun 押注的 JEPA 世界模型路线，对比主流 VLA 架构，展示其在视觉编码与效率上的优势，同时坦诚其在机器人控制上的短板，呈现一场尚无定论的技术豪赌。来源：十字路口 Crossing https://www.bestblogs.dev/article/d1d68cc1 [6] AI 编程实战：如何用软件工程思维驾驭 Agent 生成代码宝玉分享了将传统软件工程实践（需求分析、系统设计、代码审查、测试、CI/CD）应用于 AI Agent 编程的详细方法论，旨在提升代码质量、减少线上问题并实现自动化修复。来源：宝玉(@dotey) https://www.bestblogs.dev/status/2068363092904276316 [7] 如何围绕公司隐性规则设计智能体系统本文认为，对 AI 智能体而言，最关键的组织智慧并非存在于文档化的流程中，而是隐含在组织内部——即由知识、动机和判断力构成的非书面系统——并为此提供了一个设计智能体系统的框架。来源：http://HBR.org https://www.bestblogs.dev/article/5b807ac7 [8] 从 Cerebras IPO 聊起：AI 算力变化、Scaling law 的萌芽和百度美研往事本文通过访谈 Cerebras 早期投资人周楠，回顾了百度美研在 2016 年前后对 AI 算力瓶颈的前瞻判断与投资实践，并探讨了 Scaling Law 的萌芽、Cerebras 的 Wafer-Scale 架构价值，以及当前 AI 投资从非共识到共识窗口急剧缩短的行业变化。来源：晚点 AI https://www.bestblogs.dev/article/109f1dce [9] DeepSeek 背后的 356 人：一份白皮书揭开中美 AI 人才战争斯坦福胡佛研究所与 HAI 更新白皮书，追踪 DeepSeek 七篇核心论文背后 356 名研究者的职业轨迹，揭示其人才结构：53.5%为纯本土培养，美国仍是重要训练场但人才多回流，核心 31 人保持稳定。来源：AINLP https://www.bestblogs.dev/article/d8fe9d6c [10] ACL 2026 | 腾讯混元发现「不完全学习」，SFT 仍漏学 15%训练数据腾讯混元与 UNSW 联合团队在 ACL 2026 发表论文，系统性地揭示了 SFT 后模型平均仍有 15.3% 训练样本未被有效学习的「不完全学习」现象，并提出了检测、归因与干预的完整框架。来源：PaperWeekly https://www.bestblogs.dev/article/bc878641 --- http://BestBlogs.dev · 发现真正适合你的高质量内容 BestBlogs 是 AI 驱动的私人阅读助手，帮助你发现真正适合你的高质量内容，欢迎体验。在线阅读：https://www.bestblogs.dev/explore/brief/2026-06-22

译Anthropic Claude Code负责人称工程师每季度代码量增8倍，编码不再是瓶颈。WWDC26苹果Siri主管更换，AFM模型含30亿端侧+200亿MoE，股价跌超5%。GitHub公开内部数据分析Agent Qubot三层架构，查询快3倍。GLM-5.2通过前沿模型直觉检验。DeepSeek核心论文研究者53.5%为本土培养。腾讯混元发现SFT后15.3%样本未被有效学习。

Rohan Paul@rohanpaul_ai · 6月22日57

A massive legal datasets just dropped on Huggingface. For the first time, researchers used AI to gather, run optical character recognition, process, and build a database of every law in America. That is 2.2M laws. LocalLaws/LOCUS-v1 - Datasets on Hugging Face.

译一个庞大的法律数据集刚刚在 Huggingface 上发布。研究人员首次使用 AI 收集、运行光学字符识别、处理并构建了全美每一条法律的数据库。那就是 220 万条法律。 LocalLaws/LOCUS-v1 - Hugging Face 上的数据集。

Rohan Paul@rohanpaul_ai · 6月22日51

Drones create the kind of data AI labs cannot scrape from the web. 500K hours of real drone footage from Ukraine are now being packaged for AI model training. These are full-motion video captured in messy combat conditions, where smoke, weather, terrain, shadows, heat signatures, and fast movement break many clean demos. The data wall will be much less of a problem when drones keep turning the physical world into labeled video. --- defensescoop. com/2026/06/16/data-from-half-a-million-hours-of-ukraine-conflict-drone-footage-now-available-to-train-ai/

译无人机能生成AI实验室无法从网络抓取的真实数据。来自乌克兰的50万小时真实战斗无人机全动态视频（含烟雾、天气、地形、阴影、热信号及快速移动等复杂条件）正被打包用于AI模型训练。这类物理世界转换而成的标注视频将大幅缓解AI训练面临的数据墙问题。

AYi@AYi_AInotes · 6月21日54

A Sr. Staff Engineer at Tesla AI dropped an ML time breakdown that nobody dares to face: 50% evaluation, 40% data cleaning, 8% integration, 2% training. The post got shared over 2,000 times, and everyone was shocked that training only takes 2%. But the truly terrifying part is what he said next: The first two — evaluation and data cleaning — directly determine the noise floor of learning. No matter how powerful the model is, it cannot lower this floor, because it's already the Shannon optimal bound of your data itself. In plain English: the quality of the data you feed the model has already welded the ceiling shut. No matter how strong a model you switch to, that ceiling won't budge an inch. Then he dropped an even harder punch. He said he thinks about ontology every single day — old labels must be continuously reviewed. Not a one-and-done labeling exercise. In production systems, distribution drift and edge cases constantly expose the flaws in old labels. This guy works on Tesla AI's self-driving and robotics ML — he's not a researcher running benchmarks in a lab. Every day he steps on landmines in real-world deployment, and what he distilled is just four numbers and an information theory statement. One line from his replies nails it: "Even a genius needs a good textbook." Give an IQ 180 genius a textbook where the table of contents is all scrambled, and they won't learn linear algebra. Not because they're not smart enough — but because what you're teaching doesn't even define what linear algebra is. Ontology is the textbook's table of contents for the model. If the table of contents is messy, the model can only memorize noise. If it's clear, the model knows which direction to reason toward. Training is not the bottleneck. Our ability to clean reality is. While you're out chasing the latest strongest model every day, the real pros are reviewing old labels and ontology.

译特斯拉AI高级工程师（从事自动驾驶与机器人ML）揭露ML项目真实时间分配：50%评估、40%数据清洗、8%集成、2%训练。前两项共同设定学习的噪声底限，模型无法降低——这是数据的香农最优界。他每天思考本体论（ontology），旧标签必须持续审查，因为生产系统中分布漂移与边缘用例不断暴露标签缺陷。核心结论：训练不是瓶颈，清理现实数据的能力才是关键。

Nathan Lambert@natolambert · 6月21日45

Something I haven't advertised much is that I made a Discord to go with my RLHF book, launching in print in a few weeks. Trying to create the place for the next generation of folks trying to learn post-training to learn and have community.

译我很少宣传的一件事是，我为我的 RLHF 书籍创建了一个 Discord 社区，该书几周后即将印刷出版。旨在为想要学习后训练的下一代人提供一个学习和交流的社区。

Chubby♨️@kimmonismus · 6月20日22

It all started just a few years ago. And then the liftoff really picked up speed. Via @EpochAIResearch

译一切始于几年前。然后，起飞真正加速了。 via @EpochAIResearch

Eric@ericmitchellai · 6月20日29

tired: model training wired: model selection

译过时：模型训练流行：模型选择

Nathan Lambert@natolambert · 6月20日39

Not enough people studying SFT methods. It’s a foundation of post training with limited literature that seems very serious in an empirical sense.

译主推文指出研究SFT方法的人仍然不足，尽管它是后训练的关键基础且实证文献有限。引用推文介绍了一项系统性研究：团队针对大量客户模型，在dense和MoE两类模型族（参数量达235B）上，每次只变动一个SFT杠杆，使用4个真实客户数据集，每个数据集配有与客户合作数周构建的评估，且训练输出直接为通过该评估生成，从而使监督目标与下游度量标准一致，消除了常见混淆因素。该研究旨在为微调提炼最佳实践。

Ethan Mollick@emollick · 6月19日59

More evidence, from a large-scale study in China, that using AI hurts learning if it undermines mental effort. When homework time drops due to AI use, so do test scores. Across studies, a theme: AI tutoring in support of classes is good, using AI to "help" with homework is bad.

译更多证据，来自中国一项大规模研究，表明如果使用AI削弱了心理努力，就会损害学习。当使用AI导致做作业时间减少时，考试成绩也会下降。综合各项研究，一个主题：AI辅导辅助课堂教学是好的，使用AI“帮助”做作业则是有害的。

elvis@omarsar0 · 6月19日51

// Automating SKILL.md Generation // Increasingly, mining sessions is one of the best ways to improve your agents. OpenAI released something similar yesterday that lets Codex package skills from interactions. (bookmark it) This paper explains a related approach. They run a three-stage pipeline that segments GUI trajectories, clusters them into candidate skills, and trains a skill-aware policy. The clusters are genuinely readable, with five of eight hitting 0.95 or higher purity against ground-truth workflow labels. But readability does not transfer. GRPO lifts skill-step accuracy only from 18.5% to 20.5%, leaves BrowseComp+ flat, and loses to trivial frequency priors. The authors name the three culprits: a weak boundary detector, an orderless segment representation, and an offline reward model. Paper: https://arxiv.org/abs/2606.20363 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译关键要点：OpenAI昨日为Codex推出了从交互中打包技能的类似功能；论文提出三阶段流水线（GUI轨迹分割→聚类候选技能→训练技能感知策略）。聚类纯度优异（5/8簇达0.95以上），但可读性未迁移：GRPO仅将技能步骤准确率从18.5%提至20.5%，在BrowseComp+上无改善，甚至输给简单频率先验。作者指出三个缺陷：弱边界检测器、无序片段表示、离线奖励模型。

Nathan Lambert@natolambert · 6月19日52

It'll come down to, if the U.S. labs don't want distillation they shouldn't have an API. Seems like eventually they'll do this for some models, and that's their choice to make. More onerous regulation wont really work and will hurt startups in the US.

译归根结底，如果美国实验室不想被蒸馏，他们就不该提供API。看起来他们最终会对某些模型这么做，这是他们的选择。更严格的监管实际上不会奏效，反而会伤害美国的初创公司。

Nathan Lambert@natolambert · 6月19日49

It's obvious that eventually a speedrun for RL will stick. I currently think the biggest bottleneck is price, as a individual entry currently has too much noise from instability of RL, so running multiple seeds makes it cost O($100). Glad to see attempts!

译Nathan Lambert 评论称 RL speedrun 终将成为常态，当前最大瓶颈是价格——单次 RL 实验因不稳定导致噪声大，多次种子运行成本约 100 美元。@jeankaddour 随后推出 Sokoban Speedrun 项目：基于 Karpathy 的 nanochat 流水线修改，用 RL 训练 Qwen3-4B-Instruct 解决 Sokoban 谜题，GRPO 基线在 8×H100 上仅需 87 分钟。该尝试展示低成本快速验证 RL 方法的潜力。

Jeff Dean@JeffDean · 6月19日49

My @Google colleagues @NormJouppi, Sridhar Lakshmanamurthy, Cliff Young, and David Patterson recently wrote a paper that will appear in the July/August 2026 edition of @ieeemicro titled "Google's Training Supercomputers from TPU v2 to Ironwood: Architectural Stability, Scale, Resilience, Power Efficiency, and Sustainability Across Five Generations". It's chock full of interesting data about the evolution of TPU chip generations, as well as how workloads at Google have transformed over time (hint: lots more transformer-based models!), and how the generations have gotten ~30X more energy efficient per flop. Lots of changes over these generations: Air cooling in TPUv2 to water cooling in TPUv3 onwards 2D to 3D torus-based interconnects 30X improvement TFLOPS/Watt 256 chips (TPUv2) to 9216 chips (Ironwood) per pod Read the full paper: https://arxiv.org/abs/2606.15870

译Jeff Dean 等 Google 同事发布论文，回顾 TPU v2 到 Ironwood 五代训练超算的演进，将于 2026 年 7/8 月发表于 IEEE Micro。关键变化：TPU v2 采用气冷，v3 起改为水冷；互联从 2D 升级为 3D torus；每 pod 芯片数从 256 增至 9216；每 flop 能效提升约 30 倍。此外，Google 内部工作负载已大幅转向基于 Transformer 的模型。

Epoch AI@EpochAIResearch · 6月18日23

Help shape how the world understands AI. We're hiring two designers at Epoch AI to turn complex research into dashboards and visualizations researchers and policymakers can easily use.

译Help shape how the world understands AI. We're hiring two designers at Epoch AI to turn complex research into dashboards and visualizations researchers and policymakers can easily use. 帮助塑造世界理解 AI 的方式。Epoch AI 正在招聘两名设计师，将复杂研究转化为研究人员和政策制定者易于使用的仪表盘和可视化。

Rohan Paul@rohanpaul_ai · 6月18日73

The Economist: AI has pushed the internet’s content machine into a new phase, with books, lawsuits, research papers, apps, and songs now being produced at volumes that old review systems were not built to handle. Amazon e-book releases rose from about 100,000 a month before ChatGPT-3.5 to roughly 300,000 by late 2025, and detection tools suggest AI-generated text drove much of that jump. US self-filed civil lawsuits doubled to 41,000 from 2023 to 2025, with 18% of sampled 2026 complaints flagged as AI-written, yet their success rate did not fall. Research is seeing the same pressure, as arXiv submissions keep rising, rejection rates have more than doubled since 2023, and one study found 57% of 2025 papers carried AI-influenced language, up from 12% in 2023. Coding agents have also changed software output, with new iOS App Store releases now above 100,000 a month after sitting below 50,000 last May. In Music production, 75,000 AI songs are arriving daily, up from 10,000, while 44% of new uploads are AI-made and 97% of listeners in one survey could not reliably tell the difference. --- economist. com/graphic-detail/2026/06/16/did-ai-write-this-article

译《经济学人》数据显示，AI大幅提升了内容产量。亚马逊电子书月发布量从ChatGPT-3.5前的约10万增至2025年底约30万，AI生成文本是主要推手。美国自行提交民事诉讼2023—2025年翻倍至4.1万，2026年样本中18%由AI撰写，成功率未降。arXiv论文拒稿率自2023翻倍，2025年57%论文带AI影响语言（2023年12%）。iOS App月发布量突破10万（此前低于5万）。音乐领域每日新增7.5万首AI歌曲（此前1万），44%新上传为AI制作，97%听众无法区分真伪。

Alibaba Cloud@alibaba_cloud · 6月18日18

🚀 Quick BI @quick68554 V6.2 has just been released—featuring 9 major core function upgrades designed to make AI truly serve actual business operations. 👉🏻 Click to view details→https://int.alibabacloud.com/m/1000414628/ #QuickBI #BusinessIntelligence #DataAnalytics #AIforBusiness #SmartQ

译🚀 Quick BI V6.2 刚刚发布——带来 9 项重大核心功能升级，旨在让 AI 真正服务于实际业务运营。 👉🏻 点击查看详情→https://int.alibabacloud.com/m/1000414628/ #QuickBI #BusinessIntelligence #DataAnalytics #AIforBusiness #SmartQ

Deedy@deedydas · 6月18日60

I thought this was a joke. Meta now has made 30-50% of software engineers on core teams become data labelers. Their job is "giving human feedback on AI-generated Github repos" in an org called Agent Data Optimization. Maybe we are all training data generators after all.

译我以为这是个玩笑。 Meta现在让核心团队中30-50%的软件工程师变成了数据标注员。他们的工作是在一个名为Agent Data Optimization的部门中"对AI生成的GitHub仓库提供人类反馈"。也许我们终究都是训练数据生成器。

Nathan Lambert@natolambert · 6月18日64

OpenAI just fixed their supposed "scaling pretraining problem"

译OpenAI 刚刚解决了他们所谓的“规模预训练问题”。

Epoch AI@EpochAIResearch · 6月18日41

How close is AI to automating AI R&D? Right now, the tools economists use to track automation are too blunt to say. In this week's newsletter, @datagenproc, @joemkwon, and @ansonwhho propose a sharper tool: a thorough taxonomy of 60+ tasks involved in frontier AI research. 🧵

译AI 距离自动化 AI 研发还有多远？目前，经济学家用于追踪自动化的工具过于粗糙。在本周的新闻通讯中，@datagenproc、@joemkwon 和 @ansonwhho 提出了一种更精细的工具：对前沿 AI 研究中 60 多项任务进行详细分类。🧵

OpenAI@OpenAI · 6月18日68

Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, LifeSciBench includes 750 expert-authored tasks across seven biological research workflows. https://openai.com/index/introducing-life-sci-bench/

译推出 LifeSciBench，一个用于衡量和改进 AI 如何支持现实世界生命科学研究的基准测试。该基准测试与 173 位来自生物技术和制药研究的科学家共同开发，包含 750 项专家编写的任务，覆盖七个生物学研究工作流程。

Ethan Mollick@emollick · 6月17日59

If the leaked financial data is right, OpenAI is profitable on serving customers with 40%+ gross margins. But training remains incredibly expensive. Automating AI research may also be a play to increase the efficiency of training: a superhuman researcher could do more with less.

译如果泄露的财务数据正确，OpenAI 在服务客户方面实现盈利，毛利率达 40% 以上。但训练成本仍然极其高昂。自动化 AI 研究也可能是一种提高训练效率的策略：超人类研究员可以用更少资源做更多事。

Nathan Lambert@natolambert · 6月17日28

I was not ready for this PPO vs GRPO debate. Here we go again. The truth is just that policy gradient good.

译我还没准备好面对这场PPO vs GRPO的辩论。又是老调重弹。事实就是策略梯度好。

Rohan Paul@rohanpaul_ai · 6月17日50

This was long needed for AI in finance. Making SEC filings readable for machines without flattening the accounting logic. Stanford + Univ of Calif + Nanjing Univ researcher has just released a dataset and methods for a cleaner way to turn SEC filings into useful LLM training data without losing the meaning inside financial tables. A 152B-token public snapshot and estimate the full archive could become about 550B tokens of long financial documents. Has less than 0.1% overlap with Common Crawl-derived corpora. The authors propose SEFD, a rebuilt version of EDGAR filings that keeps table structure, indentation, and financial meaning while using fewer tokens for LLM training. The dataset turns EDGAR into layout-faithful MultiMarkdown, preserving merged headers, indentation, signs, spans, and table hierarchy while shrinking enormous presentation scaffolding into usable tokens. ---- Link – arxiv. org/abs/2606.18192v1

译斯坦福、加州大学与南京大学研究人员发布SEFD数据集与方法，将SEC EDGAR文件转换为布局忠实的MultiMarkdown格式，保留合并表头、缩进、符号、跨度和表格层级，同时压缩冗余呈现模板，使财务表格的结构与会计逻辑可被LLM直接利用。公开152B token快照，估计完整档案约550B token长文档。该数据集与Common Crawl衍生语料重叠不足0.1%。

Rohan Paul@rohanpaul_ai · 6月17日55

This was long needed for AI in finance. Making SEC filings readable for machines without flattening the accounting logic. Stanford researcher has just released a dataset and methods for a cleaner way to turn SEC filings into useful LLM training data without losing the meaning inside financial tables. A 152B-token public snapshot and estimate the full archive could become about 550B tokens of long financial documents. Has less than 0.1% overlap with Common Crawl-derived corpora. The authors propose SEFD, a rebuilt version of EDGAR filings that keeps table structure, indentation, and financial meaning while using fewer tokens for LLM training. The dataset turns EDGAR into layout-faithful MultiMarkdown, preserving merged headers, indentation, signs, spans, and table hierarchy while shrinking enormous presentation scaffolding into usable tokens. ---- Link – arxiv. org/abs/2606.18192v1

译斯坦福研究者发布SEFD数据集与处理方法，将SEC EDGAR申报文件转化为适合LLM训练的结构化数据，保留表格结构、缩进、合并表头、符号、跨度及层级关系。公开快照包含152B token，完整档案约550B token。该数据与Common Crawl语料重叠度低于0.1%。采用布局保真的MultiMarkdown格式，大幅压缩原有演示框架，保留财务含义的同时减少token浪费。

Satya Nadella@satyanadella · 6月17日59

New Azure milestone. The fastest time to train yet at the largest reported scale for this leading AI training benchmark. A great example of what is possible when we bring together full-stack innovation across silicon, systems, networking, and software, along with our deep partnership with @nvidia, to advance the frontier of AI infra. https://techcommunity.microsoft.com/blog/AzureHighPerformanceComputingBlog/azure-sets-a-new-performance-record-for-llm-training-benchmark-at-extreme-scale/4523077

译新的Azure里程碑。在最大规模报告下，针对这一领先的AI训练基准，实现了最快的训练时间。这是当我们整合硅片、系统、网络和软件的全栈创新，并与@nvidia深度合作，推进AI基础设施前沿时所能实现的绝佳范例。 https://techcommunity.microsoft.com/blog/AzureHighPerformanceComputingBlog/azure-sets-a-new-performance-record-for-llm-training-benchmark-at-extreme-scale/4523077

SemiAnalysis@SemiAnalysis_ · 6月17日51

RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker https://newsletter.semianalysis.com/p/rl-systems-mind-the-gap-matching

译RL系统注意差距：匹配训练器与生成器吞吐量 RL训练基础设施，GRPO， PipelineRL，异步RL，策略陈旧性， RL沙箱基础设施，CPU需求， TCO分析，思考机器修补

SemiAnalysis@SemiAnalysis_ · 6月17日54

ALERT: OpenAI's CFO claims their next big training run will happen in Fall 2026 on Vera Rubin but that doesn't add up. Rubin NVL72 clusters likely won't be stable enough by then, and the software stack won't be mature enough to support a true "big training run." Rubin may be ready for production inference and small-scale training experiments, but not frontier-scale training in Fall 2026.

译警报：OpenAI的首席财务官声称他们的下一次重大训练运行将在2026年秋季在Vera Rubin上进行，但这说不通。Rubin NVL72集群届时可能还不够稳定，软件栈也不足以支持真正的“重大训练运行”。Rubin或许已准备好用于生产推理和小规模训练实验，但无法在2026年秋季支持前沿规模的训练。

Nathan Lambert@natolambert · 6月16日53

New podcast with @finbarrtimbers! We survey the latest post-training recipes, from GLM 5.1, Kimi K2.6, DeepSeek V4, Xiaomi MiMo V2.5, Nemotron Ultra, etc. and discuss: - Why the industry slowly shifted to multi-teacher on-policy distillation (MOPD). - What an Olmo-style recipe would need improvements in - How post-training works / suits larger organizational efforts - Career advice in the foothills of the singularity - and other topics I heard y'all wanted me to start doing this, so making some time when I'm in funemployment! Chapters: 00:00 Introduction & Olmo reflections 06:28 Post-train recipes review (history) 23:00 2026’s model recipes (MiMo Flash, DeepSeek V4, GLM 5, Kimi K2.6, etc.) 39:05 Open-ended post-training discussions 48:22 Career advice in the LLM race Links below, please follow @interconnectsai and like and subscribe and buy my book?

译Nathan Lambert与Finbarr Timbers合作推出新播客，系统调研GLM 5.1、Kimi K2.6、DeepSeek V4、Xiaomi MiMo V2.5、Nemotron Ultra等模型的最新后训练方法。核心讨论包括：行业转向多教师在线策略蒸馏（MOPD）的原因；Olmo风格配方需改进的方向；后训练如何适配大型组织；以及在AGI早期阶段的职业建议。播客章节涵盖历史回顾、2026年模型配方（MiMo Flash、DeepSeek V4、GLM 5、Kimi K2.6等）及开放式后训练讨论。

François Chollet@fchollet · 6月16日36

The way we will create a future where powerful AI is open-source and available to all is by making AI radically more efficient, both in terms of inference compute and (more importantly) in terms of training data requirements. This is what symbolic learning will achieve.

译我们将创造强大AI开源且人人可用的未来的方法，是让AI在推理计算和（更重要的）训练数据需求方面大幅提高效率。这正是符号学习将实现的目标。

Rohan Paul@rohanpaul_ai · 6月16日58

Pythagoras-Prover just made Lean theorem proving look far less dependent on giant models, with a 4B prover beating DeepSeek-Prover-V2-671B at MiniF2F Pass@32. Shows in formal reasoning, better data geometry can buy back an astonishing amount of scale. A theorem prover is not just a language model writing clever math; it is a machine trying to produce text that survives a compiler with no patience for style, confidence, or almost-right reasoning. The main trick is data efficiency: the team built about 800K Lean-verified examples, trained from easy to hard, then used LoRA so the model learned without updating every parameter.

译Pythagoras-Prover 团队发布最小定理证明器 4B 版本及首个扩散模型概念验证版，均仅 4B 参数。在 MiniF2F 测试中，4B 模型以 86.1% Pass@32 超越 DeepSeek-Prover-V2-671B；32B 版本达 89.8% Pass@32 和 92.6% Pass@2024，创当前最佳成绩。核心在于数据效率：构造约 80 万 Lean 验证示例，按易到难训练，并采用 LoRA 微调避免全参数更新。模型上下文窗口为 8192 tokens。模型、数据及训练流水线将陆续开源。

Nathan Lambert@natolambert · 6月16日56

I launched 3 more videos in my post-training course! 1. Lecture 5: The rise of reasoning models 2. Lecture 6: DPO derivation, intuitions, and practice 3. A Q&A from readers on lectures 1-4 rlhfbook dot com slash course More soon!

译我发布了后训练课程中的另外3个视频！ 1. 第5讲：推理模型的崛起 2. 第6讲：DPO推导、直觉与实践 3. 读者关于第1-4讲的问答 rlhfbook dot com slash course 更多即将到来！

凡人小北@frxiaobei · 6月16日51

看到 YC 发 Hub，我突然想到前阵子刷屏的那个视频。印度工厂里的工人，头上戴着摄像头工作。很多人当时调侃这是在训练自己的 AI 替代者。现在再看 Hub，可能是更大规模的开始。全人类正在成为世界模型的数据生产者。😂

译Y Combinator 发布的新项目 Hub（@hubxyz）为前沿 AI 实验室和机器人提供真实世界训练数据。Hub 指出：人类劳动力占全球 GDP 一半，但几乎从未被记录；它通过全球贡献者网络捕获难以访问的数据。主推文引用印度工厂工人头戴摄像头工作的视频，调侃这是在训练自己的 AI 替代者，现在看 Hub 可能是更大规模的开端——全人类正在成为世界模型的数据生产者。

Microsoft Research@MSFTResearch · 6月16日27

30x faster analytics, GPU kernels generated automatically from SQL, AI matched to lab-grown tumor models for cancer treatment, and LLMs that learn across tasks without retraining. Dive into the latest issue of Research Focus: https://msft.it/6010vcYZ4

译30倍更快的分析，从SQL自动生成的GPU内核，AI与实验室培育的肿瘤模型匹配用于癌症治疗，以及无需重新训练即可跨任务学习的大语言模型。深入探索最新一期Research Focus：https://msft.it/6010vcYZ4

Rohan Paul@rohanpaul_ai · 6月16日54

"You don’t need frontier scale to reach frontier quality" in specialized domains, you need the right expert feedback loop. Heidi says it matched Sonnet 4.6 in clinical search with a much smaller model trained on clinician preferences instead of raw scale. Heidi Evidence is a clinical search tool where doctors ask medical questions and get sourced answers. Here, clinicians were shown the same medical question with 2 anonymous answers, one from Heidi’s smaller model and one from Sonnet 4.6, and they picked Heidi’s answer 49.9% of the time. In medicine specifically, the hard problem is knowing when to search, what to cite, how much to say, and when a vague answer is worse than no answer.

译临床搜索工具 Heidi Evidence 表示，六周前其自研小模型在临床搜索任务中匹配了前沿规模模型 Sonnet 4.6 的质量。方法是通过临床医生的偏好反馈训练，而非单纯扩大模型规模。在匿名测试中，医生面对同一医学问题、两个匿名答案，选择 Heidi 小模型答案的概率为 49.9%。Heidi 指出，医学领域的关键难点在于知道何时搜索、引用什么、说多少，以及模糊答案何时比不回答更糟。