The Bank of Korea just released a report about AI productivity. Korean workers using generative AI cut task time by 3.8%, about 1.5hours/week on a 40-hour schedule, yet the link between saved time and more work completed was 0. Only 4.4% of tasks saw time savings above 20% This creates an AI productivity disconnect, where faster reports can create more reports, faster review can create more review, and saved time can get absorbed by organizational habits instead of becoming higher output.

译韩国银行刚发布了一份关于AI生产力的报告。使用生成式AI的韩国工人将任务时间缩短了3.8%，按40小时工作制计算约为每周1.5小时，然而节省的时间与实际完成更多工作之间并无关联。仅有4.4%的任务节省了超过20%的时间。这造成了AI生产力脱节：更快的报告可能催生更多报告，更快的审查可能催生更多审查，节省的时间可能被组织习惯吸收，而非转化为更高的产出。

Nathan Lambert@natolambert · 6天前43

I get feedback a lot that is like "your book should be the RL for LLMs book" or "the post-training book" and it's definitely true those would sell more copies. The reality is that this book was in many ways a side project, and by the time I realized I agreed with a bit of this I didn't have the time for *another* refactor. At the end of the day, I still dumped as much knowledge as I could from what I was doing into the book, and now the course and the code. In it's spirit the book is totally a post-training book. The process to change this would've delayed the book from anywhere from 3 to 15 months. It is simply an amount of time I didn't have with Interconnects, Olmo, and other life necessities. So this isn't to say that I'll never do it. Re-prints and new versions are a common thing. It's doable for me to refactor most of the chapters, re-write the introduction, and make it a post-training centric book. Still, RLHF as a topic deserves a dedicated text and is far from solved. It's a technology that skyrocketed language models to prominence and points to a lot of fundamental problems interfacing the user and the AI. Much of the content that got me to where I am today in my career is by diving into caring about this interface, so I'm happy for it to have the space to live, breath and thrive. So in reality, I probably could've hot-swapped the title to sell more copies, but it would have made me feel dishonest to do so. For anyone wanting to learn post-training, there's nothing in this book that doesn't apply to you -- post-training is just constantly evolving and growing in complexity. A final nitpick, is that RLHF actually matches my more conceptual, intuitive vibe a good amount. Post-training is far more practical, in a data and systems sense, where this is more of a math & intuition book. Anyways, the RLHF "post-training" Book is coming soon and thank you for trusting me with your attention. 🩵

译Nathan Lambert回应外界建议——他的《RLHF: Reinforcement Learning from Human Feedback》若改名“后训练”书籍会更畅销。Lambert承认内容本质正是后训练，但改名需重构3至15个月，因精力有限未做。他认为RLHF远未解决，值得独立成篇；该书侧重数学与直觉，后训练更偏数据与系统。他坚持原题以避免不诚实，并宣布“RLHF后训练书籍”即将出版。

AYi@AYi_AInotes · 6天前49

说句很扎心的，大部分人口中的学LLM，本质上只是在学怎么用别人做好的工具，连发动机的盖子都没掀开过。斯坦福CS336这门课最狠的地方，就是直接把盖子掀了，让你从零手搓一整套完整的LLM流水线，从分词、Transformer架构、GPU优化，到数据清洗、scaling laws、对齐技术，五个作业打穿全链路，讲座只是辅助，动手造才是核心。调包能快速出Demo，手搓才能获得系统直觉，看一百篇论文讲FlashAttention为什么快，不如自己用Triton实现一次印象深。跑十次别人的训练脚本，不如亲手处理一遍脏数据懂scaling的本质。很多人觉得没必要这么累，觉得会用就行，却不知道所有的天花板，本质上都是底层理解的不足，你对每一层组件越清楚，上层能做的设计空间就越大。 Knowledge is never kind，真正有价值的知识，获取过程必然伴随着挫败和耗时，信息早就摆在所有人面前了，差的从来不是资源，是愿意沉下心手搓一遍的执行力。想啃的直接从Assignment1开始，每周留够十五小时，三个月后你对LLM的理解会换一个层级。

译斯坦福CS336课程要求学生从零实现完整LLM流水线，覆盖分词、Transformer架构、GPU优化、数据清洗、scaling laws、对齐技术等核心环节。五个作业打穿全链路，强调手搓比调包更能获得系统直觉，例如用Triton实现FlashAttention比看论文印象深。课程无需前期深度背景，每周投入约十五小时，三个月即可建立对LLM底层理解的系统性认知。知识获取伴随挫败，但执行力是拉开差距的关键。

Ethan Mollick@emollick · 6天前46

Finally, AI finds its ultimate uncontroversial use. A diffusion model trained on burger recipes "discovers the classic Big Mac without explicit supervision and generates novel burgers optimized for deliciousness, sustainability, or nutrition." ASI= automated slider intelligence

译终于，AI找到了其终极无争议用途。一个基于汉堡食谱训练的扩散模型“在没有显式监督的情况下发现了经典巨无霸，并生成了针对美味、可持续性或营养优化的新型汉堡。” ASI= automated slider intelligence

AK@_akhaliq · 6天前28

DanceOPD On-Policy Generative Field Distillation

译DanceOPD 策略内生成场蒸馏

Microsoft Research@MSFTResearch · 6天前41

Following up with the social copy I’ve drafted: What do people actually do with AI at work? A new analysis of five million M365 Copilot conversations has answers. Scott Counts breaks it down in a new video. And dive into the analysis here: https://msft.it/6015vUHsh

译跟进我起草的社交文案：人们在工作中的 AI 到底用来做什么？一项对五百万次 M365 Copilot 对话的新分析给出了答案。Scott Counts 在一段新视频中进行了详细解读。点击此处深入了解分析：https://msft.it/6015vUHsh

SenseTime@SenseTime_AI · 6天前60

𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗰𝗼𝗱𝗲 𝗶𝘀 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 — 𝗳𝘂𝗹𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘀𝘁𝗮𝗰𝗸, 𝗶𝗻𝘀𝗽𝗲𝗰𝘁𝗮𝗯𝗹𝗲, 𝗺𝗼𝗱𝗶𝗳𝗶𝗮𝗯𝗹𝗲, 𝗿𝗲𝗯𝘂𝗶𝗹𝗱𝗮𝗯𝗹𝗲. Also released: a smoke-test dataset spanning all 7 task types — t2i · it2i · it2i (multi-img) · interleave_gen · multimodal understanding · video understanding · pure language continuation Use it to: 🔹Bring your own data in this schema to fine-tune U1 into a specialist 🔹Validate your data against the official schema 🔹Smoke-test your pipeline end-to-end 🤗 https://huggingface.co/datasets/sensenova/SenseNova-U1-Training-Sample 🛠️https://github.com/OpenSenseNova/SenseNova-U1 Sample previews demonstrating the diverse task coverage included in our open-source smoke-test dataset. 👇

译商汤开源 SenseNova U1 完整训练代码，提供可检查、可修改、可重建的完整训练栈。同步发布 smoke-test 数据集，覆盖 t2i、it2i、多图输入、交错生成、多模态理解、视频理解、纯语言续写 7 种任务类型。用户可基于该 schema 用自有数据微调 U1，或验证数据格式及端到端测试 pipeline。数据集已上架 HuggingFace，代码托管于 GitHub。

François Chollet@fchollet · 6天前48

Autonomy isn't the ability to act without human supervision. It's the ability to *learn* without human bottlenecks in the process. A system that is fully dependent on human training data and RL environments is only an imprint of human knowledge.

译自主性不是在没有人类监督的情况下行动的能力。而是在过程中没有人类瓶颈的情况下*学习*的能力。一个完全依赖人类训练数据和RL环境的系统，只是人类知识的印记。

Alibaba Cloud@alibaba_cloud · 6天前40

At Flink Forward Asia Shenzhen 2026, Feng Wang, Researcher and Head of Open Data Platform at Alibaba Cloud & Vice Chair of the Alibaba Open Source Committee, highlighted the evolving data foundation for AI: "In the AI era, models and data together determine the quality and efficiency of Agents. Apache Flink evolves into Agentic Streaming for AI, working alongside Agentic Lake to build AI-native data platform." The next generation of intelligent agents is built on a unified, AI-native data infrastructure designed for real-time agentic workflows. #AlibabaCloud #ApacheFlink #ApachePaimon #ApacheFluss #DataAI #AI #Agent #RealTimeData

译在深圳举办的Flink Forward Asia 2026上，阿里云研究员、开放数据平台负责人Feng Wang指出，AI时代模型与数据共同决定Agent质量与效率。Apache Flink演进为Agentic Streaming for AI，与Agentic Lake协同，构建AI原生数据平台。下一代智能体建立在统一、实时的AI原生数据基础设施之上。

Alibaba Cloud@alibaba_cloud · 6天前46

At Flink Forward Asia Shenzhen 2026, Feifei Li, CTO of Alibaba Cloud and President of International Business, shared his perspective on the future of AI: "As the agent era takes off, one concept will dominate—Data Gravity. AI must tackle complex work and, more importantly, create tangible value in real enterprise workflows." AI isn't just about smarter models—it's about solving complex enterprise challenges and delivering real business value. #AlibabaCloud #ApacheFlink #ApachePaimon #ApacheFluss #DataAI #AI #Agent #RealTimeData

译在2026年深圳Flink Forward Asia大会上，阿里云CTO兼国际业务总裁李飞飞分享了对AI未来的看法：随着智能体时代兴起，“数据引力”（Data Gravity）将成为主导概念。AI不仅要处理复杂工作，更需在企业实际工作流中创造切实价值，解决复杂企业挑战并交付真实业务成果。

Rohan Paul@rohanpaul_ai · 6天前43

GLM 5.2 just took the top spot on PostTrainBench by scoring 34.29%. PostTrainBench tests whether an AI agent can take a raw LLM and make it better by actually training it, not by answering the benchmark questions itself. The agent gets 4 small base models, 1 H100 GPU, and 10h, then it must choose training data, write training code, run experiments, fix broken runs, and submit improved versions of those models. So in this case, GLM 5.2 was the agent model controlling the training process, so PostTrainBench did not score GLM 5.2’s own answers; it scored whether GLM 5.2 could take 4 weaker LLMs and improve them within 10h on 1 H100. The gap to official instruct models, which score 51.14%, still shows how far agents are from mature post-training pipelines built with more data, compute, and human tuning. GLM 5.2’s job was to write training code, pick or make training data, run fine-tuning, fix failed runs, and submit the newly trained models for testing.

译GLM 5.2 以 34.29% 得分在 PostTrainBench 上排名第一。该基准测试 AI 智能体能否实际训练改进原始 LLM：智能体拿到 4 个小基座模型、1 块 H100 GPU 和 10 小时，需自主选择训练数据、编写训练代码、运行微调、修复失败并提交改进后模型。GLM 5.2 作为控制训练流程的智能体，评测其能否在限定条件下提升 4 个较弱 LLM。当前官方指令模型得分 51.14%，显示智能体后训练流程与更成熟的人工调优仍有差距。

凡人小北@frxiaobei · 6天前20

随着大模型用户的增长和活跃，内蒙、宁夏、新疆的算力中心消耗大量的水资源，导致北方和西北的降雨量增加。

译随着大模型用户增长，内蒙、宁夏、新疆的算力中心消耗大量水资源，导致北方和西北降雨量增加。据引用数据，AI 算力每年消耗 230 亿立方米淡水，仅问豆包 AI 10 个问题就会消耗约 500 毫升水。

Nathan Lambert@natolambert · 6天前47

Is what happens when the world becomes AGI pilled then both the leading lab and the government tell you you need to bow down if you want access to their models. I feel it too. More of the last few weeks giving people the words to explain how they felt for months.

译Nathan Lambert评论称，当世界被AGI说服后，领先实验室和政府开始要求用户“低头”才能使用其模型。他注意到过去几周明显变化：大量大型企业寻求确保计算资源，并基于GLM-5.2在内部进行后训练。这一趋势显示开源模型正在赢得企业信任，人们开始理解开源如何取胜。

MiniMax (official)@MiniMax_AI · 7天前33

we’ll be at AI Engineer After Dark on July 1st with @vercel , @merge_api, @FactoryAI , and a room full of people building the AI engineering stack. our Research Lead, RL Training @olive_jy_song will be giving a lightning talk on post-training MiniMax M3 as part of a lineup featuring @browserbase , @neondatabase , @FireworksAI_HQ , @ExaAILabs , @sentry , @SurrealDB, and more. see you after dark at SFMOMA.

译我们将于7月1日参加AI Engineer After Dark活动，与@vercel、@merge_api、@FactoryAI以及众多构建AI工程栈的开发者们相聚。我们的研究负责人、RL训练专家@olive_jy_song将发表一场关于MiniMax M3后训练的闪电演讲，同场阵容还包括@browserbase、@neondatabase、@FireworksAI_HQ、@ExaAILabs、@sentry、@SurrealDB等。夜幕降临后，旧金山现代艺术博物馆见。

elvis@omarsar0 · 7天前41

New research from Meta. Building synthetic training data has stayed a fixed pipeline that you hand-tune and then freeze. Autodata casts an AI agent as a data scientist that builds training and evaluation data, with an implementation called Agentic Self-Instruct that extends classic Self-Instruct with agentic planning and tool use. Think of it as meta-optimization, where the data scientist agent is itself trained to produce stronger data, so the pipeline keeps improving instead of staying static. Across computer science research, legal reasoning, and reasoning over mathematical objects, it beats classical synthetic-data methods, and meta-optimizing the agent delivers an even larger uplift. Paper: https://arxiv.org/abs/2606.25996 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译Meta 发布新研究 Autodata，提出 Agentic Self-Instruct 方法。该方法将 AI 智能体视为数据科学家，通过智能体规划与工具使用，替代传统手工调优后固定的合成数据流水线。该智能体自身可通过元优化持续改进，从而生成更强训练数据。实验在计算机科学、法律推理、数学对象推理三个领域均超越经典合成数据方法，且元优化带来更大提升。论文见 arxiv。

Rohan Paul@rohanpaul_ai · 7天前49

AI CXOs are the next enterprise software war. Fairgen just launched "AI Chief Insights Officer", an AI research system that turns real consumer data into queryable simulated respondents for super fast brand, product, pricing, and ad decisions. Fairgen thinks that many early decisions do not need a full $5K to $200K study if teams can get a reliable directional signal in 20 minutes. So Fairgen Twin is meant to behave like one real respondent, built from interviews, surveys, transactions, reports, and panel data rather than from a generic chatbot persona. Fairgen’s answer is daily fresh data collection, plus a 6-dimension quality gate checking logic, fidelity, tone, plausibility, engagement, and numerical coherence.

译Fairgen 推出“AI Chief Insights Officer”，其核心产品 Fairgen Twin 基于每月 10 万次真实访谈为每位消费者构建 1:1 数字孪生。用户可筛选特定人群，在 20 分钟内完成定价、概念、广告测试并生成完整分析报告，替代传统 5 千至 20 万美元的研究。数据来源包括访谈、调查、交易、报告和面板数据，而非通用角色。系统通过 6 维质量门控（逻辑、忠实度、语调、合理性、参与度、数值一致性）确保输出质量。欧莱雅、T-Mobile 等品牌已使用四年，现正式向公众开放。

Lilian Weng@lilianweng · 7天前44

A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason about the optimal compute allocation between data and model size before committing to a large run. The post covers what scaling laws predict, how compute-optimal allocation works, why Kaplan et al. and Chinchilla disagree, and how data limits + fitting details make extrapolation tricky. https://lilianweng.github.io/posts/2026-06-24-scaling-laws/

译一篇超级久拖（3年多了？）的关于缩放定律的博文。计算成本高昂。缩放定律是一种帮助我们在大规模运行之前，推理数据与模型大小之间最优计算分配的方法。此文涵盖缩放定律预测了什么、计算最优分配如何运作、Kaplan 等人与 Chinchilla 的分歧点何在，以及数据限制+拟合细节如何让外推变得棘手。 https://lilianweng.github.io/posts/2026-06-24-scaling-laws/

swyx 🔜 @aiDotEngineer@swyx · 7天前47

on their @latentspacepod we covered how @pimdewitte accidentally made the PERFECT world model data collection business by collecting the world's largest dataset of trainable (video,action) pairs. turning the attention economy into the attention industry. congrats Pim!! link in reply

译PimDeWitte在Latent Space播客上分享，其公司意外打造了世界最大的可训练（视频，动作）对数据集，将注意力经济转化为注意力产业。同时，今日宣布完成3.2亿美元A轮融资，估值23亿美元，由Khoslaventures领投，General Catalyst、Jeff Bezos、Eric Schmidt、Nico Rosberg及前沿实验室和学术界研究员参投。

Rohan Paul@rohanpaul_ai · 7天前48

A firm’s judgment does not live in its archives; it lives in the changes (diffs) senior people make before work ships. Farsight calls this a system of judgment, i.e software that preserves those edits across real work, can turn repeated decisions into measurable rules. The next enterprise AI moat is not stored knowledge, but stored judgment. If AI is going to learn the work of professional firms, it cannot learn only from polished outputs. AI needs the gap between first draft and final draft, because that gap contains the firm’s private standard of what good means.

译Rohan Paul 引用 @TangriKunal 指出，机构知识长期依赖文档索引，但文档只是判断的产出物，判断本身存在于资深员工交付前修改的差异（diffs）中，而多数企业丢弃了这些印记。Farsight 将此过程定义为“系统 of Judgment”，即通过软件保存真实工作中的编辑，将重复决策转化为可衡量规则。Paul 认为企业 AI 的下一个护城河不是存储的知识，而是存储的判断——AI 需要学习初稿与终稿之间的差距，因为那里藏着企业的好标准。

Chubby♨️@kimmonismus · 7天前57

Apple is not the only hardware company getting hit by AI-driven component inflation. ASUS has warned of price hikes as DRAM and SSD costs surge. Dell and HP are under margin pressure as AI PCs require more memory while memory prices rise. Alibaba Cloud raised prices by up to 34%, citing global AI demand and higher costs for GPUs, CPUs, memory and storage. The trend seems to be reflected everywhere. And there's no end in sight.

译苹果并非唯一受 AI 驱动零部件涨价影响的硬件公司。华硕因 DRAM 和 SSD 成本飙升警告涨价；戴尔和惠普利润率承压（AI PC 需更多内存，且内存价格攀升）。阿里云提价最高 34%，归因全球 AI 需求推高 GPU、CPU、内存和存储成本。此前苹果已大幅涨价，满配 16 英寸 MacBook Pro 现售价 9999 美元。该趋势普遍蔓延，且尚无结束迹象。

Rohan Paul@rohanpaul_ai · 7天前47

Very important Meta paper brings Autodata, an agentic data scientist to create high quality synthetic data. The main result is that agent-made data usually trained models better than standard synthetic data, and in legal tasks a trained 4B model beat a much larger 397B baseline. Treats synthetic data generation as a job for an agentic data scientist, not a prompt template. “Agentic Self-Instruct,” makes AI agents generate and meta-optimize synthetic training and evaluation data, improving performance over classical synthetic data methods across CS, legal, and math benchmarks. Autodata’s loop is simple: generate an example, let a weak model and a strong model try it, judge the results, then revise the recipe until the example sits in the useful zone. This is the best idea in the paper: difficulty is not a virtue by itself. A task should not just be “hard”; it should be hard in a way that teaches the weaker model something. If the weak model always gets it right, there is nothing to learn; if it always gets zero, there is also nothing to learn. --- The direction feels important because it reframes synthetic data from bulk imitation into curriculum design. The next frontier may not be models writing more examples, but models learning what makes an example worth learning from. ---- Link – arxiv. org/abs/2606.25996v1 Title: "Autodata: An agentic data scientist to create high quality synthetic data"

译Meta提出Autodata，将合成数据生成视为智能体数据科学家的任务。核心方法“Agentic Self-Instruct”让AI智能体生成并元优化合成训练与评估数据。循环流程：生成示例→弱模型与强模型分别尝试→判断结果→修订配方直至示例处于有用区间。论文强调难度不是美德，示例应针对弱模型的学习点。关键结果：在法律任务上，4B模型训练后超越了更大的397B基线。

Chubby♨️@kimmonismus · 7天前52

This is an interesting update that Anthropic published in their official letter. But in short: they apparently haven't managed to stop the distillation. It's continuing almost seamlessly, just like before.

译这是Anthropic在其官方信件中发布的一个有趣更新。但简而言之：他们显然未能阻止蒸馏。蒸馏几乎无缝地继续，和以前一样。

Emad@EMostaque · 7天前39

What is the difference between distilling Mythos & paying experts somewhere like @mercor_ai to generate training samples? Who are the Chinese equivalents of @HelloSurgeAI & co? If you understand this you'll understand the likely dynamics of frontier AI going forward.

译Emad Mostaque发文探讨前沿AI的走向：蒸馏Mythos与付费专家生成训练样本有何区别？谁是中国对标公司？引用的推文指出，从Opus 4.8蒸馏可达到Mythos水平，意味着GLM 5.3也能独立于Anthropic做到这一点。一旦华为token工厂上线，局面将变得有趣。

Rohan Paul@rohanpaul_ai · 7天前49

Great Stanford + MIT + Harvard + Anthropic paper. Gives a clear training-based reason for why larger models learn abilities smaller models miss. Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals. The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts. Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge. In a crowded data mixture, common patterns get first claim on the model’s internal machinery. Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again. They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters. The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less. Larger models can remember weak rare signals long enough to turn them into real learned skills. ---- Link – arxiv. org/abs/2605.29548 Title: "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"

译Stanford、MIT、Harvard与Anthropic联合论文从训练层面解释大模型能力更强的原因：大模型遗忘更少，额外容量保护了弱学习信号。常见任务优先占据神经元，罕见任务在出现足够次数前被覆盖。小模型可能短暂捕捉罕见信号，但随后被常见任务更新覆盖。实验使用OLMo模型（4M到4B参数），结果显示大模型更好掌握低频任务，保留更多任务特征，梯度干扰更小。

meng shao@shao__meng · 7天前59

A 社指控阿里千问 25000 个伪装账号来蒸馏 Claude 这个数量比之前指控 DeepSeek、MiniMax 和 Kimi 加起来好像都多懂了，都传出去：Qwen 3.8 值得期待 😂 A 社的蒸馏账号报告，也是一种 Benchmark。。。

译Anthropic 指控阿里千问用 25000 个伪装账号来蒸馏 Claude 这个数量比之前指控 DeepSeek、MiniMax 和 Kimi 加起来好像都多懂了，都传出去：Qwen 3.8 值得期待 😂 Anthropic 的蒸馏账号报告，也是一种 Benchmark。。。

Alibaba Cloud@alibaba_cloud · 7天前28

⏳ 1 Day to Go! Flink Forward Asia 2026 — Main Forum Agenda Revealed! FFA2026 brings together world-class tech leaders to showcase the full picture of real-time data intelligence — from Alibaba Cloud's Agent-native to AI-native evolution, and from automotive to embodied AI industry scenarios. 📅 June 26–27 | InterContinental Shenzhen OCT, Shenzhen ⚡ Limited seats available — don't miss out! https://hd.aliyun.com/form/8369 📝 Note: All sessions will be conducted in Mandarin Chinese. #FlinkForwardAsia #FFA2026 #ApacheFlink #RealTimeData #DataAI #AIAgent #AgenticAI #StreamingData #OpenSource #AlibabaCloud #EmbodiedAI #DataEngineering

译距开幕1天，阿里云正式公布Flink Forward Asia 2026主论坛议程。大会聚焦实时数据智能，展现从Agent-native到AI-native的阿里云演进路径，并覆盖汽车、具身AI等产业场景。会议将于6月26-27日在深圳华侨城洲际酒店举行，所有演讲均以中文进行。席位有限，需提前报名。

Deedy@deedydas · 7天前51

We learn 3 things from this: 1. All AI models write extremely differently from humans 2. AI models write in very different ways from each other 3. "Humanized" AI text is distinguishable from both Coolest interpretability result in AI I've read today.

译一项可解释性研究发现：Pangram 在内部表示中学会区分 Claude、ChatGPT 和 Gemini 的写作风格，即使未经专门训练。该信号在网络中逐渐增强，通过简单线性探针即可达到 91% 准确率。主推文据此总结三点：所有 AI 模型写作与人类差异极大；不同 AI 模型间写作风格迥异；"人性化" AI 文本仍可被区分。

宝玉@dotey · 7天前72

Anthropic 今天正式致信美国参议院银行委员会和白宫，指控阿里巴巴旗下的通义千问（Qwen）AI 实验室对 Claude 发动了迄今为止规模最大的蒸馏攻击。根据 CNBC 和 Reuters 获取的信件内容，通义千问关联方在 4 月 22 日到 6 月 5 日期间，通过大约 25,000 个虚假账号与 Claude 进行了超过 2880 万次交互。攻击的目标很明确：Claude 最核心的软件工程和 Agent 推理能力。这个数字放到上下文里才知道有多夸张。今年 2 月，Anthropic 公开点名过 DeepSeek、MiniMax 和 Moonshot AI 三家，说它们用大约 24,000 个假账号总共产生了 1600 万次交互。阿里巴巴一家的量，接近之前三家加起来的两倍。所谓蒸馏攻击（adversarial distillation），简单说就是拿别人家的顶级模型当老师，大规模喂它问题，收集回答，再用这些回答训练自己的模型。这样做的好处是可以跳过数百万甚至数十亿美元的独立研发成本，快速接近对手的能力水平。Claude 在中国是不可用的，所以这些操作本身就违反了 Anthropic 的服务条款和地域限制。 Anthropic 在信中写道，这些蒸馏攻击是系统性的、工业化规模的，目的是收割美国 AI 能力，然后重新包装成自己的产品。阿里巴巴对此未予置评。这件事的时机很微妙。Anthropic 现在跟特朗普政府的关系并不好。就在 6 月 12 日，商务部以国家安全为由下令 Anthropic 停止向所有外国人提供其最新的 Fable 5 和 Mythos 5 模型的访问权限，包括 Anthropic 自己的外籍员工。Anthropic 不得不在全球范围内关闭这两个模型，到现在还没恢复。Anthropic 公开表示不同意这一决定，认为一个"狭窄的潜在越狱漏洞"不应该成为召回已部署给数亿用户的商业模型的理由。所以 Anthropic 现在的处境相当拧巴：一边在跟华盛顿说中国公司在偷我们的技术，请帮忙，一边在跟同一个政府争论你不该限制我们的模型。不过国会两党在这件事上倒是有共识。参议员 Bill Hagerty（共和党）和 Andy Kim（民主党）计划在必须通过的国防授权法案中提出修正案，对被发现非法获取美国 AI 模型输出的中国公司实施制裁或列入黑名单。Anthropic、OpenAI 和 Google 三家也已经联合起来，共享蒸馏攻击的情报信息。 Anthropic 目前估值 9650 亿美元，本月已秘密提交了 IPO 申请，最早可能今年秋天上市。蒸馏攻击对一家即将上市的公司来说是实打实的商业威胁，中国竞争对手用极低的成本复制出接近的产品，直接侵蚀市场空间。

译Anthropic 致信美国参议院银行委员会和白宫，指控阿里通义千问（Qwen）关联方在 4 月 22 日至 6 月 5 日通过约 25,000 个虚假账号与 Claude 产生超 2880 万次交互，实施蒸馏攻击，目标锁定软件工程和 Agent 推理能力。此前 2 月 Anthropic 曾点名 DeepSeek、MiniMax、Moonshot AI 三家共 1600 万次交互。同时美国商务部以国家安全为由限制其 Fable 5 和 Mythos 5 模型对外国人提供。国会两党计划在国防授权法中提出修正案，对非法获取美国 AI 模型输出的中国公司实施制裁。Anthropic 估值 9650 亿美元，已秘密提交 IPO 申请。

Berryxia.AI@berryxia · 7天前68

笑死！Anthropic 紧急给美国白宫提交申请，说阿里巴巴在疯狂的蒸馏他们的模型。 1️⃣创建了近 25,000 个假账户 2️⃣与 Claude 进行了 28.8 百万次对话 3️⃣在2026 年 4 月 22 日至 6 月 5 日之间这种蒸馏的方法就是用你竞争对手模型的回答来训练你自己的模型。 Anthropic 已将 Claude 在中国屏蔽。阿里巴巴还是找到了绕过它的方法。话说你A社不蒸馏其他模型和别的数据，那你中国的数据哪里来的哈哈哈😂 真的是会哭的孩子有奶吃！一天天就是block这个，那个。

译Anthropic 向美国白宫提交申请，指控阿里巴巴通过创建近 25,000 个假账户，在 2026 年 4 月 22 日至 6 月 5 日期间与 Claude 进行了约 2,880 万次对话，以提取模型能力用于知识蒸馏（即用竞争对手模型的输出来训练自己的模型）。Anthropic 已在中国屏蔽 Claude，但阿里巴巴仍找到了绕过方法。推文同时质疑 Anthropic 自身训练数据的来源。

Chubby♨️@kimmonismus · 6月25日68

Anthropic claims: Alibaba continues to distill Claude on a large scale to train Qwen. Via Bloomberg Anthropic is accusing Alibaba-linked operators of running a massive campaign to illicitly access Claude through nearly 25,000 fraudulent accounts. According to Bloomberg, Anthropic claims the campaign generated 28.8 million Claude exchanges between April and June, targeting capabilities like software engineering and agentic reasoning. The company says this is part of a broader pattern of “adversarial distillation,” where Chinese labs allegedly harvest outputs from US frontier models to train rival systems at a fraction of the cost. Lets see how good Qwen 3.8 will be, probably FABLEous good.

译Anthropic通过Bloomberg指控，与阿里巴巴相关的运营方利用近25,000个欺诈账户非法访问Claude，在4月至6月期间生成了2880万次Claude交互，目标聚焦于软件工程和智能体推理能力。Anthropic称这是“对抗性蒸馏”模式的一部分，中国企业实验室据称以极低成本从美国前沿模型获取输出以训练竞品系统。该指控矛头直指Qwen系列模型的训练来源。

swyx 🔜 @aiDotEngineer@swyx · 6月25日52

LOTS of alpha in this pod: - Why Databricks beat Snowflake (! a straight answer!) - Why everyone is building a metaharness now - Why the @neondatabase made so much sense (so much @nikitabase glazing its not even funny) - How LTAP solves the HTAP dream I discussed with @ankrgyl in our @braintrust pod - What happened to @MosaicML + DBRX - How to maintain research/startup culture in a $175B megacorp - What's more important knowledge/experience in the race to the agent cloud: databases, operating systems, or.... networking! very honored to be invited to @Data_AI_Summit to interview two of the top people in our industry and somehow be able to jam on everything from the @bennstancil modern data stack theme to @alighodsi's amazing keynote aura

译swyx 在 Data+AI Summit 上采访了 Databricks 联合创始人 Matei Zaharia 和 Reynold Xin。访谈亮点包括：Databricks 为何击败 Snowflake；行业正纷纷构建“元 harness”（共享智能体框架）；LTAP 与 Lakebase 重新思考操作型与分析型数据库划分，解决 HTAP 愿景；Omnigent 为编码智能体和自定义智能体提供统一框架；智能体安全需要上下文策略与支出控制；MosaicML 与 DBRX 的后续；在 1750 亿美元大公司中维持研究/创业文化；以及在智能体云竞赛中数据库、操作系统与网络的重要性。核心观点：未来软件只需让数据就绪，智能体置于其上。

Rohan Paul@rohanpaul_ai · 6月25日48

This is so cool. Tetsuwan is building a browser-based cloud biology lab. Because, AI can now generate hypotheses faster than wet labs can test them - So with Tetsuwan, the researcher uploads or describes a protocol, adds variables like samples, volumes, concentrations, treatments, and instrument settings, and ResearchOS turns that into an editable experiment specification. - That specification is then compiled into a robot-executable automation script; Tetsuwan also says its PDL and VDL languages capture the procedure and variable context, while Ariadne turns the finalized syntax into robot instructions. - Before the run, the user can review and simulate the protocol, then the cloud lab executes it without the researcher physically entering the lab. Software became powerful because instructions could be copied, inspected, rerun, and improved without depending on the hand of the person executing them. Wet-lab biology has never fully had that luxury, but now things are changing.

译AI 生成假设的速度已超过湿实验室验证能力。Tetsuwan 构建了浏览器端云生物学平台：研究者上传或描述实验协议，添加样本、体积、浓度、处理条件、仪器设置等变量，ResearchOS 将其转化为可编辑的实验规范，再编译为机器人可执行脚本（PDL/VDL 语言捕获流程与变量上下文，Ariadne 转为机器人指令）。用户可远程审查与模拟，随后云实验室自动执行，无需进入物理实验室。平台经两年试点验证，今年晚些时候将上线首项服务，聚焦蛋白质设计功能筛选。

Rohan Paul@rohanpaul_ai · 6月25日70

Startupfortune: Big Tech has shed $2.7T in market value this month. The main concern was about capex. The big labs are projected to spend around $725B on capital expenditure in 2026, a 77% jump from $410B last year. Goldman Sachs goes further, and reporting that it expects AI spending by those labs to hit $5.3T by 2030. --- startupfortune. com/big-tech-has-lost-27-trillion-in-market-value-this-month-as-investors-question-whether-ai-can-pay-for-itself/

译Startupfortune：大型科技公司本月市值蒸发2.7万亿美元。主要担忧是资本支出。大型实验室预计2026年资本支出约7250亿美元，较去年的4100亿美元增长77%。高盛更进一步，报告称预计到2030年这些实验室的AI支出将达到5.3万亿美元。

OpenAI@OpenAI · 6月24日63

We’ve designed and built our first AI chip: Jalapeño. Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products. Chips are foundational to the AI economy. Building our own expands our full-stack platform from products to models to infrastructure, and will help us scale intelligence, serve more people, and expand access to AI.

译我们设计并制造了首款 AI 芯片：Jalapeño。由 OpenAI 从零设计，并与 @Broadcom 合作投入生产，Jalapeño 专为支撑 ChatGPT、Codex、API 及未来智能体产品的 LLM 工作负载而打造。芯片是 AI 经济的基础。自建芯片扩展了我们从产品到模型再到基础设施的全栈平台，并将助力我们扩展智能、服务更多人、扩大 AI 的普及。

Ant Ling@AntLingAGI · 6月24日41

Great breakdown from Qian. In our recent UFP4 paper, we show that a uniform-grid FP4 recipe achieves lower BF16-relative loss degradation than strong E2M1 baselines across Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining. Full paper: https://arxiv.org/abs/2606.20381

译蚂蚁百灵发表UFP4论文，提出均匀网格FP4训练配方。在Dense 1.5B、MoE 7.9B和MoE 124B长程预训练中，该配方相比强E2M1基线实现了更低的BF16相对损失退化。论文指出，配合细粒度缩放和RHT后，FP4训练的瓶颈从动态范围转向局部分辨率，E1M2/INT4格式能更好利用RHT改进的桶分配，而E2M1可能使RHT有害。论文地址：https://arxiv.org/abs/2606.20381

Ant Ling@AntLingAGI · 6月24日53

We recently released a paper showing that UFP4, our uniform-grid FP4 training recipe, stays closer to BF16 than strong E2M1 baselines across Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining. The key insight: FP4 training quality is not only about bit width, but also grid geometry.

译我们最近发布了一篇论文，表明UFP4，我们的均匀网格FP4训练方案，在密集1.5B、MoE 7.9B和MoE 124B长程预训练中，比强E2M1基线更接近BF16。关键洞察：FP4训练质量不仅与比特宽度有关，还与网格几何有关。

Rohan Paul@rohanpaul_ai · 6月24日46

New Microsoft paper argues that transformers generalize better when they learn compact internal states, not just next tokens. The problem is that normal transformers can look back at every earlier token, so they do not have to squeeze the past into a clean summary. token prediction alone can reward shortcuts that do not become coherent world models. That can work beautifully on familiar data and still fail when the model has to plan, detour, reason, or carry a hidden structure forward. NextLat fixes this by adding a training task where the model must predict its next hidden state, not just the next word. A hidden state is the model’s private summary of what it has seen, so predicting the next one pushes the model to learn how situations change over time. The authors tested this on map-like world modeling, math reasoning, graph planning, story prediction, and regular language modeling. The main result is that NextLat often learned more compact and useful internal states, solved planning tasks better, and sped up generation by up to 3.3x. Overall, it gives transformers some of the useful memory behavior of recurrent models without changing the transformer architecture or slowing normal inference. ---- Link – arxiv. org/abs/2511.05963 Title: "Next-Latent Prediction Transformers Learn Compact World Models"

译微软新论文Next-Latent Prediction (NextLat) 提出一种自监督学习方法，在常规token预测基础上增加预测下一隐藏状态的任务，迫使Transformer学习紧凑的内部世界模型。该方法在地图式世界建模、数学推理、图规划、故事预测等任务上表现更优，生成速度通过自推测解码最高提升3.3x，且无需改变Transformer架构或减慢正常推理。

SemiAnalysis@SemiAnalysis_ · 6月24日62

Meta leadership voting on a motion to re-allocate 7k engineers to data labelling org

译Meta领导层正在投票一项动议，将7000名工程师重新分配至数据标注部门。

Rohan Paul@rohanpaul_ai · 6月24日52

VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO. Unusually strong for its size: with only 3B parameters, 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on recent unseen LeetCode contests. "places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2" They start from a 3B Qwen2.5-Coder base model, then train it with carefully filtered hard examples, multi-solution supervised training, reinforcement learning on math/code/STEM tasks with verifiable rewards, self-distillation, instruction-focused RL, and a test-time answer-checking method called CLR.

译VibeThinker是一个仅3B参数的推理模型，采用SFT+GRPO训练，在推理基准上与Opus 4.5几乎持平。在AIME26上达94.3，LiveCodeBench v6上80.2 Pass@1，近期未见过的LeetCode竞赛中接受率达96.1%，匹配或超越DeepSeek V3.2等大数个量级的旗舰系统。模型基于Qwen2.5-Coder 3B，经过硬样本筛选、多解监督训练、数学/代码/STEM可验证奖励强化学习、自蒸馏、指令聚焦RL及测试时答案检查方法CLR训练而成。

Alibaba Cloud@alibaba_cloud · 6月24日13

🔥 2 DAYS TO GO until #FFA2026! All 11 sub-forum agendas are now live, covering 7 major Data + AI tracks: 🧠 Multimodal & Vector Computing 🤖 AI Agents 🏗️ AI Platform in Practice ⚙️ Intelligent DevOps 🌊 Agentic Lake 📊 Real Time Analytics 🚀 Real-Time Data Powers the Future of AI Plus dedicated industry sessions on Automotive AI and Embodied AI. ✨ Apache Fluss 1.0 debuts with real-time context capabilities for AI Agents. 📅 Jun 26–27 📍 Shenzhen 🔗 Register now: https://hd.aliyun.com/form/8369 #AlibabaCloud #ApacheFlink #ApachePaimon #ApacheFluss #DataAI #AIAgent #RealTimeData

译阿里云宣布距FFA2026大会仅剩2天，全部11个分论坛议程已上线，覆盖7大Data+AI方向：多模态与向量计算、AI智能体、AI平台实践、智能DevOps、Agentic Lake、实时分析与实时数据。同时设汽车AI和具身AI行业专场。Apache Fluss 1.0在大会上首次亮相，具备专为AI智能体设计的实时上下文能力。大会将于6月26-27日在深圳举办。