AIHOT
内容
精选全部 AI 动态AI 日报主题收藏
接入
Agent 接入
更多
关于更新日志反馈
内部员工登录
精选全部日报更多
内部员工登录
全部动态X · 468 条
全部一手资讯X论文
标签「数据/训练」清除
karminski-牙医@karminski3 · 5月27日69

什么?! skill 也能"训练"了? 以往大家都是凭经验让AI写 skill, 然后调试的时候也是运行几下感觉没bug就完事了. 但 skill 能运行就一定好吗? 于是微软联合上交复旦同济等机构发了一个新框架 SkillOpt, 直接让AI评估skill写的好不好然后不断去优化! 最终, 这个框架写的 skill 让GPT-5.5的直接对话准确率飙升了 23.5分! 这个框架具体是怎么做的也很简单, 让skill迭代过程实现 harness 闭环! 大模型写完 skill 后, 立刻进入跑分流程, 只有得分更高的 skill 变更才会留下来. 跟大模型的强化学习过程如出一辙. 框架的设计也很值得做 Agent 框架的同学借鉴, 比如: 它设计了一个独立的优化器模型, 这个模型是用来写 skill 的, 它会根据 Agent 执行任务的试错表现得分, 对 skill 进行编辑操作(增加、删除、替换文本). 然后就是 harness 流程了:每一次文本编辑都必须在独立的验证集上分数有提升, 才会允许合并. 最后, 也是最精彩的地方, 框架还引入深度学习训练机制, 设计了文本层的学习率预算, 这个的核心就是限制大模型每次只能修改skill的一小部分, 慢慢迭代, 而不是全都重写. 论文中最有价值的数据就在这里, 论文实验发现, 每一步设置 4 到 8 个编辑操作的预算效果最好. 最终的最佳 skill 往往只包含 1 到 4 个被接受的核心修改. 甚至他们还设计了被拒编辑缓冲区, 用来存储训练过程的反面胶材, 以及周期性慢速/元更新, 这个则是跑完一个周期后, 会进行一次盘点, 类似于让框架形成记忆, 能更好的维持后续迭代. 这篇论文的结论十分深刻: skill(prompt) 完全配得上, 也需要一套系统级的训练流程. 原文中的描述直接是: 我们主张, skill 应当作为 Agent 的外部冻结状态来被"训练", 并且训练过程还要"让权重空间优化具有可重复性"! 这是不是意味着, 提示词工程(Prompting)和模型训练(Training) 的界限将逐渐变得模糊? 而提示词工程完全进入了机器学习的领域. 也许很快, 我们再也不需要人类去手动瞎改和调试提示词了! 论文地址: http://arxiv.org/pdf/2605.23904 #skillopt #微软 #提示词工程 #harness

译微软联合上海交通大学等机构发布SkillOpt框架,旨在通过机器学习流程系统性地优化AI智能体的技能。该框架引入独立的优化器模型,通过harness闭环流程对技能进行编辑,且每次编辑必须在验证集上带来分数提升才被接受。框架设置了每步4到8个编辑操作的学习率预算,使核心修改控制在1到4个。实验表明,优化后的技能可使GPT-5.5的对话准确率提升23.5分。

Epoch AI@EpochAIResearch · 5月27日69

Are we nearing a compute crunch? In our latest Gradient Update, @luke__emberson and @Jsevillamol estimate how many tokens all the Blackwell chips on Earth could serve, and compare this to total token demand. Direct comparisons are difficult, but it appears demand is growing much faster than supply.

译我们是否正接近算力危机? 在最新的 Gradient Update 中,@luke__emberson 和 @Jsevillamol 估算全球所有 Blackwell 芯片能处理多少 token,并与总 token 需求进行比较。直接对比很困难,但需求增长似乎远快于供应。

Epoch AI@EpochAIResearch · 5月27日39

Help us produce the most useful work on AI by taking our 5-minute survey: https://docs.google.com/forms/d/e/1FAIpQLSfzw_ad497AhTPNS5sQaCjBwqChjvM96RiiKXZqKTTS4ko53g/viewform (You can sign up at the end to join our compensated user research panel.)

译请花5分钟参与我们的调研,帮助我们产出最有用的AI工作:https://docs.google.com/forms/d/e/1FAIpQLSfzw_ad497AhTPNS5sQaCjBwqChjvM96RiiKXZqKTTS4ko53g/viewform (您可以在最后注册加入我们的有偿用户研究小组。)

Ant Ling@AntLingAGI · 5月26日69

From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. 👇 KPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-horizon agentic RL. Ring-2.6-1T → 76+ on SWE-bench Verified, pure RL. Congrats to @Jia__Guo & team! Blog: https://ringtech.notion.site/kpop

译团队发布了KPop技术,用于稳定大规模MoE模型的强化学习训练。它取代了此前IcePop方法的固定比例掩码,改用自适应二元KL散度区域来匹配每个token的固有噪声,从而实现更鲁棒的参数更新,支持长期、智能体化的强化学习训练。具体应用中,万亿参数的Ring-2.6-1T模型在仅使用纯强化学习训练(未修改基础设施或路由重放)的情况下,于SWE-bench Verified评测中得分超过76。KPop仅通过一个关键参数即可实现该优化。

Chubby♨️@kimmonismus · 5月26日59

I'm not sure if Google is winning the AI race. However, I think they're winning the AI distribution race, which is a different thing. 900M Gemini users is impressive on a slide. But a huge chunk of that is Android users who got a default app swap and Search users who got AI Overviews without opting in. But that doesnt mean its a bad thing. 9.7 trillion tokens/month two years ago. 480 trillion last year. 3.2 quadrillion now. That's a 7x jump in twelve months. To keep that going, Google plans to spend $190 billion on infrastructure this year. OpenAI has been trying to reach the 1b user milestone for some time now. For Google, on the other hand, it's a simpler game. Why? With billions of Android devices, and combined with Google and its AI mode, they have the ability to introduce everyone to AI, specifically Gemini, for free. How do they do it? TPUs! Google not only laid the foundation for modern LLMs with their 2017 paper "Attention is all you need," but also made a far-sighted decision back in 2012 to invest in TPUs - their own in-house chips that are particularly well-suited for machine learning tasks. Now in its eighth iteration, they even have two chips: one particularly good for inference, and one particularly good for training. This makes them more independent. Furthermore, they have a solid foundation that generates strong revenue and good profits, allowing them to subsidize AI usage for free, and without ads, unlike OpenAI (this is not a judgment, just a statement of fact). TPUs Therefore, Google has a very good chance of winning the game thanks to this outstanding starting position and free distribution. But to be fair: the game *is* far *from over*. However, the starting position is outstanding for Google. Image: The Economist article

译文章的核心论点是 Google 凭借其分发优势,在 AI 分发竞赛中占据了有利位置。目前 Gemini 拥有 9 亿用户,这主要归功于向 Android 用户进行的默认应用替换,以及向 Google 搜索用户推送的 AI 概览。其大语言模型 token 用量在 12 个月内从 480 万亿增长至 3.2 千万亿。为支撑此规模,Google 计划今年投入 1900 亿美元用于基础设施。Google 的关键优势在于能够利用庞大的 Android 设备基础,通过其搜索和 AI 模式免费向用户推广 Gemini。这一策略的部分成本优势源于自研的 TPU 芯片,使其在推理和训练上更独立,并能基于自身盈利补贴免费 AI 服务。尽管游戏远未结束,但 Google 的开局位置非常出色。

Ant Ling@AntLingAGI · 5月26日68

From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. 👇 KPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-horizon agentic RL. Ring-2.6-1T → 76+ on SWE-bench Verified, pure RL. Congrats to @Jia__Guo & team! Blog: https://ringtech.notion.site/kpop

译团队推出 KPop,用于稳定大规模 MoE 模型的智能体强化学习训练。它用基于二元 KL 散度的自适应掩码机制,替代了此前 IcePop 方法中的固定比例掩码,能根据训练过程中的训练-推理不匹配程度动态调整。这一改进使得 Ring-2.6-1T 模型在无需修改基础设施或路由重放的情况下,仅通过纯 RL 训练,在 SWE-bench Verified 上取得了超过 76 分的成绩。

SenseTime@SenseTime_AI · 5月26日77

🚀 𝗪𝗲'𝘃𝗲 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗰𝗼𝗱𝗲𝗯𝗮𝘀𝗲 𝗳𝗼𝗿 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮-𝗨𝟭 (8B dense + A3B MoE). ​ ​ One stack for 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝘁𝗮𝘀𝗸𝘀 across: text-to-image · editing · interleaved generation · text & vision understanding.​ ​ Built for practical large-scale training: ​ ⚙ Hybrid WP/TP/PP + ISP parallelism​ 🌊 Streaming, resumable, packed data pipeline ​ 🎛 Env-var driven configs for easy experimentation ​ 🧱 Decoupled backbone, data, and objective modules ​ 📈 Scales from 1×8 GPUs to multi-node clusters ​ ​ Apache-2.0 👇 ​ https://github.com/OpenSenseNova/SenseNova-U1​ Discord: https://discord.gg/BuTXPHmQub​ ​ @GitHub

译商汤开源了SenseNova-U1(8B dense + A3B MoE)的完整训练代码库。这是一个统一的框架,支持文本到图像、图像编辑、交错生成、文本与视觉理解等多种多模态任务的训练。其设计注重实用性与大规模训练,采用混合并行、流式可恢复数据管道、环境变量配置、解耦模块化设计,并支持从1×8 GPU扩展到多节点集群的规模。代码库以Apache-2.0协议开源。

Ant Ling@AntLingAGI · 5月26日62

SwiGLU is everywhere in modern LLMs — but for large inputs it behaves like x². That quadratic blow-up inflates activations, amplifies outliers, and makes deep network or low-precision (FP8/FP4) training prone to loss spikes. We propose PowLU, a drop-in activation built for stable large-scale pre-training. 🧵

译SwiGLU在现代大语言模型中无处不在——但对于大输入,它的行为类似于x²。这种二次增长会膨胀激活值,放大异常值,并使深层网络或低精度(FP8/FP4)训练容易出现损失尖峰。 我们提出了PowLU,一种为稳定大规模预训练而设计的即插即用激活函数。🧵

向阳乔木@vista8 · 5月26日64

让 Codex 分享过去 3 年 X 的发帖数据(约3.4G)总结。 注意:每人数据和发帖习惯不一样,粗看有点过拟合,仅供参考: 1. 最爆的内容类别:编程/产品/创业、资源/推荐/合集、学习/认知/方法论 2. 爆款内容公式:一个真实有用的工具,加一个明确场景,再给三步以内的使用路径。 3. 发帖窗口:周日、周六、周五数据好,周一最差。 下午5点到晚上11点、上午10到下午1点、凌晨0到2点是三个黄金发帖窗口。 4. 内容形式和长度:带媒体(图/视频)和链接的明显表现更好,内容 101-180 字,是黄金长度。

译用户让Codex分析自己过去3年在X上的约3.4G发帖数据,总结出几点规律:最爆内容为编程/产品/创业、资源推荐合集、学习方法论类;爆款公式是“真实工具+明确场景+三步内路径”;发帖时间上,周五至周日、及每日三个时段(下午5-11点、上午10-下午1点、凌晨0-2点)数据更好,周一最差;内容形式上,带媒体和链接、篇幅在101-180字的表现更优。

Ethan Mollick@emollick · 5月26日56

Its very limiting that a big set of very hard problems that we have just lying around are Erdos problems. Don’t get me wrong, they are quite cool, but we really need hard problems repositories for many fields, including areas that have less specified answers & require judges. Yes, math is the easiest field in which to do verified work, but it is also an area where direct implications of increasing AI ability on everyday life are less clear. We need more types of problems (complex engineering problems, large data sets in economics, physics, biology), for people to turn AI loose on, including speciations of how to evaluate them.

译推文指出,当前用于推动AI能力发展的困难问题过于集中于数学领域(如Erdős问题)。虽然数学易于验证,但其成果对日常生活的直接影响不够明确。作者呼吁需要为包括工程、经济、物理、生物等在内的更多领域建立困难问题库,并配套制定相应的评估方法,以让AI智能体处理更复杂、答案更不明确的任务。

X.PIN@thexpin · 5月26日46

China's AI compute grid is challenging the US. While US tech giants focus on profit, China is turning AI tokens into a state utility. Read further here: http://www.thexpin.com/china-ai-grid-vs-us-market

译中国的AI算力网络正在挑战美国。当美国科技巨头专注于盈利时,中国正将AI token转变为一种国家公用事业。阅读更多: http://www.thexpin.com/china-ai-grid-vs-us-market

Alibaba Cloud@alibaba_cloud · 5月25日57

At Qwen Conference 2026, Minglei Feng (Principal Database Solution Architect, Alibaba Cloud) and Foong Chee Mun (CEO, YTL AI Lab) hit the Agent-Native Cloud Forum to present Activate Enterprise AI Actions with AI-native Data Foundation. 🚀 Stay tuned: https://click.qwencloud.com/m/20000000190/

译在通义千问大会2026上,阿里云首席数据库解决方案架构师冯明磊与YTL AI Lab首席执行官冯志文在智能体原生云论坛上,共同展示了《用AI原生数据基础激活企业AI行动》。 🚀 敬请关注:https://click.qwencloud.com/m/20000000190/

Chubby♨️@kimmonismus · 5月25日68

Welcome to "Intelligence from the Community", our Sunday format where a selected author from the Superintelligence community publishes an original essay or analysis. The idea hasn't changed: some of you are researchers, some are operators, some are engineers building the systems everyone else writes about. That expertise deserves space. This week's piece comes from Amish Regmi, an AI engineer at Klaviyo who previously built inference infrastructure and agentic systems at Amazon. Amish tackles something that has been bugging me for months: the way "AI is exponential" gets thrown around as if it were a single, self-evident fact. It rarely comes with the numbers that would make it testable. What is the base of the exponent? What is the doubling time? Which curve are we even talking about? Amish goes through the data, separates confirmed steep exponents from fast hillclimbs and broken instruments, and arrives at a conclusion that is more useful than the slogan: the transition will be governed by mismatched slopes. Read his article for free at http://getsuperintel.com

译Klaviyo的AI工程师Amish Regmi(前亚马逊推理基础设施与智能体系统构建者)撰文,批判了笼统的“AI发展是指数级”的说法。他指出,这种说法常缺乏可验证的具体数据,如指数的基数、翻倍时间以及具体所指哪条技术曲线。文章通过分析数据,区分了真正陡峭的指数增长与单纯快速提升或指标失效的情况,其结论是,未来的转型将由不同技术或能力曲线之间“不匹配的斜率”所主导。

Nathan Lambert@natolambert · 5月25日18

I heard people needed clarification that my book was a post-training book http://posttrainingbook.com/

译我听说人们需要澄清,我的书是一本关于后训练的书 http://posttrainingbook.com/

swyx@swyx · 5月23日58

co-sign. a very handy mental framework for what kinds of learning transformers do well today, and why it runs into limitations. when @ankit2119 and i wrote about the need for adversarial world models earlier this year, we were describing a couple of the functions of these rungs of thinking that bring us ever closer to the kolmogorov-limit generator of reality. throwing more params, more power, more everything at a demonstrably inefficient paradigm will be outclassed by the simple solution that can hypothesize and seek truth rather than backfit a house of cards - although the bitter lesson is it is simpler to scale and we may hit agi anyway because human intelligence just isn’t that smart nor plentiful

译本文肯定了对Transformer当前学习能力及局限性的分析框架,并指出对抗性世界模型是逼近现实本质的关键功能之一。作者认为,单纯增加参数和算力以扩展一个低效范式,将被能主动假设与验证真理的简洁方案所超越,尽管规模化可能因人类智能本身有限而意外通向AGI。引用推文补充了强化学习(RL)作为从干预中学习的范式,比监督学习更强大,而世界建模与RL的结合有望实现对反事实的学习。

Rohan Paul@rohanpaul_ai · 5月23日64

New Google paper shows that wearable data becomes far more useful when AI learns the person behind the signals. It's is not another heart-rate algorithm, but a general model trained on more than one trillion minutes of sensor data from five million people. The authors propose SensorFM, a foundation model trained on more than 1 trillion minutes of unlabeled wearable data from 5 million people, so it can learn general patterns of human physiology before seeing specific health tasks. That scale changes the problem from measuring isolated events to learning patterns of lived physiology: sleep, movement, temperature, oxygen, heart rhythms, and their ordinary daily messiness. Wearables are not weak because they lack data; they are weak because most systems compress that data into crude summaries before the meaningful structure has a chance to appear. SensorFM tries to learn that structure first, then reuse it across tasks, which is why the same representation can help with cardiovascular, metabolic, mental health, sleep, lifestyle, and demographic predictions. The evidence is strongest as a scaling story: larger models trained on more data performed better, and the learned embeddings beat engineered-feature baselines on 34 of 35 prediction tasks. ---- Paper Link – arxiv. org/abs/2511.15352v3 Paper Title: "People readily follow personal advice from AI but it does not improve their well-being"

译谷歌研究院提出基础模型SensorFM,通过学习超过500万人产生的逾1万亿分钟可穿戴设备传感器数据,掌握了人类生理活动的一般性模式。该模型超越了将数据压缩为简单指标的传统方法,能够从数据中提取出有意义的结构并将其复用于多种健康预测任务。实验显示,模型规模和数据量越大性能越强,且其学习到的数据表征在35项预测任务中的34项上,均优于基于工程特征的基线方法。

Epoch AI@EpochAIResearch · 5月22日65

OpenAI kicked off the AI compute buildout in 2023. But today it uses ~10% of the world's compute, and the top labs together are probably under half. In this week's newsletter, @justjoshinyou13 discusses how much that share may change, and when it could hit a ceiling. 🧵

译OpenAI在2023年开启了AI算力建设浪潮。但如今它仅占全球算力的约10%,顶尖实验室的总和可能也不到一半。 在本周的通讯中,@justjoshinyou13 探讨了这一份额可能如何变化,以及何时会触及天花板。 🧵

Krea@krea_ai · 5月21日69

introducing LoRAs for Krea 2 (beta). our most powerful fine-tuning system to date; now you can train Krea 2 on a your own specific style, object, or character with incredible precision. learn how it works 👇

译为 Krea 2(测试版)引入 LoRA。 我们迄今最强大的微调系统;现在你可以用惊人的精度,在 Krea 2 上训练你自己的特定风格、对象或角色。 了解其工作原理 👇

Rohan Paul@rohanpaul_ai · 5月21日48

StanChart is using AI to automate rule-based support work like compliance screening and customer data updates. plans to cut over 15% of support roles by 2030. CEO says this is about building a future-fit operation.

译渣打银行正在使用AI自动化合规筛查和客户数据更新等基于规则的支持工作。计划到2030年削减超过15%的支持岗位。 CEO表示,这是为了打造面向未来的运营模式。

Rohan Paul@rohanpaul_ai · 5月21日69

WOW, 🤯 A leaked audio from Meta’s April 30 all-hands. Meta is reportedly using its own engineers’ work traces to train coding AI while cutting thousands of jobs. Here Zuckerberg arguing that models learn better when they watch “really smart people” perform tasks, meaning Meta’s internal code, tool use, clicks, and problem-solving can become higher-grade training data than contractor-written examples. The idea is behavior cloning: instead of only feeding an AI finished code, Meta can feed it the step-by-step path a strong engineer takes, including edits, tests, mistakes, fixes, and tool choices. That can teach a model not just what correct code looks like, but how a skilled developer moves from a vague task to a working solution. Meta is reportedly cutting about 8,000 jobs, roughly 10% of its workforce, and additionaly moving about 7,000 employees toward AI-focused work, so the hard realy is that human expertise is being turned into training data before some of those humans leave. The story is not fully independently verified, but the shift is happening for sure: tech companies no longer see AI as a tool sitting beside workers, but as a system that can absorb worker patterns and then compress them into software.

译Meta正利用内部工程师的工作痕迹——如代码编写、工具使用和问题解决步骤——来训练其编程AI。CEO扎克伯格认为,让AI观察“聪明人”执行任务(行为克隆),比使用外部承包商代码样本更有效。同时,Meta正裁员约8000人,并计划让约7000名员工转向AI相关岗位。此举反映科技行业新趋势:公司正将人类专业知识直接转化为训练数据,AI不再只是工具,而是能吸收并压缩员工工作模式的系统。

Ethan Mollick@emollick · 5月21日57

Math is easy* because it has verifiable outputs and few messy judgement choices to make. Which AI labs have the guts to make advancing social science a priority? It may actually do more for human flourishing to unlock sociology, econ & psych reseach. * For AIs, not for humans

译数学很简单*,因为它有可验证的输出,且无需做太多混乱的判断选择。 哪些AI实验室有勇气将推进社会科学作为优先事项?解锁社会学、经济学和心理学研究可能实际上更能促进人类繁荣。 *对AI而言,而非对人类

Google AI@GoogleAI · 5月21日69

For centuries, the scientific method has been our best tool for progress. But today, there’s so much data out there that it’s impossible for any one researcher to connect all the dots. We want to fix that: Introducing Gemini for Science, a collection of science tools and experiments designed to accelerate the speed and scale of scientific exploration. Read on to learn more about each announcement in this inaugural set 🧵👇

译几个世纪以来,科学方法一直是我们取得进步的最佳工具。但如今,数据如此之多,任何单一研究者都无法将所有点连接起来。我们希望解决这个问题: 推出 Gemini for Science,这是一套旨在加速科学探索速度和规模的科学工具与实验集合。 请继续阅读,了解本次首批发布中各项公告的详情 🧵👇

AYi@AYi_AInotes · 5月21日68

AI时代最恐怖的事情不是AI取代你,而是你亲手教AI取代你,然后你自己被裁🤯 扎克伯格4月30号的内部音频泄露了, 他直白地告诉所有员工,公司正在收集你们的键盘鼠标屏幕数据,训练AI。 因为Meta员工的平均智力远高于外包,这些数据能让Llama的编码能力实现戏剧性超越。 然后20天之后,也就是今天凌晨4点,8000名员工收到了裁员邮件。 这哪是为了AI转型啊,分明就是企业食人主义, 好家伙,你教AI怎么干活, AI学会了, 然后你滚蛋。 你以为这就完了? 还有更狠的, 以前资本剥削你的时间, 现在资本剥削你的智慧, 以前你996是为了给自己挣工资, 现在你996是为了训练一个能完美取代你的AI, 而且你还不能划水, 你划水训练出来的AI不够强,你还是会被裁🤣 扎克伯格在效率上肯定是赢了, 他找到了AI时代最暴利的商业模式, 用自己的员工当免费的高质量训练数据, 用完就扔, 但他也输掉了所有信任。 以后再也不会有员工愿意全力以赴了, 因为所有人都知道, 你越优秀, 你被榨干的速度就越快。 你被裁掉的日子就越近。 #Meta #AI #裁员

译近日,Meta CEO扎克伯格的内部音频泄露,他承认公司秘密收集员工键盘、鼠标和屏幕数据,用于训练Llama等AI模型,因Meta员工智力高可提升模型能力。然而,数据收集约20天后,Meta裁员8000人,引发“企业食人主义”批评:员工在不知情下训练可能取代自己的AI,资本剥削从时间升级到智慧。这损害了员工信任,揭示了AI时代高效但冷酷的用人逻辑——员工越优秀,其价值被快速榨取并抛弃的风险越高。

Yuchen Jin@Yuchenj_UW · 5月21日73

You’re praying for 8×H100s. An Anthropic MTS spins up a 10,000×B300 auto-research run with Mythos to train the next Claude. We are the permanent underclass.

译你还在祈祷能用上8块H100。 Anthropic的MTS已经用Mythos启动了万倍B300的自动研究流程来训练下一代Claude。 我们永远是算力底层的阶级。

Chubby♨️@kimmonismus · 5月21日68

And so it starts

译事情开始了 [引用 @kimmonismus]:重磅:Meta 4月30日全员会议泄露音频: 扎克伯格告诉员工,公司正在利用他们训练AI模型,随后大规模裁员即将开始。 他的理由是?Meta的工程师比任何外部劳动力都聪明,让他们在内部解决编码任务,能让Meta的模型比竞争对手更好、更快。 裁员预计在周三凌晨4点进行。先培训你的替代者,然后被请走。这就是现在的规矩。

小互@xiaohu · 5月21日63

Mdjourney创始人暗示他们被Google的 TPU坑了 白白浪费了一年时间… 如果回到过去他会选择英伟达的GPU🤣 “这大概让我们的研究进度,比起一开始就完全采用 Nvidia 技术栈,落后了差不多一年。并不算特别理想。如果我能回到过去,我会从第一天开始就全部使用 Nvidia 的方案。”

译Midjourney创始人暗示他们被Google的 TPU坑了 白白浪费了一年时间… 如果回到过去他会选择英伟达的GPU🤣 “这大概让我们的研究进度,比起一开始就完全采用 Nvidia 技术栈,落后了差不多一年。并不算特别理想。如果我能回到过去,我会从第一天开始就全部使用 Nvidia 的方案。”

AK@_akhaliq · 5月21日67

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

译基于点互信息的推理强化学习反自蒸馏方法

elvis@omarsar0 · 5月20日73

Self-improving AI is a big deal! As a first step, I've been exploring how much of the post-training can be automated. Here is a first post on how I am using @FireworksAI_HQ Agent to automate LLM fine-tuning itself. Dataset + Skill file included. For the use case, I took inspiration from @karpathy's tweet on LLM Knowledge Bases. I asked Claude Code to interact with Fireworks Agent to fine-tune a small Qwen model to get the right output style to efficiently keep growing my PaperWiki (https://x.com/omarsar0/status/2042286186920550498?s=20). All done via natural language. This is obviously the future of improving AI systems. The next step with the PaperWiki project is how to tune a model to better "know" the data. Harder to do, but if possible, then we have an incredibly powerful system that can recursively self-improve and can be extremely useful for things like knowledge discovery and automating all kinds of research end-to-end. More on this soon. Thanks to the Fireworks team for allowing me to test this early. Super excited about this.

译作者探索利用Fireworks AI Agent,通过自然语言交互自动化完成大语言模型的微调流程。他以Qwen小模型为例,调整其输出风格以优化PaperWiki项目的扩展效率。这一方法灵感源于@karpathy关于LLM知识库的推文,强调微调是让模型更“懂”数据的关键步骤。核心观点是自动化微调可推动构建可递归自我改进的AI系统,最终目标是打造一个能自我优化、用于知识发现和端到端自动化研究的强大工具。

elvis@omarsar0 · 5月20日74

http://x.com/i/article/2056851733582880768 # Automating LLM Fine-Tuning with Fireworks Agent ## From Context Window to Weights Andrej Karpathy (@karpathy) recently described the personal LLM Wiki as a kind of pre-AGI memory aid, a curated repo of notes about papers, tools, and ideas you read into context when you want a model to reason over them. In his viral post, Karpathy flagged the obvious next move: "As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM 'know' the data in its weights instead of just context windows." Building LLM Knowledge Bases or LLM Wikis is already possible with agents like Claude Code or Codex, but this approach can quickly get inefficient and expensive as you try to scale them. Fine-tuning LLMs to maintain your knowledge bases is often a more efficient path forward. This post takes that next step by putting the wiki's output style into the weights. In under ten minutes of GPU time and a couple of cents of compute, a small open-weight model writes summaries of new papers in the exact format the wiki uses, with no system-prompt gymnastics, no few-shot exemplars, and no router logic. Once deployed, the summary comes back in a single fast call, fast enough to use inline inside a larger agent loop rather than as a batch job. The harder version (parametric knowledge injection of the wiki's contents) is the natural follow-up to Karpathy's framing, and I treat it as future work at the end. The interesting part is not the model itself, but that one @FireworksAI_HQ Agent session did the entire pipeline (dataset inspection, hyperparameter sweep, full training, deployment, and a working inference endpoint). Fireworks Agent is the autonomous orchestration layer for fine-tuning runs, where you give it a natural-language goal, and it plans, executes, and surfaces decision gates back to you. The whole flow can be driven from a coding agent you already use (Claude Code, Codex, or similar), which is how I ran it. The bigger picture this points to is self-improving LLMs and agents. Once training is a callable step inside an agent loop, the same coding agent that drives your workflow can also kick off fine-tuning runs to bake recurring patterns (a wiki's voice, a coding style, a triage policy) into the model itself, closing the loop between using a model and improving it. The rest of this post is the full walkthrough. All resources from this run are available in a companion repo, including the training and validation splits (train.jsonl, val.jsonl, wiki-sft-2026.jsonl), the data-build scripts (parse_2026.py, fetch_abstracts.py, build_jsonl.py), the pilot-agent.md slash command, the smoke-test script (test_new_deployment.py), and the baseline-vs-fine-tuned comparison code (before_after.py). Grab it at github.com/dair-ai/wiki-sft, clone it, point it at your own corpus, and reproduce the run end to end. ## Why Output Style Is the Right First SFT Target For a personal wiki, the high-leverage thing is consistency. Readers recognize a summary by its shape, which is a one-paragraph lede that names the authors' affiliation and the core contribution, followed by three to five bulleted takeaways with bolded short labels. A capable base model can be coaxed into this format with a careful system prompt, but the failure modes are familiar. It reverts to title-case headers, drops the affiliation line, varies bullet count, and sneaks in marketing language. Supervised fine-tuning (SFT) fixes this at the parameter level. Once the format is in the weights, every generation conforms by default, and the system prompt collapses to a single sentence (or drops out entirely). The cost stays small when the dataset stays small, and a clean stylistic dataset of 50 to 100 examples is usually enough to get started. ## Handing the Work to an Agent Most fine-tuning tutorials walk you through ten distinct steps. You format your data, upload it, choose a base model, decide on LoRA rank and learning rate, launch a job, parse logs, pick a winner, retrain on full data, deploy, and smoke test. Each step is its own surface to mess up, and you end up playing the role of a tuning agent yourself. Fireworks Agent inverts this. The interface is firectl session create -n "<your instruction>", where firectl is the Fireworks CLI. After that, you watch events stream and respond to gates when the agent surfaces a decision, such as the proposed plan or the hyperparameter (HP) sweep results. Fireworks also ships a Claude Code slash command (or you can format it as an agent skill), pilot-agent.md (previously known as Pilot Agent), that wraps the firectl commands and handles event streaming, gate detection, and resume-from-last-timestamp logic. ## Full Walkthrough Step 0: Setup Install the Fireworks CLI and confirm your account. In the Fireworks dashboard, create a service account that has the permissions Training Agent needs (the role that lets it launch training jobs and deployments on your behalf), then generate an API key tied to that service account. Also, create a separate user-level API key for inference and deployment inspection. Drop both into a .env file next to the project. Step 1: Build the Dataset The training data I use consists of chat-format records derived from the DAIR.AI Top AI Papers of the Week wiki, drawn from the top 5 papers per week in 2026 and paired with their arXiv abstracts. Three small Python scripts handle the pipeline, namely parse_2026.py (wiki to structured entries), fetch_abstracts.py (arXiv abstract lookup), and build_jsonl.py (chat-format assembly). The chat schema is the standard Fireworks shape: The final outputs are train.jsonl and val.jsonl (plus the combined wiki-sft-2026.jsonl for reference), with about 90 percent of records reserved for training and 10 percent for validation. Step 2: Upload the Dataset to Fireworks Confirm the dataset is `READY`: The dataset path you will pass to the Fireworks Agent looks like accounts/<your-account>/datasets/wiki-sft-2026. Step 3: Kick Off the Fireworks Agent This is the entire user-facing config for the run, just one instruction. The session returns an ID like 1777224532-7ddb. Stream the events: The --wait flag is important; without it, the command dumps existing events and exits. The Claude Code slash command handles this for you. Step 4: Approve the Plan and Promote the Winner The agent surfaces two gates. The first is a plan with a cost estimate and three HP configs to sweep in parallel, with validation loss as the evaluator, which you approve to resume streaming. The HP sweep then runs three SFT jobs in parallel and returns a ranked table, after which the agent surfaces a second gate with the winning config. In my run, the top three configs landed very close to each other on eval loss, which tells you the task is not particularly HP-sensitive at this dataset size, so approving full training is the obvious next step. Full training takes about eight minutes of GPU time and costs a few cents. Step 5: Verify the Deployment Deployment is where ad-hoc fine-tuning workflows usually go sideways, picking the wrong accelerator, missing a compatible shape, or stalling on capacity. The agent handles the recovery itself, so the session lands at status succeeded with a READY scale-to-zero deployment. Confirm the deployment with the following command: Step 6: Call the Model Inference uses the standard Fireworks chat completions endpoint, with a deployment-pinned model ID so requests route to your custom deployment: Once warm, calls return fast enough to use as an inline step inside an agent rather than a batch job. ## Why This Workflow Pays Off I tested the fine-tuned model on a few papers that sit outside the training set, sending the same system prompt and abstract to both the baseline qwen3-8b and the fine-tuned model. The fine-tuned model produces affiliation-led ledes that name the researchers' lab, followed by three to five bullets with bolded short-label prefixes (Method:, Performance Gains:, Scalability:), and an analytical, non-promotional tone. For instance, on Chain-of-Thought, it opened with "Researchers at Stanford University demonstrate that chain-of-thought prompting significantly enhances large language models' reasoning capabilities..." That is the wiki's voice, baked into the weights and produced in a single fast call. The practical payoff is that you no longer need a large, inefficient LLM or agent to write the summaries for your LLM Wiki. A smaller fine-tuned model can do it effectively, efficiently, and cheaply. Getting the style and tone right matters for this use case, and no amount of tuning a skill or system prompt can replace what a properly fine-tuned LLM gives you. Two more things make this useful beyond a one-off experiment. First, training becomes a tool, not a project, with one CLI command, cents of compute, and a real callable endpoint at the end, while the agent handles the boring failure modes. Second, you own the resulting model. The weights live in your account, deployed on infrastructure you control, and the idle cost is zero. At this price and friction, reaching for SFT becomes a reasonable answer to a much wider set of style and format problems. ## What's Next, Knowledge in the Weights I intentionally stopped at style transfer because it is the cleanest first SFT target on a small dataset. The harder version Karpathy described (your wiki's contents in the weights) is the natural follow-up, with synthetic data generation, more training records, and knowledge-recall evaluators in the loop. The pattern generalizes beyond a personal papers wiki. Any structured knowledge surface (an internal docs wiki, a product manual, a research vault) is a candidate for the same two-step recipe, where you SFT on style first and layer knowledge injection on top. A model that has internalized both the voice and the substance of a corpus is what makes a personalized agent on top of it genuinely useful. Fireworks Agent is currently in private preview and will be generally available soon. If you are thinking about applying this workflow to your own corpus and want to request access or talk it through with the Fireworks team, reach out at fireworks.ai/contact-training.

译本文探讨了通过微调,将个人知识库(如LLM Wiki)的内容从依赖上下文窗口,转变为固化到模型自身权重中的方法。关键在于利用如Fireworks Agent这样的自主AI代理,仅需提供自然语言目标,它就能自动完成从数据准备、训练到部署的完整微调流程。这标志着模型自我改进的闭环成为可能:当训练成为AI工作流中一个可调用的步骤时,模型能主动将反复使用的模式(如特定写作风格或决策逻辑)学习并内化到权重中,从而实现使用与优化的持续迭代。

Chubby♨️@kimmonismus · 5月20日63

Holy: Leaked audio from a Meta all-hands on April 30: Zuckerberg told employees the company is using them to train AI models before mass layoffs hit. His argument? Meta's engineers are smarter than any external workforce, so having them solve coding tasks internally will make Meta's models better, faster than competitors. The layoffs are expected Wednesday at 4 a.m. Train your replacement, then get walked out. That's the deal now.

译泄露的Meta内部音频显示,CEO扎克伯格在4月30日的全员会议上向员工表示,公司正在利用他们的工作成果来训练AI模型,随后即将启动大规模裁员。其核心论点是,Meta工程师的平均智力远高于外部可雇佣的群体,让这些顶尖人才在内部完成编码等任务,能更快速、更有效地提升公司AI模型的编程能力,从而超越竞争对手。裁员计划预计在周三凌晨启动,员工需先完成对替代者的培训。

X.PIN@thexpin · 5月20日58

JUST IN: Alibaba just dropped a 128-chip AI supernode at its 2026 Cloud Summit. We were lucky to see it in person.

译突发:阿里巴巴刚刚在2026云峰会上发布了一款128芯片AI超级节点。我们有幸在现场亲眼目睹。

Emad@EMostaque · 5月20日40

Seems a lot of autoregressive models will be converted to diffusion models

译看来许多自回归模型将被转换为扩散模型。

meng shao@shao__meng · 5月19日71

Cursor 发布 Composer 2.5,仍基于 Kimi K2.5,同时因为与 SpaceXAI 合作,马斯克亲自发帖证实 Composer 2.5 已经开始使用 Colossus 2 算力训练,同时正在合作从零训练一个算力规模 10 倍以上的全新模型! Composer 2.5 相对 Composer 2 在智能水平和行为表现上均有显著提升,重点改进了三类能力:长任务的持续推进、复杂指令的可靠遵循、协作交互的自然度。 https://cursor.com/blog/composer-2-5 三项关键训练创新 1. 定向文本反馈强化学习 解决问题:长任务(数十万 token 的 rollout)中,最终奖励难以告诉模型究竟是哪一步出了错——典型的 RL 信用分配难题。 2. 合成训练数据 合成任务量是 Composer 2 的 25 倍。其中一种代表性方法是 feature deletion: · 给模型一个有完整测试套件的代码库 · 删除若干代码以剥离某个特性 · 让 agent 重新实现该特性,以原测试作为可验证奖励 3. 基础设施层优化 继续预训练阶段使用 Muon 优化器 + 分布式正交化: · 按模型自然粒度跑 Newton-Schulz(attention 按 head,MoE 按 expert) · 分片张量先 all-to-all 拼回完整矩阵,正交化后再 all-to-all 散回;通信与计算异步重叠 · 1T 模型的优化器单步耗时仅 0.2s 训练目标的"软"维度 Cursor 明确指出现有 benchmark 无法很好衡量的两个维度,他们专门优化了: · Communication style(沟通风格) · Effort calibration(投入度校准——什么时候该多想、什么时候该收手) 这两点在实际协作中体感差异很大,也是这次定向文本反馈方法的重点应用场景。

译Cursor发布迄今最强模型Composer 2.5,仍基于Kimi K2.5。模型已与SpaceXAI合作,使用Colossus 2算力开始训练,并计划合作训练一个规模大10倍的全新模型。Composer 2.5在长任务推进、复杂指令遵循及协作自然度方面均有显著提升。关键创新包括:采用定向文本反馈强化学习解决长任务信用分配问题、使用25倍于前代的合成数据进行训练,以及通过Muon优化器与分布式正交化技术优化基础设施层。此外,模型还专门针对沟通风格和投入度校准等协作“软”维度进行了优化。

Nathan Lambert@natolambert · 5月19日52

On-policy distillation is on track to be a lasting method in post-training. The list of areas would be: Instruction tuning (SFT/IFT) RLHF Direct Preference Optimization (DPO et al) RLVR On-policy Distillation (OPD) New classes of methods are rare! Excited to play.

译在线蒸馏有望成为后训练中的持久方法。涉及领域包括: 指令微调(SFT/IFT) RLHF 直接偏好优化(DPO等) RLVR 在线蒸馏(OPD) 新方法类别实属罕见!期待参与实践。

SemiAnalysis@SemiAnalysis_ · 5月19日57

Our SemiAnalysis Weekly Podcast often asks - Is the AI cycle this time truly different from other cycles? Well, at least from our analysis, we think the return from AI is real and it looks like a structural trend that is truly different from other cycles. We tracked token spend vs human labor cost across 9 real workflows at SemiAnalysis - these are tasks our analyst team do consistently to stay on top of the industry: company initiations, earnings recaps, conference transcript mining, financial data pulls. (1/3) 🧵

译SemiAnalysis在播客中探讨了本次AI周期是否真正不同于以往技术周期。团队基于分析认为,AI带来的回报是真实的,且呈现为与其他周期不同的结构性趋势。为验证此观点,他们在内部追踪了9个实际工作流程(包括公司研究、财报总结等)中的token消耗成本与人工劳动成本对比,通过具体数据表明AI的效率与经济价值。研究认为这一趋势已显现出区别于历史技术迭代的独特性与持续性。

elvis@omarsar0 · 5月19日67

NEW paper from Meta. (bookmark it) It's an agent system that autonomously discovers neural architectures that beat Llama 3.2 at 350M, 1B, and 3B scales, all under a 24-hour compute budget. They get this work by splitting the search into two agents: > AIRA-Compose searches the macro architecture. > AIRA-Design implements the low-level mechanisms. For devs: If one agent in your stack is doing both strategy and implementation, split it. Run a planner that picks the structure and an implementer that fills in the mechanisms. AIRA shows this beats a single end-to-end agent on a real, non-toy search problem. The same split is useful for pipeline assembly, query planning, prompt scaffolding, and tool-use programs. Paper: https://arxiv.org/abs/2605.15871 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译Meta提出AIRA系统,通过分离策略与实现的双代理架构,实现神经架构的自主发现。AIRA-Compose负责宏观架构搜索,AIRA-Design专注低级机制实现。该系统在24小时计算预算内,于350M、1B和3B规模上找到超越Llama 3.2的架构。其核心方法论表明,在复杂任务中分离规划代理与实现代理能提升效能,此思路同样适用于流水线组装、查询规划等其他AI代理场景。

Berryxia.AI@berryxia · 5月18日56

兄弟们,今天必需卧槽一下了! 昨晚发完这条推文后,终于等到了… xAI算法开源后,终于有人把源码真正啃完了。 岚叔@LufzzLiz (某大厂架构师,多模态与模型私有化领域专家)直接上手,把xai-org/x-algorithm仓库的每一行结论都追溯到源码,用Opus-4.7花了两天时间,搞出了一个完整wiki。 所有页面都有明确源码出处,跟市面上很多“AI批量生成”的解读完全不一样,直接Wiki库整起来了… 就是不一样啊! 这才是真正有价值的算法拆解。 GitHub仓库:https://github.com/cclank/x-algorithm-wiki 在线阅读地址:https://lansu-wiki-web.lank.workers.dev/wiki/cclank/x-algorithm-wiki#index

译xAI算法开源后,专家岚叔@LufzzLiz深入研究了xai-org/x-algorithm仓库源码,使用Opus-4.7创建了带有明确源码出处的完整wiki。这与引用推文所指出的现状形成对比:市面上95%的分析是AI批量生产的同质化废话,缺乏对源码的真正理解。岚叔的工作提供了有价值的算法拆解,GitHub仓库和在线阅读地址已公开。

Ethan Mollick@emollick · 5月18日61

So the two most obvious barriers to some sort of true AI takeoff are robust RSI (AI acting as an independent AI researcher, rather than “merely” a multiplier of human effort) and continual learning. Either would represent a major change in trajectory for AI development.

译因此,实现真正AI腾飞的两个最明显障碍是: 强大的RSI(AI作为独立的AI研究者,而“不仅仅”是人类工作的倍增器) 以及持续学习能力。 其中任何一项都将代表AI发展轨迹的重大转变。

Rohan Paul@rohanpaul_ai · 5月17日70

New Illinois+ Tsinghua University and other labs study finds that LLM agents still have unreliable memory and that it can get worse when they keep rewriting their own memories. LLM agents can learn from experience, but their rewritten memories often become unreliable. The problem is that many agent systems store past work by asking an LLM to compress messy experience into neat written lessons. That sounds useful because the agent should remember what worked before, but the paper finds that repeated rewriting slowly damages the memory. The core idea is that raw episodes, meaning the actual past attempts and solutions, often stay more useful than the polished lessons made from them. The authors tested this across tasks like web shopping, simulated worlds, app use, and ARC-style puzzle problems where they could control the correct solutions. The sharpest result is that GPT-5.4 solved 100% of a small ARC-AGI set with no memory, but after memory was built from correct solutions, streaming updates dropped it to about 54%. The failures came from bad grouping, overbroad lessons, and overfitting, so the memory forgot details, mixed up task types, or learned rules that only worked on narrow examples. The big deal is that agent memory should not automatically rewrite every experience into a summary, because keeping raw evidence and only sometimes making summaries worked better. The paper is really proposing that agent memory should treat raw past episodes as important evidence, not as disposable notes to summarize away. ---- Paper Link – arxiv. org/abs/2605.12978 Paper Title: "Useful Memories Become Faulty When Continuously Updated by LLMs"

译伊利诺伊大学与清华大学等机构的研究发现,LLM智能体虽能从经验中学习,但其通过LLM将原始经历压缩成书面教训的记忆重写机制会损害记忆可靠性。在网页购物、模拟世界及ARC风格谜题等任务测试中,反复重写记忆会导致错误分组、规则过度泛化或过拟合,使智能体遗忘细节或混淆任务类型。例如,GPT-4在无记忆时可100%解决小型ARC-AGI问题集,而建立记忆并流式更新后,性能降至约54%。研究主张智能体记忆系统应重视原始经历作为关键证据,而非自动将所有经验重写为摘要,保留原始证据并选择性摘要效果更佳。

SemiAnalysis@SemiAnalysis_ · 5月17日53

At Stanford CS153 Frontier Systems, Jensen states word for word that he "would like to be at low MFU all the time" & the reasoning Jensen gives is that he wants be so smart, he over-provisioned the work like flops, networking, memory, etc. maybe the kernel folks at @xai are following this philosophy too

译在斯坦福CS153前沿系统课程中,Jensen Huang逐字表示他“希望始终保持低MFU”,其给出的理由是:他希望系统足够智能,以至于超额配置了如浮点运算、网络、内存等工作负载。或许@xai的内核团队也在遵循这一理念。

全部 AI 动态
AI 相关资讯全量信息流
全部一手信源资讯推文
全部模型产品行业论文技巧
5月27日
07:21
karminski-牙医@karminski3
69
微软等发布SkillOpt框架,用机器学习流程系统优化AI智能体技能

微软联合上海交通大学等机构发布SkillOpt框架,旨在通过机器学习流程系统性地优化AI智能体的技能。该框架引入独立的优化器模型,通过harness闭环流程对技能进行编辑,且每次编辑必须在验证集上带来分数提升才被接受。框架设置了每步4到8个编辑操作的学习率预算,使核心修改控制在1到4个。实验表明,优化后的技能可使GPT-5.5的对话准确率提升23.5分。

智能体arXivMicrosoft数据/训练
04:50
Epoch AI@EpochAIResearch
69
我们是否正接近算力危机? 在最新的 Gradient Update 中,@luke__emberson 和 @Jsevillamol 估算全球所有 Blackwell 芯片能处理多少 token,并与总 token 需求进行比较。直接对比很困难,但需求增长似乎远快于供应。
推理数据/训练现象/趋势
02:20
Epoch AI@EpochAIResearch
39
请花5分钟参与我们的调研,帮助我们产出最有用的AI工作:https://docs.google.com/forms/d/e/1FAIpQLSfzw_ad497AhTPNS5sQaCjBwqChjvM96RiiKXZqKTTS4ko53g/viewform (您可以在最后注册加入我们的有偿用户研究小组。)
其他数据/训练
5月26日
23:59
Ant Ling@AntLingAGI
69
团队发布了KPop技术,用于稳定大规模MoE模型的强化学习训练。它取代了此前IcePop方法的固定比例掩码,改用自适应二元KL散度区域来匹配每个token的固有噪声,从而实现更鲁棒的参数更新,支持长期、智能体化的强化学习训练。具体应用中,万亿参数的Ring-2.6-1T模型在仅使用纯强化学习训练(未修改基础设施或路由重放)的情况下,于SWE-bench Verified评测中得分超过76。KPop仅通过一个关键参数即可实现该优化。

Jia Guo: Curious about the secret sauce behind our trillion-scale agentic foundation model? Here it comes!🥳 Last year, we releas...

智能体数据/训练论文/研究
关联讨论 4 条蚂蚁 inclusionAI:HuggingFace 新模型HuggingFace Daily Papers(社区热门论文)公众号:蚂蚁百灵(Ling)X:蚂蚁百灵 (@AntLingAGI)
23:31
Chubby♨️@kimmonismus
59
Google 正在赢得 AI 分发竞赛,而非 AI 竞赛本身

文章的核心论点是 Google 凭借其分发优势,在 AI 分发竞赛中占据了有利位置。目前 Gemini 拥有 9 亿用户,这主要归功于向 Android 用户进行的默认应用替换,以及向 Google 搜索用户推送的 AI 概览。其大语言模型 token 用量在 12 个月内从 480 万亿增长至 3.2 千万亿。为支撑此规模,Google 计划今年投入 1900 亿美元用于基础设施。Google 的关键优势在于能够利用庞大的 Android 设备基础,通过其搜索和 AI 模式免费向用户推广 Gemini。这一策略的部分成本优势源于自研的 TPU 芯片,使其在推理和训练上更独立,并能基于自身盈利补贴免费 AI 服务。尽管游戏远未结束,但 Google 的开局位置非常出色。

Google大佬观点搜索数据/训练
23:29
Ant Ling@AntLingAGI
同事件精选68
团队推出 KPop,用于稳定大规模 MoE 模型的智能体强化学习训练。它用基于二元 KL 散度的自适应掩码机制,替代了此前 IcePop 方法中的固定比例掩码,能根据训练过程中的训练-推理不匹配程度动态调整。这一改进使得 Ring-2.6-1T 模型在无需修改基础设施或路由重放的情况下,仅通过纯 RL 训练,在 SWE-bench Verified 上取得了超过 76 分的成绩。

Jia Guo: Curious about the secret sauce behind our trillion-scale agentic foundation model? Here it comes!🥳 Last year, we releas...

智能体数据/训练编码论文/研究
同一事件,精选展示《蚂蚁 inclusionAI 推出万亿参数推理模型 Ring-2.6-1T》
推荐理由:蚂蚁团队把 IcePop 升级成 KPop,从固定掩码变成自适应 KL 区域,思路很巧。Ring-2.6-1T 纯 RL 直接冲到 SWE-bench 76+,做 agentic RL 训练的同学值得翻一下博客。
23:29
SenseTime@SenseTime_AI
同事件精选77
开源多模态模型SenseNova-U1完整训练代码库

商汤开源了SenseNova-U1(8B dense + A3B MoE)的完整训练代码库。这是一个统一的框架,支持文本到图像、图像编辑、交错生成、文本与视觉理解等多种多模态任务的训练。其设计注重实用性与大规模训练,采用混合并行、流式可恢复数据管道、环境变量配置、解耦模块化设计,并支持从1×8 GPU扩展到多节点集群的规模。代码库以Apache-2.0协议开源。

多模态开源/仓库开源生态数据/训练
同一事件,精选展示《商汤发布信息图生成模型升级,增强多项核心能力》
推荐理由:商汤把 SenseNova-U1 的训练代码全量开源,支持多模态任务和 MoE,还给了完整的并行策略,做多模态训练的可以直接 fork 过去用,Apache-2.0 很友好。
22:28
Ant Ling@AntLingAGI
62
SwiGLU在现代大语言模型中无处不在--但对于大输入,它的行为类似于x2。这种二次增长会膨胀激活值,放大异常值,并使深层网络或低精度(FP8/FP4)训练容易出现损失尖峰。 我们提出了PowLU,一种为稳定大规模预训练而设计的即插即用激活函数。🧵
推理数据/训练论文/研究
20:59
向阳乔木@vista8
64
Codex分析揭示X平台内容规律

用户让Codex分析自己过去3年在X上的约3.4G发帖数据,总结出几点规律:最爆内容为编程/产品/创业、资源推荐合集、学习方法论类;爆款公式是“真实工具+明确场景+三步内路径”;发帖时间上,周五至周日、及每日三个时段(下午5-11点、上午10-下午1点、凌晨0-2点)数据更好,周一最差;内容形式上,带媒体和链接、篇幅在101-180字的表现更优。

向阳乔木: 有朋友问:什么样的内容在 X 上受欢迎,如何做 X 的运营增长? 我先让 Codex 把自己三年的 X 数据分析一遍,看有什么发现。

教程/实践数据/训练
04:54
Ethan Mollick@emollick
56
AI评估挑战:数学问题单一,亟需多样化难题库

推文指出,当前用于推动AI能力发展的困难问题过于集中于数学领域(如Erdős问题)。虽然数学易于验证,但其成果对日常生活的直接影响不够明确。作者呼吁需要为包括工程、经济、物理、生物等在内的更多领域建立困难问题库,并配套制定相应的评估方法,以让AI智能体处理更复杂、答案更不明确的任务。

大佬观点数据/训练评测/基准
01:22
X.PIN@thexpin
46
中国的AI算力网络正在挑战美国。当美国科技巨头专注于盈利时,中国正将AI token转变为一种国家公用事业。阅读更多: http://www.thexpin.com/china-ai-grid-vs-us-market
政策/监管数据/训练现象/趋势
5月25日
19:54
Alibaba Cloud@alibaba_cloud
57
在通义千问大会2026上,阿里云首席数据库解决方案架构师冯明磊与YTL AI Lab首席执行官冯志文在智能体原生云论坛上,共同展示了《用AI原生数据基础激活企业AI行动》。 🚀 敬请关注:https://click.qwencloud.com/m/20000000190/
数据/训练行业动态
07:57
Chubby♨️@kimmonismus
68
AI指数增长论需要具体数据支撑,转型或由不匹配斜率决定

Klaviyo的AI工程师Amish Regmi(前亚马逊推理基础设施与智能体系统构建者)撰文,批判了笼统的“AI发展是指数级”的说法。他指出,这种说法常缺乏可验证的具体数据,如指数的基数、翻倍时间以及具体所指哪条技术曲线。文章通过分析数据,区分了真正陡峭的指数增长与单纯快速提升或指标失效的情况,其结论是,未来的转型将由不同技术或能力曲线之间“不匹配的斜率”所主导。

数据/训练现象/趋势
02:48
Nathan Lambert@natolambert
18
我听说人们需要澄清,我的书是一本关于后训练的书 http://posttrainingbook.com/
其他数据/训练
5月23日
14:44
swyx@swyx
58
Transformer学习局限与RL的突破潜力

本文肯定了对Transformer当前学习能力及局限性的分析框架,并指出对抗性世界模型是逼近现实本质的关键功能之一。作者认为,单纯增加参数和算力以扩展一个低效范式,将被能主动假设与验证真理的简洁方案所超越,尽管规模化可能因人类智能本身有限而意外通向AGI。引用推文补充了强化学习(RL)作为从干预中学习的范式,比监督学习更强大,而世界建模与RL的结合有望实现对反事实的学习。

Rishabh Agarwal: Very well written blog. I think of RL as learning from interventions, and it kinda explains why it's more powerful as a ...

大佬观点推理数据/训练
08:27
Rohan Paul@rohanpaul_ai
64
谷歌新研究:AI学习生理模式提升可穿戴设备价值

谷歌研究院提出基础模型SensorFM,通过学习超过500万人产生的逾1万亿分钟可穿戴设备传感器数据,掌握了人类生理活动的一般性模式。该模型超越了将数据压缩为简单指标的传统方法,能够从数据中提取出有意义的结构并将其复用于多种健康预测任务。实验显示,模型规模和数据量越大性能越强,且其学习到的数据表征在35项预测任务中的34项上,均优于基于工程特征的基线方法。

Google数据/训练端侧论文/研究
5月22日
00:37
Epoch AI@EpochAIResearch
65
OpenAI在2023年开启了AI算力建设浪潮。但如今它仅占全球算力的约10%,顶尖实验室的总和可能也不到一半。 在本周的通讯中,@justjoshinyou13 探讨了这一份额可能如何变化,以及何时会触及天花板。 🧵
数据/训练现象/趋势
5月21日
22:41
Krea@krea_ai
精选69
为 Krea 2(测试版)引入 LoRA。 我们迄今最强大的微调系统;现在你可以用惊人的精度,在 Krea 2 上训练你自己的特定风格、对象或角色。 了解其工作原理 👇
产品更新图像生成数据/训练

推荐理由:Krea 2 把 LoRA 微调直接做进了产品,对需要固定角色或风格的设计师来说省事了,虽然不是新概念但低门槛就是好文明。
17:56
Rohan Paul@rohanpaul_ai
48
渣打银行正在使用AI自动化合规筛查和客户数据更新等基于规则的支持工作。计划到2030年削减超过15%的支持岗位。 CEO表示,这是为了打造面向未来的运营模式。
数据/训练行业动态
13:14
Rohan Paul@rohanpaul_ai
69
工程师经验成AI燃料:Meta用员工工作痕迹训练模型

Meta正利用内部工程师的工作痕迹——如代码编写、工具使用和问题解决步骤——来训练其编程AI。CEO扎克伯格认为,让AI观察“聪明人”执行任务(行为克隆),比使用外部承包商代码样本更有效。同时,Meta正裁员约8000人,并计划让约7000名员工转向AI相关岗位。此举反映科技行业新趋势:公司正将人类专业知识直接转化为训练数据,AI不再只是工具,而是能吸收并压缩员工工作模式的系统。

More Perfect Union: LEAKED AUDIO: In an all-hands meeting on April 30, Mark Zuckerberg tells employees that he's training AI on them ahead o...

Meta数据/训练行业动态
09:09
Ethan Mollick@emollick
57
数学很简单*,因为它有可验证的输出,且无需做太多混乱的判断选择。 哪些AI实验室有勇气将推进社会科学作为优先事项?解锁社会学、经济学和心理学研究可能实际上更能促进人类繁荣。 *对AI而言,而非对人类
大佬观点数据/训练
02:17
Google AI@GoogleAI
69
几个世纪以来,科学方法一直是我们取得进步的最佳工具。但如今,数据如此之多,任何单一研究者都无法将所有点连接起来。我们希望解决这个问题: 推出 Gemini for Science,这是一套旨在加速科学探索速度和规模的科学工具与实验集合。 请继续阅读,了解本次首批发布中各项公告的详情 🧵👇
Google产品更新数据/训练
关联讨论 3 条X:Google AI for Developers (@googleaidevs)X:Google DeepMind (@GoogleDeepMind)Google DeepMind:Blog(RSS)
01:56
AYi@AYi_AInotes
68
Meta泄露音频:员工培训AI后遭裁员,信任危机

近日,Meta CEO扎克伯格的内部音频泄露,他承认公司秘密收集员工键盘、鼠标和屏幕数据,用于训练Llama等AI模型,因Meta员工智力高可提升模型能力。然而,数据收集约20天后,Meta裁员8000人,引发“企业食人主义”批评:员工在不知情下训练可能取代自己的AI,资本剥削从时间升级到智慧。这损害了员工信任,揭示了AI时代高效但冷酷的用人逻辑——员工越优秀,其价值被快速榨取并抛弃的风险越高。

More Perfect Union: LEAKED AUDIO: In an all-hands meeting on April 30, Mark Zuckerberg tells employees that he's training AI on them ahead o...

Meta数据/训练现象/趋势
01:33
Yuchen Jin@Yuchenj_UW
73
你还在祈祷能用上8块H100。 Anthropic的MTS已经用Mythos启动了万倍B300的自动研究流程来训练下一代Claude。 我们永远是算力底层的阶级。
Anthropic数据/训练行业动态
00:35
Chubby♨️@kimmonismus
68
事情开始了 【引用 @kimmonismus】:重磅:Meta 4月30日全员会议泄露音频: 扎克伯格告诉员工,公司正在利用他们训练AI模型,随后大规模裁员即将开始。 他的理由是?Meta的工程师比任何外部劳动力都聪明,让他们在内部解决编码任务,能让Meta的模型比竞争对手更好、更快。 裁员预计在周三凌晨4点进行。先培训你的替代者,然后被请走。这就是现在的规矩。

Chubby♨️: Holy: Leaked audio from a Meta all-hands on April 30: Zuckerberg told employees the company is using them to train AI mo...

Meta数据/训练行业动态
00:19
小互@xiaohu
63
Midjourney创始人称被Google TPU坑惨

Midjourney创始人暗示他们被Google的 TPU坑了 白白浪费了一年时间… 如果回到过去他会选择英伟达的GPU🤣 “这大概让我们的研究进度,比起一开始就完全采用 Nvidia 技术栈,落后了差不多一年。并不算特别理想。如果我能回到过去,我会从第一天开始就全部使用 Nvidia 的方案。”

David: @bubbleboi it probably put our research a year behind where it could have been if we were pure Nvidia stack, not totally...

Google图像生成大佬观点数据/训练
00:05
AK@_akhaliq
67
基于点互信息的推理强化学习反自蒸馏方法
arXiv推理数据/训练论文/研究
5月20日
23:33
elvis@omarsar0
73
自我改进的AI是件大事!

作者探索利用Fireworks AI Agent,通过自然语言交互自动化完成大语言模型的微调流程。他以Qwen小模型为例,调整其输出风格以优化PaperWiki项目的扩展效率。这一方法灵感源于@karpathy关于LLM知识库的推文,强调微调是让模型更“懂”数据的关键步骤。核心观点是自动化微调可推动构建可递归自我改进的AI系统,最终目标是打造一个能自我优化、用于知识发现和端到端自动化研究的强大工具。

elvis: http://x.com/i/article/2056851733582880768

智能体开源/仓库教程/实践数据/训练
23:03
elvis@omarsar0
74
通过AI代理自动化微调,将知识注入大语言模型权重

本文探讨了通过微调,将个人知识库(如LLM Wiki)的内容从依赖上下文窗口,转变为固化到模型自身权重中的方法。关键在于利用如Fireworks Agent这样的自主AI代理,仅需提供自然语言目标,它就能自动完成从数据准备、训练到部署的完整微调流程。这标志着模型自我改进的闭环成为可能:当训练成为AI工作流中一个可调用的步骤时,模型能主动将反复使用的模式(如特定写作风格或决策逻辑)学习并内化到权重中,从而实现使用与优化的持续迭代。

智能体MCP/工具教程/实践数据/训练
20:05
Chubby♨️@kimmonismus
63
泄露的Meta内部音频显示,CEO扎克伯格在4月30日的全员会议上向员工表示,公司正在利用他们的工作成果来训练AI模型,随后即将启动大规模裁员。其核心论点是,Meta工程师的平均智力远高于外部可雇佣的群体,让这些顶尖人才在内部完成编码等任务,能更快速、更有效地提升公司AI模型的编程能力,从而超越竞争对手。裁员计划预计在周三凌晨启动,员工需先完成对替代者的培训。

More Perfect Union: LEAKED AUDIO: In an all-hands meeting on April 30, Mark Zuckerberg tells employees that he's training AI on them ahead o...

Meta数据/训练行业动态
11:34
X.PIN@thexpin
58
突发:阿里巴巴刚刚在2026云峰会上发布了一款128芯片AI超级节点。我们有幸在现场亲眼目睹。
产品更新数据/训练部署/工程
04:59
Emad@EMostaque
40
看来许多自回归模型将被转换为扩散模型。
数据/训练现象/趋势
5月19日
08:56
meng shao@shao__meng
71
Cursor发布最强模型Composer 2.5,与SpaceXAI合作启动Colossus 2算力训练

Cursor发布迄今最强模型Composer 2.5,仍基于Kimi K2.5。模型已与SpaceXAI合作,使用Colossus 2算力开始训练,并计划合作训练一个规模大10倍的全新模型。Composer 2.5在长任务推进、复杂指令遵循及协作自然度方面均有显著提升。关键创新包括:采用定向文本反馈强化学习解决长任务信用分配问题、使用25倍于前代的合成数据进行训练,以及通过Muon优化器与分布式正交化技术优化基础设施层。此外,模型还专门针对沟通风格和投入度校准等协作“软”维度进行了优化。

Cursor: Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running t...

数据/训练模型发布编码
07:27
Nathan Lambert@natolambert
52
在线蒸馏有望成为后训练中的持久方法。涉及领域包括: 指令微调(SFT/IFT) RLHF 直接偏好优化(DPO等) RLVR 在线蒸馏(OPD) 新方法类别实属罕见!期待参与实践。
大佬观点数据/训练
05:13
SemiAnalysis@SemiAnalysis_
57
AI周期的独特性与实际回报获数据分析验证

SemiAnalysis在播客中探讨了本次AI周期是否真正不同于以往技术周期。团队基于分析认为,AI带来的回报是真实的,且呈现为与其他周期不同的结构性趋势。为验证此观点,他们在内部追踪了9个实际工作流程(包括公司研究、财报总结等)中的token消耗成本与人工劳动成本对比,通过具体数据表明AI的效率与经济价值。研究认为这一趋势已显现出区别于历史技术迭代的独特性与持续性。

数据/训练现象/趋势
02:09
elvis@omarsar0
67
Meta新系统双代理协同,自动设计超越Llama 3.2的神经架构

Meta提出AIRA系统,通过分离策略与实现的双代理架构,实现神经架构的自主发现。AIRA-Compose负责宏观架构搜索,AIRA-Design专注低级机制实现。该系统在24小时计算预算内,于350M、1B和3B规模上找到超越Llama 3.2的架构。其核心方法论表明,在复杂任务中分离规划代理与实现代理能提升效能,此思路同样适用于流水线组装、查询规划等其他AI代理场景。

智能体Meta数据/训练论文/研究
5月18日
09:54
Berryxia.AI@berryxia
56
xAI算法开源深度解析,专家创建完整源码wiki

xAI算法开源后,专家岚叔@LufzzLiz深入研究了xai-org/x-algorithm仓库源码,使用Opus-4.7创建了带有明确源码出处的完整wiki。这与引用推文所指出的现状形成对比:市面上95%的分析是AI批量生产的同质化废话,缺乏对源码的真正理解。岚叔的工作提供了有价值的算法拆解,GitHub仓库和在线阅读地址已公开。

Berryxia.AI: xAI 算法开源后,解读内容铺天盖地。 我敢说一句颠覆多数人认知的实话: 市面上 95% 的分析,是 AI 批量生产的同质化废话, 连源码文件名都没翻过一次。 「多互动」「多发帖」「账号要垂直」 这种谁都会说的话,说了等于没说。 真正藏在 ...

GitHubxAI开源/仓库教程/实践
06:39
Ethan Mollick@emollick
61
因此,实现真正AI腾飞的两个最明显障碍是: 强大的RSI(AI作为独立的AI研究者,而"不仅仅"是人类工作的倍增器) 以及持续学习能力。 其中任何一项都将代表AI发展轨迹的重大转变。
智能体大佬观点数据/训练
5月17日
16:10
Rohan Paul@rohanpaul_ai
70
研究揭示LLM智能体记忆重写机制损害可靠性

伊利诺伊大学与清华大学等机构的研究发现,LLM智能体虽能从经验中学习,但其通过LLM将原始经历压缩成书面教训的记忆重写机制会损害记忆可靠性。在网页购物、模拟世界及ARC风格谜题等任务测试中,反复重写记忆会导致错误分组、规则过度泛化或过拟合,使智能体遗忘细节或混淆任务类型。例如,GPT-4在无记忆时可100%解决小型ARC-AGI问题集,而建立记忆并流式更新后,性能降至约54%。研究主张智能体记忆系统应重视原始经历作为关键证据,而非自动将所有经验重写为摘要,保留原始证据并选择性摘要效果更佳。

智能体数据/训练论文/研究
10:42
SemiAnalysis@SemiAnalysis_
53
在斯坦福CS153前沿系统课程中,Jensen Huang逐字表示他"希望始终保持低MFU",其给出的理由是:他希望系统足够智能,以至于超额配置了如浮点运算、网络、内存等工作负载。或许@xai的内核团队也在遵循这一理念。
大佬观点数据/训练
‹ 上一页
1…56789…12
下一页 ›