AIHOT
内容
精选全部 AI 动态AI 日报主题收藏
接入
Agent 接入
更多
关于更新日志反馈
内部员工登录
精选全部日报更多
内部员工登录
全部动态X · 468 条
全部一手资讯X论文
标签「数据/训练」清除
AK@_akhaliq · 5月8日61

MARBLE Multi-Aspect Reward Balance for Diffusion RL paper: https://huggingface.co/papers/2605.06507

译MARBLE 扩散RL的多维度奖励平衡 论文: https://huggingface.co/papers/2605.06507

SemiAnalysis@SemiAnalysis_ · 5月8日50

Floating point math is not associative! And many of the highest performance kernels split the workload among SMs and accumulate partial results in a nondeterministic order. Many AI labs just accept this, or pay a huge performance penalty for determinism. DeepSeek decided to do neither. (1/4) 🧵

译浮点运算不满足结合律!许多高性能计算核心会将工作负载分配到多个流多处理器上,并以非确定性顺序累加部分结果。许多AI实验室只能接受这一点,或为确定性付出巨大的性能代价。DeepSeek决定两者都不选。(1/4) 🧵

Nathan Lambert@natolambert · 5月8日63

Work led by @jacobcares showed that little compute for building an LLM is actually in the final runs. The vast majority of compute goes to developing a recipe. Creating the recipe openly is a huge lever in making sure the research community's compute pushes to new knowledge.

译由 @jacobcares 主导的研究表明,构建大语言模型的算力消耗很少集中在最终训练阶段,绝大部分算力实际用于开发算法配方。 公开创建算法配方是确保研究界算力能推动新知识产出的重要杠杆。

SemiAnalysis@SemiAnalysis_ · 5月8日51

We are so used to seeing chip company marketing teams exaggerate specs that it is refreshing to see them understate specs for a change. Here's one example from Cerebras's website, where they understate on-chip SRAM by a factor of 8! @cerebras y'all are far too modest!

译我们已习惯芯片公司营销团队夸大参数规格, 如今看到他们转而低调陈述反而令人耳目一新。 Cerebras官网就存在一例—— 他们将片上SRAM容量低估了整整八倍! @cerebras 你们实在太过谦虚了!

Chubby♨️@kimmonismus · 5月8日57

The xAI / Anthropic compute story is not about one company having GPUs and the other wanting them. It's that they have opposite problems. xAI reportedly runs one of the largest GPU fleets in the world. Yet according to The Information, its recent model FLOPs utilization was around 11%. Buying GPUs is only half the battle. Turning them into actual work is the other half. Anthropic looks like the mirror image. Claude demand is running ahead of available capacity. Revenue run-rate passed $30B, up from roughly $9B at the end of 2025. Its $1M+ business customers doubled from 500 to 1,000+ in under two months. The new SpaceX compute capacity is immediately being converted into higher Claude Code and Opus limits. So the real compute race may not be about who can announce the biggest cluster. It's about who can digest compute fastest. xAI shows that raw GPU ownership can outpace operational absorption. Anthropic shows what happens when product demand is so intense that new capacity instantly becomes more usage, higher limits, and more revenue. The scarce resource is no longer just GPUs. It's the ability to turn them into products people pay for to be honest.

译xAI与Anthropic在算力运用上呈现出镜像困境。xAI虽拥有全球顶尖的GPU集群,但其模型计算利用率仅约11%,凸显出将硬件转化为有效算力的挑战。相反,Anthropic面临需求远超供给的局面:其Claude收入年化已超300亿美元,百万美元级企业客户在两个月内从500家激增至1000家以上,新增的算力被立即转化为更高的使用限额和收入。这场竞赛的关键已非单纯比拼集群规模,而在于“算力消化效率”——即谁能最快速地将原始计算资源转化为可盈利的产品能力。稀缺资源正从GPU硬件本身,转向这种高效的转化能力。

Nathan Lambert@natolambert · 5月8日72

Visiting most of the leading Chinese AI labs, I'm struck by a culture that's extremely well suited to building LLMs with fewer resources, but one happening in a very different ecosystem, more companies at play, almost no data industry, etc. Full report: https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs

译走访多家中国顶尖AI实验室后,我深受触动:这里存在一种极其适合用较少资源构建LLM的文化,但这种文化发生在截然不同的生态系统中——参与企业更多,数据产业几乎空白等。 完整报告:https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs

宝玉@dotey · 5月7日76

http://x.com/i/article/2052198374636404736 # Anthropic 兄妹 Dario Amodei 和 Daniela Amodei 最新对话:Claude 为什么一直限速? 在 5 月 6 日的 Code with Claude 旧金山场上,Anthropic 兄妹 Dario Amodei 和 Daniela Amodei 一起坐到了台上。这是 Anthropic 第二届开发者大会,同一天,Anthropic 刚刚宣布与 SpaceX 签下 Colossus 1 数据中心的全部算力(超过 300 MW、22 万张 NVIDIA GPU)。 主持这场对话的是 Anthropic 首席产品官 Ami Vora(2026 年 1 月接替转去 Labs 的 Mike Krieger)。话题从“指数曲线上的体感”开始,覆盖开发者生态、模型训练逻辑的下一步、Anthropic 在能力释放上的取舍,一直聊到未来六个月最让 Dario 兴奋的能力变化。 下面是这场约半小时对话的整理,原视频来自 Anthropic 官方 Code with Claude 系列。 原始视频:https://www.youtube.com/watch?v=7xco5Qd2Oo8 ## 要点速览 - 一,Anthropic 原本按“每年 10 倍”准备算力,但 2026 年第一季度的实际增速年化下来约为 80 倍,这是 Claude 一直在限速的直接原因。Dario 直说希望增速回到 10 倍,“80 倍太疯狂了,扛不住”。 - 二,Dario 一年前在去年的 Code with Claude 上对 Mike Krieger 说,2026 年会出现第一家“一人估值 10 亿美元”的公司。如今离 2026 年结束还有七八个月,目前的最新进展是:已经出现两人估值 10 亿美元的 AI 公司,以及单人估值数亿美元的案例。 - 三,软件工程师是 AI 在整个经济中扩散的“先行指标”。开发者怎么用 Claude,预示了其他行业未来怎么用。 - 四,编码能力进步快,是因为它“可验证”(跑单测就知道行不行)。下一个真正难啃的,是安全、设计质量、code review 这些没法用单测自动判定的“主观”能力。Anthropic 正在训练模型攻克这些,也会反哺写作和科研。 - 五,“光与影并举”(Hold light and shade)是 Anthropic 的内部文化原则。最新案例是最强模型 Mythos:因为它能识别和利用软件漏洞,公司没有公开发布,而是走 Project Glasswing 的限定路径,发给 50 多家机构去强化防御。 - 六,Dario 最期待未来六个月的能力变化,是组织级 AI。AI 不再只是替一个人做完很多人的事,而是在一群人组成的组织里把这件事重复做很多次。 ## 【1】80 倍的年化增速,是什么体感 Ami 一开场就抛出了个灵魂拷问:你们俩是真正切身感受这条指数曲线的人,这种增长是什么感觉? Daniela 接话先用了一个公司内部的梗。Anthropic 的 Slack 里有一个“过山车”的表情包,斜率突然垂直拉起来的那种。她说自己和 Dario 像分别坐在车头和车尾,“看你坐哪头,得到的鞭甩感不一样”。她接着补了一句让台下笑出声的话: > 我们是有点不太确定,开过山车的那个操作员,是不是一个心智状态可疑的、暑假来打工的 15 岁小孩。 (“We're not totally sure that the operator of the roller coaster isn't like a 15 year old who's doing a summer job of like questionable level of sound mind.”) Dario 的回答更“理科”。他说自己和几位联合创始人十多年前就是通过 scaling laws(规模化定律,即模型能力随训练算力呈可预测的增长)写下了这条曲线,预测过“先花 1000 美元一个月,然后 1 万、10 万,一直到几千亿,模型在这个任务和那个任务上会做到什么程度”。所以从纸面上看,眼前发生的一切其实是预测之内。 > 注: Dario Amodei 2014 年在百度研究院参与 Deep Speech 2 项目时首次观察到“规模越大、性能越好”的规律,2020 年在 OpenAI 合著发表了影响深远的规模定律论文。Anthropic 的七位联合创始人中多人参与了这项研究。这也是他说“十多年前就预测了这条曲线”的背景。 但他说,把曲线写在纸上和亲眼看见这条曲线变成现实,是两回事。他用了《星际穿越》里那个著名的场景做类比:飞船降落在一个靠近黑洞的星球,星球上的浪有 2000 英尺高。 > 我以前是物理学家,广义相对论里物质能被剪切到什么程度,公式我都懂。但你真的在人类尺度上看见这一幕,是另一种深层的、令人不安的怪。Anthropic 内部每一年都是这种感觉。 (“I was a physicist, I know the math, the general relativity, how much things can be sheared. But actually seeing it on human scale, there's something deeply, it's kind of deeply strange and unsettling about seeing it actually happen.”) Dario 接着把“指数曲线”具象化到了三个数字上。 第一,今年是公司历史上第一次,Claude 让 Anthropic 内部 PR(pull request,代码合并请求)的数量出现了曲线向上的拐点。Claude 写代码的速度,超过了人加进来的速度。 第二,公司的外部增长,今年第一次“超过了指数”。Anthropic 原本按“每年 10 倍”做算力规划,做了从“几乎不增长”到“涨 10 倍”的多版本预案。但 2026 年第一季度,如果按当季度速度年化,营收和使用量是 80 倍。 > 注: Dario 在表述时用了“if you were to annualize it”的限定语,这意味着 80 倍是将单季度爆发外推至全年的数字。实际全年增速不太可能维持在这个水平,但即便打折,这个数字仍然远超公司的 10 倍规划弹性。 第三,这就是为什么 Anthropic 一直在限速。Dario 用了“道歉式”的语气: > 80 倍太疯狂了,是真的扛不住,我希望它能回到正常一点的数字,比如就 10 倍。 (“I hope the 80x growth doesn't continue 'cause that's just crazy and it's too hard to handle. I hope for some more normal numbers, a mere 10x.”) 他随后把话题接到了今天的另一条新闻: > 你们今天看到 SpaceX 的算力交易了,我们在尽全力把更多算力拿到手,会在我们能力允许的范围内尽快传递给你们。 (“As you saw today with the SpaceX compute deal, we're working as quickly as possible to provide more compute than we have in the past.”) > 注: Anthropic 在 5 月 6 日同步公布的新闻是,与 SpaceX 签订协议,使用 Colossus 1 数据中心(位于田纳西州孟菲斯,原属 Elon Musk 旗下的 xAI)的全部算力,“一个月内”上线超过 300 MW、22 万张以上 NVIDIA GPU。Anthropic 的其他算力交易包括:与 Amazon 高达 5 GW 的协议(其中近 1 GW 在 2026 年底前上线)、与 Google + Broadcom 的 5 GW 协议(2027 年开始上线)、与 Microsoft + NVIDIA 的 300 亿美元 Azure 算力战略合作。Musk 此前曾多次公开批评 Anthropic 和 Dario,但在 5 月 6 日同步发推称,自己上周和 Anthropic 高层接触后“留下了好印象”。这桩交易公告本身就是这场访谈“算力是真实瓶颈”叙事的最直接背书。 ## 【2】为什么 Anthropic 把开发者放在用户金字塔最上面 Ami 接着把话题转向开发者社区。这一天的会场坐的几乎全是开发者,她想听 Dario 和 Daniela 怎么定位这个群体。 Daniela 说得很直接:在很多意义上,开发者就是 Claude 最重要的用户。这里面有几层原因: 首先,Anthropic 自己内部就以开发者为主,他们对自己造出来的工具最敏感。 其次,开发者社区给的反馈是真诚的。做过产品的人都明白这有多稀缺: > 你做出一个产品,看几个数字觉得“还不错”,但开发者社区跟你互动的那种实在感,完全是两码事。 (“You build a product and you're like I see some numbers like those are nice but...the genuineness with which the developer community I think engages with us is something that is so special.”) 最后,Anthropic 从第一天起就“主要为开发者和企业”做产品,Daniela 觉得这在 AI 圈里其实不太常见。 她列出 Claude 已经渗透进的领域,包括医学、软件开发、金融服务,几乎每个行业都有以开发者为核心的公司在用 Claude 重塑业务。她把这种关系描述成“既是特权也是责任”。 Dario 从另一个角度补充。他说,技术在经济里不会均匀扩散,软件工程师永远是最快采用新技术的那群人。所以这场行业聚光灯都打在编程上不是偶然,“它是接下来整个经济会怎么被 AI 改造的微缩预演”。 ## 【3】“一个人 10 亿美元公司”的赌局,还剩七八个月 Dario 接着把“开发者”这条线引向一个具体赌局。他说大约一年前,也就是 2025 年的 Code with Claude,Mike Krieger 当面问过他: > 第一家估值 10 亿美元、只有一个人的公司,会在哪一年出现? Dario 当时的回答是 2026 年。如今还剩七八个月。台下笑了。Dario 半开玩笑半认真地补充: > 在指数曲线上,七八个月已经是一辈子了。 (“That's eternity on the exponential.”) 他透了个底:已经出现两人估值 10 亿美元、用 AI 起家的公司,也出现单人估值数亿美元的案例,但严格意义上“一个人 10 亿美元”还没兑现。在他看来,这件事真正的含义不是“省人工成本”,而是单个有想法的个体或极小团队,第一次有可能用几年才能积累起来的资源量级,去做出他们想象中的事。 > 我们已经从“模型在帮我们写代码”,走到“模型在帮我们把软件工程当成一个任务来思考”,再走到“模型在帮我们把整个商业单元、整个经济单元当成一个任务来思考”。 > 注: Mike Krieger 是 Instagram 联合创始人,2024 年加入 Anthropic 任首席产品官,2026 年 1 月转去新成立的 Anthropic Labs 担任技术员,专注实验性产品孵化(最有名的当下项目就是后文提到的 Mythos),由 Ami Vora 接任 CPO。Dario 在那场对话里给出的概率是“70%-80% 会发生”。这场赌局的终点是 2026 年 12 月 31 日。不过他没有给出“两人公司十亿美元”的具体案例名称,这个说法目前无法独立验证。 ## 【4】单 Agent 走向多 Agent,下一个瓶颈是验证 Ami 顺势问 Dario,开发者使用 Claude 的方式接下来会怎么变。Dario 给出几条相互咬合的趋势。 第一条,从单 Agent 走向多 Agent。一个开发者手上不再是一个 Claude,而是一群 Claude,可能还构成层级关系,上层 Claude 把任务再分包给下层 Claude。Dario 用了一个他经常用的比喻: > 我们正在朝“数据中心里的天才之国”走。现在还在“一屋子聪明人”这个阶段,正在往上爬。 (“We're gradually making our way to the country of geniuses in the data center. We're starting with a team of smart people in a room or something.”) 第二条,Claude Code 目前主要在帮“个人”提效,但 Anthropic 越来越多在思考“整个团队、整个组织”的提效,让一群人加上一群 Claude 的整体产出超过简单相加。 第三条,也是 Dario 反复强调的:要看 Amdahl's law(阿姆达尔定律)。当某一段被加速到极限时,瓶颈会跳到没被加速的那一段。 > 你提到 PR 数量,如果你在一个组织里,能写 3-4 倍的 PR,你会立刻意识到,原来还有一堆别的东西在拖着你。如果只把这一段跑得飞快,其他没跟上,反而会出事。 (“If you're living in a world where you can, within an organization, write three or four times as many PRs as you could previously, you start to understand there are all these other things that are holding you back or that will go wrong if you speed up just that and not everything else.”) 他点出这些“其他东西”具体是什么:安全、验证、code review、设计质量。Anthropic 接下来要做的,不是单点再提速,而是把这一整圈瓶颈一起抬起来,让加速能“平稳、可靠地”释放出来。 > 注: Amdahl's law 出自 1967 年计算机科学家 Gene Amdahl 提出的并行计算公式,原本说的是:一个程序里如果只有部分能被并行加速,另一部分必须串行,那么整体能跑多快受限于那段串行的部分。Dario 把它借来描述工程组织的协作瓶颈,这是他这场对话里反复回到的核心分析框架,后面讨论产品和模型训练时还会再用。 ## 【5】训练模型的方式也得跟着变 Ami 追问:这些趋势会不会反过来改变 Anthropic 训练模型的方式? Dario 的回答有两层。 第一层是已经在发生的事:Anthropic 正在用 Claude 加速 Claude 自己的开发。 第二层更有意思。Dario 说,软件工程之所以是 AI 进步最快的领域,是因为它有一个特殊性:可验证。给模型一段代码任务,它写出来,跑单元测试就能立刻判定对不对。这个反馈回路简单粗暴有效,所以训练效率特别高。 但软件工程里还有一大块东西不可验证: > 这段代码“真的对吗”?能不能找到错误?有没有安全问题?这些就没那么容易验证了。 (“Is this thing really right? Can we find errors? Are there security issues? Not quite as verifiable.”) 这里面的道理很直接:训练效率取决于验证的容易程度。代码能跑测试,对错一目了然,所以训练进步快;安全分析和设计判断没有这种自动验证机制,进步就慢。一旦 Anthropic 在这些“半主观”任务的训练上取得突破,受益的就不只是软件工程,写作、科研等领域也会跟着受益。 他用 Amdahl 定律重新概括了这件事:在软件工程内部,那些“软的、主观的”能力,因为是当前的瓶颈段,反而会变得不成比例地重要。 ## 【6】使命:在快速发布和负责任发布之间走钢丝 Ami 转向使命这个话题。Anthropic 体量在变大,整个行业的赌注也越来越高,外界最该了解 Anthropic 的到底是什么? Daniela 给了两根支柱。 一根是“如何把这项有变革性的技术做好,让它对所有人都有益”。Claude 是一个工具,能放大人创造的野心和能力,这是机会的一面。 另一根是承认风险:劳动力被冲击的风险、技术发布是否安全、对人是否真的有益。 Daniela 说,Anthropic 想做的事,是把这两端“等量齐观”地处理。她引出了一个公司内部的文化关键词:“Hold light and shade”,光和影并举。 她举了刚发布不久的“Mythos 和 Glasswing”作为例子: > Mythos 这种能力级别的模型,能用它做出的事情潜力巨大。但因为存在一些安全方面的脆弱点,我们想在发布上稍微小心一点。 她这样总结这种纠结: > 我们这种平衡其实挺微妙的。我们想尽快把东西发出来、做最好的产品、发布最强的模型,但我们也想做得负责任一点。我们大多数决策的出发点,都是在这两个支柱之间来回校准。 > 注: Claude Mythos Preview 是 Anthropic 2026 年 4 月发布的预览版模型,在网络安全任务上展现了跨代能力,在多个主流操作系统和浏览器中发现了大量零日漏洞。Project Glasswing 是配套的防御安全联盟,联合数十家关键基础设施组织使用 Mythos 扫描和修复漏洞。正因为这些安全风险,Mythos 被限制在极小范围内发布。转录稿中的“Glassman”疑为“Glasswing”的语音识别错误。 ## 【7】指数曲线下的产品观:为 AI 做产品 vs. 用 AI 做产品 谈到产品,Daniela 先调侃了一下 Ami。她说“你刚刚说我和 Dario 在产品上'leaned in a lot',翻译成人话就是:你俩天天插手我业务,能不能让我安静干活”。 但她话锋一转,承认两人确实在产品上很较真,因为产品就是 Anthropic 想做的事的对外呈现。她还说了一个比较少听到的视角:在 Anthropic 内部,“产品”和“研究”是两条互相牵引的输入。有时候你会觉得“我们应该建一个更好用的工具”,但更多时候,“产品创新是被模型涌现出来的新能力推着走的”。 她举的例子是编程:Anthropic 一开始并没有从第一天就立志做一个编程产品。是某个时间点,团队发现模型已经能写出“还不错、不完美”的代码,又观察到很多深度用户本身就是开发者,自然萌生出“我们应该给这个群体做点什么”的念头,最后才有了 Claude Code。 Dario 接着把这个话题拆得更具体。他说有两件事要分开来看:在 AI 时代做产品(building products for AI)、用 AI 做产品(building products with AI)。 先说前者。他给出了 AI 时代做产品最关键的几条规律。 第一,AI 时代做产品的特点是技术底盘在飞速变化。2010 年代的产品时代,技术底图按部就班,偶尔有一个新框架。在 AI 时代,能力台阶每跨一档,原本死活做不出来的产品突然“亮起来”。所以内部要持续做实验,“哪怕这个东西现在做不出来,过几个月再回来试一次”。 他给了一个亲历的例子: > 我们 2022 年其实试过类似 Claude Code 的东西。当时挺挫败的,理念是对的,但模型太傻,根本榨不出价值。我从 2015 年开始就在训练这些模型,他们是真的,是真的傻。 (“If we had tried to do Claude Code in 2022, it wouldn't have worked because the models wouldn't have been strong enough...I've been training these models since 2015. They were really dumb.”) 第二,AI 时代里,产品的饱和点是被模型变得太强而推到的。Dario 说 chatbot 形态已经接近饱和,市场仍然很大,但模型继续变聪明,对 chatbot 形态的边际增益已经不明显。今天每一档新能力,更多体现在 Claude Code 这种 agentic(智能体)形态上。 第三,API 这个市场永远不会消失。因为新产品永远在出现,Anthropic 内部如此,外部更是如此。code 之外,写代码的人在做的医疗、法律、金融应用,每多一档模型能力就会多出一批新应用空间。 第四(也是回到 Amdahl 定律),用 AI 做产品时,他在公司内部观察到一个现象:发布速度被加速了 2 倍、4 倍、5 倍,但接下来“系统性的债”开始浮现。 > 用 AI 加速发布,是真的可以做到一年前做不到的产能;但你也会以惊人的速度积累技术债。然后你被迫问:能不能也用 AI 来还这些债,或者至少帮我们盯住债是什么?再然后你会发现,团队不得不用一种完全不同的方式协作。这些事每个月都会冒出新的认知。 (“It's possible to accumulate an extraordinary amount of internal technical debt when you ship that fast. And so then you have to say, well, can we also use the AI models to undo that technical debt or keep track of what it is that we're doing?”) 也因此,AI 时代不只是发布节奏更快,“连'你怎么做事'本身都被迫高频升级”。 Ami 借这个话题加了一句自己的体感:问题本身是不会变得那么快的,人始终是人。但你必须保持“用新眼光看技术”,并且接受“你每天的工作内容也在变,因为瓶颈每隔一段就跳到新的地方”。 ## 【8】未来六个月,最让 Dario 兴奋的能力 Ami 让 Dario 用一句话回答:未来六个月,模型能力上最让你兴奋的是什么? Dario 给了个跨维度的答案:从“个人级 AI”跃迁到“组织级 AI”。 > 让我兴奋的是这个想法:AI 不只是替一个老板做完很多人的事,而是 AI 在一群人组成的组织里,把很多人的事重复做很多次。 (“AI is not just doing the work of many people working for one person, but that it does the work of many people many times over by operating within an organization of humans.”) 他把这条线索和“一个人 10 亿美元公司”的赌局连了起来:那个赌局可能反而被低估了。真正会发生的更可能是“一群人加上 AI,把以前几百几千人的工作做完”,而不是“一个人独立创业撑起一个 10 亿”。 ## 【9】最打动他们的 Claude 用例 最后 Ami 把话题切给 Daniela:让你最有触动的用户用例是哪些? Daniela 举了几个反差极大的例子。 第一个是全球南方的移动医生项目。某些地区想见到一个真正的医生很难,要走几十英里土路才能到最近的城市。但当地人仍然有疾病和健康问题。开发者用 Claude 做出“问诊式”的接口,给出经过把关的医疗建议,把模型能力翻译成在低资源场景里能落地的工具。 她也提到了生物医学研究领域的加速,这是她一直关注的方向。 后面两个更私人。一位开发者用 Claude 把一段已经损坏的硬盘里的婚礼照片救了回来。还有一个人用 Claude 跟踪自家花园里番茄的生长情况。 Daniela 被番茄那个例子逗乐了:“我这辈子都不会想到这种用法。但是,你有摄像头直播吗?我想订阅。” AI 能用来干什么这个问题,用户的想象力永远比产品经理的规划跑得快。 ## 末尾 Q&A 速览 Q:今天 Anthropic 增长有多快? 第一季度按当季速度年化是 80 倍(Dario 用了“if you were to annualize it”的限定语,这是短期爆发外推的数字)。原本按 10 倍准备算力,所以一直在限速。 Q:SpaceX 算力交易解决了什么? 接下来一个月内会上线 300 MW、22 万张以上 NVIDIA GPU。Anthropic 会尽快把算力转化为更高的限额传给开发者。 Q:“一个人 10 亿美元的公司”赌局现在到哪了? 已经有两人 10 亿美元、单人数亿美元的案例(Dario 未给出具体名称,无法独立验证)。Dario 在 2025 年 Code with Claude 上给的时间窗是 2026 年,置信度 70%-80%。距离窗口结束还有七八个月。 Q:未来六个月模型能力上最让 Dario 兴奋的是什么? 组织级 AI。AI 不再只是替一个人做完很多人的事,而是在一个由人组成的组织里把这件事重复做很多次。 Q:Anthropic 在能力释放上是怎么做取舍的? 公司内部叫“光与影并举”。Mythos 模型因为安全风险没有公开发布,改用 Project Glasswing 限量发到数十家机构去做防御侧的强化。 ## 最后 这场对话透出的核心看点,是 Anthropic 试图兼顾两种极端定位时,那种“左右互搏”的矛盾感。 一方面,它是增长最快的 AI 公司。80 倍年化增速(即使这个数字有选择性计算的成分),SpaceX 算力合作,Claude Code 让内部 PR 数量出现了向上拐点。Dario 在台上承认 80 倍扛不住,希望回到 10 倍,同一天就把全行业最难搞定的合作之一签了下来。这是“能找的算力我们都找了”的最强证据。 另一面,它又是最谨慎的 AI 公司。Mythos 这种跨代模型仅仅因为安全风险就被限制发布,“光与影并举(Hold light and shade)”成了反复提及的保命符。面对一个如此强大的模型,Anthropic 等于主动放弃了把它直接推向市场的速度。 要同时端平这两碗水,真实情况绝对比 Dario 和 Daniela 在台上说的难得多。80 倍增长意味着恐怖的交付压力,技术债“以惊人速度积累”可是 Dario 的原话。在这种推背感极强的速度下,还要踩刹车做安全评估、坚持负责任发布,靠的不仅是几句原则,更是每天资源排期里拳拳到肉的现实博弈。 Dario 关于 Amdahl 定律的反复引用,是整场对话的关键分析框架。它指向了一个比“AI 让一切变快”更实际的问题:加速之后,瓶颈会转移到哪里。对开发者来说,这个问题比“模型又变强了”更值得认真想。 两个值得持续追踪的信号:Colossus 1 上线后,限额是不是真的明显放宽,5 小时限额翻番但是周限额不变更像是文字游戏,Amazon、Google、Microsoft 那些动辄 GW 级的承诺到年底有多少能转化成用户可用的算力;Mythos 何时从预览版走出 Glasswing,在什么条件下走。前者考验 Anthropic 作为产品公司的基础设施能力,后者考验“光与影并举”这个原则在商业压力下能撑多久。 至于“一人 10 亿美元公司”的赌局,距离 2026 年结束还有七八个月。Dario 在台上已经在修正它:真正的命题可能是“一群人加上 AI 干以前几百人的活”。如果这个修正是对的,“一人独角兽”反而会成为这个故事里相对没意思的一部分。 原视频来源:Anthropic Code with Claude 旧金山场,2026 年 5 月 6 日,“A conversation with Dario Amodei & Daniela Amodei”。

译Anthropic联合创始人Dario Amodei在开发者大会上表示,Claude服务持续限速的直接原因是需求增速远超预期。公司原本按年增10倍规划算力,但2026年第一季度实际年化增速高达80倍,导致算力供不应求。为此,Anthropic已与SpaceX签署协议,将获得Colossus 1数据中心超过300 MW、22万张NVIDIA GPU的全部算力。Dario称这种指数级增长虽在理论预测内,但实际体验仍令人震撼。公司视开发者为AI扩散的先行指标和最重要用户群体,并正致力于攻克代码安全等“主观”能力。

Rohan Paul@rohanpaul_ai · 5月7日57

MRC was introduced by NVIDIA, Microsoft, and OpenAI, along with collaborated with AMD, Broadcom, Intel. Multipath Reliable Connection is a new RDMA transport protocol, proven first and optimized on NVIDIA Spectrum-X Ethernet hardware. Spreads AI training traffic across many paths instead of forcing each GPU connection through one route. Basically, it is a new way to move training data between huge numbers of GPUs without letting one bad network path slow the whole cluster. RDMA lets GPUs move data through the network with very little CPU help, which is crucial when thousands of GPUs must exchange model updates constantly during one training run. MRC changes the connection itself by letting one RDMA stream use multiple network paths, so traffic can shift around congestion, failed links, and overloaded switches without waiting for software-level repair.

译多路径可靠连接(MRC)是一种新型RDMA传输协议,由NVIDIA、微软和OpenAI联合推出,并与AMD、博通和英特尔合作。该协议首先在NVIDIA Spectrum-X以太网硬件上得到验证和优化。MRC的核心创新是改变连接方式,允许单个RDMA数据流利用多条网络路径传输AI训练流量,而非强制每个GPU连接走单一固定路由。RDMA技术使GPU能以极少CPU帮助移动数据,这对于数千GPU在训练中不断交换模型更新至关重要。当网络出现拥塞、链路故障或交换机过载时,流量可自动绕行,无需软件层面修复,从而避免单一不良路径拖慢整个计算集群,保障大规模AI训练任务的高效进行。

Rohan Paul@rohanpaul_ai · 5月7日48

This research builds a system that trains language models continuously using everyday conversations instead of manual labeling. The huge deal here is that this method completely removes the traditional need for human workers to manually gather, review, and score massive datasets. AI Agents can now use their everyday mistakes to get smarter automatically. Whenever a person replies to the digital assistant or corrects a mistake, the software treats that response as a direct learning signal. A background program reads these natural follow-up messages and extracts specific text hints about what the model should have done differently. The software agent simply updates itself in real time during normal use by analyzing how people naturally interact with it. Every time a person corrects an agent or a software test fails, the system receives a valuable clue about how to improve. ---- Think about a student looking at their final grade and throwing the paper away without reading the teacher's helpful notes. Current Reinforcement Learning systems do the exact same thing. Current models throw this natural feedback away because they only care about whether the final outcome was a success or a failure. OpenClaw-RL fixes this by grabbing 2 specific signals from every single interaction. - First, it looks at evaluative signals to see if the action worked. If a user asks the same question again, they are probably unhappy. If a test passes, it is a success. These become simple numerical rewards using a Process Reward Model judge. - Second, it gathers directive signals to figure out how the action needs to change. User corrections and error logs offer direct guidance. These become word-level supervision using a technique called Hindsight-Guided On-Policy Distillation. Personal chats, terminal commands, Graphical User Interface clicks, and software tasks all create these reaction signals. A single policy can learn from all of them at the same time. It runs the training process in the background so the model never has to pause its normal tasks to learn. By treating standard deployment as a continuous learning environment, the model constantly adapts to individual user preferences without any manual data labeling. ---- Paper Link – arxiv. org/abs/2603.10165 Paper Title: "OpenClaw-RL: Train Any Agent Simply by Talking"

译本研究提出OpenClaw-RL系统,使语言模型能通过日常对话进行持续训练,无需人工标注数据。其核心是利用用户互动中产生的自然反馈(如纠正或重复提问)作为实时学习信号。系统从每次交互中提取两种信号:评估信号(判断行动成败,转化为数值奖励)和指导信号(获取具体改进方向,转化为词级监督)。该方法将标准部署环境转化为持续学习场景,使模型在后台运行中不断自我更新,自适应不同用户偏好,从而摆脱对大规模人工标注数据集的依赖。

TestingCatalog News 🗞@testingcatalog · 5月7日72

SpaceX ❤️ Anthropic > Elon: I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. The balance of power just disrupted 👀

译Elon Musk表示,他近期与Anthropic高层团队深入交流,对其确保Claude AI有益于人类的努力印象深刻,认为团队高度专业且秉持正确价值观。基于此信任,他同意将SpaceX的超算集群Colossus 1出租给Anthropic,因为SpaceXAI已将自身训练任务转移至Colossus 2。这一合作被视作科技巨头间力量平衡的一次变动。

向阳乔木@vista8 · 5月7日60

电脑打开X->创作者工作室->数据分析,下载近90天或更长时间数据,给大模型分析,AI给了几条发现: 1. 发帖越多通常越能拉曝光,但最佳效率区间更接近 3-5 条/天,不是单纯越多越好。 2. 周三平均互动率最高,周四平均涨粉最高,周六最适合冲曝光。 3. 90天中44%新增关注来自前10个高涨粉日,涨粉靠帖子爆发。 看看你的数据有什么发现?

译通过将X平台创作者工作室近90天的数据分析数据输入大模型,AI提炼出关键运营规律。核心发现包括:每日发帖3-5条是曝光效率最佳区间,而非单纯追求数量;周三互动率最高,周四涨粉效果最好,周六则最利于冲击曝光量。此外,近44%的新增关注者集中来源于少数“高涨粉日”,表明涨粉主要依赖爆款帖文的拉动效应。

OpenAI@OpenAI · 5月6日66

AI supercomputers need a new kind of network to stay in sync at massive scale. OpenAI’s @markjhandley and @poyntingatgreg join @AndrewMayne to discuss what it takes to move data across record numbers of chips reliably and efficiently, the new Multipath Reliable Connection (MRC) networking protocol, and why it's available for the whole industry to use.

译大规模AI超算需要新型网络来保持芯片同步。OpenAI专家讨论了在庞大芯片集群间可靠高效传输数据的挑战,并介绍了新发布的多路径可靠连接(MRC)网络协议。该协议由OpenAI与AMD、Broadcom、Intel、Microsoft、NVIDIA等行业伙伴共同推出,旨在帮助大型AI训练集群运行得更快、更可靠,减少GPU闲置时间。MRC是一个开放的行业协议,可供整个业界使用。

Chubby♨️@kimmonismus · 5月6日54

NVIDIA just open-sourced a transport protocol that powers OpenAI's Blackwell clusters. It opened MRC, a new RDMA transport protocol for massive AI training clusters. Instead of pushing GPU traffic through one fragile path, MRC spreads a single connection across multiple network paths. If one path fails or gets congested, traffic can be rerouted in hardware within microseconds. This is important because frontier training is no longer only about GPUs. The network is becoming one of the biggest bottlenecks in AI factories. OpenAI is already using MRC on Blackwell clusters. Microsoft and Oracle are also named by NVIDIA as major deployments. NVIDIA is pushing Ethernet into territory historically associated with InfiniBand. And by opening MRC through OCP, while optimizing it first for Spectrum-X, NVIDIA is making a smart platform move: more open standard on the surface, stronger full-stack NVIDIA advantage underneath.

译NVIDIA通过OCP开源了MRC协议,这是一种专为大规模AI训练集群设计的新型RDMA传输协议。其核心创新在于将单一连接分散到多条网络路径上,当某条路径出现故障或拥塞时,能在微秒级时间内通过硬件重路由流量,以解决前沿AI训练中日益严峻的网络瓶颈问题。该协议已应用于OpenAI的Blackwell集群,微软和Oracle也是其主要部署方。NVIDIA此举在表面推动更开放标准的同时,优先为自家Spectrum-X平台优化,实则强化了其全栈竞争优势,并将以太网技术推向传统上由InfiniBand主导的高性能计算领域。

ginobefun@hongming731 · 5月6日63

当前 AI 与真实工作场景之间的错配

译斯坦福研究基于1500名工人和844项任务指出,当前AI投资方向与真实工作需求错配。研究通过WORKBank框架,将工作任务按对AI的“渴望度”和AI“当前能力”划分为四个象限:高渴望高能力的“绿灯区”任务(如数据录入)已可自动化;高渴望低能力的“研发机会区”是创业方向;低渴望高能力的“红灯区”(如创意最终呈现)易引发抵制;双低的“低优先级区”则无需关注。关键发现是,同一职业(如程序员)的不同任务横跨多个象限,因此“职业被替代”是伪命题,工作正被重新切分与融合。

meng shao@shao__meng · 5月6日77

全球首个基于 Subquadratic Sparse Attention (SSA) 架构的前沿 LLM ~「SubQ」,实现 12M token 的实用上下文窗口,同时在效率上大幅领先传统 Transformer,来自 @subquadratic 技术核心突破:SSA 机制 传统 Transformer 的标准注意力是全对全(all-pairs),计算复杂度为 O(n²),导致长上下文成本爆炸。大多数 token 间的交互实际无意义,却仍需全量计算。 SSA 的创新在于内容依赖的选择(content-dependent selection): · 每个 query 只动态挑选真正相关的 key 位置进行注意力计算。 · 实现线性缩放(linear scaling):计算与内存成本随序列长度线性增长,而非二次方。 · 同时保留内容驱动路由与任意位置精确检索能力,避免了固定模式稀疏注意力(位置无关)、RNN/SSM(状态压缩丢失细节)或 DeepSeek DSA(selector 仍为二次方)等方案的权衡。 实测效果(B200 GPU + FlashAttention-2 对比): · 128K token:7.2× 预填充加速 · 1M token:52.2× 加速 · 成本 < Opus 的 5%,支持 12M token 上下文。 训练与功能定位 SubQ 采用三阶段训练(预训练 → SFT → RL),特别强化长上下文下的可靠检索与多跳推理,针对企业真实场景(如完整代码库、长合同、跨文档研究)优化,而非仅追求基准分。 功能定位:解决“名义上下文窗口”(能塞多少 token) vs “功能上下文窗口”(能有效利用多少 token)的鸿沟。适用于 Coding Agent、长期 Agent 会话、企业知识库等需要“一次性看全”而非 RAG/分块的场景。 SubQ Code 也可以申请试用,我也刚刚申请,期待通过后再做具体分享。申请链接在这: https://subq.ai/request-early-access

译前沿模型SubQ基于创新的Subquadratic Sparse Attention架构,实现了1200万token的实用上下文窗口。其核心技术SSA通过内容依赖的选择机制,让每个查询仅动态计算与相关键的注意力,使计算和内存成本随序列长度线性增长,而非传统Transformer的二次方增长。实测在100万token时比FlashAttention-2快52.2倍,成本低于Opus的5%。该模型针对需要一次性处理完整代码库、长文档等企业真实长上下文场景优化,旨在弥合“名义上下文”与“功能上下文”窗口的差距。

Nathan Lambert@natolambert · 5月6日43

Adding an on policy distillation section to the RLHF book and it’s remarkable how bad LLMs / coding agents are at it, despite me giving them the core papers and 250 pages of context on how I present ideas.

译正在为RLHF书籍添加一个关于策略蒸馏的章节,值得注意的是,尽管我已经提供了核心论文和250页关于我如何阐述观点的背景资料,但LLMs/编码代理在这方面的表现却出奇地差。

Epoch AI@EpochAIResearch · 5月5日44

Join us for in-person workshops to develop problems for FrontierMath: Open Problems! We are seeking highly interesting unsolved problems from research mathematics whose solutions can be verified programmatically. These are hard to find. Come take a crack at it! Link below.

译加入我们的现场研讨会,共同为FrontierMath:开放性问题集开发题目! 我们正在寻找研究数学中极具趣味性、且可通过程序化验证解决方案的未解难题。这类问题非常难得。快来一展身手吧!链接如下。

Berryxia.AI@berryxia · 5月5日52

兄弟们,都已经2026年了! 强烈推荐了~ 但最荒谬的是-顶级AI公司里不少工程师,每天调提示词、刷benchmark。 却对LLM到底怎么从零构建出来的了解,远不如斯坦福这堂2小时公开课。 它把ChatGPT、Claude这类大模型的完整诞生过程,从Transformer架构到训练技巧、Scaling law、数据配比、甚至最底层的优化细节,全都拆得清清楚楚。 抖音快手短视频能让你放松2小时, 斯坦福这堂课却能让你在2小时内,真正看懂整个AI时代的底层核心要素。 免费、公开、含金量夯到爆啊!。 很多在OpenAI、Anthropic工作的人,都没系统学过这么多。 真正想懂AI的人,现在就把抖音关掉,打开这个视频开干。

译斯坦福一门2小时公开课系统讲解了ChatGPT等大语言模型从零构建的全过程,涵盖Transformer架构、训练技巧、Scaling law等核心知识。课程免费且含金量高,揭示了AI时代的底层逻辑。相比之下,许多顶级AI公司的工程师仅专注于调提示词和刷基准测试,缺乏此类系统知识。课程为真正想理解AI的人提供了宝贵的学习机会。

François Chollet@fchollet · 5月5日73

I wrote Deep Learning with Python to be the definitive guide to how deep learning works and how to best make use of it. Tens of thousands of people got their career start via this book. 120,000 copies sold, and downloaded by millions more. And now it's free to read online: https://deeplearningwithpython.io/

译我撰写《Deep Learning with Python》旨在成为理解深度学习工作原理及最佳应用方式的权威指南。数以万计的人通过这本书开启了职业生涯。已售出12万册,更有数百万人下载阅读。 现在可以免费在线阅读:https://deeplearningwithpython.io/

Nathan Lambert@natolambert · 5月5日53

We need to create a new term for the attacks some Chinese labs are doing on APIs that is different than distillation or else we risk tarnishing a crucial technique that is crucial to AI diffusion, academic research &amp; the open-source ecosystem. https://www.interconnects.ai/p/the-distillation-panic

译我们需要为某些中国实验室对API进行的攻击创造一个新术语,以区别于蒸馏,否则我们可能会玷污一项对AI扩散、学术研究和开源生态系统至关重要的关键技术。 https://www.interconnects.ai/p/the-distillation-panic

Epoch AI@EpochAIResearch · 5月5日46

Are AI benchmarks doomed? @GregHBurnham and @tmkadamcz join @ansonwhho to push back on benchmark pessimism and dig into what the next generation of AI benchmarks could look like. (0:00:00) - Preview (0:00:36) - Intro: Are AI benchmarks doomed? (0:03:13) - The costs and benefits of benchmark development (0:11:48) - MirrorCode and scalable benchmarks (0:20:57) - AI speed-up in benchmark development (0:23:28) - The benchmark-reality gap (0:38:26) - Can an AGI benchmark exist? (0:43:18) - Beyond automated scoring (1:00:45) - How AI changes benchmark building in practice

译针对“AI基准测试是否已失效”的悲观论调,讨论者进行了反驳,并深入探讨下一代AI基准测试的可能形态。核心议题包括基准测试开发的成本与收益、可扩展基准(如MirrorCode)的构建、AI技术对基准开发本身的加速作用,以及当前基准测试与现实应用能力之间存在的差距。对话还触及了构建通用人工智能(AGI)基准的可行性,并展望了超越自动化评分的更全面评估方法。

elvis@omarsar0 · 5月4日66

Autodata (from Meta) is an agentic data scientist that builds high-quality training and evaluation data autonomously. Great work on the autoharness track. (bookmark it)

译Meta FAIR开发的Autodata是一个能自主构建高质量训练与评估数据的代理系统。其核心在于“代理式自我指导”循环:编排器LLM指导挑战者代理基于领域文档生成问题,由弱、强解算器尝试解答,法官评分后分析失败并循环优化,从而产出能有效区分模型能力的挑战性数据。在CS研究QA任务中,该方法产生了34个百分点的性能差距,远超标准方法的1.9点。系统还具备元优化能力,通过外循环调整指令,将验证通过率从12.8%提升至42.4%。研究处理了超万篇论文,产出2,117个优质QA对,通过增加推理计算使数据更具挑战性,从而提升下游模型性能。

Eric@ericmitchellai · 5月4日40

I am begging you to look at your data. Please look at the data evals worse than expected? look at the data evals better than expected? *definitely* look at the data evals about what you expected? believe it or not ....

译我恳求你看看你的数据。 请看看数据 评估结果不如预期?看看数据 评估结果超出预期?*务必*看看数据 评估结果符合预期?信不信由你……

Rohan Paul@rohanpaul_ai · 5月3日54

Freshly assembled Figure's F.03 humanoid can now walk autonomously from the manufacturing line straight to headquarters. Navigates stairs using only its onboard camera feeds—no LiDAR, no pre-mapped floors. The full locomotion policy was trained end-to-end with reinforcement learning entirely in simulation, then transferred zero-shot to the physical robot. Watch its depth perception in action as it handles stair navigation. The colorful reconstruction is how neural networks infer geometry from cameras, though some jitter in scale and artifacts around windows are visible.

译Figure公司最新组装的F.03人形机器人已能实现自主行走,从生产线直接步行至总部。其核心突破在于仅依靠机载摄像头感知,无需LiDAR或预先地图,即可完成上下楼梯等复杂导航。完整的运动策略完全通过仿真环境中的端到端强化学习训练而成,并零样本迁移至实体机器人。演示中可见其通过神经网络从摄像头数据推断几何环境的深度感知能力,尽管在尺度稳定性和窗户等区域仍存在轻微抖动与伪影。

SemiAnalysis@SemiAnalysis_ · 5月3日54

🚨 A junior new grad at Jane Street just locked in & signed a $220K–$600K role. 🚨🚀🔥 Not because he worked harder. Because he built an agentic AI system that uses JAX & Mesh-TF to chews through trillions of data points while entire teams are still loading their spreadsheets. He just dropped a 1-hour breakdown of the exact system: 🟠 how he mines massive datasets most people don't even know exist 🟠 how AI catches patterns the human brain physically can't 🟠 how raw data becomes real trades and real decisions 🟠 how you can build the same thing from scratch Kill the TikTok/Reels/XHS scrolling tonight. This one hour will do more for your career than the last six months of scrolling

译一名Jane Street的应届毕业生通过自主构建的智能AI系统,成功获得了年薪22万至60万美元的职位。该系统的核心在于运用JAX与Mesh-TF框架,能够高效处理海量数据,并识别人类无法察觉的隐秘模式,从而直接驱动实际交易决策。其成功关键并非单纯加班,而是通过技术创新实现了效率的质的飞跃。该毕业生已发布长达一小时的系统构建详解,内容涵盖从挖掘稀缺数据集到将原始数据转化为交易决策的全过程,并指出这比花费数月时间浏览社交媒体对职业发展的助益大得多。

Chubby♨️@kimmonismus · 5月2日63

http://x.com/i/article/2050492808184659968 # NVIDIA Blackwell vs. Huawei Ascend: Did DeepSeek V4 prove China doesn't need Western silicon? Every Saturday, I write a Deep Dive for my newsletter at getsuperintel.com. Given how important the China–US chip race has become, I’m publishing today’s Deep Dive here on X as a full article. Yesterday, I promised to take a closer look at Huawei chips vs. NVIDIA and DeepSeek. Here it is. Enjoy the read. For the better part of three years, the Western technology establishment slept soundly on a reassuring premise: China was hopelessly behind in AI chips, and export controls would keep it that way. Chris Miller's bestselling book "Chip War" painted a vivid and persuasive picture of a global semiconductor supply chain so intricate, so dependent on Western chokepoints, that Chinese self-sufficiency seemed a decade or more away. ASML's monopoly on extreme ultraviolet lithography, NVIDIA's stranglehold on AI training through its CUDA software ecosystem, and TSMC's unmatched manufacturing prowess formed what appeared to be an impenetrable triple lock. Then, in April 2026, DeepSeek released V4, a 1.6 trillion parameter Mixture-of-Experts model with 49 billion active parameters and a one-million-token context window. On selected coding and reasoning benchmarks, it approaches frontier-class performance, even though CAISI’s May 2026 evaluation still places it roughly eight months behind the absolute frontier; a model deeply optimized for Huawei's domestic Ascend chip ecosystem and confirmed to run on Huawei's latest Ascend 950 infrastructure for inference and deployment. While the full details of V4's training hardware remain ambiguous, with some reports suggesting pre-training still relied on NVIDIA GPUs (ChinaTalk, 04/27/2026), the strategic significance is clear: DeepSeek has built a frontier model that no longer depends on Western hardware to operate at scale, and that may soon no longer need it to train, either. Huawei's Ascend processors, manufactured domestically by China's SMIC foundry using equipment that Western analysts said could never produce chips this advanced. The implications are staggering, and they demand an honest reckoning with a central question: How did China close a gap that was supposed to take 10 to 15 years, in roughly three? ## The chip gap was real, but measured wrong To understand what happened, you first need to understand what the "chip gap" actually meant, and where the framing went wrong. On the level of a single chip, Western superiority remains overwhelming. NVIDIA's current flagship, the Blackwell B200, is fabricated on TSMC's cutting-edge 4-nanometer process and delivers around 2,250 teraflops of computing power at BF16 precision, paired with 192 gigabytes of the latest HBM3e memory running at 8 terabytes per second of bandwidth. Huawei's earlier domestic alternative, the Ascend 910C, illustrates the scale of the gap. Built on SMIC's optimized 7-nanometer process using older lithography tools, it manages roughly 700 teraflops and offers only 3.2 terabytes per second of memory bandwidth, roughly a third of the compute and less than half the bandwidth of a single B200. Huawei's newer Ascend 950 generation, which is now central to the DeepSeek V4 story, narrows the gap further but still appears to trail NVIDIA's most advanced chips significantly. This is the metric much of the Western chip-control debate focused on, and on this metric, the diagnosis was largely correct. China remains one to two hardware generations behind. But here is where the Western analysis made a critical error: it assumed the chip-level gap would translate directly into a capability gap. It did not. Brute Force at Scale Huawei's answer to NVIDIA's chip-level dominance is what engineers call a "scale-out" strategy, and it is as elegant in concept as it is brutal in execution. Where NVIDIA's reference data center system, the GB200 NVL72, connects 72 Blackwell GPUs into a unified computing fabric delivering about 180 petaflops, Huawei simply built bigger. Its CloudMatrix 384 system packs 384 Ascend 910C chips into a densely interconnected cluster, delivering a theoretical 300 petaflops of BF16 compute, roughly 1.7 times the NVIDIA system's raw output. It also offers 3.6 times the aggregate memory capacity and 2.1 times the total memory bandwidth. The trade-off is enormous. A single NVIDIA NVL72 rack consumes about 145 kilowatts. The Huawei CloudMatrix 384 devours 560 kilowatts, making it about 2.5 times less energy-efficient per unit of useful computation. In any normal commercial context, this would be economic suicide. No Western cloud provider would willingly operate hardware this inefficient when cheaper, more performant alternatives exist. But China is not operating under normal commercial logic. The development of domestic AI infrastructure is treated as a matter of national sovereignty. State-backed telecommunications giants and government investment funds subsidize the astronomical energy costs. When the goal is strategic independence from a hostile technology embargo, electricity bills become a secondary variable. ## Software Ate the Hardware Gap The CUDA moat falls? The brute-force hardware story only gets you halfway to an explanation. Even with 384 chips wired together, you still need software sophisticated enough to orchestrate them. This was supposed to be NVIDIA's second, even more durable advantage: its CUDA software platform, the invisible infrastructure that makes AI training on NVIDIA hardware almost effortless and that locked in developers through massive switching costs. Huawei's alternative, called CANN (Compute Architecture for Neural Networks), was for years considered unstable and painful to use. Training runs on Huawei clusters frequently crashed. Hardware utilization rates hovered around a dismal 60 percent, meaning 40 percent of the expensive compute was being wasted to coordination failures and software bugs. DeepSeek V4 is the proof that this barrier has been overcome. DeepSeek engineers worked directly with Huawei to write custom software kernels, specifically designed for the Ascend chip's architecture, that overlap computation, memory access, and network communication simultaneously. These optimizations pushed hardware utilization from 60 percent to over 85 percent, fundamentally changing the economics of Chinese AI clusters. Algorithmic genius as compensation But the truly revolutionary contribution of DeepSeek V4 is not the hardware adaptation. It is the model architecture itself, a masterclass in using software innovation to compensate for hardware limitations. The model employs a Mixture-of-Experts (MoE) architecture. While it has 1.6 trillion total parameters, only 49 billion, roughly 3 percent, are activated for any given computation. The network consists of hundreds of specialized sub-networks, or "experts," each trained for specific tasks like mathematical reasoning, Chinese grammar, or Python code generation. A dynamic routing system decides which experts to engage for each input token. The result is a model with the knowledge capacity of a 1.6-trillion-parameter giant but the computational cost of something far smaller. Earlier MoE systems suffered from a problem called "routing collapse," where a few popular experts got overwhelmed while others sat idle. DeepSeek solved this with what they call "Anticipatory Routing," computing expert assignments asynchronously in advance using slightly older network weights. This decouples the routing decision from the critical computation path and dramatically stabilizes training (DeepSeek-AI, Technical Report, 04/2026). The team also deployed the Muon optimizer, a departure from the AdamW optimizer used across virtually the entire Western AI industry. Muon works by ensuring that parameter updates during training remain mathematically orthogonal to each other, preventing the kind of conflicting gradient updates that can cause training to collapse, a risk that is especially acute on less reliable hardware. Perhaps most impressively, DeepSeek introduced FP4 quantization-aware training. While most AI labs train their models in 16-bit or 8-bit numerical precision, DeepSeek trained its expert weights in just 4-bit precision. Because each expert handles only a narrow domain, this extreme compression works without meaningful quality loss, and it dramatically reduces memory bandwidth consumption, precisely the resource where Huawei's chips are most disadvantaged relative to NVIDIA. The cumulative effect of these innovations is staggering. DeepSeek V4-Pro can process contexts of one million tokens, the equivalent of 15 to 20 full novels, while requiring only 27 percent of the compute and 10 percent of the memory cache compared to its predecessor, DeepSeek V3.2. ## The Lithography Question: Did China Copy ASML? The question of how SMIC (Semiconductor Manufacturing International Corporation (SMIC) is the largest and most advanced pure-play semiconductor foundry in mainland China) manufactures advanced chips without access to ASML's extreme ultraviolet (EUV) lithography machines is perhaps the most technically fascinating part of this story. EUV uses light with a wavelength of 13.5 nanometers to etch transistor patterns onto silicon wafers. It is considered physically essential for chip features below 7 nanometers, and the Netherlands has banned its export to China since 2019. SMIC's workaround is a technique called Self-Aligned Quadruple Patterning (SAQP). Since the older deep ultraviolet (DUV) light it has access to, at 193 nanometers, is too coarse to draw fine features in a single pass, SMIC exposes the wafer four times in succession with extraordinary precision, effectively creating structures equivalent to 7-nanometer and, as of late 2025, even 5-nanometer processes. Independent analysis by TechInsights confirmed that Huawei's Kirin 9030 uses SMIC's N+3 process, a scaled evolution of its 7nm-class technology that shows how close SMIC is getting to 5nm-class manufacturing without EUV, while still remaining meaningfully behind leading commercial 5nm nodes from TSMC and Samsung (TechInsights, 12/11/2025). The catch is yield. SMIC's multi-patterning approach produces catastrophic defect rates, with only 30 to 40 percent of chips coming off the line in working condition. For comparison, TSMC achieves yields above 80 percent with its EUV processes. Each wafer takes longer to produce, the machinery wears out faster, and the cost per working chip is astronomical. For any company operating in a free market, this approach would mean bankruptcy. For China, it is a matter of state policy: hundreds of billions of yuan in subsidies from government investment funds absorb the losses. China's EUV Manhattan Project The long-term DUV workaround has a ceiling. Pushing beyond the current 5nm-class toward the 3nm and emerging 2nm frontier becomes exponentially harder without EUV. Each additional patterning step adds cost, defect risk, and cycle time, and the economics deteriorate rapidly. DUV can be stretched further, but not indefinitely, and not competitively. An ASML EUV machine costs over 370 million dollars, weighs more than 180 tons, contains over 100,000 specialized components, and requires three Boeing 747 cargo planes to transport. The precision of its mirror system, supplied by Germany's Carl Zeiss, operates at tolerances measured in picometers, the width of individual atoms. You cannot reverse-engineer this from a blueprint. The knowledge is embedded in people. China has pursued exactly this vector. Reporting from late 2025 revealed that China had initiated a classified research program of extraordinary scale, internally compared to the Manhattan Project (Reuters, 11/2025). Under high-level political coordination, a secured laboratory in Shenzhen produced a functioning EUV prototype in early 2025. The effort relied heavily on recruiting former ASML engineers, including key figures from the company's light-source development division, with signing bonuses reportedly reaching up to $700,000. Within 18 months, one recruited team filed eight critical EUV-related patents. The prototype is far from commercially viable. It fills nearly an entire factory hall, uses secondary-market optics from Nikon and Canon rather than Zeiss-grade components, and achieves only about 3.4 percent conversion efficiency, far too low for high-volume manufacturing. It demonstrates an important proof-of-concept milestone. Western intelligence agencies, which had projected a Chinese EUV machine for 2035 at the earliest, were caught off guard. The timeline has compressed by nearly a decade, with Chinese officials targeting functional EUV chip production by 2028 to 2030. ## A preliminary verdict The evidence leads to a clear, if uncomfortable, set of conclusions. DeepSeek V4 is not a benchmark stunt. On selected coding tasks, V4-Pro is highly competitive! It achieves 80.6% on the SWE-bench Verified coding benchmark, essentially matching Claude Opus 4.6 at 80.8%, and surpasses it on LiveCodeBench with 93.5% versus 88.8% (Of course, it's also true that real-world usage differs from the benchmarks.). It accomplishes this while offering API prices 90 to 97 percent lower than Western equivalents, a cost advantage driven not by predatory pricing but by genuine architectural efficiency. China did not close the chip gap. It went around it! The hardware remains inferior chip-for-chip, but radical system-level scaling, extraordinary software innovation, state-subsidized energy costs, and a willingness to accept manufacturing inefficiencies that would destroy any commercial enterprise combined to produce an outcome that the sanctions were specifically designed to prevent. ## The sanctions paradox The deepest irony of this story is that the export controls may have accelerated the very outcome they sought to prevent. Before October 2022, Chinese AI labs were happy NVIDIA customers, content to buy American hardware and train their models on CUDA. The sanctions forced them into an uncomfortable but ultimately productive marriage with Huawei, compelled DeepSeek to invent algorithmic solutions to hardware problems, and gave the Chinese government the political mandate to pour unlimited resources into semiconductor independence. Chris Miller's analysis in "Chip War" was not wrong about the physics. EUV lithography is genuinely hard, and NVIDIA's chips are genuinely superior. What it underestimated was the degree to which software innovation, system-level engineering, and state-directed economic irrationality could neutralize those advantages in practice. The 10-to-15-year gap was measured in hardware generations. China's response was to make the hardware generation gap matter less. The question going forward is not whether China can match NVIDIA chip for chip. It probably cannot, at least not soon. The question is whether chip-for-chip superiority still matters when the competition is being fought on a different axis entirely, one where algorithmic efficiency, system architecture, and political will have proven to be just as decisive as nanometers and transistors. The West built a fortress around its silicon. China built a ladder out of software, and climbed over the wall. A few final words and personal views The future of AI infrastructure is more open than anyone in Washington or Silicon Valley assumed even 12 months ago, and the comfortable narrative of permanent Western dominance no longer holds. What we are watching is the emergence of a genuine two-player race between the US and China, one that will be fought across hardware, software, and industrial policy simultaneously, with escalating intensity on both sides. Europe, absent any frontier chip design capability or hyperscaler of its own, risks being reduced to a spectator in this contest. But one European lever remains decisive: as long as ASML remains the only supplier of production-grade EUV lithography, Europe is not merely watching the game. It holds one of the few choke points that still shapes the board. P.s. This text is essentially the answer to my open question: Sources referenced in the article: 1. DeepSeek V4 Technical Report (04/24/2026) https://huggingface.co/collections/deepseek-ai/deepseek-v4 / https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf 1. TechInsights: SMIC N+3 Confirmed, Kirin 9030 Analysis (12/11/2025) https://www.techinsights.com/blog/smic-n3-confirmed-kirin-9030-analysis-reveals-how-close-smic-5nm 1. Reuters (via Modern Diplomacy): Inside China's Secret Push to Build Its Own EUV Chip Machine (12/17/2025) https://moderndiplomacy.eu/2025/12/18/inside-chinas-secret-push-to-build-its-own-euv-chip-machine/ (Original Reuters article is paywalled; this is the most complete openly accessible version citing Reuters directly) 1. MIT Technology Review: Three Reasons Why DeepSeek's New Model Matters (04/24/2026) https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/ 1. NIST/CAISI Evaluation of DeepSeek V4 Pro (05/02/2026) https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro 1. EE Times: China EUV Breakthrough and the Rise of the 'Silicon Curtain' (12/23/2025) https://www.eetimes.com/china-euv-breakthrough-and-the-rise-of-the-silicon-curtain/ 1. Asia Times: Made-in-China EUV Machine Targets AI Chip Output by 2028 (12/24/2025) https://asiatimes.com/2025/12/made-in-china-euv-machine-targets-ai-chip-output-by-2028/

译西方长期认为中国在AI芯片领域落后10-15年,但DeepSeek V4的发布颠覆了这一观点。该模型深度优化于华为昇腾芯片生态,可在昇腾950基础设施上部署推理,实现前沿模型大规模运行不依赖西方硬件。虽然单芯片性能上,昇腾950仍显著落后于NVIDIA Blackwell B200,但中国通过“横向扩展”战略,用大量国产芯片集群结合软件优化和模型架构创新(如MoE),使系统级AI能力快速接近前沿水平。这暴露了西方分析的根本错误——将芯片级差距直接等同于能力差距。

Rohan Paul@rohanpaul_ai · 5月2日59

🇨🇳 China's Tsinghua’s AI patent count now exceeds Harvard, MIT, and Stanford combined. Tsinghua has been filing AI and machine learning patents at a completely different scale than top United States universities for well over a decade, and the gap keeps getting wider. Tsinghua University is almost operating with a different kind of machine that turns AI research into protected, transferable assets at scale. I felt it while surfing through arxiv everyday, theres's just soooo many papers with 'Tsinghua' in the author list. --- Source: Bloomberg bloomberg .com/news/features/2025-11-18/china-s-tsinghua-university-is-beating-us-in-the-race-for-ai-patents

译清华大学在人工智能和机器学习领域的专利数量已超过哈佛大学、麻省理工学院和斯坦福大学的总和。十余年来,清华的专利申请规模远超美国顶尖高校,且差距持续扩大。其运作模式如同一台高效机器,能将AI研究成果大规模转化为受法律保护、可转移的资产。日常浏览arxiv等学术平台时,也能直观感受到署名“清华”的论文数量极为庞大。

AK@_akhaliq · 5月2日35

Co-Evolving Policy Distillation paper: https://huggingface.co/papers/2604.27083

译协同进化策略蒸馏 论文: https://huggingface.co/papers/2604.27083

Elon Musk@elonmusk · 5月1日55

Grok 4.3

译Grok 4.3 此次发布显示运行 Artificial Analysis Intelligence Index 的成本效益有所提高,Grok 4.3 在智能与成本的帕累托边界上表现稳健。 得益于输入 token 价格降低 37.5% 和输出 token 价格降低 58.3%,运行 Intelligence Index 评估的成本为 395 美元,较 Grok 4.20 0309 v2 整体下降约 20%。

Berryxia.AI@berryxia · 5月1日57

你看看大模型有多重? 这个挺有意思的😂

译Pine AI首席科学家李博杰提出新方法,通过模型回答1400道冷知识题的能力来估算其参数量。原理是存储事实需占用参数空间,先利用已知开源模型拟合曲线,再将闭源模型得分投射得出估算。研究评估了92个闭源模型,结果显示GPT-5.5以约9.7T参数断层领先,Claude Opus 4.6约5.3T次之。主流旗舰模型如GPT-5、Claude Opus 4.7参数集中在3-4T量级。分析还推断GPT-5的.x版本及Claude Opus 4.7等可能是全新训练而非微调产物,并指出MoE模型的知识容量取决于总参数量。评测工具与数据已开源。

Anthropic@AnthropicAI · 5月1日63

How do people seek guidance from Claude? We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview. https://www.anthropic.com/research/claude-personal-guidance

译人们如何向Claude寻求指导? 我们分析了100万次对话,以了解人们提出什么问题、Claude如何回应,以及它何时会陷入阿谀奉承。我们利用这些发现改进了Opus 4.7和Mythos Preview的训练方式。 https://www.anthropic.com/research/claude-personal-guidance

Epoch AI@EpochAIResearch · 5月1日59

How much AI compute has been smuggled to China? We estimate between 290k and 1.6M H100-equivalents by the end of 2025 — representing ~20% to ~60% of China’s total compute.

译有多少AI算力被走私到中国?我们估计到2025年底将达到29万至160万H100等效算力——约占中国总算力的20%至60%。

Rohan Paul@rohanpaul_ai · 5月1日56

Time published a piece. Google’s AI position came from a long series of early bets by Sundar Pichai on DeepMind, TPUs, cloud infrastructure, and AI products, not from a last-minute reaction to ChatGPT. Google’s biggest strength in AI is its full-stack control of research, chips, cloud, products, and distribution across billions of users. "Critics once underestimated CEO Sundar Pichai. Now, critics wonder if he’s made Google too powerful" Google just secured absolute architectural control over the AI landscape by merging its custom physical silicon manufacturing directly with a single unified research laboratory. Competitors pay steep financial premiums for external chips while Google seamlessly executes complex neural calculations on its proprietary Tensor Processing Units. Building internal hardware allows engineers to aggressively scale pretraining, the critical phase where models ingest massive datasets, without facing crushing financial overhead. --- time .com/collection/time100-most-influential-companies/2026/saudi-aramco/

译《时代》杂志指出,谷歌在人工智能领域的领先地位,源于CEO桑达尔·皮查伊早期对DeepMind、TPU芯片、云基础设施及AI产品的一系列长期投资,而非对ChatGPT的仓促反应。其核心优势在于对研究、芯片、云服务、产品和覆盖数十亿用户的分发渠道实现全栈控制。通过将定制芯片制造与统一的研究实验室深度融合,谷歌获得了对AI架构的绝对控制权,能利用自研TPU高效执行复杂计算,同时让工程师得以低成本大规模扩展模型预训练,而无需像竞争对手那样承受高昂的外部芯片采购成本。

Qwen@Alibaba_Qwen · 4月30日73

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools: 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: https://qwen.ai/blog?id=qwen-scope HuggingFace: https://huggingface.co/collections/Qwen/qwen-scope ModelScope: https://modelscope.cn/collections/Qwen/Qwen-Scope Technical Report: https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf

译Qwen团队推出开源稀疏自编码器套件Qwen-Scope,将SAE特征转化为实用工具。该套件支持四大应用方向:无需提示工程即可通过直接操控内部特征引导模型输出;用极少样本对目标数据进行分类与合成,提升长尾能力;追踪代码切换和重复生成问题的根源并进行修复;通过分析特征激活模式优化评测基准并减少冗余。团队希望社区利用Qwen-Scope深入探索Qwen模型内部机制,并开发出超越现有研究范围的应用。相关资源已开放。

向阳乔木@vista8 · 4月30日60

今天刚发的DeepSeek-VL论文中最有意思的就是这个结论和配方了。 多模态训练会"吃掉"语言能力 用100%视觉数据训练语言模型,语言benchmark会断崖式崩塌。 70% 纯文本 + 30%多模态数据,是最佳配方。 两种模态存在竞争关系,不是调参能绕过去的。

译DeepSeek-VL论文指出,多模态训练会损害语言模型的语言能力,使用100%视觉数据训练将导致语言benchmark性能断崖式崩塌。研究确定最佳训练配方为70%纯文本数据与30%多模态数据结合,并强调视觉与语言模态之间存在固有竞争关系,这种竞争无法通过参数调整来规避。论文结论突显了平衡多模态数据比例对维持模型语言性能的关键作用。

Chubby♨️@kimmonismus · 4月30日76

For the first time ever, Meta lost daily active users. The "Family daily active people" metric dropped by 20 million in Q1 2026, falling from 3.58b to 3.56b. Meta blames internet disruptions in Iran and Russia's WhatsApp ban, but here's the thing: the company bundles all its platforms into one metric, making it impossible to see which app is actually bleeding users. Convenient. Meanwhile, Zuckerberg is doubling down on the AI bet like never before. Meta raised its 2026 capex guidance to $125b to $145b, $10b more than previous estimates, driven largely by surging memory chip prices. That's roughly $400m per day spent on infrastructure. Revenue surged 33% to $56.3b and net income jumped 61%, so the money machine is humming. But the company also announced plans to lay off 8,000 employees, about 10% of its workforce, to "offset" those very AI investments. Reality Labs continues to hemorrhage cash too, posting another $4 billion operating loss. Wall Street wasn't impressed. Meta's stock dropped over 7% after hours, punishing the company not for its results, which beat estimates, but for its spending trajectory.

译2026年第一季度,Meta全球日活跃用户首次下降,“应用家族”日活减少2000万。公司归因于伊朗网络中断和俄罗斯禁用WhatsApp,但合并数据掩盖了具体应用流失。同时,Meta将2026年资本支出指引上调至1250-1450亿美元,主要用于应对内存芯片涨价和加码AI基础设施投资,日均投入约4亿美元。尽管营收增长33%至563亿美元、净利跃升61%,公司仍计划裁员8000人以“抵消”AI投资成本,Reality Labs部门亏损40亿美元。华尔街对其支出轨迹不满,股价盘后大跌超7%。

ginobefun@hongming731 · 4月30日51

当传统的 CTR 模型在流量天花板前陷入瓶颈,京东广告团队公开了 GRAM 架构:放弃修补传统的特征工程,全面转向大模型原生的知识工程。 三大核心价值: - 构建「事实护栏」治愈幻觉: 摒弃大模型不可控的自由发挥。通过构建 5ms 内极速查询的级联知识图谱,将商品属性、业务规则和通识硬性注入,确保 AI 推荐 100% 契合物理现实与商业规则。 - 彻底颠覆「冷启动」路径: 告别对用户历史点击数据的深度依赖。新商品哪怕是零销量,系统也能通过知识网络的高维映射(如光源、材质、价格段等特征关联),瞬间完成语义对齐与精准分发。 - 从曝光计算走向「深度决策」: 传统特征的高频更新往往会干扰大模型。将企业长年积淀的内隐知识结构化并作为背景上下文输入,能让大模型真正化身资深专家,处理极其复杂的跨品类消费决策。

译京东广告团队推出GRAM架构,旨在通过大模型原生知识工程解决传统CTR模型的瓶颈。该架构构建了毫秒级查询的级联知识图谱,将商品属性与业务规则作为“事实护栏”注入,以杜绝AI幻觉,确保推荐符合现实。它颠覆了依赖历史数据的冷启动模式,即使零销量新品也能通过知识网络的高维特征关联实现精准分发。同时,GRAM将企业内隐知识结构化作为上下文,使大模型能进行复杂的深度决策,而非仅计算曝光。

Chubby♨️@kimmonismus · 4月30日62

Cloud revenue explodes, stocks still tumble: Meta, Amazon, Alphabet and Microsoft earnings: All four tech giants reported Q1 2026 earnings on the same day, and every single one beat Wall Street expectations. Alphabet led the pack with $109.9 billion in revenue, up 22% year over year, as Google Cloud exploded 63% to cross $20 billion for the first time. Microsoft posted $82.9 billion in revenue with Azure growing 40%, while Meta surged 33% to $56.3 billion in revenue and Amazon hit $181.5 billion with AWS growing 28%, its fastest pace in 15 quarters. But here's the number that shook markets: combined 2026 capex across the four hyperscalers is on track to exceed $650 billion!! Alphabet raised its full year 2026 capex guidance to $180 billion to $190 billion, Microsoft guided $190 billion for calendar year 2026, and Meta bumped its range to $125 billion to $145 billion. Amazon's capex reached $44.2 billion in Q1 alone. The revenue beats were massive, but so was the market's anxiety: Meta slid 6% and Microsoft dropped 2.5% after hours, even as Alphabet shares rose 7% in after-hours trading, on course to open at a record market value. The hyperscalers are collectively spending more on AI infrastructure than the GDP of most nations, completely reshaping the global economy around compute. Whether this bet generates returns proportional to its scale will define tech investing for the next decade, at least thats for sure.

译Meta、亚马逊、Alphabet和微软2026年第一季度营收均超预期,云业务增长强劲,其中Google Cloud收入暴涨63%首次突破200亿美元。然而,四家超大规模企业2026年资本开支总额预计将超过6500亿美元,巨额AI基础设施投资引发市场焦虑,导致Meta和微软股价在盘后下跌。这些巨头在计算领域的投入规模正重塑全球经济,其投资能否带来相应回报将定义未来十年的科技投资格局。

Rohan Paul@rohanpaul_ai · 4月30日55

Anthropic's new research shows that Claude can solve real bioinformatics problems human experts miss. 23 “human-difficult” problems that their expert panel could not solve, and their top model, Claude Mythos Preview, solved 29.6% of that set. The problem is that older science tests mostly check clean questions, not messy biology data work on real datasets. BioMysteryBench tries to fix that by hiding objective answers inside real datasets and grading only the final answer. It gives Claude standard biology tools and database access on 99 tasks, while up to 5 experts try them too. On the 76 problems at least 1 expert solved, the best model got about 83%, and on 23 expert-stumping problems it got about 30%. The post also found that wins on the hard problems were much less repeatable across 5 tries, so many successes were shaky rather than dependable. Anthropic’s own examples suggest Claude is strongest when it behaves less like an oracle and more like an unusually fast research collaborator: it layers methods, cross-checks evidence, and uses broad background knowledge to narrow the search space.

译Anthropic最新研究利用BioMysteryBench测试平台评估Claude在真实生物信息学问题上的能力。该测试将客观答案隐藏于真实数据集中,涵盖99项任务。在至少一位人类专家解决的76个问题上,Claude Mythos Preview模型准确率约为83%;更值得注意的是,在23个专家小组未能解决的问题上,该模型仍解决了其中约29.6%。然而,模型在困难问题上的成功重复性较低,表明其表现尚不稳定。研究指出,Claude最有效的模式并非充当“先知”,而是扮演快速研究协作伙伴的角色:通过分层使用方法、交叉验证证据并运用广泛背景知识来缩小搜索空间。

Chubby♨️@kimmonismus · 4月30日61

Anthropic just dropped a benchmark that should make every scientist pay attention. BioMysteryBench puts AI models through 99 real bioinformatics challenges, using raw, messy datasets from actual research, think unprocessed DNA sequences and clinical samples. However: these aren't textbook problems with neat answers. They're the kind of open-ended puzzles that keep PhD students up at night. The results are exciting. Claude's latest models (4.7) solve the majority of tasks that trained human experts can handle, and on 23 problems that a panel of five domain experts couldn't crack, Claude Mythos Preview nailed 30% of them. How? By combining knowledge from hundreds of thousands of papers and layering multiple analytical strategies when uncertain, essentially doing what a room full of specialists would do, but faster and in a single run. Genentech and Roche independently confirmed this trajectory with their own CompBioBench, where Claude Opus 4.6 reached 81% overall accuracy and 69% on the hardest questions. Two separate benchmarks, same conclusion: AI is no longer just keeping pace with biologists, it's pulling ahead on some of the hardest problems.

译Anthropic发布了BioMysteryBench基准测试,包含99个使用原始、杂乱真实生物数据集的开放式生物信息学挑战。最新Claude模型(4.7)解决了大部分人类专家能处理的任务,并在专家小组未能解决的23个难题中攻克了约30%。其能力源于整合数十万篇论文知识,并在不确定时叠加多种分析策略。Genentech和Roche的独立测试(CompBioBench)中,Claude Opus 4.6总体准确率达81%,最难问题准确率69%。两项基准共同表明,AI已在部分最困难的生物学问题上超越人类专家。

全部 AI 动态
AI 相关资讯全量信息流
全部一手信源资讯推文
全部模型产品行业论文技巧
5月8日
23:47
AK@_akhaliq
61
MARBLE 扩散RL的多维度奖励平衡 论文: https://huggingface.co/papers/2605.06507
数据/训练论文/研究
05:06
SemiAnalysis@SemiAnalysis_
50
浮点运算不满足结合律!许多高性能计算核心会将工作负载分配到多个流多处理器上,并以非确定性顺序累加部分结果。许多AI实验室只能接受这一点,或为确定性付出巨大的性能代价。DeepSeek决定两者都不选。(1/4) 🧵
DeepSeek数据/训练现象/趋势
02:40
Nathan Lambert@natolambert
63
由 @jacobcares 主导的研究表明,构建大语言模型的算力消耗很少集中在最终训练阶段,绝大部分算力实际用于开发算法配方。 公开创建算法配方是确保研究界算力能推动新知识产出的重要杠杆。

Ai2: Today we're bringing new NSF OMAI compute online with NVIDIA Blackwell Ultra-powered systems, turning a $152M national i...

大佬观点开源生态数据/训练
01:06
SemiAnalysis@SemiAnalysis_
51
我们已习惯芯片公司营销团队夸大参数规格, 如今看到他们转而低调陈述反而令人耳目一新。 Cerebras官网就存在一例-- 他们将片上SRAM容量低估了整整八倍! @cerebras 你们实在太过谦虚了!
数据/训练现象/趋势
00:31
Chubby♨️@kimmonismus
57
算力竞赛的核心:从硬件占有到消化效率的转变

xAI与Anthropic在算力运用上呈现出镜像困境。xAI虽拥有全球顶尖的GPU集群,但其模型计算利用率仅约11%,凸显出将硬件转化为有效算力的挑战。相反,Anthropic面临需求远超供给的局面:其Claude收入年化已超300亿美元,百万美元级企业客户在两个月内从500家激增至1000家以上,新增的算力被立即转化为更高的使用限额和收入。这场竞赛的关键已非单纯比拼集群规模,而在于“算力消化效率”——即谁能最快速地将原始计算资源转化为可盈利的产品能力。稀缺资源正从GPU硬件本身,转向这种高效的转化能力。

AnthropicxAI大佬观点数据/训练
00:10
Nathan Lambert@natolambert
72
走访多家中国顶尖AI实验室后,我深受触动:这里存在一种极其适合用较少资源构建LLM的文化,但这种文化发生在截然不同的生态系统中--参与企业更多,数据产业几乎空白等。 完整报告:https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs
数据/训练现象/趋势
5月7日
09:36
宝玉@dotey
76
Anthropic创始人解释Claude限速原因:需求增速远超预期,年化高达80倍

Anthropic联合创始人Dario Amodei在开发者大会上表示,Claude服务持续限速的直接原因是需求增速远超预期。公司原本按年增10倍规划算力,但2026年第一季度实际年化增速高达80倍,导致算力供不应求。为此,Anthropic已与SpaceX签署协议,将获得Colossus 1数据中心超过300 MW、22万张NVIDIA GPU的全部算力。Dario称这种指数级增长虽在理论预测内,但实际体验仍令人震撼。公司视开发者为AI扩散的先行指标和最重要用户群体,并正致力于攻克代码安全等“主观”能力。

Anthropic大佬观点安全/对齐数据/训练
04:34
Rohan Paul@rohanpaul_ai
57
NVIDIA、微软和OpenAI联合推出多路径可靠连接(MRC)协议

多路径可靠连接(MRC)是一种新型RDMA传输协议,由NVIDIA、微软和OpenAI联合推出,并与AMD、博通和英特尔合作。该协议首先在NVIDIA Spectrum-X以太网硬件上得到验证和优化。MRC的核心创新是改变连接方式,允许单个RDMA数据流利用多条网络路径传输AI训练流量,而非强制每个GPU连接走单一固定路由。RDMA技术使GPU能以极少CPU帮助移动数据,这对于数千GPU在训练中不断交换模型更新至关重要。当网络出现拥塞、链路故障或交换机过载时,流量可自动绕行,无需软件层面修复,从而避免单一不良路径拖慢整个计算集群,保障大规模AI训练任务的高效进行。

OpenAI数据/训练行业动态部署/工程
04:34
Rohan Paul@rohanpaul_ai
48
OpenClaw-RL:通过日常对话持续训练语言模型

本研究提出OpenClaw-RL系统,使语言模型能通过日常对话进行持续训练,无需人工标注数据。其核心是利用用户互动中产生的自然反馈(如纠正或重复提问)作为实时学习信号。系统从每次交互中提取两种信号:评估信号(判断行动成败,转化为数值奖励)和指导信号(获取具体改进方向,转化为词级监督)。该方法将标准部署环境转化为持续学习场景,使模型在后台运行中不断自我更新,自适应不同用户偏好,从而摆脱对大规模人工标注数据集的依赖。

智能体arXiv数据/训练论文/研究
02:10
TestingCatalog News 🗞@testingcatalog
72
Elon Musk表示,他近期与Anthropic高层团队深入交流,对其确保Claude AI有益于人类的努力印象深刻,认为团队高度专业且秉持正确价值观。基于此信任,他同意将SpaceX的超算集群Colossus 1出租给Anthropic,因为SpaceXAI已将自身训练任务转移至Colossus 2。这一合作被视作科技巨头间力量平衡的一次变动。

Elon Musk: Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic...

Anthropic数据/训练行业动态
00:37
向阳乔木@vista8
60
AI分析X平台数据揭示发帖效率与涨粉规律

通过将X平台创作者工作室近90天的数据分析数据输入大模型,AI提炼出关键运营规律。核心发现包括:每日发帖3-5条是曝光效率最佳区间,而非单纯追求数量;周三互动率最高,周四涨粉效果最好,周六则最利于冲击曝光量。此外,近44%的新增关注者集中来源于少数“高涨粉日”,表明涨粉主要依赖爆款帖文的拉动效应。

教程/实践数据/训练
5月6日
23:04
OpenAI@OpenAI
66
大规模AI超算需要新型网络来保持芯片同步。OpenAI专家讨论了在庞大芯片集群间可靠高效传输数据的挑战,并介绍了新发布的多路径可靠连接(MRC)网络协议。该协议由OpenAI与AMD、Broadcom、Intel、Microsoft、NVIDIA等行业伙伴共同推出,旨在帮助大型AI训练集群运行得更快、更可靠,减少GPU闲置时间。MRC是一个开放的行业协议,可供整个业界使用。

OpenAI: We've partnered with @AMD, @Broadcom, @Intel, @Microsoft, and @NVIDIA, to release Multipath Reliable Connection (MRC), a...

OpenAI数据/训练行业动态部署/工程
21:29
Chubby♨️@kimmonismus
54
NVIDIA开源支撑OpenAI Blackwell集群的新型网络传输协议

NVIDIA通过OCP开源了MRC协议,这是一种专为大规模AI训练集群设计的新型RDMA传输协议。其核心创新在于将单一连接分散到多条网络路径上,当某条路径出现故障或拥塞时,能在微秒级时间内通过硬件重路由流量,以解决前沿AI训练中日益严峻的网络瓶颈问题。该协议已应用于OpenAI的Blackwell集群,微软和Oracle也是其主要部署方。NVIDIA此举在表面推动更开放标准的同时,优先为自家Spectrum-X平台优化,实则强化了其全栈竞争优势,并将以太网技术推向传统上由InfiniBand主导的高性能计算领域。

OpenAI开源/仓库数据/训练部署/工程
17:20
ginobefun@hongming731
63
斯坦福研究基于1500名工人和844项任务指出,当前AI投资方向与真实工作需求错配。研究通过WORKBank框架,将工作任务按对AI的"渴望度"和AI"当前能力"划分为四个象限:高渴望高能力的"绿灯区"任务(如数据录入)已可自动化;高渴望低能力的"研发机会区"是创业方向;低渴望高能力的"红灯区"(如创意最终呈现)易引发抵制;双低的"低优先级区"则无需关注。关键发现是,同一职业(如程序员)的不同任务横跨多个象限,因此"职业被替代"是伪命题,工作正被重新切分与融合。

indigo: Stanford 用 1500 个工人和 844 个任务告诉 YC:你们 41% 的钱投错了方向 -- 你们投的都是"人们不想要"或"不需要"的东西,而那些"想要但没什么人做"的事正在等待 founders。论文中工人最想自动化的前 10 ...

数据/训练现象/趋势
09:34
meng shao@shao__meng
精选77
全球首个基于SSA架构的模型SubQ实现1200万token上下文窗口,效率大幅领先

前沿模型SubQ基于创新的Subquadratic Sparse Attention架构,实现了1200万token的实用上下文窗口。其核心技术SSA通过内容依赖的选择机制,让每个查询仅动态计算与相关键的注意力,使计算和内存成本随序列长度线性增长,而非传统Transformer的二次方增长。实测在100万token时比FlashAttention-2快52.2倍,成本低于Opus的5%。该模型针对需要一次性处理完整代码库、长文档等企业真实长上下文场景优化,旨在弥合“名义上下文”与“功能上下文”窗口的差距。

Alexander Whedon: Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse...

数据/训练模型发布编码

推荐理由:这是第一个真正把子二次方注意力用到前沿模型上的突破,12M 上下文窗口不再只是参数,而是能用起来的真窗口,长上下文场景的成本逻辑要重写了。
07:33
Nathan Lambert@natolambert
43
正在为RLHF书籍添加一个关于策略蒸馏的章节,值得注意的是,尽管我已经提供了核心论文和250页关于我如何阐述观点的背景资料,但LLMs/编码代理在这方面的表现却出奇地差。
大佬观点数据/训练
5月5日
23:27
Epoch AI@EpochAIResearch
44
加入我们的现场研讨会,共同为FrontierMath:开放性问题集开发题目! 我们正在寻找研究数学中极具趣味性、且可通过程序化验证解决方案的未解难题。这类问题非常难得。快来一展身手吧!链接如下。
数据/训练行业动态
13:14
Berryxia.AI@berryxia
52
斯坦福2小时公开课详解LLM构建

斯坦福一门2小时公开课系统讲解了ChatGPT等大语言模型从零构建的全过程,涵盖Transformer架构、训练技巧、Scaling law等核心知识。课程免费且含金量高,揭示了AI时代的底层逻辑。相比之下,许多顶级AI公司的工程师仅专注于调提示词和刷基准测试,缺乏此类系统知识。课程为真正想理解AI的人提供了宝贵的学习机会。

教程/实践数据/训练
02:48
François Chollet@fchollet
精选73
我撰写《Deep Learning with Python》旨在成为理解深度学习工作原理及最佳应用方式的权威指南。数以万计的人通过这本书开启了职业生涯。已售出12万册,更有数百万人下载阅读。 现在可以免费在线阅读:https://deeplearningwithpython.io/
教程/实践数据/训练

推荐理由:Chollet 的《Deep Learning with Python》是无数人入行深度学习的启蒙书,现在免费在线阅读,新手不用再纠结买不买,直接看就完事了。
00:56
Nathan Lambert@natolambert
53
我们需要为某些中国实验室对API进行的攻击创造一个新术语,以区别于蒸馏,否则我们可能会玷污一项对AI扩散、学术研究和开源生态系统至关重要的关键技术。 https://www.interconnects.ai/p/the-distillation-panic
大佬观点安全/对齐数据/训练
00:26
Epoch AI@EpochAIResearch
46
探讨AI基准测试的困境与未来方向

针对“AI基准测试是否已失效”的悲观论调,讨论者进行了反驳,并深入探讨下一代AI基准测试的可能形态。核心议题包括基准测试开发的成本与收益、可扩展基准(如MirrorCode)的构建、AI技术对基准开发本身的加速作用,以及当前基准测试与现实应用能力之间存在的差距。对话还触及了构建通用人工智能(AGI)基准的可行性,并展望了超越自动化评分的更全面评估方法。

数据/训练评测/基准
5月4日
23:24
elvis@omarsar0
66
Meta FAIR开发的Autodata是一个能自主构建高质量训练与评估数据的代理系统。其核心在于"代理式自我指导"循环:编排器LLM指导挑战者代理基于领域文档生成问题,由弱、强解算器尝试解答,法官评分后分析失败并循环优化,从而产出能有效区分模型能力的挑战性数据。在CS研究QA任务中,该方法产生了34个百分点的性能差距,远超标准方法的1.9点。系统还具备元优化能力,通过外循环调整指令,将验证通过率从12.8%提升至42.4%。研究处理了超万篇论文,产出2,117个优质QA对,通过增加推理计算使数据更具挑战性,从而提升下游模型性能。

DAIR.AI: Banger paper from Meta FAIR. They introduce Autodata, an agentic data scientist that builds high-quality training and ev...

智能体Meta数据/训练论文/研究
10:18
Eric@ericmitchellai
40
我恳求你看看你的数据。 请看看数据 评估结果不如预期?看看数据 评估结果超出预期?*务必*看看数据 评估结果符合预期?信不信由你……
OpenAI大佬观点数据/训练
5月3日
18:42
Rohan Paul@rohanpaul_ai
54
Figure F.03人形机器人实现自主行走与楼梯导航

Figure公司最新组装的F.03人形机器人已能实现自主行走,从生产线直接步行至总部。其核心突破在于仅依靠机载摄像头感知,无需LiDAR或预先地图,即可完成上下楼梯等复杂导航。完整的运动策略完全通过仿真环境中的端到端强化学习训练而成,并零样本迁移至实体机器人。演示中可见其通过神经网络从摄像头数据推断几何环境的深度感知能力,尽管在尺度稳定性和窗户等区域仍存在轻微抖动与伪影。

Brett Adcock: F.03 can now walk up/down stairs purely using it's onboard camera perception Our robots now walk from manufacturing when...

产品更新具身智能数据/训练
09:18
SemiAnalysis@SemiAnalysis_
54
应届毕业生凭借自研AI交易系统斩获Jane Street高薪职位

一名Jane Street的应届毕业生通过自主构建的智能AI系统,成功获得了年薪22万至60万美元的职位。该系统的核心在于运用JAX与Mesh-TF框架,能够高效处理海量数据,并识别人类无法察觉的隐秘模式,从而直接驱动实际交易决策。其成功关键并非单纯加班,而是通过技术创新实现了效率的质的飞跃。该毕业生已发布长达一小时的系统构建详解,内容涵盖从挖掘稀缺数据集到将原始数据转化为交易决策的全过程,并指出这比花费数月时间浏览社交媒体对职业发展的助益大得多。

智能体教程/实践数据/训练
5月2日
17:44
Chubby♨️@kimmonismus
63
DeepSeek V4挑战西方对中国AI芯片落后的认知

西方长期认为中国在AI芯片领域落后10-15年,但DeepSeek V4的发布颠覆了这一观点。该模型深度优化于华为昇腾芯片生态,可在昇腾950基础设施上部署推理,实现前沿模型大规模运行不依赖西方硬件。虽然单芯片性能上,昇腾950仍显著落后于NVIDIA Blackwell B200,但中国通过“横向扩展”战略,用大量国产芯片集群结合软件优化和模型架构创新(如MoE),使系统级AI能力快速接近前沿水平。这暴露了西方分析的根本错误——将芯片级差距直接等同于能力差距。

DeepSeek开源生态推理数据/训练
04:41
Rohan Paul@rohanpaul_ai
59
清华AI专利数超哈佛、MIT与斯坦福总和

清华大学在人工智能和机器学习领域的专利数量已超过哈佛大学、麻省理工学院和斯坦福大学的总和。十余年来,清华的专利申请规模远超美国顶尖高校,且差距持续扩大。其运作模式如同一台高效机器,能将AI研究成果大规模转化为受法律保护、可转移的资产。日常浏览arxiv等学术平台时,也能直观感受到署名“清华”的论文数量极为庞大。

数据/训练现象/趋势
01:16
AK@_akhaliq
35
协同进化策略蒸馏 论文: https://huggingface.co/papers/2604.27083
数据/训练论文/研究
5月1日
23:39
Elon Musk@elonmusk
55
Grok 4.3 此次发布显示运行 Artificial Analysis Intelligence Index 的成本效益有所提高,Grok 4.3 在智能与成本的帕累托边界上表现稳健。 得益于输入 token 价格降低 37.5% 和输出 token 价格降低 58.3%,运行 Intelligence Index 评估的成本为 395 美元,较 Grok 4.20 0309 v2 整体下降约 20%。

Artificial Analysis: This release shows increased cost efficiency to run the Artificial Analysis Intelligence Index, with Grok 4.3 sitting co...

xAI数据/训练模型发布
08:10
Berryxia.AI@berryxia
57
Pine AI首席科学家李博杰提出新方法,通过模型回答1400道冷知识题的能力来估算其参数量。原理是存储事实需占用参数空间,先利用已知开源模型拟合曲线,再将闭源模型得分投射得出估算。研究评估了92个闭源模型,结果显示GPT-5.5以约9.7T参数断层领先,Claude Opus 4.6约5.3T次之。主流旗舰模型如GPT-5、Claude Opus 4.7参数集中在3-4T量级。分析还推断GPT-5的.x版本及Claude Opus 4.7等可能是全新训练而非微调产物,并指出MoE模型的知识容量取决于总参数量。评测工具与数据已开源。

思维怪怪: 有人做了一个很好玩的研究,用冷知识来给大模型称体重,得出结论:GPT-5.5 约 9.7T、Opus 4.7 约 4T、Grok-4 约3.2T。。。 Pine AI 首席科学家李博杰发表论文《不可压缩知识探针:基于事实容量估算黑盒大语言模...

AnthropicOpenAI数据/训练论文/研究
03:16
Anthropic@AnthropicAI
同事件精选63
人们如何向Claude寻求指导? 我们分析了100万次对话,以了解人们提出什么问题、Claude如何回应,以及它何时会陷入阿谀奉承。我们利用这些发现改进了Opus 4.7和Mythos Preview的训练方式。 https://www.anthropic.com/research/claude-personal-guidance
Anthropic安全/对齐数据/训练
同一事件,精选展示《用户如何向Claude寻求个人生活指导及其模型优化》
推荐理由:百万条真实对话里扒出谄媚模式,Anthropic 没光发论文,直接把结论灌进 Opus 4.7 训练,做助手的值得细看用户到底在问什么、模型又怎么滑向讨好。
03:14
Epoch AI@EpochAIResearch
59
有多少AI算力被走私到中国?我们估计到2025年底将达到29万至160万H100等效算力--约占中国总算力的20%至60%。
数据/训练现象/趋势论文/研究
02:09
Rohan Paul@rohanpaul_ai
56
谷歌AI领先优势源于长期战略投资,非仓促应对ChatGPT

《时代》杂志指出,谷歌在人工智能领域的领先地位,源于CEO桑达尔·皮查伊早期对DeepMind、TPU芯片、云基础设施及AI产品的一系列长期投资,而非对ChatGPT的仓促反应。其核心优势在于对研究、芯片、云服务、产品和覆盖数十亿用户的分发渠道实现全栈控制。通过将定制芯片制造与统一的研究实验室深度融合,谷歌获得了对AI架构的绝对控制权,能利用自研TPU高效执行复杂计算,同时让工程师得以低成本大规模扩展模型预训练,而无需像竞争对手那样承受高昂的外部芯片采购成本。

DeepMindGoogle大佬观点搜索
4月30日
22:43
Qwen@Alibaba_Qwen
精选73
Qwen-Scope开源套件发布:稀疏自编码器助力模型内部特征操控

Qwen团队推出开源稀疏自编码器套件Qwen-Scope,将SAE特征转化为实用工具。该套件支持四大应用方向:无需提示工程即可通过直接操控内部特征引导模型输出;用极少样本对目标数据进行分类与合成,提升长尾能力;追踪代码切换和重复生成问题的根源并进行修复;通过分析特征激活模式优化评测基准并减少冗余。团队希望社区利用Qwen-Scope深入探索Qwen模型内部机制,并开发出超越现有研究范围的应用。相关资源已开放。

Hugging Face开源/仓库开源生态数据/训练

推荐理由:可解释性工具从学术走向工程,Qwen-Scope 把内部特征操控、数据合成、问题溯源打包成套装,做模型调试和长尾优化的团队值得立刻上手试试。
22:13
向阳乔木@vista8
60
DeepSeek-VL论文揭示多模态训练最佳配方:70%文本+30%多模态

DeepSeek-VL论文指出,多模态训练会损害语言模型的语言能力,使用100%视觉数据训练将导致语言benchmark性能断崖式崩塌。研究确定最佳训练配方为70%纯文本数据与30%多模态数据结合,并强调视觉与语言模态之间存在固有竞争关系,这种竞争无法通过参数调整来规避。论文结论突显了平衡多模态数据比例对维持模型语言性能的关键作用。

向阳乔木: http://x.com/i/article/2049847033758916609

DeepSeek多模态数据/训练论文/研究
21:41
Chubby♨️@kimmonismus
精选76
Meta首次日活用户下降,资本支出激增致股价大跌

2026年第一季度,Meta全球日活跃用户首次下降,“应用家族”日活减少2000万。公司归因于伊朗网络中断和俄罗斯禁用WhatsApp,但合并数据掩盖了具体应用流失。同时,Meta将2026年资本支出指引上调至1250-1450亿美元,主要用于应对内存芯片涨价和加码AI基础设施投资,日均投入约4亿美元。尽管营收增长33%至563亿美元、净利跃升61%,公司仍计划裁员8000人以“抵消”AI投资成本,Reality Labs部门亏损40亿美元。华尔街对其支出轨迹不满,股价盘后大跌超7%。

Meta数据/训练行业动态

推荐理由:Meta 日活首降叠加千亿 AI 资本开支,这份财报把社交帝国的焦虑摊在了桌面上,用户流失与 AI 军备竞赛同时加速,信号交叉值得细看。
20:10
ginobefun@hongming731
51
京东广告发布GRAM架构,用大模型知识工程突破推荐瓶颈

京东广告团队推出GRAM架构,旨在通过大模型原生知识工程解决传统CTR模型的瓶颈。该架构构建了毫秒级查询的级联知识图谱,将商品属性与业务规则作为“事实护栏”注入,以杜绝AI幻觉,确保推荐符合现实。它颠覆了依赖历史数据的冷启动模式,即使零销量新品也能通过知识网络的高维特征关联实现精准分发。同时,GRAM将企业内隐知识结构化作为上下文,使大模型能进行复杂的深度决策,而非仅计算曝光。

教程/实践数据/训练部署/工程
19:11
Chubby♨️@kimmonismus
62
四大科技巨头云收入激增,资本开支飙升引市场担忧

Meta、亚马逊、Alphabet和微软2026年第一季度营收均超预期,云业务增长强劲,其中Google Cloud收入暴涨63%首次突破200亿美元。然而,四家超大规模企业2026年资本开支总额预计将超过6500亿美元,巨额AI基础设施投资引发市场焦虑,导致Meta和微软股价在盘后下跌。这些巨头在计算领域的投入规模正重塑全球经济,其投资能否带来相应回报将定义未来十年的科技投资格局。

GoogleMicrosoft数据/训练行业动态
17:39
Rohan Paul@rohanpaul_ai
55
Anthropic研究显示Claude能解决人类专家遗漏的真实生物信息学问题

Anthropic最新研究利用BioMysteryBench测试平台评估Claude在真实生物信息学问题上的能力。该测试将客观答案隐藏于真实数据集中,涵盖99项任务。在至少一位人类专家解决的76个问题上,Claude Mythos Preview模型准确率约为83%;更值得注意的是,在23个专家小组未能解决的问题上,该模型仍解决了其中约29.6%。然而,模型在困难问题上的成功重复性较低,表明其表现尚不稳定。研究指出,Claude最有效的模式并非充当“先知”,而是扮演快速研究协作伙伴的角色:通过分层使用方法、交叉验证证据并运用广泛背景知识来缩小搜索空间。

Anthropic数据/训练论文/研究
16:39
Chubby♨️@kimmonismus
61
Anthropic发布BioMysteryBench基准,AI在复杂生物信息学难题上开始超越人类专家

Anthropic发布了BioMysteryBench基准测试,包含99个使用原始、杂乱真实生物数据集的开放式生物信息学挑战。最新Claude模型(4.7)解决了大部分人类专家能处理的任务,并在专家小组未能解决的23个难题中攻克了约30%。其能力源于整合数十万篇论文知识,并在不确定时叠加多种分析策略。Genentech和Roche的独立测试(CompBioBench)中,Claude Opus 4.6总体准确率达81%,最难问题准确率69%。两项基准共同表明,AI已在部分最困难的生物学问题上超越人类专家。

Anthropic: New on the Science Blog: We gave Claude 99 problems analyzing real biological data and compared its performance against ...

Anthropic数据/训练论文/研究
‹ 上一页
1…789101112
下一页 ›