DeepSeek真的是性价比和技术双重斩杀线... 有同学看不懂DSpark是啥, 简单给大家写个小教程讲讲. 推测性解码(投机解码)这个技术是用来提升大模型输出速度的. 本质是让小模型给大模型接话, 大模型判断小模型说的对不对. 因为现在模型普遍卡内存带宽, 而GPU算力是富余的, 所以大模型的prefill速度(看字)比decode速度(吐字)快很多. 那么让小模型沿着大模型的思路先说一段话, 大模型判断对不对(只需要看字), 只要小模型猜对了, 那么这就利用了prefill速度, 吐字就会成倍的提升. 但问题来了, 外挂小模型也要看字(prefill), 也要占用显存, 也要吃显存带宽. 那么有没有更好的方法来解决呢? 来了, 这就是DSpark. 看我的这个图(左侧DSv4架构图是 @rasbt 大佬的), DSpark 接在了 Final RMSNorm 过程中. 不是接一个完整的小模型, 而是一个3 层的MTP(多Token预测)微型Transformer堆叠. 大模型算完前面60多层后, 刚把当前这句话的"高浓缩概念"(特征向量/隐藏状态)推到 Final RMSNorm 这个出口，还没来得及翻译成具体文字时，DSpark开始截胡: 首先是半自回归极速脑补 (MTP + Markov Head), DSpark自己有一丢丢参数, 然后它就瞬间并行猜5个字(特征向量), 然后再用自己内部的一个串行网络理顺逻辑. (注意啊,先并行然后串行消除并行导致的逻辑不连贯). 然后, 它会有一个置信度预测头, 预判自己猜的准不准, 比如5个字的后2不准就直接砍掉, 防止后续送回大模型浪费算力. 最后把留下的3个字塞回词表映射层, 把向量翻译为token. 到此为止DSpark工作就做完了. 然后就是大模型扫一遍DSpark输出的对不对(只用prefill，不decode), 一旦正确了, 就直接吐字, 这样之前模型一次只能吐一个字, 现在就能吐3个字了! 最后, 推测性解码是不会降智的, 速度能提升60%-85%! 之前是雇一个小模型帮忙写草稿, 现在则是直接脑子里植入芯片了. 目前SGLang已经有这个特性的PR了(29538), 而且DeepSeek刚在自己的HuggingFace主页发了一大堆小模型的DSpark魔改版. 大胆猜一波未来发布的模型会不会标配DSpark? #dspark #deepseek #投机解码 #推测性解码

译DeepSeek推出的DSpark是一种推测性解码技术，通过在Final RMSNorm后接入3层MTP微型Transformer堆叠，让大模型在输出前并行猜5个token，经置信度头剪裁后，送回大模型用prefill验证，正确则一次性吐出多个token。相比外挂小模型更高效，不降智，速度提升60%-85%。目前SGLang已有相关PR（#29538），DeepSeek已在HuggingFace发布多款DSpark魔改版小模型。

ClaudeDevs@ClaudeDevs · 3天前53

You can now run Claude models in Microsoft Foundry, hosted on Azure. Claude Opus 4.8 and Claude Haiku 4.5 are available through the Messages API with capabilities like prompt caching, thinking, and more. https://x.com/claudeai/status/2071653958905467027?s=20

译你现在可以在 Microsoft Foundry（托管于 Azure）上运行 Claude 模型。 Claude Opus 4.8 和 Claude Haiku 4.5 通过 Messages API 提供，支持 prompt caching、thinking 等功能。

Claude@claudeai · 3天前55

Claude in Microsoft Foundry is now generally available, hosted on Azure. Azure customers get Claude Opus 4.8 and Claude Haiku 4.5, with Azure authentication, billing, and commitment retirement.

译Claude 现已在 Microsoft Foundry 中正式可用，托管于 Azure。 Azure 客户可获得 Claude Opus 4.8 和 Claude Haiku 4.5，并支持 Azure 身份验证、计费和承诺预留。

Nathan Lambert@natolambert · 3天前50

Together AI processing 400T tokens a month.

译Together AI 月处理 400T tokens。

Alibaba Cloud@alibaba_cloud · 3天前43

We’re proud to announce a strategic partnership between Alibaba Cloud International and NovaxAI @NovaxAi26 . By integrating our global cloud infrastructure and AI advancements with Novax AI’s unique capabilities, we are enabling AI companies to achieve faster, more stable, and more efficient global growth. #AlibabaCloud #NovaxAI #AIInnovation #YourAInnovationPlatform

译我们很自豪地宣布，阿里云国际与NovaxAI @NovaxAi26 达成战略合作。通过将我们的全球云基础设施及AI进展与Novax AI的独特能力相结合，我们正助力AI公司实现更快、更稳定、更高效的全球增长。 #AlibabaCloud #NovaxAI #AIInnovation #YourAInnovationPlatform

宝玉@dotey · 3天前56

Ford 重新雇了 350 名老工程师回来，因为 AI 质检系统没能达到预期。过去三年，福特悄悄招回了 350 名资深工程师，有的是退休或离职的老员工，有的是从供应商那边挖来的。公司内部管他们叫 gray beard，直译是白胡子，意思就是经验老到的老师傅。他们回来干两件事：带新人，以及重新调教那些没干好活的 AI 工具。负责整车硬件工程的副总裁 Charles Poon 说： > 我们错误地以为，只要把 AI 引进来，把设计要求输入 AI，它就能产出高质量的产品。首席运营官 Kumar Galhotra 对 Bloomberg 的说法类似。福特这些年越来越依赖自动化质检系统，结果一直不理想；把技术专家请回来后，他们在零件还没上产线之前，就先把故障点揪出来。效果立竿见影。福特时隔 16 年重新拿下 JD Power 新车质量榜主流品牌第一，从去年的第 10 名一口气冲到榜首，是所有品牌里年度进步最大的一个，把丰田和本田都甩在了后面。这个榜单(Initial Quality Study)测的是新车买来头 90 天内车主遇到多少问题，问题越少排名越高。 F-150 皮卡、Super Duty 卡车和 Mustang 跑车在各自品类都拿了第一。CEO Jim Farley 说，质保和召回成本跟着下来了，福特预计今年因此能省下大约 10 亿美元。 Ford 没打算丢掉 AI。它还在扩充 AI 测试，新增了大约 10 万项评估来模拟更多路况。 AI 是个好工具,但它有多好,取决于你拿什么数据去训练它。老师傅回来，主要是给 AI 当老师，告诉它什么样的零件算合格、什么样的设计会埋雷。现在主流叙事都是“AI 要取代白领”，福特这个案例倒是反例，类似的案例还有一些：瑞典金融科技公司 Klarna 几年前高调宣布，AI 客服干了相当于 700 名人工客服的活，到 2025 年，CEO 公开承认这套全 AI 客服质量更差，又开始招人。麦当劳在美国上百家门店试过 AI 点餐，出了一堆翻车视频后撤掉，把人工收银员请了回来。咨询公司 Gartner 早就预测，到 2027 年，因为 AI 裁掉客服的公司里，有一半会需要重新招人。

译福特过去三年召回350名退休/离职资深工程师（gray beard），负责带新人并重新调教未达预期的AI质检系统。整车工程副总裁Charles Poon承认曾错误认为引入AI就能产出高质量产品。效果立竿见影：福特时隔16年重返JD Power新车质量榜主流品牌第一（从第10升至第1），F-150、Super Duty、Mustang分别拿下品类冠军，预计今年节省约10亿美元质保和召回成本。福特未抛弃AI，正新增约10万项评估模拟更多路况。

向阳乔木@vista8 · 3天前52

Agent基建越来越好，利好中小企业。当开发部署不是问题时，又回到根本问题，如何理解企业需求用AI解决问题。最近FDE岗位（Forward Deployed Engineer，前沿部署工程师）很火，可能也是这个原因。外派到客户公司，让AI技术与企业真实业务场景结合，推动AI落地并产生商业价值。不知道有没有正在做FDE工作的朋友，想学习交流下。

译腾讯云 EdgeOne 今日发布「EdgeOne Makers」，通过 `npm install -g edgeone` 等几行命令即可部署 AI Agent 开发框架，自动处理上下文、并发、沙箱环境等问题，支持绑定域名、关联 GitHub 持续迭代。产品处于 Beta 内测，注册可免费领取 50 万 Token。该工具大幅降低 Agent 部署门槛，利好中小企业。Vista 指出，当开发部署不再是问题，关键转向如何理解企业需求用 AI 解决问题，近期 FDE（前沿部署工程师）岗位走热，正是推动 AI 与业务场景结合、实现落地的具体实践。

🚨 AI News | TestingCatalog@testingcatalog · 4天前16

Tasks on Grok for iOS got renamed to Automations. For now, it seems to be only a name change along with a slightly different UI. Are we still about to see Grok desktop eventually?

译Grok for iOS 上的 Tasks 已更名为 Automations。目前看来，这似乎只是名称变更，外加 UI 略有不同。我们最终还能看到 Grok 桌面版吗？

DogeDesigner@cb_doge · 4天前33

Cybercabs are taking over Austin.

译Cybercabs 正在占领奥斯汀。

Rohan Paul@rohanpaul_ai · 4天前52

FT: Google capped Meta’s use of Gemini after Meta asked for more model compute capacity than Google could supply. Meta’s problem is that it uses Gemini inside safety automation, customer support, ad tools, coding, and internal workflows. Google’s problem is different because it has paying cloud customers, its own Gemini products, and limited data center capacity all competing for the same chips, power, and networking. Google Cloud’s March-quarter revenue rose to $20 billion, but Sundar Pichai said a shortage of compute capacity kept growth lower and helped backlog nearly double versus the previous quarter. --- ft .com/content/c5d52f72-71ef-40bc-bad3-61afdba8b378?syn-25a6b1a6=1

译Google限制了Meta对Gemini模型的使用，原因是Meta要求的计算容量超出Google供应能力。Meta在安全自动化、客服、广告工具、编程及内部工作流中均依赖Gemini。Google面临自身云客户、Gemini产品与有限数据中心容量之间的资源竞争。Google Cloud 3月季度收入增至200亿美元，CEO Sundar Pichai表示计算容量短缺制约了增长，并导致未交付订单较前一季度近乎翻倍。

Rohan Paul@rohanpaul_ai · 4天前56

Techcrunch: Micron, the only U.S.-based manufacturer of high bandwidth memory chips, just became Wall Street’s newest AI infrastructure bet because AI servers now need huge amounts of HBM, DRAM, and NAND beside every GPU. Micron’s case got stronger after Q3 revenue hit $41.46B, gross margin reached 84.6%, and Q4 guidance moved to $49 B-$51B. Current market cap ~$1.27 trillion. Memory has usually been a boom-bust business because factories take years and billions to build, then prices collapse when too much supply arrives. Micron is trying to break that cycle with 16 strategic customer agreements worth $22B, using deposits, pricing floors, and take-or-pay terms to make demand harder to cancel. --- 📌 The current state of the global memory market. This market has 2 main layers: DRAM, which includes the memory used next to CPUs and AI GPUs, and NAND flash, which is the storage inside SSDs, phones, and data centers. In DRAM, the market is extremely concentrated, with Samsung at 38.5%, SK hynix at 28.8%, and Micron at 22.4% in 1Q26, meaning the top 3 control about 90% of global DRAM revenue. In HBM, which is a premium submarket inside DRAM, the AI-specific memory used beside Nvidia GPUs, SK hynix is the market leader, with 58% share in 1Q26, while Samsung and Micron each had 21%. HBM, or High Bandwidth Memory, is a special form of DRAM built for extreme data movement. The difference is physical design. Normal DRAM chips usually sit on memory modules or near the processor, and data moves through relatively narrower connections. HBM stacks multiple DRAM dies vertically and places them very close to the GPU through advanced packaging, which creates a much wider data path. That wider path gives AI chips much higher memory bandwidth, meaning the GPU can receive data faster instead of sitting idle. --- techcrunch .com/2026/06/28/why-wall-street-thinks-us-memory-maker-micron-is-the-next-nvidia/

译美光是美国唯一高带宽内存（HBM）制造商，因AI服务器需求激增成为华尔街新宠。Q3营收414.6亿美元，毛利率84.6%，Q4指引490-510亿；利润同比增长15倍，调整后毛利率84.9%（去年39%）。全球DRAM市场高度集中，三星、SK海力士、美光合计占约90%收入；HBM细分领域SK海力士占58%，美光占21%。为打破内存行业周期性，美光签下16个价值220亿美元的战略客户协议，通过定金、价格下限和照付不议条款稳定需求。

Berryxia.AI@berryxia · 4天前50

兄弟们，DeepSeek开源了DSpark！一个投机解码框架，不是新模型，是推理优化。核心问题：传统投机解码里，一个小的draft模型先猜一串token，然后大模型一次性验证。问题是猜的越后面越容易错，验证错误的猜测也浪费GPU算力。 DSpark的解法： 1. 并行backbone + 顺序head混合。纯并行猜测速度快，但后面的token会衰减，因为每个位置猜的时候不知道前面实际采样了什么。 DSpark加了一个小的Markov head，用前一个token调整当前猜测，解决了后缀衰减问题。 2. 置信度调度。加了一个置信度head，估算每个draft token的存活概率。再配合一个负载感知调度器，GPU空闲时多验证几个token，忙碌时少验证。不是所有猜的token都值得检查，只检查那些可能正确的部分。效果：在DeepSeek-V4生产环境中，单用户生成速度比MTP-1基线快60-85%。不同场景下吞吐提升1.5x到5x。开源内容： - 模型checkpoint：`DeepSeek-V4-Pro-DSpark` 和 `DeepSeek-V4-Flash-DSpark`，复用现有V4权重，附加draft模块 - 训练代码：MIT协议的DeepSpec代码库 - 与北京大学联合开发为什么重要：投机解码一直被认为"理论好但实战难"。 DSpark证明了在真实生产系统中，投机解码能稳定提速60%以上，而且不影响输出质量。 DeepSeek已经部署在生产环境里了。

译DeepSeek 开源 DSpark，一个面向生产环境的投机解码框架。核心解决传统投机解码中 draft 模型猜测后期 token 错误率高、浪费算力的问题。DSpark 采用并行 backbone + 顺序 Markov head 混合架构，消除后缀衰减；并引入置信度 head 和负载感知调度器，动态控制验证数量。在 DeepSeek-V4 生产系统中，单用户生成速度比 MTP-1 基线快 60-85%，吞吐提升 1.5x 至 5x。开源内容包括基于 V4 权重的 `DeepSeek-V4-Pro-DSpark`/`Flash-DSpark` checkpoint，以及 MIT 协议的 DeepSpec 训练代码，与北京大学联合开发。

Ethan Mollick@emollick · 4天前56

In my experience, all model routers underestimate the difficulty of non-math/coding tasks and assign them too little intelligence. This is worth addressing, as non-verifiable tasks (innovation, marketing, qualitative analysis) often benefit the most from using “smarter” AI models

译根据我的经验，所有模型路由器都低估了非数学/编码任务的难度，并为它们分配了过少的智能。这是一个值得解决的问题，因为非可验证任务（创新、营销、定性分析）通常从使用“更聪明”的 AI 模型中获益最多。

🚨 AI News | TestingCatalog@testingcatalog · 4天前52

Google vs Meta 🤖 > Google introduces restrictions on Meta's use on Gemini amid capacity shortage, according to the Financial Times. > Reportedly, this negatively affected internal projects at Meta related to customer support and content moderation, causing delays. I bet token efficiency will be a huge market in the long run, with a very transparent and predictable business model.

译Google vs Meta 🤖 > 据《金融时报》报道，Google因容量短缺对Meta使用Gemini施加限制。 > 据报道，这负面影响了Meta内部与客户支持和内容审核相关的项目，导致项目延期。我敢打赌，从长远来看，token效率将成为一个巨大的市场，其商业模式非常透明且可预测。

Rohan Paul@rohanpaul_ai · 4天前29

Somebody quit his data center job and leased an empty warehouse. filled it with rows of server machines. charges clients between $4-6 K p/month for private isolated deployment. Each client receives dedicated machines to host model. He runs all on vLLM.

译某人辞去了数据中心的工作，租了一个空仓库。里面摆满了成排的

Rohan Paul@rohanpaul_ai · 5天前50

Bloomberg: Two prominent Chinese hedge funds are warning that the global AI stock boom has crossed from strong demand into a super bubble. Their point is that many AI-linked stocks now price in years of perfect growth before the businesses have proved they can defend profits. The weakest point is AI infrastructure, where companies must keep spending huge amounts on chips, servers, power, and data centers just to stay relevant. A business with a moat can protect pricing, margins, and customers, while a business built only on sudden demand can look brilliant until supply catches up or customers push back. Wealspring says some hot Chinese AI shares could fall more than 80%, while Banxia points to Anthropic’s revenue run-rate as a pressure point because token costs can rise faster than customer budgets. --- bloomberg. com/news/articles/2026-06-26/chinese-hedge-funds-warn-the-ai-super-bubble-is-ready-to-burst

译彭博社报道，两家中国对冲基金警告全球AI股票繁荣已从强劲需求转为超级泡沫。许多AI相关股票的定价已包含多年完美增长预期，但企业尚未证明能捍卫利润。最薄弱环节是AI基础设施——公司必须持续在芯片、服务器、电力和数据中心上巨额投入以维持竞争力。Wealspring称部分热门中国AI股可能下跌超80%；Banxia指出Anthropic的收入运行率是压力点，因为token成本上升速度可能超过客户预算。

Rohan Paul@rohanpaul_ai · 5天前54

Ford’s AI push hit a hard limit: factories still need human failure memory. They hiring back 350 specialists to catch failures machines missed. Ford leaned on automated inspection to find defects faster, but car manufacturing is full of edge cases where tiny design, material, supplier, and assembly changes interact in ways rules-based systems and trained models can miss. The missing ingredient was tacit engineering knowledge, the hard-earned pattern memory built from many product cycles, failed parts, supplier mistakes. Ford’s rehired “gray beards” now review designs before parts reach the plant floor, while also helping improve the training data behind the AI systems. --- independent. co .uk/tech/ford-ai-automation-human-workers-b3003787.html

译福特汽车的AI自动化缺陷检测遇到硬限制：汽车制造中存在大量边缘案例，微小设计、材料、供应商和装配变化相互作用，导致基于规则的系统与训练模型容易遗漏故障。福特因此召回350名经验丰富的工程师（“gray beards”），利用他们多年积累的隐性工程知识（即故障模式记忆），在零件到达工厂前审查设计，同时帮助改进AI系统的训练数据。

Rohan Paul@rohanpaul_ai · 5天前71

Gallup found that 71% of Americans oppose building local AI data centers. Women (55%) are also more likely than men (43%) to register strong opposition to data center construction. There are no meaningful differences in total opposition by age, race, education, income or urbanicity. Opposition is slightly lower among those living in the West (63%) and East (68%) than in the Midwest (76%) and South (75%). The opposition is not mainly abstract fear of AI, because 50% of opponents mention resource strain, including 18% water use and 18% electricity demand. Local residents are reacting to AI as a land, grid, water, noise, traffic, and bill-pressure issue. Supporters see the other side of the bargain, with 66% citing local economic benefits and 55% naming jobs. The political risk is unusually broad because majorities of Democrats, independents, and Republicans all oppose local construction, even though Democrats show the strongest opposition at 56% strongly opposed.

译Gallup民调显示，71%美国人反对在本地建设AI数据中心，女性（55%）强烈反对比例高于男性（43%）。反对主因并非恐惧AI，而是资源压力（50%反对者提及，其中水、电各占18%）；支持者则看重经济（66%）和就业（55%）。政治风险广泛，多数民主党、独立派、共和党均反对，民主党最强烈（56%强烈反对）。自2023年以来，美国已出现300+州及地方数据中心禁令/暂停。但现代数据中心已能缓解担忧：微软新一代芯片级闭环冷却零水耗；谷歌全球PUE 1.09低于行业平均1.56；数据中心未推高居民电价；"自带电力"成趋势，Google、微软、Meta纷纷签订核电合同。

Rohan Paul@rohanpaul_ai · 5天前58

The U.S. AI buildout is running into a harder constraint than GPUs: permission. The Information’s new map finds 300+ state and local data-center bans or moratoriums since 2023, with 275+ passed this year and 75+ still under consideration; resistance is strongest in the Midwest and South, exactly where hyperscalers want cheap land and megawatt-scale power. But the backlash against data centers is outrunning the facts. The most current datacenter is increasingly designed to solve the two biggest fears: water and power bills. Start with water. Microsoft’s next-generation AI data centers use chip-level, closed-loop cooling that consumes zero water for cooling and can avoid more than 125 million liters per year per site. Its fleetwide water-use efficiency has already improved 39% since 2021, to 0.30 liters per kWh. Google reports that 86% of its freshwater withdrawals come from low- or medium-risk sources, and its global data-center fleet runs at a 1.09 PUE versus a 1.56 industry average—meaning far less wasted overhead energy. Electric bills are not automatically shifted to households, either. A recent causal study of U.S. retail rates from 2015–2024 found data centers modestly lowered average rates by spreading fixed grid costs across more electricity sales. And “bring your own power” is already the new trend - e.g. Google’s 500 MW nuclear deal, Microsoft’s 835 MW Three Mile Island agreement, and Meta’s 1,121 MW nuclear contract.

译美国AI基础设施建设最大瓶颈已从GPU变为许可。《The Information》地图显示2023年以来有300多项州级和地方数据中心禁令或暂停，今年通过275项，还有75项在审，中西部和南部抵制最强。但现实数据反超担忧：微软新一代AI数据中心采用芯片级闭环冷却，每站每年避免超1.25亿升水，全舰队用水效率自2021年提升39%至0.30升/kWh；谷歌86%淡水来自低中风险源，全球PUE 1.09（行业均值1.56）。2015–2024年研究表明数据中心通过分摊固定电网成本适度降低了居民平均电价。趋势已转为“自带电源”：谷歌500 MW核电、微软835 MW三哩岛、Meta 1121 MW核电协议。

凡人小北@frxiaobei · 5天前49

医疗 AI 很容易被讲成“模型答题准确率”的竞争。但真正难的是进入工作流，比如医生说话、病历结构化、患者上下文、支付方、院内系统、审计责任。医疗 AI 的产品壁垒最后大概率不是一个 chatbox。

译医疗AI常被简化为“模型答题准确率”的竞争，但真正的难点在于进入实际工作流——包括医生自然语言处理、病历结构化、患者上下文理解、支付方对接、院内系统集成以及审计责任。产品壁垒最终大概率不是一个 chatbox，而是与医疗场景的深度融合。

AYi@AYi_AInotes · 5天前62

Cloudflare 上免费无限使用GLM 5.2的方法，冲！

译在Cloudflare Workers AI上配置GLM 5.2免费使用：登录后创建API Token，在Chatbox中设置OpenAI API兼容的自定义API，填入API Key和拼接了Account ID的Host地址，模型名选@cf/zai-org/glm-5.2即可。但实测每日有使用限制，并非真正无限。冲！

宝玉@dotey · 5天前51

讨厌老登，理解老登，成为老登

译推文围绕AI行业“老登”与“新登”展开讨论。老登指注重基建、有认知思辨的专业程序员，被认为能支撑AI健康稳定迭代，是专业尊严的最后阵地；新登则概念强、快速落地、吹牛忽悠投资后砍掉产研，导致裁员频发（有朋友一年被三家AI创业公司裁）。主推文以“讨厌老登，理解老登，成为老登”概括了从对立到认同的态度转变。

meng shao@shao__meng · 5天前46

刚刚在 Chat 垃圾箱翻到一封邮件，做 llm api 中转站的，咨询合作。好奇点进网站，首页赫然写着「Claude Fable 5」，第一反应是它们可能曾经接过，后来下架后没调。手贱我用他们给的邀请码注册了，让 Codex 跑一下 Fable 5 的调用，居然通了 😂 这。。到底是背景太硬，还是嘴太硬。。

译作者在垃圾箱发现一封LLM API中转站的合作邮件，网站首页声称提供「Claude Fable 5」模型。作者用邀请码注册后，通过Codex调用该模型，居然成功返回结果。作者质疑该站是背景过硬还是虚假宣传。

Rohan Paul@rohanpaul_ai · 5天前70

AI revenue has crossed its first serious accounting test: $25B in quarterly sales now exceeds $21B in estimated chip and data-center depreciation. i.e. AI infrastructure is starting to pay for itself before power, labor, financing, and leases are counted. --- Chart from Bloomberg bloomberg. com/news/articles/2026-06-25/ai-demand-begins-to-justify-massive-cost-of-data-center-buildout

译AI季度收入达$250亿，首次超过芯片与数据中心折旧估算$210亿（未计电力、人力等成本）。据@exponentialview报告，过去12个月去除重复计算的真实AI收入为$1100亿，当前年化$1750亿（终端客户支出，不含中国等），增长速度约为移动/互联网浪潮的3倍。每新增$10亿收入所需时间从2023年的180天缩至不到2天。企业AI已超越试点阶段，但全面推广仍处早期。降价效应显著：每降10%推动12-18%更多token使用，需求呈价格弹性。电力与数据中心成本仍是未来扩展的主要瓶颈。

SemiAnalysis@SemiAnalysis_ · 5天前24

Spotted in @modal NYC office

译在@modal纽约办公室被发现。

Rohan Paul@rohanpaul_ai · 5天前56

The top 1% of U.S. AI firms are now spending about $7,500 per employee each month on AI. --- Source: econlab .substack.com/p/how-much-does-it-cost-to-be-ai-pilled

译美国前1%的AI公司现在每名员工每月在AI上花费约7500美元。

SemiAnalysis@SemiAnalysis_ · 6天前24

At HPC Summit Asia this year we enjoyed checking out the new 2u DLC HGX R200 systems on display. (1/3)🧵

译今年在 HPC Summit Asia 上，我们很高兴看到了展出的新款 2u DLC HGX R200 系统。（1/3）🧵

Sam Altman@sama · 6天前64

team cooked, spicily

译团队完成了工作，带点辣味。 OpenAI 设计并制造了首款 AI 芯片：Jalapeño。该芯片由 OpenAI 从零开始设计，并与 Broadcom 合作量产，专为支持 ChatGPT、Codex、API 及未来智能体产品的 LLM 工作负载而打造。芯片是 AI 经济的基础。自研芯片扩展了从产品到模型再到基础设施的全栈平台，将助力扩展智能、服务更多用户并扩大 AI 的普及。

DogeDesigner@cb_doge · 6天前58

BREAKING: Elon Musk has been cleared by the FTC to acquire Mesh Optical Technologies. Mesh was founded by former SpaceX engineers who worked on Starlink laser-link communications. The company builds optical transceivers for AI data centers.

译BREAKING: Elon Musk已获FTC批准收购Mesh Optical Technologies。 Mesh由前SpaceX工程师创立，他们曾参与Starlink激光链路通信。该公司为AI数据中心制造光学收发器。

Replit ⠕@Replit · 6天前27

450+ Integrations, Now Easier to Find https://x.com/i/broadcasts/1yxBeeQApqyJN

译450+集成，现更易查找 https://x.com/i/broadcasts/1yxBeeQApqyJN

Berryxia.AI@berryxia · 6天前68

PaddleOCR的PP-OCRv6又扔了一波硬核部署数据。他们在A100上做到0.13秒一张图，在Intel CPU上比PP-OCRv5快3.9倍到5.2倍。 Apple M4上用ONNX Runtime也能跑到0.35秒一张。还提供了Tiny、Small、Medium三种尺寸，分别对应移动端、CPU文档系统和高并发API的不同场景。最有意思的是他们最后总结的那句话：在专用OCR任务上，轻量架构 + 高质量训练数据，往往比单纯堆参数更实用。这其实是把当前大模型“暴力scaling”的思路，在垂直领域做了一次反向验证。从v5到v6，PaddleOCR在精度、速度、多语言和工程部署上持续迭代，这次把部署侧的数据拉得这么细。等于把“怎么在真实生产环境里用好OCR”这件事讲透了。

译PaddleOCR发布PP-OCRv6完整端到端部署基准。A100上PP-OCRv6_tiny达0.13秒/图；Intel CPU上用OpenVINO，PP-OCRv6_medium比PP-OCRv5_server快5.2倍，PP-OCRv6_tiny比PP-OCRv5_mobile快3.9倍；Apple M4上用ONNX Runtime跑出0.35秒/图。提供Tiny、Small、Medium三种尺寸，Medium/Small均支持50种语言，PP-OCRv6_medium英文准确率88.4%，拉丁字母准确率88.0%。官方总结认为，在专用OCR任务上，轻量架构+高质量训练数据比单纯堆参数更实用，是对大模型“暴力scaling”路线的反向验证。

Alibaba Cloud@alibaba_cloud · 6天前45

At Flink Forward Asia Shenzhen 2026, NVIDIA’s Chuan Chen shared how NVIDIA and Alibaba Cloud accelerate multimodal data stream processing for Apache Flink: “NVIDIA and Alibaba Cloud's team technically collaborate to enable the CUDA library-accelerated multimodal data stream processing of Apache Flink.” This open-source collaboration enables end-to-end, high-performance multimodal streaming architectures for AI commentary, live image-text feeds, and interactive Q&A. #NVIDIA #AlibabaCloud #ApacheFlink #DataAI #AI #Multimodal #RealTimeStreaming

译在Flink Forward Asia Shenzhen 2026大会上，NVIDIA的Chuan Chen分享了NVIDIA与阿里云的技术合作：通过CUDA库加速Apache Flink的多模态数据流处理。这一开源协作实现了端到端的高性能多模态流式架构，可应用于AI解说、实时图文信息流和交互式问答等场景。

Alibaba Cloud@alibaba_cloud · 6天前30

Welcome to Qwen Live – Episode 1: Agent-First, When Your Next User Isn't Human. 📅 June 30, 2026 | 10:00 AM (UTC+8) 🔔Set Reminder Now: https://youtube.com/live/Hh-ftRYsGkI?feature=share As AI evolves at lightning speed, your next user might not be human—it could be an AI agent. In this debut episode, we are joined by Linlin Kong, Head of Qwen Cloud, alongside Qwen Cloud Product Managers Pan Gu and Xijue. Together, they will explore building cloud platform for agent from scratch, redefine developer experiences for non-human users, and uncover the new paradigms of large-scale human-agent collaboration. Ready to build for agents? Get started with Qwen Cloud Platform: https://click.qwencloud.com/m/20000000401/ #Qwen #QwenLive #AIagents #AICloud #DeveloperExperience #FutureOfWork

译阿里云宣布Qwen Live系列首期节目，主题为“Agent-First：当你的下一个用户不是人类”。节目将于2026年6月30日10:00（UTC+8）直播，由Qwen Cloud负责人林林孔、产品经理潘古和西觉共同主持。他们将探讨从零构建面向AI智能体的云平台、为非人类用户重新定义开发者体验，以及大规模人机协作的新范式。节目还提供Qwen Cloud平台入门链接。

向阳乔木@vista8 · 6天前68

3行命令搭一个 AI Agent 框架，腾讯云给力啊！很多人想开发 AI Agent，除了选框架开发，其实更麻烦的事情是部署。本地运行没问题，一上线就翻车。需要解决上下文问题，并发问题，为了安全还要搭沙箱环境，全都自己搞非常麻烦。腾讯云 EdgeOne 今天发布「EdgeOne Makers」，一切都变简单了。测试了下，在 Terminal 执行几行指令，就能部署个AI Agent开发框架： npm install -g edgeone edgeone makers create --template openai-agents-starter-node cd openai-agents-starter-node && npm install && edgeone makers dev 本地会起个测试网站，直接能对话看Agent效果和工具调用细节。线上能绑定域名、关联Github，持续迭代开发，太省心了！产品正在 Beta 内测，注册就能免费领 50w Token，方法见评论区。 #腾讯云 #EdgeOne #AIAgent #EdgeOneMakers

译腾讯云 EdgeOne 发布「EdgeOne Makers」，简化 AI Agent 开发与部署。用户在终端执行三行命令即可部署 Agent 框架：`npm install -g edgeone`；`edgeone makers create --template openai-agents-starter-node`；`cd openai-agents-starter-node && npm install && edgeone makers dev`。本地启动测试网站，可实时查看 Agent 对话与工具调用细节；线上支持绑定域名、关联 GitHub 实现持续迭代。产品处于 Beta 内测阶段，注册即免费领取 50 万 Token。

Alibaba Cloud@alibaba_cloud · 6天前40

At Flink Forward Asia Shenzhen 2026, Feng Wang, Researcher and Head of Open Data Platform at Alibaba Cloud & Vice Chair of the Alibaba Open Source Committee, highlighted the evolving data foundation for AI: "In the AI era, models and data together determine the quality and efficiency of Agents. Apache Flink evolves into Agentic Streaming for AI, working alongside Agentic Lake to build AI-native data platform." The next generation of intelligent agents is built on a unified, AI-native data infrastructure designed for real-time agentic workflows. #AlibabaCloud #ApacheFlink #ApachePaimon #ApacheFluss #DataAI #AI #Agent #RealTimeData

译在深圳举办的Flink Forward Asia 2026上，阿里云研究员、开放数据平台负责人Feng Wang指出，AI时代模型与数据共同决定Agent质量与效率。Apache Flink演进为Agentic Streaming for AI，与Agentic Lake协同，构建AI原生数据平台。下一代智能体建立在统一、实时的AI原生数据基础设施之上。

Rohan Paul@rohanpaul_ai · 6天前61

FT: Micron just reported a 15-fold profit jump because AI servers are now short of high-bandwidth memory, the stacked memory that keeps GPUs fed with data. The shortage gives Micron rare pricing power, with adjusted gross margin reaching 84.9%, compared with 39% a year earlier. --- ft .com/content/9b739203-3274-43f1-b61e-c1905061d32a

译FT: 美光刚刚报告利润增长15倍，因为AI服务器目前缺乏高带宽内存，这种堆叠内存可保持GPU的数据供给。短缺赋予美光罕见的定价权，调整后毛利率达84.9%，而一年前为39%。 --- ft .com/content/9b739203-3274-43f1-b61e-c1905061d32a

MiniMax (official)@MiniMax_AI · 6天前44

More great options for the open-weight ecosystem. Thanks @NVIDIAAI for making MiniMax M3 available in NVFP4.

译开源权重生态的更多好选择。感谢 @NVIDIAAI 使 MiniMax M3 可在 NVFP4 中使用。

SemiAnalysis@SemiAnalysis_ · 6天前62

H100 ornn index spot prices are falling, now at $2.42 per hour, roughly 40% below the May peak. The ecosystem is concerned that this is a sign that compute demand and by extension the appetite for AI is waning. (1/5)🧵

译H100 ornn 指数现货价格正在下跌，目前为每小时 2.42 美元，比 5 月峰值低约 40%。生态系统担忧这是计算需求以及由此产生的 AI 兴趣减弱的迹象。(1/5)🧵

Ethan Mollick@emollick · 6天前50

I feel like on X all you hear about is elaborate plans by firms to build their own AI stacks but in my experience companies are full of people who want access to Claude or ChatGPT and are pressuring their purchasing staff to get licenses so they can just use the tools they know.

译我感觉在X上听到的都是公司构建自有AI堆栈的复杂计划，但根据我的经验，公司里满是想要访问Claude或ChatGPT的人，他们正施压采购人员获取许可，以便直接使用那些他们已经熟悉的工具。

OpenRouter@OpenRouter · 6天前56

TIP 💡@Zai_org GLM-5.2 providers are working on faster and faster inference! Today's new endpoints include @wafer_ai and @FireworksAI_HQ fast variants. Set your model to "z-ai/glm-5.2:nitro" to continuously get the fastest provider based on live traffic data.

译提示💡@Zai_org GLM-5.2 提供商正努力实现越来越快的推理！今天的新端点包括 @wafer_ai 和 @FireworksAI_HQ 快速变体。将模型设置为 "z-ai/glm-5.2:nitro"，即可根据实时流量数据持续获得最快的提供商。