"So you're saying that your SRAM supply is infinite?" "Yes" "But the logic wafers on which the SRAM is fabbed is supply constrained?" "Yes Dave that's right"

译"所以你是说你的SRAM供应是无限的？" "是的" "但制造SRAM的逻辑晶圆供应受限？" "是的Dave，没错"

Rohan Paul@rohanpaul_ai · 7天前39

John Carmack on on the anti-datacenter conversation.

译John Carmack 就反数据中心争论发表观点。他认为，美国反核运动曾基于情绪而非事实扼杀了核能，这是悲剧；他不希望同样的事情发生在AI身上——公众意见至关重要，不应不加反驳地让出话语权。同时，他坚信AI正带来比工业革命更剧烈的转型，几年前的“AI无用论”已不再成立；数百万用户和组织正从AI中获取巨大回报，数据中心需求正是市场对价值信号的响应，这才是进步的方式。

OpenRouter@OpenRouter · 7天前63

Interface + inference, in one place. @OpenWebUI now runs on OpenRouter. Give your team one chat interface, one unified bill, and access to 400+ frontier and open models through a single API.

译界面 + 推理，合二为一。 @OpenWebUI 现运行于 OpenRouter。为你的团队提供一个聊天界面、一个统一账单，并通过单个 API 访问 400 多个前沿和开放模型。

swyx 🔜 @aiDotEngineer@swyx · 7天前41

btw as a musician myself i've gone ahead and pulled the trigger for the first ever music corner at AIE if you code + music, we want impromptu jams between sessions. grab me to sing with u if u want. give us your best dad rock. sponsorship for this + networking night on jun 30 are the only show floor opportunities left at aie, everything else is sold out (btw tix on track to sell out TOMORROW)

译swyx 以音乐人身份为 AI Engineer World's Fair 2026 设立首个音乐角，邀请会编程和音乐的参会者即兴合奏。该音乐角与 6 月 30 日的社交之夜是仅剩的展位赞助机会。本届大会规模更大：展区扩大 4 倍、设 4 个展台舞台、研究者海报环节、Token Billionaires 及 Off the Record 领袖论坛、医疗/GTM/FDE/AGC/金融垂直论坛、NEO 和儿童日边会。参会者可获 $40k 信用额度试用赞助商产品。swyx 提醒门票预计明天售罄。

Rohan Paul@rohanpaul_ai · 7天前66

US Congress is moving a Ratepayer Protection Act that would make AI data center builders, not ordinary electricity customers, pay for the new power plants, transmission lines, and grid upgrades their projects require. The bill pushes states to create a large-load standard, meaning a data center that creates a huge new electricity burden can be put in a separate rate category and charged upfront, through special tariffs, deposits, guarantees, or contracts, so normal homes and businesses do not subsidize the infrastructure built mainly for that data center. Tech firms already signed a White House pledge to cover new energy and delivery infrastructure for data centers, but this bill would give states a formal path to enforce that idea. So this is the first serious federal move to put AI’s physical footprint into the price of AI itself, and it could make data center siting depend less on cheap land and more on who can fund reliable power fastest. --- cnbc. com/2026/06/24/ai-data-centers-tech-companies-congress-energy-costs.html

译美国国会正在推进《Ratepayer Protection Act》，要求AI数据中心开发商承担新建电厂、输电线路和电网升级费用，而非转嫁给普通电力用户。法案鼓励各州设立“大负荷标准”，将高耗电数据中心单独分类，通过特殊关税、押金、担保或合同预先收费，防止普通家庭和企业补贴此类基础设施。科技公司此前已签署白宫承诺覆盖数据中心能源及输送基础设施，但该法案为各州提供了正式执行路径。这是首个将AI物理能耗成本内部化的联邦举措，可能使数据中心选址更侧重供电能力而非廉价土地。

Hao AI Lab@haoailab · 7天前52

Introducing JetSpec: we find speculative decoding can push LLM generation latency to extreme by co-optimizing drafting cost and drafting quality with causal parallel tree drafting. JetSpec reaches up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while keeping lossless. With CUDA graph and kernel optimizations, JetSpec further translates to around 1000 TPS on a single B200. ⚡️ Check out our project page for demos and a blog post on how we built it 👇 https://jetspec-project.github.io/jetspec-web/ https://haoailab.com/blogs/parallel-tree-decoding/

译Sky Computing Lab推出JetSpec，一种通过因果并行树草稿（causal parallel tree drafting）联合优化草稿成本与质量的推测解码方法，可将LLM生成延迟推向极致。在MATH-500上达到最高9.64x端到端加速，开放式聊天达4.58x，且保持无损。结合CUDA graph和kernel优化，在单B200上实现约1000 TPS。

Rohan Paul@rohanpaul_ai · 7天前41

Code is automated, debugging still stayed mostly manual. @sazabi is trying to close that gap with an AI observability system that detects issues, investigates failures, and helps prepare fixes. logs are all you need: Its bet is that logs can become the source of truth, with AI deriving metrics, traces, and possible fixes from the raw events teams already collect.

译AI 可观测性初创公司 sazabi 获得 800 万美元融资，其平台将日志（logs）作为唯一事实来源，让 AI 自动检测问题、调查故障并协助准备修复方案。该平台从团队已有的原始日志中推导指标、追踪和可能修复，旨在替代传统手动监控。sazabi 定位为下一代通用可观测性方案，适用于任何工作负载（包括 AI 智能体），而非另一款 AI SRE 或 LLM 可观测性工具。2026 年软件更新速度极快，该平台希望通过 AI 最大化自动化和速度，实现自我修复软件。

AK@_akhaliq · 7天前41

glm 5.2 in hf-claude building a gradio server app for Ornith-1.0-9B

译在 hf-claude 中使用 glm 5.2 为 Ornith-1.0-9B 构建 Gradio 服务器应用。

Rohan Paul@rohanpaul_ai · 7天前67

IBM debuts world’s first sub-1 nanometer chip. They just announced a 0.7nm chip technology. (0.7nm is about 100,000 times thinner than a human hair, only a few atoms across as a technology marker,) It stops treating smaller chips as a flat-layout problem and starts stacking transistors upward through a new nanostack architecture. A chip node no longer means every feature has that exact size, but 7 angstroms means IBM is pushing transistor design into a scale where individual atoms start to shape the engineering limits. Traditional scaling packs more switches side by side, while nanostack vertically staggers nanosheet transistors, letting each layer use different materials for speed or lower power instead of forcing one transistor recipe across the chip. IBM says the design can fit nearly 100B transistors on a fingernail-sized chip, deliver up to 50% more performance or 70% better energy efficiency than its 2nm node, and shrink SRAM by 40%, which is huge because AI chips constantly wait on memory. Practically it means future chips that can run bigger AI models, phones, laptops, servers, and cloud systems faster while using much less power, because more compute and memory can fit into the same tiny chip area. --- newsroom. ibm. com/2026-06-25-ibm-debuts-worlds-first-sub-1-nanometer-chip-technology

译IBM推出0.7nm芯片技术，采用新型nanostack架构将晶体管垂直堆叠，取代传统平面缩放。指甲盖大小面积可容纳近1000亿个晶体管，性能较其2nm节点提升50%，能效提升70%，SRAM缩小40%。该技术突破原子尺度工程极限，有望让AI芯片、手机、服务器等更快更省电。

elvis@omarsar0 · 7天前20

Running agents these days shouldn't be too hard. But local agents are tricky to operate. @hyperagentapp gives every agent its own dedicated cloud machine. Handles infra for you so it runs whether your laptop is on or not.

译Hyperagent 为每个 AI 智能体提供专用云端机器，托管基础设施，无需笔记本常开也能持续运行。针对 OpenClaw 等本地框架常见的问题（每日崩溃、泄露秘密、频繁监控），Hyperagent 提供稳定安全替代方案。限时优惠：注册即获 $100 推理积分，迁移首个智能体再获 $500。

ginobefun@hongming731 · 7天前62

使用 FreeLLMAPI 这个项目光明正大的白嫖，目前看有 1.3B 的 token，还可以选择和自定义策略，真不错

译开发者 @hongming731 分享使用 FreeLLMAPI 项目“光明正大白嫖”，已累计消耗约 1.3B token，支持自定义策略。此外，他还提出基于 Dify 异常分支的省钱方法：增加一个 openrouter/free 节点，当异常时使用 flash 模型兜底，每天可免费调用 1000 次。

Chubby♨️@kimmonismus · 7天前54

We are still not building enough data centers. That sounds almost absurd, given the scale of the current AI infrastructure boom. OpenAI and SoftBank’s Stargate campus in Texas alone is expected to cost well over $40 billion and draw around 1.2 gigawatts at peak load. Such an interesting article by @ChrisGillett tl;dr: AI labs need more compute. Compute needs more data centers. Data centers need enormous amounts of electricity. And the real bottleneck may not be chips, GPUs, or even energy generation itself. It may be the grid! Before a new data center or power plant can connect, grid operators have to study whether it will overload transmission infrastructure. In the US, the median wait for power plant interconnection reportedly increased from less than 20 months in 2005 to 55 months by 2023. That is a brutal constraint for an industry trying to scale in months, not decades. The current system often works on a first-come, first-served basis, which means serious projects can get stuck behind speculative or lower-value ones. The result is a growing mismatch between the speed of AI infrastructure demand and the speed of Western grid bureaucracy. America may not have an energy shortage. It has a grid connection problem. And if AI becomes one of the defining infrastructure races of the century, the winners may not just be the countries with the best models or the most chips, but the ones that can actually plug them in. Highly recommend you read his whole article

译AI算力需求激增推动数据中心扩张，但真正的瓶颈可能并非芯片或能源生产，而是电网接入。OpenAI与SoftBank在德州的Stargate园区耗资超400亿美元，峰值负载约1.2吉瓦。然而美国电网并网等待时间中位数从2005年的不到20个月增至2023年的55个月。现行先到先得的审批机制导致严肃项目被投机项目阻塞。未来赢家可能不是拥有最佳模型或最多芯片的国家，而是能快速接入电网的国家。

Alibaba Cloud@alibaba_cloud · 7天前37

Securing AI Agents on Alibaba Cloud: The Constraint Infra ️ Solve Agent chaos with a robust governance layer: ✅ Dynamic Control: Hot-update Prompts/rules via Nacos. ✅ Granular Governance: Token limits & multi-agent security. ✅ Proven in Prod: StarOps SRE Agent runs high-risk tasks safely within these boundaries. ✅ Self-Evolving: Rules iterate via AgentLoop data flywheel. Build safer, smarter Agents! 🚀 https://int.alibabacloud.com/m/1000414834/ #AI #AlibabaCloud #Nacos #Higress #StarOps #AgentLoop

译阿里云发布面向AI智能体的约束基础设施（Constraint Infra），提供治理层解决Agent混乱问题。核心能力包括：通过Nacos热更新提示词与规则实现动态控制；支持token限制及多智能体安全的细粒度治理；已在生产环境验证，StarOps SRE智能体在该边界内安全运行高风险任务；通过AgentLoop数据飞轮驱动规则自我进化。

AK@_akhaliq · 7天前38

Over 300 GLM 5.2 requests through hf-claude on HF Inference Providers for just $34

译通过 HF 推理提供商的 hf-claude，超过 300 次 GLM 5.2 请求仅花费 34 美元。

AYi@AYi_AInotes · 7天前71

卧槽这个必须分享，一个开源工具，让你用免费 API 密钥池跑出企业级路由的效果，等于是零成本撸10亿+免费LLM Token，要把把付费网关干碎的节奏，对比一下：高容量令牌和企业路由，0，原理很简单，它是个路由框架，不是卖 API 的，你需要自己去各厂商申请免费密钥，然后填进配置，工具自动帮你做负载均衡和自动故障切换， 30 秒能跑起来：克隆仓库，配好密钥，把应用指向本地端点，完事，免费额度用满、用稳，不用自己写回退逻辑，项目几周前刚发布，现在入坑还能直接给作者提改进意见， GitHub 链接放评论区 👇 有用的记得给仓库加星。

译一款开源路由框架（非API售卖），让用户自行申请各厂商免费API密钥，通过配置实现自动负载均衡与故障切换，从而零成本使用10亿+免费LLM Token。操作极简：克隆仓库、填入密钥、将应用指向本地端点，30秒即可运行，无需手写回退逻辑。项目几周前刚发布，作者开放改进建议，GitHub链接见评论。

ginobefun@hongming731 · 7天前61

基于 Dify 异常分支的省钱小妙招增加一个 openrouter/free 节点处理，异常时使用 flash 模型兜底，每天可以调用 1000 次

swyx 🔜 @aiDotEngineer@swyx · 6月25日19

we are going to have to Rebuild So. Much. Infra. for the age of Software Factories

译我们将不得不为软件工厂时代重建大量基础设施。

karminski-牙医@karminski3 · 6月25日50

本地用vLLM部署GLM-5.2的速度终于上来了! 好消息终于轮到本地部署 GLM-5.2 了! 大家都知道 GLM-5.2 这次是自带了MTP头的, 可以进行推测性解码. 但是, 这个只适用于bf16原始精度的GLM-5.2, 而这玩意原始精度要到1.5TB, 本地跑的很少有富到这个程度的, 所以大家都用各种量化版本, 毕竟4bit量化就只要430GB了. 问题这就来了, 由于 GLM-5.2 的 MTP 采用了非常特殊的 DSA (动态稀疏注意力), 导致目前几个推理引擎 (llama.cpp, vLLM, mlx) 都无法支持. 其中 llama.cpp, mlx 是完全没办法开 MTP, vLLM 只支持FP8精度的. 而SGLang 没事哈, SGLang 架构比较屌上来就支持同一个计算流使用混合精度. 所以直接用 GLM-5.2-W4AFP8 就行. 所以回到这几个不支持的推理引擎, 大部分的量化版本 GLM-5.2 开了 MTP 反而会掉速度. 甚至有的量化版本直接把MTP部分给砍了(mlx). 而社区作者dnhkng搞了个缝合方法, 最终搞出了 GLM-5.2-AWQ-INT4-FP8-MTP-delta, 即底座用 INT4（走 Marlin 算子）+ MTP 用 FP8（保持精度）同时还能让vLLM 支持. 速度从原来的 2 token/s 直接飙升到了 43.39 token/s (绑定NUMA+MTP-3) 所以目前位置 SGLang 和 vLLM (魔改版)都能直接火力全开跑带MTP的 GLM-5.2了. 而 llama.cpp和mlx用户还需要再等等. 社区还在弄. 这个作者的blog (过程极其精彩, 有不少优化技巧): http://dnhkng.github.io/posts/gh200-benchmarking-part-3-glm52/ #glm52 #mtp #dsa

译GLM-5.2 自带 MTP（推测性解码）头因采用 DSA（动态稀疏注意力），导致 vLLM、llama.cpp、mlx 等推理引擎难以支持。原始 bf16 精度需 1.5TB，4bit 量化仅 430GB。社区作者 dnhkng 制作了 GLM-5.2-AWQ-INT4-FP8-MTP-delta 魔改版：底座用 INT4（Marlin 算子）+ MTP 用 FP8，使 vLLM 支持 MTP，速度从 2 token/s 提升至 43.39 token/s（绑定 NUMA+MTP-3）。SGLang 因支持混合精度可直接使用 GLM-5.2-W4AFP8；llama.cpp 和 mlx 用户仍需等待社区适配。

Tibo@thsottiaux · 6月25日65

Spicy

译OpenAI 设计并制造了其首款 AI 芯片：Jalapeño。该芯片由 OpenAI 从零设计，与 Broadcom 合作量产，专为支撑 ChatGPT、Codex、API 及未来智能体产品的大语言模型工作负载而打造。芯片是 AI 经济的基础，自研芯片扩展了 OpenAI 从产品到模型再到基础设施的全栈平台，将助力扩展智能、服务更多人、并扩大 AI 的可及性。主推文：「劲爆。」

Berryxia.AI@berryxia · 6月25日63

别只吹OpenAI的芯片牛逼了… OpenAI今天官宣自研第一颗AI芯片「Jalapeño」（辣椒芯片），全网都在吹“垂直整合时代来了”…… 但真实情况没人说：这不是胜利宣言，是被推理成本逼到墙角后的无奈自救。推理（跑模型回答用户）成本正在爆炸式吞噬OpenAI的利润，甚至威胁生存。前因：ChatGPT每天要处理海量用户查询，NVIDIA GPU又贵又抢手。 2025年10月，OpenAI就和Broadcom宣布合作开发自定义AI加速器，目标10吉瓦规模。现在Jalapeño出来了，OpenAI自己从头设计，Broadcom负责生产。后果：如果2026年底实现吉瓦级部署——推理成本有望降低约50%（Broadcom CEO原话），性能功耗比大幅优于当前顶级加速器。让ChatGPT、API和未来Agent产品跑得更快更便宜。 OpenAI将从“模型公司”彻底变成“全栈AI基础设施公司”，服务更多人，但也意味着大公司对底层算力的掌控更深。别人最忽视的细节（这些才是真正震撼的点）： ✅ 开发速度离谱：从初始设计到制造流片仅用9个月！而且是用OpenAI自己的AI模型辅助设计的（AI在帮自己设计加速自己的硬件，meta到爆）。 ✅ 这颗芯片只针对Inference（推理），不是训练。训练阶段大概率还是得继续依赖NVIDIA。 ✅ 首批样片已经到手，正在实测中。早期数据：性能功耗比显著优于当前最先进的水平”。 ✅ Broadcom CEO直接说：性能能媲美NVIDIA Blackwell + Google TPU，同时成本省一半。 ✅ 它不是孤零零一颗芯片，而是OpenAI未来多代计算平台的第一步，还带Broadcom的网络技术。 ✅ 名字叫「Jalapeño」，够辣，够应景这个越来越“spicy”的AI时代。这枚芯片的出现，其实在无声宣告：AI已经开始用自己加速自己的基础设施建设。而人类对算力的胃口，只会越来越大。你怎么看？是OpenAI的聪明自救，还是AI军备竞赛又一次疯狂升级？

译OpenAI 发布首颗自研 AI 芯片 "Jalapeño"，专为 LLM 推理设计，与 Broadcom 合作生产。从设计到流片仅 9 个月，且由自身 AI 模型辅助设计。首批样片已到手，性能功耗比显著优于当前顶级加速器，Broadcom CEO 称性能媲美 NVIDIA Blackwell 与 Google TPU，同时成本降低约一半。目标 2026 年底实现吉瓦级部署，推理成本有望下降约 50%。该芯片将驱动 ChatGPT、Codex、API 及未来 Agent 产品，标志着 OpenAI 从模型公司向全栈 AI 基础设施公司转型。

Chubby♨️@kimmonismus · 6月25日42

We are so back. Fable 5 is so back.

译据报，Fable 5 已重新出现在 Amazon Bedrock。主推文反应：我们回来了。Fable 5 回来了。

SemiAnalysis@SemiAnalysis_ · 6月25日49

Chat develop a chip from initial design to tape out in 9 months, make no mistakes.

译Chat从初始设计到流片在9个月内开发出一颗芯片，并且不犯任何错误。

Berryxia.AI@berryxia · 6月25日66

别只吹OpenAI的芯片牛逼了… OpenAI今天官宣自研第一颗AI芯片「Jalapeño」（辣椒芯片），全网都在吹“垂直整合时代来了”…… 但真实情况没人说：这不是胜利宣言，而是被推理成本逼到墙角后的无奈自救。推理（跑模型回答用户）成本正在爆炸式吞噬OpenAI的利润，甚至威胁生存。前因：ChatGPT每天要处理海量用户查询，NVIDIA GPU又贵又抢手。 2025年10月，OpenAI就和Broadcom宣布合作开发自定义AI加速器，目标10吉瓦规模。现在Jalapeño出来了，OpenAI自己从头设计，Broadcom负责生产。后果：如果2026年底实现吉瓦级部署——推理成本有望降低约50%（Broadcom CEO原话），性能功耗比大幅优于当前顶级加速器。让ChatGPT、API和未来Agent产品跑得更快更便宜。 OpenAI将从“模型公司”彻底变成“全栈AI基础设施公司”，服务更多人，但也意味着大公司对底层算力的掌控更深。别人最忽视的细节（这些才是真正震撼的点）： ✅ 开发速度离谱：从初始设计到制造流片仅用9个月！而且是用OpenAI自己的AI模型辅助设计的（AI在帮自己设计加速自己的硬件，meta到爆）。 ✅ 这颗芯片只针对Inference（推理），不是训练。训练阶段大概率还是得继续依赖NVIDIA。 ✅ 首批样片已经到手，正在实测中。早期数据：性能功耗比显著优于当前最先进的水平”。 ✅ Broadcom CEO直接说：性能能媲美NVIDIA Blackwell + Google TPU，同时成本省一半。 ✅ 它不是孤零零一颗芯片，而是OpenAI未来多代计算平台的第一步，还带Broadcom的网络技术。 ✅ 名字叫「Jalapeño」，够辣，够应景这个越来越“spicy”的AI时代。这枚芯片的出现，其实在无声宣告：AI已经开始用自己加速自己的基础设施建设。而人类对算力的胃口，只会越来越大。你怎么看？是OpenAI的聪明自救，还是AI军备竞赛又一次疯狂升级？

译OpenAI发布首款自研AI芯片Jalapeño，专为ChatGPT、Codex、API及未来Agent产品的LLM推理设计，由Broadcom生产。从设计到流片仅用9个月，借助AI模型辅助设计。首批样片实测性能功耗比显著优于当前顶级加速器，Broadcom CEO称性能媲美NVIDIA Blackwell与Google TPU，成本减半。若2026年底实现吉瓦级部署，推理成本有望降低约50%。Jalapeño仅针对推理，训练仍依赖NVIDIA。此举标志OpenAI从模型公司向全栈AI基础设施公司转型。

xAI@xai · 6月25日47

Use the official @MongoDB plugin in Grok Build to query data, optimize indexes, and manage databases.

译在 Grok Build 中使用官方 @MongoDB 插件来查询数据、优化索引和管理数据库。

Greg Brockman@gdb · 6月25日64

Introducing Jalapeño — designed from scratch for LLM inference over nine months, accelerated by our models. Perf per watt looking incredible.

译OpenAI 联合 Greg Brockman 正式推出其首款 AI 芯片 Jalapeño，专为大语言模型推理任务从头设计，历时九个月。芯片已与 Broadcom 合作投入量产，将加速 ChatGPT、Codex、API 及未来智能体产品。Jalapeño 利用 OpenAI 自身模型进行加速，官方称其每瓦性能“令人难以置信”。这标志着 OpenAI 从产品到模型再到基础设施的全栈平台扩展，旨在规模化智能并扩大 AI 可及性。

向阳乔木@vista8 · 6月25日58

字节火山引擎大会上洪定坤的分享，有几点值得参考： 1. 用原型驱动开发：用 AI 生成可交互原型替代 PRD，基于这个讨论，提前暴露分歧。 2. AI Development系统化：AI写 Spec → 功能实现→ Browser Use 验证→自动提交上线 3. Harness 基建：上下文工程 + 架构约束 + 团队知识 Memory + 技术债梳理，能把可交付性从 40~60 分提到 80 分原文见评论

译字节火山引擎大会上，洪定坤分享了AI开发的三个核心方法论：1. 原型驱动开发——用AI生成可交互原型替代PRD，提前暴露分歧。2. AI Development系统化——AI写Spec→功能实现→Browser Use验证→自动提交上线。3. Harness基建——上下文工程、架构约束、团队知识Memory、技术债梳理，能将可交付性从40~60分提升至80分。

Rohan Paul@rohanpaul_ai · 6月24日65

OpenAI rolls out its 1st chip through a Broadcom tie-up as part of its “build the full stack” push. Jalapeño is an ASIC, so it is less flexible than an Nvidia GPU, but can be cheaper and faster when the workload is known very well. They say "the architecture reduces data movement and balances compute, memory, and networking resources to achieve realized utilization much closer to theoretical peak performance." Overall better performance per watt. Jalapeño also signals OpenAI’s shift from buying compute to shaping the whole stack: models, software, servers, networks, and now silicon. There was a 9-month tape-out, means OpenAI and Broadcom finalized the chip design and moved it to manufacturing unusually fast for advanced AI silicon. OpenAI says its own models helped speed up parts of the design work.

译OpenAI与Broadcom合作推出首款自研AI芯片Jalapeño（ASIC），专为ChatGPT、Codex、API及未来AI智能体产品的LLM工作负载设计。在已知工作负载下，Jalapeño比NVIDIA GPU更便宜、更快，通过减少数据移动、均衡计算/内存/网络资源实现更接近理论峰值的实际利用率，能效更优。该芯片从设计到流片仅用9个月，OpenAI自己的模型加速了部分设计工作。这标志着OpenAI从购买算力转向构建完整堆栈（模型、软件、服务器、网络、芯片）的战略转变。

AYi@AYi_AInotes · 6月24日61

AI集成的草莽时代正式结束咯！当大家都在喊AI替代人类工作的时候，最该用上AI的公司@NotionHQ 反而大规模扩招了，真的太有意思了， Notion这条招聘帖我觉得更像是AI时代一个聪明的反向宣言，职位列表里找不到一个叫Prompt Engineer的角色，取而代之的是模型行为工程师，AI评估师，AI治理专员，客户体验知识架构师这些，早年接个API，写几条提示词就能凑出功能的玩法已经行不通，现在要专人调优模型的行为边界，专人评估输出质量，专人设计权限与审计体系，每一环都变成了正经的工程化岗位。这些新角色本质上在回答同一个问题，当AI能干活了，人类该干什么？ Notion的答案很清晰，AI负责执行，人类负责定义什么值得执行，信任，关系，判断，结构化，商业放大，这些AI做不了的事，反而成了最核心的价值。所以他们一边猛招AI技术岗，一边扩招销售，客户成功，知识架构师， AI产品越复杂，越需要人去做AI做不了的事。其中模型行为工程师这个角色最有标志性，它不是传统的机器学习工程师，管的是模型在产品里的人格一致性，安全边界，上下文融合方式，就像当年移动互联网从网页套壳走向原生应用时，第一批原生开发工程师的出现，新工种的定义权，永远藏在这种没见过但一看就懂的职位名里。还有一个容易被忽略的细节，实习生要求里除了AI构建经验，还写明对艺术，历史，社会科学的兴趣，也就是说他们要的从来不是只会调API的工具人，要有判断力，能在复利工作上做取舍的人，毕竟工具会过时，但判断力不会。而且连这条帖子本身也是个信号，纯ASCII艺术的招聘帖拿到七千赞百万级浏览，这说明了信息过载的时代，人格化加可参与性的传播效率，远大于信息密度本身。 Notion用一整篇招聘说了一句没写出来的话，那就是AI时代的赢家，不会是最可能替代人的公司，而是最会定义人类新角色的公司。

译Notion 发布招聘帖，职位列表中找不到传统 Prompt Engineer，取而代之的是模型行为工程师、AI 评估师、AI 治理专员、客户体验知识架构师等工程化岗位，负责调优模型行为边界、评估输出质量、设计权限与审计体系。Notion 认为 AI 负责执行，人类负责定义价值，因此同步扩招销售、客户成功、知识架构师。实习生要求对艺术、历史、社会科学有兴趣，强调判断力。招聘帖采用 ASCII 艺术设计，获得高传播。

Chubby♨️@kimmonismus · 6月24日55

Absolutely insane: "Jalapeño was co-developed from initial design to manufacturing tape-out in just nine months, and the custom AI accelerator program represents what we believe to be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors." ChatGPT helped design the chip so they could reach 9 months of developement cycle "If AI can help engineers design better chips faster, it can lower the cost of compute across the industry and help democratize access to advanced AI."

译OpenAI 推出首款自研 AI 芯片 Jalapeño，专为 LLM 推理从零设计。从初始设计到流片仅用 9 个月，ChatGPT 参与了芯片设计，堪称高性能先进半导体领域最快的 ASIC 开发周期。该芯片由 Broadcom 和 Celestica 代工，针对 ChatGPT、Codex、API 及未来 Agent 产品的实际负载优化。早期样片已在实验室达到目标频率和功耗，成功运行 GPT-5.3-Codex-Spark 等 ML 负载；性能功耗比显著优于当前 SOTA，详细基准后续公布。部署计划于 2026 年底启动，战略上旨在减少对外部 GPU 依赖，加强对算力经济的控制。

meng shao@shao__meng · 6月24日66

OpenAI 发布首款自研推理芯片 Jalapeño OpenAI 联合 Broadcom（和 Celestica）从零设计了一款专为 LLM 推理优化的加速器 Jalapeño，9 个月完成流片，宣称能效显著优于当前 SOTA，计划 2026 年底起以吉瓦级规模部署——这是 OpenAI 把"全栈"延伸到芯片层的标志性一步。为什么 OpenAI 要造芯片？官方用了 "full-stack advantage"（全栈优势）和一个飞轮模型来论证：更好的基础设施 → 更高算力效率 → 更好的训练与推理 → 更强模型 → 更好产品 → 更多使用与收入 → 再投入下一代基础设施。逻辑上是把芯片作为飞轮的最底层杠杆：只有自己掌握芯片架构，才能让内核、内存、网络、调度、产品体验围绕同一目标协同优化。这与 Google（TPU）、Amazon（Trainium/Inferentia）、Meta（MTIA）走的是同一条垂直整合路径——前沿 AI 公司自研推理芯片已成行业共识。对 OpenAI 而言，还有一个直接的商业落点：推理是 AI 触达用户的环节。每一点成本、速度、可靠性的改善，都会直接转化为更快的 ChatGPT 回答、能多走几步的 Codex 任务、更便宜的 API、以及高峰期更稳的访问。

译OpenAI 联合 Broadcom 与 Celestica 从零设计首款自研推理芯片 Jalapeño，9 个月完成流片，专为 LLM 推理优化，能效优于当前 SOTA。计划 2026 年底起以吉瓦级规模部署，用于 ChatGPT、Codex、API 及未来智能体产品。OpenAI 称这是“全栈优势”关键环节，通过自研芯片构建飞轮：更好基础设施→更高算力效率→更好训练与推理→更强模型→更好产品→更多使用与收入→再投入。推理芯片直接改善成本、速度与可靠性，是 AI 触达用户的环节。

Chubby♨️@kimmonismus · 6月24日60

OpenAI just unveiled Jalapeño, its first custom AI chip designed from scratch for LLM inference- It is OpenAI moving deeper into the full stack: chips, kernels, memory, networking, racks, scheduling, deployment and product experience. OpenAI has learned from Cerebras-deal what is valuable in specialized inference hardware and is now attempting to translate that lesson into its own controllable platform. Built with Broadcom and Celestica, Jalapeño is optimized around the workloads OpenAI actually runs across ChatGPT, Codex, the API and future agentic products. Early samples are already running ML workloads in the lab at target frequency and power, including GPT-5.3-Codex-Spark. OpenAI says performance per watt should be substantially better than current state of the art, with detailed benchmarks coming later! The strategic angle is obvious: less dependence on external GPUs, more control over compute economics, and a stronger flywheel between models, products, revenue and infrastructure. Deployment is planned to start by the end of 2026.

译OpenAI 推出其首款自研 AI 芯片 Jalapeño，与 Broadcom 和 Celestica 合作构建，针对 ChatGPT、Codex、API 及未来智能体产品的工作负载优化。早期样品已在实验室以目标频率和功耗运行 ML 工作负载，包括 GPT-5.3-Codex-Spark。OpenAI 称每瓦性能显著优于当前最先进水平，详细基准稍后公布。部署计划于 2026 年底启动。此举旨在减少对外部 GPU 的依赖，增强对计算经济的控制，并强化模型、产品、收入与基础设施之间的飞轮效应。

SemiAnalysis@SemiAnalysis_ · 6月24日53

NVIDIA POOR DRIVER QUALITY ALERT: There is a GB300 NVL72 firmware bug where the rack needs to be rebooted every 66.5 days. Although people tend to think of NVIDIA as having top-tier software, it turns out there are still many issues with NVIDIA drivers and firmware. The thing is, among the competition, NVIDIA just has the least-worst software quality.

译NVIDIA 驱动质量警告：GB300 NVL72 存在固件 bug，机架每 66.5 天需重启一次。虽然人们通常认为 NVIDIA 拥有顶级软件，但事实证明其驱动和固件仍存在许多问题。关键在于，在竞争对手中，NVIDIA 只是软件质量最不差的那个。

Kimi.ai@Kimi_Moonshot · 6月24日50

The Kimi API is now live on AWS Marketplace. 🚀 If your team is already running on AWS, you can now access Kimi with consolidated billing. Plus, eligible customers can apply Kimi API usage directly toward their AWS EDP commitments. Build and scale with Kimi today: https://aws.amazon.com/marketplace/pp/prodview-rfjb2elzc5jp4

译Kimi API 现已上线 AWS Marketplace。🚀 如果你的团队已经在使用 AWS，现在可以通过合并计费访问 Kimi。此外，符合条件的客户可将 Kimi API 使用量直接计入其 AWS EDP 承诺。立即使用 Kimi 构建和扩展：https://aws.amazon.com/marketplace/pp/prodview-rfjb2elzc5jp4

Alibaba Cloud@alibaba_cloud · 6月24日49

Rule-based data classification is officially a thing of the past. 🚀Alibaba Cloud’s Data Security Center (DSC) now leverages a converged architecture of AI Foundation Models + Expert Models + Regex Rules to deliver: 800+ auto-identified data typesContext-aware accuracy & recallMillisecond-level compliance response️ Seamless cloud-native integration Shift from rule-driven to AI-driven data security today! https://int.alibabacloud.com/m/1000414795/ #DataSecurity #AI #DataSecurityCenter #AlibabaCloud

译阿里云数据安全中心（DSC）采用AI基础模型+专家模型+正则表达式的融合架构，取代传统规则驱动的数据分类。新方案支持800+种数据类型自动识别，具备上下文感知的准确率与召回率，实现毫秒级合规响应，并支持无缝云原生集成。

Alibaba Cloud@alibaba_cloud · 6月24日38

Alibaba Cloud joined the CCI France Chine Gala 2026 on May 29th. LVMH x Alibaba Cloud, together we win the Innovation & Transformation Award - "Responsible Generative AI for Luxury Retail in China". A landmark partnership embedding Alibaba's Qwen and Alibaba Cloud Model Studio into LVMH's retail ecosystem to deliver responsible, Generative AI-powered luxury experiences. Thanks for the trust and support! #AlibabaCloud #Qwen #LVMH #CCIFranceChine

译阿里云于5月29日参加了2026年中法工商会晚宴。 LVMH x 阿里云，我们共同赢得创新转型奖——"负责任生成式AI应用于中国奢侈零售"。这一里程碑式的合作将阿里的通义千问和阿里云Model Studio嵌入LVMH零售生态，提供负责任的、由生成式AI驱动的奢侈品体验。感谢信任与支持！ #AlibabaCloud #Qwen #LVMH #CCIFranceChine

OpenClaw🦞@openclaw · 6月24日46

🦞 OpenClaw 2026.6.10 just dropped. Just a small release to keep things brewing: ⚡ Automatic fast mode for short talks 🧠 Much more reliable model routing 🔒 Safer session state + trusted policies 🛠️ Better provider onboarding Helping deliver rock-solid lobsters. 🦞 https://github.com/openclaw/openclaw/releases/tag/v2026.6.10

译🦞 OpenClaw 2026.6.10 刚刚发布。只是一个小型发布，保持进展： ⚡ 短对话自动快速模式 🧠 更可靠的模型路由 🔒 更安全的会话状态 + 受信任的策略 🛠️ 更好的提供商接入帮助交付坚如磐石的龙虾。🦞

Rohan Paul@rohanpaul_ai · 6月24日55

How Andrew Ng organizes his engineering team to move faster in the era of AI. "1 to 10 engineers in a team, often made up of generalists: high-context, highly empowered generalists." When code gets generated much faster, organizations become the slow part. Once a feature can move from idea to working prototype in a day, every surrounding function is suddenly exposed. Product has to decide faster, design has to clarify faster, marketing has to understand faster, and legal has to review faster. So his way is 1-10 high-context generalists who can move much faster because they do not need every decision translated across departments before anything happens. --- From "LangChain" YouTube channel, (link in comment)

译Andrew Ng分享了AI时代如何组织工程团队以加速：1到10人的团队由高度授权的通才组成，保持高语境。当代码生成速度大幅提升后，组织反而成为瓶颈——功能从想法到原型只需一天，迫使产品、设计、营销、法务等所有环节同步加速。他的解决方案是让少数高语境通才团队独立决策，避免跨部门翻译带来的延迟。源自LangChain YouTube频道。

AK@_akhaliq · 6月24日42

hf-claude works well with glm 5.2 hf extensions install hf-claude

译hf-claude 与 GLM 5.2 兼容良好 hf extensions 安装 hf-claude

Rohan Paul@rohanpaul_ai · 6月24日41

Brilliant 🫡 NVIDIA’s Rubin AI servers can now cool every chip and networking part with 45°C water-glycol coolant instead of cold air. The big deal with this is that cooling water use can drop from about 2.6M gallons per MW per year to near zero in suitable climates Traditional data centers cool air, then force that air across servers, so the building needs fans, chillers, cold aisles, and often cooling towers that dump heat by evaporating water. Direct-to-chip cooling skips most of that air problem by bolting cold plates onto GPUs, CPUs, and networking parts, then pumping water-glycol coolant through them so heat leaves the chip through liquid instead of room air. The strange part is that warmer coolant can be more efficient, because 45°C inlet coolant and roughly 55°C outlet coolant are hot enough for outdoor dry coolers to reject heat like a car radiator in many climates. A cooling tower spends water to remove heat through evaporation, while a dry cooler spends fan power to move heat into outside air, so water use can fall from about 2.6M gallons per MW per year to near zero in the right location. NVIDIA’s design is not water-free, because the loop still uses mostly water mixed with glycol, but it can be closed-loop, meaning the same liquid circulates rather than being continually evaporated. Single-phase immersion goes further by putting electronics in a non-conductive dielectric fluid, where the fluid stays liquid, pulls heat from many surfaces at once, and can run in a sealed loop without water evaporation. Another strong claim from the immersion side is not only “less water,” but zero process-water cooling plus easier heat capture, because the whole server bath becomes a heat collector. Heat reuse works only when someone nearby needs low-grade heat, such as buildings, greenhouses, or industrial preheating, while BESS does not create energy but can shift power demand and provide grid services. They say that a 50MW AI facility can save over $4M per year in cooling-related energy and water costs by shifting to liquid-cooled infrastructure. And fully liquid-cooled Rubin servers can shrink a system from 6 rack units to 2, which means more AI compute fits into the same building footprint.

译NVIDIA Rubin AI 服务器用 45°C 水-乙二醇冷却液直接冷却芯片和网络部件，取代传统空气冷却。在适宜气候下，干式冷却器可替代冷却塔，设施冷却水用量从约 260 万加仑/MW/年降至接近零。液冷为闭环循环，不持续蒸发水。一个 50MW AI 设施每年可节省超 400 万美元冷却能源和水费。全液冷 Rubin 服务器还将系统从 6 个机架单元缩减至 2 个，在相同空间内容纳更多计算。引用 NVIDIA 数据称，数据中心用水仅占美国日常用水量的 0.2%，液冷却正大幅降低水耗并创造热量再利用机会。

OpenRouter@OpenRouter · 6月24日41

Access all the providers for GLM 5.2 in one spot. Including one serving at over 125 TPS: https://openrouter.ai/z-ai/glm-5.2

译在一个地方访问 GLM 5.2 的所有供应商。包括一个服务速率超过 125 TPS：https://openrouter.ai/z-ai/glm-5.2