Great work to @vllm_project team and @NVIDIA on smooth, out-of-the-box day 0 @MiniMax_AI M3 experience with @inferact EAGLE3 spec decode. Here are the details of ongoing M3 workstream: NVIDIA, Inferact and SemiAnalysis are working hard on enabling disaggregated inferencing (PR 45879), and the Inferact team is working on enabling FlashInfer M3 MoE kernels (PR 45723). Performance should be much better once those PRs land. Huge shoutout to @rogerw0108 & @mgoin_ and the maintainers for the rapid review and mentorship here!

译vLLM 团队与 NVIDIA 合作，为 MiniMax M3 模型提供开箱即用的 day 0 体验，并集成 Inferact 的 EAGLE3 推测解码。当前工作包括：NVIDIA、Inferact 与 SemiAnalysis 推动拆分推理（PR 45879），Inferact 团队启用 FlashInfer M3 MoE 内核（PR 45723），落地后性能将显著提升。NVIDIA 表示 M3 已加入 DeepSeek V4 和 Kimi-K2.6 等前沿开放智能体模型行列。NVIDIA Blackwell Ultra 在 M3 上比 Hopper 实现最高 5 倍 AI 工厂吞吐量，并超过 300 TPS/user。未来通过优化内核、NVFP4 及 NVIDIA Dynamo 拆分推理等，性能有望进一步提升。

elvis@omarsar0 · 6月18日56

I was a bit suspicious of the claim, but GLM-5.2 is pretty good at designing stuff. Obviously not at the level of a professional designer, but it has that Opus-level quality. Great at: - games - landing pages - HTML artifacts - 3D worlds Wish I had Fable 5 to compare with.

译GLM-5.2 在 Design Arena 上以 Elo 1360 跃居第一，超过已下架的 Claude Fable 5，排名提升 4 位、Elo 提高 27 分，且为开源权重。DAIR.AI 的 Elvis Saravia 实测认为其设计能力不错，虽未达专业设计师水平，但具备 Opus 级质量，擅长游戏、落地页、HTML artifacts 及 3D 世界等任务。

Chubby♨️@kimmonismus · 6月18日48

All the major news outlets agree: The biggest winner in the Anthropic controversy is open source. And I wholeheartedly agree. I said it a few days ago, and I'll say it again: it's the biggest PR win for open source ever. Whether it's Bloomberg, Fortune, or CNBC: the consensus is clear: "Making the model open means that companies, governments or organizations with sufficient hardware can run it locally, and never have to worry about it being yanked on a whim." (Bloomberg) The reason is as simple as it is straightforward: Companies and entire alliances of states that have had their access cut off overnight will look for a sovereign solution that makes them relatively insensitive to shutdowns. By far the most significant and powerful open-source models come from China. This has, in effect, done China a great favor, probably unintentionally. Open source is essentially a lifeline, the opportunity to continue participating in this revolution with the assurance of independence. That's why the GLM 5.2 release was so important and came at precisely the right time. Open source is the solution.

译多家主流媒体（Bloomberg、Fortune、CNBC）一致认为，Anthropic争议的最大赢家是开源。Bloomberg指出，开源模型可本地运行，无需担心被随意撤下。被切断访问的企业和国家联盟会寻求主权解决方案，而目前最强大的开源模型来自中国，这无意中利好中国。推文认为GLM 5.2发布恰逢其时，开源成为保障独立参与AI革命的关键。

Artificial Analysis@ArtificialAnlys · 6月18日51

A standout number in Z ai’s GLM-5.2 launch is CritPt, a benchmark of unpublished research-level physics problems where it ties with Claude Opus 4.8 and is well above other open weights models Key takeaways: ➤ @Zai_org ’s GLM-5.2 (max reasoning effort) leads open weights by a wide margin: the next open model, DeepSeek V4 Pro, scores 12.9% ➤ GLM-5.2 matches Claude Opus 4.8 (20.9%) and beats several proprietary models, including GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 ➤ Only proprietary models score higher with GPT-5.5 Pro topping the benchmark at 30.6% ➤ A 4.5× generational jump: GLM-5.1 scored just 4.6% on CritPt ten weeks ago

译智谱发布 GLM-5.2（最大推理努力），在 CritPt 基准（未发表研究级物理问题）上得分 20.9%，与 Claude Opus 4.8 持平，远超其他开放权重模型。DeepSeek V4 Pro 仅得 12.9%；GLM-5.2 同时超越 GPT-5.5、Gemini 3.1 Pro 和 Claude Opus 4.7 等专有模型。仅 GPT-5.5 Pro 以 30.6% 领先。相比十周前 GLM-5.1 的 4.6%，实现 4.5 倍代际提升。

向阳乔木@vista8 · 6月18日37

高风亮节，这个操作赚口碑，但这些数据会不会用来训练？

小互@xiaohu · 6月17日74

OpenAI 格局大了宣布Codex （包含 App 客户端、命令行 CLI 和开发包 SDK）支持直接接入任何开源大模型不强制绑定 OpenAI 自家的模型并且放出了一个文档：手把手教开发者如何把 Codex 客户端底层的“大脑”，替换成免费的开源模型…

SiliconFlow@SiliconFlowAI · 6月17日72

Just dropped the entire War and Peace (~750K tokens) into GLM-5.2. Then asked it to analyze the book and build an interactive 3D character universe. The result: · 27 characters, 9 factions · ~50 relationships mapped across 66,000 lines No drift, no confusion, still had room to think GLM-5.2 is now live on SiliconFlow🔥 Time to give it a try and show us what you build👇

译智谱 GLM-5.2 已在硅基流动上线，完全开源。该模型将《战争与和平》（约750K tokens）完整输入后，成功分析并构建出包含27个角色、9个派系、约50组关系映射的交互式3D角色宇宙（66,000行代码），无漂移无混淆。GLM-5.2 在 CodeArena 排名第一的可用模型；支持1M上下文窗口，生产级编码能力与 Opus 4.8 相当；提供双思考模式（max 深度、high 质量-成本平衡）。定价：输入缓存/输入/输出分别为 $0.26/1.40/4.40 每百万 token。

Rohan Paul@rohanpaul_ai · 6月17日50

This was long needed for AI in finance. Making SEC filings readable for machines without flattening the accounting logic. Stanford + Univ of Calif + Nanjing Univ researcher has just released a dataset and methods for a cleaner way to turn SEC filings into useful LLM training data without losing the meaning inside financial tables. A 152B-token public snapshot and estimate the full archive could become about 550B tokens of long financial documents. Has less than 0.1% overlap with Common Crawl-derived corpora. The authors propose SEFD, a rebuilt version of EDGAR filings that keeps table structure, indentation, and financial meaning while using fewer tokens for LLM training. The dataset turns EDGAR into layout-faithful MultiMarkdown, preserving merged headers, indentation, signs, spans, and table hierarchy while shrinking enormous presentation scaffolding into usable tokens. ---- Link – arxiv. org/abs/2606.18192v1

译斯坦福、加州大学与南京大学研究人员发布SEFD数据集与方法，将SEC EDGAR文件转换为布局忠实的MultiMarkdown格式，保留合并表头、缩进、符号、跨度和表格层级，同时压缩冗余呈现模板，使财务表格的结构与会计逻辑可被LLM直接利用。公开152B token快照，估计完整档案约550B token长文档。该数据集与Common Crawl衍生语料重叠不足0.1%。

Emad@EMostaque · 6月17日54

Important to note @Zai_org train on @Huawei Ascend chips, no NVIDA (!) So you have frontier -3 months on a fully Chinese stack, 90% cheaper. Would estimate the total cost of this to be $25m, largely post training (80%) @Zai_org market cap now nearly $100b, $$s in open source!

译值得注意的是 @Zai_org 在 @Huawei Ascend 芯片上训练，没有 NVIDIA (!) 因此你拥有前沿 -3 个月，完全中国堆栈，便宜 90%。我估计总成本为 2500 万美元，主要在后训练（80%） @Zai_org 市值现在接近 1000 亿美元，$$ 在开源中！

Rohan Paul@rohanpaul_ai · 6月17日55

This was long needed for AI in finance. Making SEC filings readable for machines without flattening the accounting logic. Stanford researcher has just released a dataset and methods for a cleaner way to turn SEC filings into useful LLM training data without losing the meaning inside financial tables. A 152B-token public snapshot and estimate the full archive could become about 550B tokens of long financial documents. Has less than 0.1% overlap with Common Crawl-derived corpora. The authors propose SEFD, a rebuilt version of EDGAR filings that keeps table structure, indentation, and financial meaning while using fewer tokens for LLM training. The dataset turns EDGAR into layout-faithful MultiMarkdown, preserving merged headers, indentation, signs, spans, and table hierarchy while shrinking enormous presentation scaffolding into usable tokens. ---- Link – arxiv. org/abs/2606.18192v1

译斯坦福研究者发布SEFD数据集与处理方法，将SEC EDGAR申报文件转化为适合LLM训练的结构化数据，保留表格结构、缩进、合并表头、符号、跨度及层级关系。公开快照包含152B token，完整档案约550B token。该数据与Common Crawl语料重叠度低于0.1%。采用布局保真的MultiMarkdown格式，大幅压缩原有演示框架，保留财务含义的同时减少token浪费。

Tibo@thsottiaux · 6月17日30

Reminder that you can use the Codex App, CLI and SDK with any open source model, not just with OpenAI models. https://developers.openai.com/codex/config-advanced#oss-mode-local-providers

译提醒一下，你可以使用 Codex App、CLI 和 SDK 搭配任何开源模型，不仅仅限于 OpenAI 模型。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日80

ZAI 🔥: GLM-5.2 by @Zai_org scored 51 point on Artificial Analysis Intelligence Index and got placed on the 4th spot! This made GLM-5.2 a new SOTA open-weight model. Besides that, GLM-5.2 got ranked second on Frontend Code Arena, after currently unavailable Claude Fable 5. Should be ZOTA! 👀

译Z ai 推出 GLM-5.2，在 Artificial Analysis Intelligence Index 上得 51 分排名第四，成为开源权重 SOTA。模型规模同 GLM-5.1（744B 总/40B 活跃参数），智能指数 v4.1 提升 11 分。科学推理显著增强：CritPt +16% 至 21%，HLE +12% 至 40%，GPQA Diamond +3% 至 89%。上下文窗口升至 1M tokens。API 定价 $1.4/$4.4/$0.26 每 1M 输入/输出/缓存命中 token，每任务成本约 $0.46，处智能 vs 成本帕累托前沿。MIT 许可证，已上线 DeepInfra 等第三方平台。

Artificial Analysis@ArtificialAnlys · 6月17日61

Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task @Zai_org’s GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens Key results: ➤ GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) ➤ Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89% ➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k) ➤ On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05) Additional Model Details: ➤ License: MIT ➤ Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1 ➤ Context window: 1M tokens, up from 200K on GLM-5.1 ➤ Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens ➤ Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including @DeepInfra, @novita_labs, @nebiusai, @parasailnetwork , @SiliconFlowAI , @gmi_cloud , @Baseten and @FireworksAI_HQ

译Z ai 发布 GLM-5.2（744B 总参数/40B 活跃参数），在 Artificial Analysis Intelligence Index v4.1 上得分 51，超越 MiniMax-M3、DeepSeek V4 Pro 和 Kimi K2.6。科学推理大幅提升：CritPt +16、HLE +12、GPQA Diamond 达 89%。GDPval-AA v2 得分 1524，与 GPT-5.5 (xhigh reasoning) 相当。上下文窗口扩展至 1M tokens，MIT 许可证。第一方 API 定价 $1.4/$4.4/$0.26 每百万输入/输出/缓存命中 token，每任务成本约 $0.46，处于智能 vs 成本帕累托前沿。

数字生命卡兹克@Khazix0918 · 6月17日56

智谱 YYDS！官方评分也终于出来了，真是真的可以跟 Opus 4.8 掰掰手腕了

译智谱发布GLM-5.2，开源模型（MIT许可），在编码和智能体任务上有显著提升，支持1M上下文窗口。提供两种推理努力级别：GLM-5.2 (max) 极限模式、GLM-5.2 (high) 性能与token效率平衡。API定价与GLM-5.1保持不变。官方评测显示其性能已可与Opus 4.8竞争。

Orange AI@oran_ge · 6月17日71

智谱发布的 GLM 5.2 今日正式开源它的的意义在于 GLM 5.2 是首个编程 coding 能力达到 Opus 水平的开源模型我们已经在第一时间将其接入 Cola，作为 beta 模型供大家测试。模型定价与官方相同欢迎大家体验和反馈

译智谱今日正式开源 GLM 5.2，这是首个编程 coding 能力达到 Opus 水平的开源模型。目前该模型已接入 Cola 作为 beta 模型开放测试，定价与官方一致，欢迎体验和反馈。

Alibaba Cloud@alibaba_cloud · 6月17日28

🚀 Flink Forward Asia 2026 lands in Shenzhen for the first time! 🗓️ June 26–27 📍 InterContinental Shenzhen OCT Theme: Real-time Data Power Future AI 🌟 70+ speakers from Alibaba Cloud, Qwen, ByteDance, Tencent, LinkedIn & more 🔹 Deep‑dive tracks: AI Native, multi‑modal streams, Agents, inference acceleration 🎁 Exclusive merch + prizes on site 🎟️ Free registration: https://hd.aliyun.com/form/8369 #FlinkForwardAsia #FFA2026 #ApacheFlink #AI #StreamProcessing #Shenzhen

译🚀 Flink Forward Asia 2026 首次登陆深圳！ 🗓️ 6月26–27日 📍 深圳华侨城洲际酒店主题：实时数据驱动未来 AI 🌟 70+ 位演讲者来自阿里云、通义千问、字节跳动、腾讯、LinkedIn 等 🔹 深度专题：AI 原生、多模态流、智能体、推理加速 🎁 现场独家周边 + 奖品 🎟️ 免费注册：https://hd.aliyun.com/form/8369 #FlinkForwardAsia #FFA2026 #ApacheFlink #AI #流处理 #深圳

歸藏(guizang.ai)@op7418 · 6月17日72

智谱 GLM-5.2 可以在 Codepilot 模型管理里面自行添加哈

译智谱 GLM-5.2 正式发布并开源，定位处理长周期任务。模型具备稳定的100万上下文窗口，并引入思考力度控制。架构上采用 IndexShare 机制，每四层稀疏注意力共享同一个 indexer，在百万 token 上下文中将每 token 计算量降低约 2.9 倍。用户现可在 Codepilot 模型管理中添加使用 GLM-5.2。

SiliconFlow@SiliconFlowAI · 6月17日42

Code like a real G😎 Congrats to @Zai_org 's GLM 5.2 ranks #1 as available model on CodeArena 💪 SiliconFlow is proud to be T+0 launch partner🔥 💰 Input Cache/Input/Output: $ 0.26/1.40/4.40 per 1M tokens 📚 Usable 1M context for entire codebases and project-scale workflows ⚙️ Reliable long-horizon execution that stays on track through complex tasks 💪 Production-grade coding on par with Opus 4.8 🧠 Dual thinking modes: max for depth, high for quality-cost balance And it's still fully open-source. Big shoutout to @Zai_org for keeping frontier model accessible to builders and the community 🙌 Get started today 👇

译智谱 GLM 5.2 在编码评测 CodeArena 的可用模型中排名第一。硅基流动同步首发，定价 Input Cache/Input/Output 分别为 $0.26/1.40/4.40 每百万 token，支持 1M 上下文，具备可靠的长时间任务执行能力，编码性能与 Opus 4.8 持平。提供双思考模式：max 侧重深度，high 侧重质量成本平衡。模型完全开源。

karminski-牙医@karminski3 · 6月17日67

GLM-5.2正式发布啦！一会给大家带来评测视频~

译智谱（Z.ai）发布GLM-5.2模型，编程与智能体任务显著改进，支持1M上下文窗口。提供两种推理模式：GLM-5.2（max）追求极限性能，GLM-5.2（high）平衡性能与token效率。模型权重以MIT许可开源，API定价与GLM-5.1保持一致。

歸藏(guizang.ai)@op7418 · 6月17日79

智谱 GLM-5.2 正式发布和开源了，基准测试成绩相当吓人核心定位是处理长周期任务，并且有稳定的 100 万上下文，模型还引入了思考力度控制。架构层面，GLM-5.2 提出了 IndexShare 机制，每四层稀疏注意力共享同一个 indexer，从而在百万 token 上下文下将每 token 的计算量降低约 2.9 倍。

译智谱发布并开源 GLM-5.2，定位长周期任务，支持 100 万 token 稳定上下文。引入思考力度控制：GLM-5.2 max 追求极限性能，GLM-5.2 high 兼顾效率。架构采用 IndexShare 机制，每四层稀疏注意力共享 indexer，百万 token 下每 token 计算量降低约 2.9 倍。编码与智能体任务表现显著提升。模型权重以 MIT 许可证开源，API 定价与 GLM-5.1 一致。

Orange AI@oran_ge · 6月17日76

GLM 5.2 的意义在于开源模型的 Coding 能力第一次达到了 Opus 水平

译GLM-5.2 开源模型发布，其编程（Coding）能力首次达到Opus级别。该模型在编程与智能体（Agentic）任务上显著提升，支持1M上下文窗口，提供两级推理难度——GLM-5.2 (max) 追求极限性能，GLM-5.2 (high) 平衡性能与token效率。采用MIT许可证开源，API定价与GLM-5.1保持一致。

elvis@omarsar0 · 6月17日50

The era of meta apps is here.

译元应用时代已经到来。

Ethan Mollick@emollick · 6月17日58

Credit to GLM-5.2 Max, the new open weights model, for pulling this off. ...but you can see the difference between it and Fable in a way benchmarks don't show. GLM-5.2 gives a correct poem (& the Welsh is fun) but Fable weaves the disappearing letters into the theme of the poem.

译归功于 GLM-5.2 Max，这个新的开放权重模型，成功完成了这个任务。 ...但你能看出它和 Fable 之间的区别，这种区别是基准测试无法体现的。GLM-5.2 给出了一首正确的诗（威尔士语很有趣），但 Fable 将消失的字母融入了诗歌主题。

meng shao@shao__meng · 6月17日66

微软 Copilot Cowork 正式 GA，考虑引入 Azure 托管的 DeepSeek V4 作为低成本模型选项，按算力/用量计费 token maxxing 已经被证实商业模式不可行！ Copilot Cowork 等 Agent 无法再用「包月无限用」的模式卖，因为 Agent 会在一个任务里反复调用模型（读文件、写代码、调工具、自我纠错），token 消耗因此急剧放大；用户每周跑几百个任务时，生产力上去了，账单也会失控。简单任务也被丢给最贵的 frontier 模型，进一步推高成本。 DeepSeek 进入 Copilot 栈？ · 正在测试微调版 DeepSeek V4，作为 Anthropic / OpenAI 模型的低成本替代 · 预计数周内公布最终选择 · 若落地：可选、非强制，完全托管在 Azure 上，数据不出 Microsoft 云，走现有企业安全/合规/数据驻留体系 · 已做微调，并加入减少偏见等安全层

译微软 Copilot Cowork 正式全球可用，支持多模型。为控制成本，正评估引入微调版 DeepSeek V4 作为 Anthropic/OpenAI 模型的低成本替代，按算力/用量计费。模型完全托管于 Azure，数据不出微软云，已加入安全层，数周内公布。同时指出，Agent 任务反复调用模型致 token 消耗大幅增加，包月无限用模式已不可行。

Chubby♨️@kimmonismus · 6月17日69

Open Source is so back. Let’s freaking go

译GLM-5.2 以 Elo 1360 在 Design Arena 代码类别中跃居第一，超越现已下架的 Claude Fable 5，且权重开放。这是自该榜单启动以来代码类别的最高 Elo 分数之一，较之前提升了 4 个名次和 27 Elo 分。 Open Source is so back. Let’s freaking go

MiniMax (official)@MiniMax_AI · 6月17日28

This weekend, we’re bringing M3 open-weights to the first RSI-focused hackathon, co-hosted by @hud_evals × @ycombinator. in 24 hours, top builders will turn verifiable tasks into RL environments, evals, RFT workflows, and agents. because you can improve models at anything you can verify. the only question left is: what will you teach them? RSVPs are open until tomorrow June 17th 👇

译本周末，我们将把 M3 开源权重带到首届 RSI 聚焦的黑客马拉松，由 @hud_evals × @ycombinator 联合主办。在 24 小时内，顶尖构建者将把可验证任务转化为 RL 环境、评测、RFT 工作流和 AI 智能体。因为你可以改进任何你能验证的事情上的模型。唯一剩下的问题是：你会教它们什么？ RSVP 开放至明天，6 月 17 日 👇

Rohan Paul@rohanpaul_ai · 6月17日70

DeepSeek takes the crown as China’s most valuable AI startup after a massive $7.4B raise at a $50B valuation. The unusual part is control: Liang Wenfeng, DeepSeek’s founder, held almost 90% of the company before the financing and invested around $3 B as the biggest contributor. DeepSeek’s bet is to keep pushing open-source models and AGI research, while also helping domestic chipmakers such as Huawei run powerful models despite U.S. chip limits. Other top disclosed investors : Tencent: about $1.5B CATL: about $740M China’s National Artificial Intelligence Industry Investment Fund: about $150M

译DeepSeek完成74亿美元融资，估值达500亿美元，成为中国估值最高的AI初创公司。创始人梁文峰在融资前持股近90%，并以约30亿美元个人出资成为最大投资方。本轮主要投资者包括腾讯（约15亿美元）、宁德时代（约7.4亿美元）以及国家人工智能产业投资基金（约1.5亿美元）。DeepSeek计划继续推进开源模型和AGI研究，同时帮助华为等国内芯片制造商在美国芯片限制下运行强大模型。

Nathan Lambert@natolambert · 6月17日47

It's hard to pinpoint open-closed gap and so-on, but I trust the @arena team and just look where GLM 5.2 is on this. An MIT licensed, to be open weight model. At this point you could argue they have a better agent than Gemini does. That's a serious accomplishment.

译很难精确衡量开源与闭源的差距等等，但我信任 @arena 团队，直接看 GLM 5.2 所处的位置就行。这是一个采用 MIT 许可证、即将开源权重的模型。到这一步，你甚至可以说它的智能体比 Gemini 还要好。这是实打实的成就。

elvis@omarsar0 · 6月17日70

No time wasting on the frontier of open-weight models. GLM-5.2 looks impressive based on the results I've seen. Very curious to see how it holds on long-horizon tasks.

译Z.AI 发布 GLM-5.2，采用 MIT 许可证开源权重。模型在编码与智能体任务上显著提升，支持 1M 上下文窗口，具备长时能力。提供两种推理力度：GLM-5.2 (max) 与 GLM-5.2 (high)，后者平衡性能与 token 效率。API 定价与 GLM-5.1 相同。DAIR.AI 的 Elvis Saravia 评价其在前沿开放权重模型中表现令人印象深刻，并关注其长时任务表现。

Nathan Lambert@natolambert · 6月17日45

Still hard to expect the unexpected with AI. It goes to show how skilled many of the scientists are in China. They're hitting high peaks with much less compute. Overall, I think the US models are really ahead, but you can't just discount the Chinese labs. Not at all.

译智谱（Zhipu AI）最新模型 GLM-5.2 在 Design Arena 上以 1360 Elo 跃居第一，超越已下架的 Claude Fable 5，并开源权重。此次排名上升 4 位、Elo 提升 27 分，创下该基准代码类别的历史最高分之一。AI 分析师 Nathan Lambert 评价称，中国科研团队用更少算力达到高水准，虽美国模型整体领先，但无法忽视中国实验室的进步。

Chubby♨️@kimmonismus · 6月17日75

Axios reports that Microsoft is considering a Microsoft-hosted version of DeepSeek V4 as a cheaper model option for Copilot Cowork. Microsoft says Copilot Cowork can’t work on unlimited pricing. “We have users who do hundreds of tasks a week… but the consequence is the costs can go very high,” Charles Lamanna told Axios. So Microsoft is moving Copilot Cowork to usage-based pricing, and exploring cheaper open-source model options. If Microsoft really goes with DeepSeek, it would be optional, fine-tuned, safeguarded, and hosted fully on Azure. Still: Microsoft adding a Chinese AI model to an enterprise Copilot product would be huge.

译微软正考虑为 Copilot Cowork 提供微软托管的 DeepSeek V4 版本，作为更便宜的模型选项。Copilot Cowork 将放弃无限定价，转向按使用量计费，原因是成本过高（用户每周执行数百项任务导致费用激增）。若采用 DeepSeek，该模型将是可选的、经过微调与安全防护，并完全托管于 Azure。Axios 报道称微软已微调了一个可用模型，最终决定待定。

Chubby♨️@kimmonismus · 6月17日83

Lets go, GLM-5.2 released as Open Weights model. tl;dr -1M context window -MIT-licensed open weights -Stronger long-horizon coding agents -Two reasoning modes: max and high -Same API pricing as GLM-5.1 Zai says GLM-5.2 was trained specifically for large-scale implementation, automated research, performance optimization, and complex debugging. Open Source got a serious upgrade today!

译GLM-5.2 作为开放权重模型发布，采用 MIT 许可，拥有 1M 上下文窗口。提供两种推理模式：max（极限推理）和 high（平衡性能与 token 效率）。在编码和智能体任务上有显著提升，专为大规模实现、自动化研究、性能优化和复杂调试训练。API 定价与 GLM-5.1 保持一致。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日77

ZAI 🔥: GLM-5.2 is now available on huggingface! > It comes with a 1M context window and 2 levels of reasoning effort, max and high. MIT license and same pricing as GLM-5.1. > GLM-5.2 scores 46.2% on DeepSWE, the SOTA score among open-weight models.

译ZAI 在 Hugging Face 上发布 GLM-5.2，采用 MIT 开源许可，API 定价与 GLM-5.1 相同。模型支持 1M 上下文窗口，提供两种推理努力级别：max（极致性能）和 high（平衡性能与 token 效率）。在编程和 AI 智能体任务上有显著提升，具备长程任务能力。DeepSWE 基准得分 46.2%，创下开源权重模型的 SOTA 纪录。

Z.ai@Zai_org · 6月17日73

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: http://z.ai/blog/glm-5.2 Weights: http://huggingface.co/zai-org/GLM-5.2 API: http://docs.z.ai/guides/llm/glm-5.2 Coding Plan: http://z.ai/subscribe Chat: http://chat.z.ai

译智谱（Z.ai）正式发布GLM-5.2，采用MIT开源协议开放模型权重。相比前代，在编码和智能体任务上有显著提升，支持1M上下文窗口。提供两种推理努力级别：GLM-5.2（max）追求极致性能，GLM-5.2（high）在效果与token效率间取得平衡。API定价与GLM-5.1保持一致。技术博客、权重及API文档均已上线。

eric zakariasson@ericzakariasson · 6月17日43

extremely excited to join this talented team. a lot of things in the works!

译非常兴奋能加入这个才华横溢的团队。很多工作正在进行中！（引用推文要点：Cursor宣布与SpaceX联手，推动实用AI前沿，预计Cursor将很快迎来重大改进。）

Jim Fan@DrJimFan · 6月17日64

Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake. Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence. ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones. A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning. /goal: we all take a holiday and Jensen wouldn't even notice ;) We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:

译NVIDIA GEAR 实验室首次在物理世界启用 AutoResearch，推出 ENPIRE 项目。给 8 个 Codex 智能体分配机器人舰队、GPU 和 token 预算，目标快速安全完成任务。人类退出后，机器人舰队自主学会寻找视觉线索、重置场景、练习新技能、调整控制栈、阅读论文、辩论反思。ENPIRE 能高精度完成扎带、整理细针、安装 GPU 等任务。发现物理扩展：8 机器人并行探索比少机器人效率显著提升。实验室部分可整夜自我改进，早上读取报告。所有内容将开源。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日41

MISTRAL 🔥: A new “fat” model family has been teased to arrive this summer! The model will be open-weight and initially released in early access for key partners. > This will be the start of a new family of models, fat indeed, but sparse. We're opening up an early access program in July for key partners in research, government and the industry. > This model and upcoming ones will be open-weight. We believe this is critical for our customer confidence and for the research and developer communities. Le Chaton Fat soon? 👀

译Mistral 预告将在今年夏季推出一个新的“fat”模型系列，模型为 open-weight，7 月面向研究、政府和行业关键合作伙伴开放早期访问。官方称该系列“fat indeed, but sparse”（大但稀疏），并强调开放权重对客户信任和开发者社区至关重要。后续模型也将保持开源。此外，推文还提及了“Le Chaton Fat”的代号。

向阳乔木@vista8 · 6月16日52

Factory AI CEO的播客访谈，太长不看版： 1. 大约80%到90%的任务用开源模型就能完成，顶级模型最适合做规划和决策。 2. AI工具给高杠杆的人提供了更高的杠杆，给低杠杆的人提供的帮助相对有限。 3. 未来最值钱的工程师不是快速写代码、写算法的人，而是能端到端拥有业务结果的人。 4. 三年内，Token支出的中位数会和薪资处于同一数量级。 https://www.youtube.com/watch?v=lgo_QbgV198

译Factory AI CEO 在播客中分享观点：约80%-90%的任务可用开源模型完成，顶级模型更适合规划与决策；AI工具对高杠杆人群提升更大，低杠杆者受益有限；未来最值钱的工程师是能端到端拥有业务结果的人，而非仅写代码者；预计三年内Token支出中位数将与薪资处于同一数量级。

向阳乔木@vista8 · 6月16日57

一个轻量快速的RSS免费客户端，还支持用自己的API key做 AI总结、问答。感觉Papr是个不错的项目，地址和安装见评论区。

Chubby♨️@kimmonismus · 6月16日65

Axios reports that the industry is now worried White House export controls on Anthropic’s latest model could hurt the entire U.S. AI industry. The problem is trust. And that was to be expected. As Deutsche Bank’s Jim Reid put it: “You can’t rely on something that could be switched off.” If companies fear future frontier models from OpenAI, Anthropic or Google can be restricted overnight, they’ll diversify faster. And that could be a major advantage for open models. “You have no idea whether the U.S. government is just going to shut off your access to any future models,” Martin Chorzempa told Axios. “That’s a big advantage to open models.” As I already said: this Anthropic / US Gov dispute was the biggest PR for open source.

译Axios报道称行业担忧白宫对Anthropic最新模型Claude Fable 5的出口管制可能损害整个美国AI产业。核心问题是信任——如德意志银行Jim Reid所言，“你不能依赖可能被关闭的东西”。若公司担心OpenAI、Anthropic或Google的未来前沿模型可被一夜限制，它们将加速多元化，这为开源模型带来重大优势。据Wired，Anthropic与特朗普政府周一谈判无果，对Fable 5的出口管制仍在持续。核心分歧：Fable 5的护栏能否被剥离以解锁更强大的Mythos能力——NSA认为可以，Anthropic则认为风险被夸大。目前尚无下一步方案。