I think many people are not yet aware of the tectonic shift taking place. By preventing state-of-the-art capabilities - at least insofar as we are able to use them - open source becomes not only more attractive for one’s own applications, but more attractive overall. This also applies, for example, to entire states such as the European Union, provided it genuinely sets itself the goal of implementing AI. This not only attracts new investments and fresh capital, but also creates an opportunity for a PR coup that draws many idealists toward open-source companies. In that sense, I can well imagine that companies like OpenAI and Anthropic will suffer the most from regulation by the U.S. government - on two fronts - and that this indirectly does open source a tremendous service, especially given that it now primarily comes from China.

译Kim 指出，美国政府限制前沿AI能力（阻止SOTA被使用），反而让开源模型更吸引自有应用和整体市场，欧盟等国家也可受益。这吸引新投资与理想主义人才，OpenAI 和 Anthropic 将最受监管反噬，间接助推开源（尤其来自中国）。引用称，Anthropic 4月预览 Mythos 后，DeepSeek 因无法竞争而融资74亿美元；此前该实验室靠 CEO 梁文锋个人财富，现有约300人，计划至少翻倍。

Berryxia.AI@berryxia · 6天前68

PaddleOCR的PP-OCRv6又扔了一波硬核部署数据。他们在A100上做到0.13秒一张图，在Intel CPU上比PP-OCRv5快3.9倍到5.2倍。 Apple M4上用ONNX Runtime也能跑到0.35秒一张。还提供了Tiny、Small、Medium三种尺寸，分别对应移动端、CPU文档系统和高并发API的不同场景。最有意思的是他们最后总结的那句话：在专用OCR任务上，轻量架构 + 高质量训练数据，往往比单纯堆参数更实用。这其实是把当前大模型“暴力scaling”的思路，在垂直领域做了一次反向验证。从v5到v6，PaddleOCR在精度、速度、多语言和工程部署上持续迭代，这次把部署侧的数据拉得这么细。等于把“怎么在真实生产环境里用好OCR”这件事讲透了。

译PaddleOCR发布PP-OCRv6完整端到端部署基准。A100上PP-OCRv6_tiny达0.13秒/图；Intel CPU上用OpenVINO，PP-OCRv6_medium比PP-OCRv5_server快5.2倍，PP-OCRv6_tiny比PP-OCRv5_mobile快3.9倍；Apple M4上用ONNX Runtime跑出0.35秒/图。提供Tiny、Small、Medium三种尺寸，Medium/Small均支持50种语言，PP-OCRv6_medium英文准确率88.4%，拉丁字母准确率88.0%。官方总结认为，在专用OCR任务上，轻量架构+高质量训练数据比单纯堆参数更实用，是对大模型“暴力scaling”路线的反向验证。

OpenBMB@OpenBMB · 6天前63

Hybrid LLMs are everywhere now: full attention is mixed with efficient modules like SWA, Mamba-2, and GDN. But what does efficient attention actually do inside these models? 🧵 New work from THUNLP Lab & OpenBMB: "Rethinking the Role of Efficient Attention in Hybrid Architectures." Through scaling laws, mechanistic analysis, and design studies, they reach a counter-intuitive conclusion 👇 📄 arXiv: https://arxiv.org/abs/2606.15378 💻 Code: https://github.com/thunlp/rethinking-hybrid-attention 1️⃣Same destination, different speed: Efficient-attention design barely affects short-context Loss — all seven curves nearly overlap. But on long-context metric LongPPL, early-training gaps are large, with large-window SWA worst of all. With enough training, every hybrid converges to the full-attention level. 2️⃣Full attention carries retrieval: Restricting full attention's receptive field at inference spikes LongPPL across all hybrids; restricting efficient attention barely moves it. Even recurrent mixers with in-principle unbounded receptive fields (like GDN) store little long-range info in their states. Layer-wise probing shows the same pattern: retrieval gains concentrate in the full-attention layers. 3️⃣Large-Window Laziness: A large SWA window already covers most useful dependencies, so the model needn't push full attention to retrieve from afar—delaying retrieval-head formation. It's like a student who won't walk to the library when the reference book is already on the desk. Smaller windows force full attention to do the retrieval work, training it faster. 4️⃣A simple design that works: Apply NoPE to just the full-attention layers of a small-window SWA hybrid (SWA-128-NoPE). It substantially improves long-context performance with negligible short-context cost. Under an effective training budget, the bottleneck for the long-context capability of hybrid models is not how powerful the efficient attention module is—it is whether full attention's retrieval capability can be effectively activated. Furthermore, strengthening full attention itself can bring greater performance improvements. Read the full paper! 🚀 #AI #THUNLP #OpenBMB #LLM #Attention #LongContext #HybridArchitecture #NLP

译清华自然语言处理实验室（THUNLP）与面壁智能OpenBMB发布论文，重新审视混合LLM架构中高效注意力（如SWA、Mamba-2、GDN）的实际作用。研究发现：高效注意力设计对短上下文Loss影响极小，但长上下文LongPPL差异显著；全注意力承担检索功能，限制其感受野会大幅提升LongPPL，而限制高效注意力几乎无影响。大窗口SWA导致模型懒惰，延迟检索能力形成。简单方法——对小窗口SWA混合架构的全注意力层仅用NoPE（SWA-128-NoPE），即可用极小短上下文代价显著提升长上下文性能。论文认为瓶颈在于全注意力的检索能力能否被有效激活。

AYi@AYi_AInotes · 6天前74

💥突发：Oh my gosh，GPT 5.6也跳票了啊😲 美国政府将决定谁能使用gGPT 5.6🤨 美国商务部长Lutnick 亲自给 Altman 打电话，警告没有跨机构审批不能发 GPT-5.6。上周Anthropic 的 Mythos 被同样的方式卡住，紧急限制发布，只给选定伙伴。 Altman 在内部 memo 的原话是——政府会"customer by customer"审批访问，不是行业自律、也不是公司自己决定谁能用，要走国家机构的审批流程。而且涉及的部门不只是商务部，还有国家网络总监办公室、科技政策办公室都在里面。这套操作没走国会立法，用的是国家安全例外 + 出口管制工具的行政权力，换句话说，前沿 AI 的准入门槛，已经从"你有多少钱"变成了"国家觉得你能不能拿"。最值得关注的是 precedent， Anthropic 第一集，OpenAI 第二集，显然不是巧合，这意味着国家安全系统正在建立对前沿模型的事前审查机制。以前是"鼓励创新 + 事后追责"，现在是"国家安全前置审查"。这个分类一旦立住，GPT-6、下一代 o-series、Google 和 xAI 的未来模型，每一代最前沿的能力，都可能要走这道门。 3-5 年后回头看今天，这可能是一个分水岭，AI 从"商业技术"变成"战略资源"的分水岭，我看很多人说这是在自废武功，但换个角度想——当一个国家认为一项技术已经重要到必须控制谁能碰它的最前沿时，其实就已经不是用"技术"的逻辑在对待它了，它是在用对待核、芯片、生物技术的逻辑，这个转变，比一次发布延迟大得多的多

译美国商务部长Lutnick亲自致电Altman，警告没有跨机构审批不能发布GPT-5.6。此前Anthropic的Mythos也被同样方式卡住，紧急限制发布。Altman内部memo称政府将“customer by customer”审批访问，涉及商务部、国家网络总监办公室、科技政策办公室。此举基于国家安全例外和出口管制行政权力，建立对前沿模型的事前审查机制。预示GPT-6等后续模型都可能需走此流程，标志AI从商业技术转变为战略资源。

Chubby♨️@kimmonismus · 6天前77

This looks to good to be true. A 397B open source model on par or even outperforming Claude Opus 4.8? I need to check it out.

译Ornith-1.0 是专为智能体编程设计的开源大语言模型家族，提供 9B Dense、31B Dense、35B MoE 和 397B MoE 四种尺寸。基于 gemma4 和 qwen3.5 后训练，采用强化学习联合优化任务脚手架与解决方案的自我改进策略。在多个编码基准上取得开源模型最优：Terminal-Bench 2.1（77.5）、SWE-Bench Verified（82.4）/ Pro（62.2）/ Multilingual（78.9）、NL2Repo（48.2）、SWE Atlas（QnA 41.2 / RF 42.6 / TW 39.1）、ClawEval（77.1）。所有模型以 MIT 许可证开源，支持商业与研究使用。主推文称其 397B 版本性能媲美甚至超越 Claude Opus 4.8。

Chubby♨️@kimmonismus · 6天前55

All hope now rests on open source. I have never been more bullish on DeepSeek, GLM, Qwen, and so many others than after the drama around Fable 5 and now GPT-5.6 - because we will presumably only rarely get access to frontier models, and even then only with difficulty.

译主推文作者因 GPT-5.6 发布困境更看好 DeepSeek、GLM、Qwen 等开源模型。Axios 报道，OpenAI 在 Anthropic 的 Fable 5 冲突前已主动与特朗普政府沟通，白宫预览了模型能力，Altman 与商务部长 Lutnick 讨论，要求政府审查后再公开。Altman 称 GPT-5.6 “不是我们偏好的长期模型”，暗示前沿模型发布需经过安全审查和合作伙伴筛选。作者推测 GPT-5.6 原计划本周四发布，因政府干预延迟。

宝玉@dotey · 6天前59

PPT Master 确实是最好的 PPT Skill 我新的 skill 写PPT也挺好，能导出可编辑版本，可以AI配图，可以在 Agent 内置浏览器中标记编辑 https://github.com/jimliu/baoyu-design

译宝玉（@dotey）在推文中称PPT Master为最佳PPT skill，并推荐自己的新skill。他引用B站博主对7款GitHub PPT技能排名：hugohe的PPT Master（3.1万star）元素全可编辑，自带音色克隆与旁白生成；花叔（1.9万star）输出可编辑PPTX；歸藏（1.5万star）自带快捷键；Lewis（6500star）含计时器与逐字稿；宝玉（2.2万star）为纯图片风格；张咋啦（2.3万star）为HTML；乔木（5400star）为纯图片卡片。宝玉补充其新版skill可导出可编辑版本、AI配图，并可在Agent内置浏览器中标记编辑。

AYi@AYi_AInotes · 6天前56

我现在越看越觉得， 2026 年 AI 工具的成熟正在让跨领域迁移能力变得成本极低， GitHub开源的这本书表面上是在教量化，实际上它给我们提供了一套用AI 攻破任何一个你完全不懂的领域的模板，说白了就是先跑通，边跑边学，把卡住的地方变成 Spec，让 AI 帮你破局主仓库 🔗 http://github.com/xingwudao/xquant-beginner

译GitHub开源量化书《XQuant：人人都是量化交易员》核心是问题驱动而非知识驱动：每章提供写好的Spec，丢给Claude或Cursor生成代码，先跑通策略（哪怕亏钱）再补理论。全书用9个问题串起量化pipeline（最小闭环、ETF选股、仓位、买卖信号、回测、过拟合检测、实盘等），第1章即上手最小系统。正文与练习代码分开维护。作者认为2026年AI工具成熟使跨领域迁移成本极低，这套把模糊想法写成清晰Spec的能力可复用于任何复杂领域。

MiniMax (official)@MiniMax_AI · 6天前44

More great options for the open-weight ecosystem. Thanks @NVIDIAAI for making MiniMax M3 available in NVFP4.

译开源权重生态的更多好选择。感谢 @NVIDIAAI 使 MiniMax M3 可在 NVFP4 中使用。

Rohan Paul@rohanpaul_ai · 6天前64

UBS says 60% of companies now watching AI budgets are moving to cheaper models and open-source Chinese models The pressure is coming from extreme bills, including users spending up to $35K/month, teams exceeding quotas by 200%, and companies cutting internal AI tools from 5 to 2. Companies are not abandoning AI, they are using model routing, which sends easy tasks to cheaper models and saves premium models for hard reasoning, code, and long-context work. Chinese open-source models such as Qwen, DeepSeek, MiniMax, GLM, and Kimi now fit the enterprise cost curve because they can be run locally or used through cloud catalogs. --- news .futunn.com/en/post/75068082/ubs-group-finds-60-have-already-started-curbing-ai-spending?level=2&data_ticket=1780870170397383

译"UBS报告称，60%关注AI预算的企业正转向更便宜的模型和中国开源模型。用户月花费高达$35K，团队超配额200%，公司内部AI工具从5个削减至2个。企业采用模型路由策略，将简单任务分配给低成本模型，将复杂推理、编码和长上下文任务保留给高端模型。中国开源模型如Qwen、DeepSeek、MiniMax、GLM、Kimi因可本地部署或通过云目录使用，符合企业成本曲线。"

AYi@AYi_AInotes · 6天前56

GitHub 上刚开源一本量化书，设计思路有点不一样，而且我觉得这本书真正在教的东西不只是量化，背后其实是一个被严重低估的元能力——把模糊想法写成清晰 Spec，然后让 AI 执行。这套能力放到任何复杂领域都管用，量化交易只是它第一个练手的战场。现在量化交易的学习路径，大部分人搞反了，传统路线：先啃数学 → 觉得自己没准备好 → 永远不动手 → 放弃。一本GitHub上开源的书把路翻过来：先写 Spec 让 AI 帮你跑通一个策略，亏钱也行，跑起来再补理论。书叫《XQuant：人人都是量化交易员》，核心设计就一条：问题驱动，不是知识驱动。 9 个问题串起整条量化 pipeline： 1. 量化怎么赚钱？（先跑通最小闭环） 2. 买什么？（3 只 ETF 开始） 3. 买多少？（3 种仓位分法实测） 4. 什么时候买卖？（信号、再平衡、止盈止损） 5. 怎么知道有效？（回测框架） 6. 如何避免自欺欺人？（过拟合检测）——这章位置极早，说明作者懂新手真正的死法 7-9：实盘执行、持续改进、因子研究日常几个反直觉的地方： • 第 1 章就让你跑策略，不是先讲 CAPM、Black-Scholes，是直接上手做一个能运行的最小系统，跑起来产生的反馈和多巴胺，比任何理论都更能驱动你学下去。 • 正文和练习代码分开维护，书稿仓库放干净的正文，学习仓库放 Specs + Jupyter Notebooks。阅读时不被打断，动手时有完整参考。 • 每章给你写好的 Spec，丢给 Claude 或 Cursor 生成代码。你训练的不是手写代码，是把模糊策略想法变成清晰任务描述的能力。

译一本名为《XQuant：人人都是量化交易员》的开源量化书采用“问题驱动”设计：先写Spec让AI生成代码跑通策略，再补理论。全书用9个问题串联量化pipeline：量化怎么赚钱、买什么（3只ETF）、买多少（3种仓位分法）、何时买卖、如何回测、过拟合检测（第6章极早讲述）、实盘、改进、因子研究。正文与练习代码分开维护，每章提供现成Spec给Claude/Cursor生成代码，训练将模糊想法转为清晰任务描述的能力。

Yuchen Jin@Yuchenj_UW · 6天前73

I can’t believe my eyes: “the government will be approving access to GPT-5.6 customer by customer.” OK. Then we will see Chinese open models be the best LLMs before GPT-5.6 / Fable 5 ships to the public.

译我不能相信我的眼睛： “政府将逐客户审批GPT-5.6的访问权限。” 好吧。那么我们将看到，在GPT-5.6 / Fable 5面向公众推出之前，中国开源模型将成为最好的大语言模型。

Emad@EMostaque · 7天前40

Will above fable level open source models be made illegal in the USA

译在美国，超过寓言级别的开源模型会被定为非法吗？

Berryxia.AI@berryxia · 7天前76

真正的大佬真的就可以早早预见这个趋势！美国政府 reportedly 要亲自审批谁能用GPT-5.6，这已经不是单纯的模型发布了。据说OpenAI计划只给一小部分合作伙伴有限预览，Sam Altman被告知要等其他政府部门批准。 Commerce Secretary Lutnick还亲自打电话警告不要擅自发布。这已经接近事实上的许可制了。 Yann LeCun之前就警告过这种趋势：如果以安全为由把AI系统锁起来，只让少数人接触，那AI就无法真正实现“让智能民主化”。他一直主张开源才是把AI放到所有人手里的正确方式。当最强的闭源模型开始被政府按客户审批时，开源模型的意义就不再只是技术上的追赶，而是成了对抗技术集中控制的一种实际路径。

译特朗普政府要求OpenAI分阶段发布其下一代前沿模型GPT-5.6，理由是网络安全和国家安全担忧。OpenAI CEO Sam Altman告知员工，新模型不会立即全面公开发布，而是先以有限预览形式开放给一小部分合作伙伴和企业客户，且美国政府将对每个客户的访问权限进行逐个审批。这一要求来自国家网络总监办公室和科技政策办公室，与近期Anthropic的情况类似。Yann LeCun曾警告，以安全为由限制AI系统访问将阻碍智能民主化。

Ethan Mollick@emollick · 6天前52

As this post points out, contrary to what many say, the US government could absolutely effectively ban open weights models. That doesn’t mean you won’t be able to download the weights & run them, but they can ensure that no US company would use or provide access or host them

译Ethan Mollick指出，美国政府完全有能力有效禁止开源权重模型。禁止并非阻止个人下载运行，而是通过法规确保美国企业不得使用、提供访问或托管未经批准的模型。具体措施包括：禁止企业使用未经政府批准的模型，对在美国境内故意使用未批准模型伤害美国人或财产的行为处以严厉刑事处罚，并要求所有超过特定能力阈值的模型必须获得美国政府批准。这一框架既能限制商业分发，又不完全封杀个人使用。

Nathan Lambert@natolambert · 6天前47

Is what happens when the world becomes AGI pilled then both the leading lab and the government tell you you need to bow down if you want access to their models. I feel it too. More of the last few weeks giving people the words to explain how they felt for months.

译Nathan Lambert评论称，当世界被AGI说服后，领先实验室和政府开始要求用户“低头”才能使用其模型。他注意到过去几周明显变化：大量大型企业寻求确保计算资源，并基于GLM-5.2在内部进行后训练。这一趋势显示开源模型正在赢得企业信任，人们开始理解开源如何取胜。

Ethan Mollick@emollick · 7天前41

It would be very useful to understand more about the government safety concerns associated with frontier AI releases so we could (a) know what risks everyone will face if/when open source reaches Mythos class & (b) whether they are doing enough or too much to prevent those risks.

译更好地了解与前沿AI发布相关的政府安全担忧将非常有用，这样我们就能 (a) 知道当开源达到Mythos级别时，每个人将面临什么风险，以及 (b) 他们是否采取了足够或过多的措施来防止这些风险。

Berryxia.AI@berryxia · 7天前76

卧槽！最近开源大模型太卷了啊！这不又一个专注agentic coding的开源模型家族来了，叫Ornith-1.0。它覆盖了从9B到397B MoE的全尺寸，在Terminal-Bench、SWE-Bench等agent coding benchmark上达到了当前开源模型里的顶尖水平。最有意思的是它的训练方式：不是只让模型生成答案，是用RL同时优化“任务脚手架（scaffold）”和最终解决方案，让模型自己学会怎么搭建更好的执行框架。这个思路挺有意思的，很多agent失败不是因为不会写代码，恰恰是因为不会组织执行流程。 Ornith直接把“怎么搭框架”也变成了可学习的信号。模型全系列MIT开源，还提供了GGUF版本，能在Ollama、Unsloth等工具里直接跑。本地党又多了一个强力选择。地址见评论区👇

译Ornith-1.0 开源模型家族发布，专注智能体编程（Agentic Coding），覆盖 9B Dense、31B Dense、35B MoE 及 397B MoE 全参数规模。在 Agent Coding 基准上达开源顶尖：SWE-Bench Verified 82.4、SWE-Bench Pro 62.2、Terminal-Bench 2.1 77.5、NL2Repo 48.2、SWE Atlas 41.2 QnA、ClawEval 77.1。基于 gemma4 和 qwen3.5 后训练，采用强化学习联合优化任务脚手架（scaffold）与最终解决方案，让模型自主改进执行框架。全系列 MIT 开源，提供 GGUF 版本，支持 Ollama、Unsloth 等本地运行。

Rohan Paul@rohanpaul_ai · 7天前72

Another fantastic open source release. DeepReinforce just dropped Ornith-1.0, an MIT-licensed open-source family of agentic coding LLMs. The flagship Ornith-1.0-397B MoE (17B-active) is the most powerful model in the release, reporting 82.4 on SWE-Bench Verified and 77.5 on Terminal-Bench 2.1 - surpassing Claude Opus 4.7 on both benchmarks. Built on top of pretrained Gemma 4 and Qwen 3.5 Employs a novel self-improving training strategy. With this Ornith changes the training target by asking the model to improve both the answer and the task scaffold, meaning the plan, memory pattern, tool rhythm, error handling, and search process that shape the answer. During RL, the model proposes a better scaffold first, then uses it to produce solution rollouts, and the reward updates both stages together. That makes the model less like a coder following one rigid checklist and more like a coder learning which checklist works for each type of bug, repo, or terminal task. The most interesting result is the 9B model reaching 69.4 on SWE-Bench Verified

译DeepReinforce 发布 Ornith-1.0，一个 MIT 许可的开源智能体编码大语言模型家族，涵盖 9B Dense、31B Dense、35B MoE 及旗舰 397B MoE（17B 活跃参数）。旗舰模型在 SWE-Bench Verified 上取得 82.4，Terminal-Bench 2.1 上取得 77.5，均超越 Claude Opus 4.7；并在 SWE-Bench Pro（62.2）、Multilingual（78.9）等基准上达到开源同尺寸最佳。模型基于 Gemma 4 和 Qwen 3.5 后训练，采用新型自我改进策略：强化学习不仅生成解决方案，还联合优化任务特定的 scaffold（包含计划、记忆模式、工具节奏、错误处理等）。最小的 9B 模型也在 SWE-Bench Verified 上达到 69.4。全部模型以 MIT 许可证发布，支持商用与研究。

🚨 AI News | TestingCatalog@testingcatalog · 7天前74

DeepReinforce has released Ornith-1.0, their new self-improving family of open-source models designed for agentic coding. > Ornith-1.0 learns to write its own task scaffolds during training rather than relying on human-designed harnesses. > The 397B MoE flagship can match Claude Opus 4.7 on coding benchmarks, and the compact 9B Dense variant is optimized for edge devices.

译DeepReinforce 发布 Ornith-1.0 系列开源模型，专为智能体编码设计。参数覆盖 9B Dense、31B Dense、35B MoE 和 397B MoE，基于 gemma4 和 qwen3.5 微调。采用自我改进训练策略：强化学习同时生成解决方案和任务脚手架。旗舰 397B MoE 在编码基准上匹配 Claude Opus 4.7，9B Dense 针对边缘设备优化。评测成绩包括 Terminal-Bench 2.1 77.5、SWE-Bench verified 82.4、SWE-Bench Pro 62.2、NL2Repo 48.2 等。全部模型以 MIT 许可证开源，可商用和研究使用。

OpenBMB@OpenBMB · 7天前22

🎉 Congratulations to @aia_gh on hosting this fantastic hands-on AI workshop! 👏 We're truly honored that you chose our open-source MiniCPM-V models.❤️ Looking forward to more collaborations in the future!🤗

译🎉 祝贺 @aia_gh 举办这场精彩的动手AI工作坊！👏 我们非常荣幸您选择了我们的开源 MiniCPM-V 模型。❤️ 期待未来更多合作！🤗

Berryxia.AI@berryxia · 7天前60

这下让真的可以让很多人都闭嘴了！ Unsloth把GLM-5.2压缩到1-bit后。本地跑起来居然还能和Claude Opus、GPT-5.5正面比创意输出。他们用Mac Studio M3 Ultra 256GB RAM跑1-bit版本，速度还能到21 tok/s左右。在同一个prompt下生成的HTML/设计效果，看起来甚至比闭源模型更丰富、更“有想法”。这已经不是简单的量化了，而是把一个原本需要海量显存的超大模型，硬生生塞进了消费级硬件还能打。 GLM-5.2本身就以创意和长上下文见长，现在连极致量化后都还能保持较强的表现，确实有点超出预期。这也再次验证了一个趋势：开源模型在极端优化后，正在快速缩小和闭源前沿模型在实际可用性上的差距，尤其是在本地部署和特定任务上。大内存的本子这下真的太香了，Qwen 3.7 这些模型又该迭代版了。

译Unsloth 将 GLM-5.2 压缩为 1-bit GGUF 量化版本，在 Mac Studio M3 Ultra（256GB RAM）上以约 21.6 tok/s 本地运行。与 Claude 4.8 Opus、GPT-5.5 使用相同提示进行创意输出（HTML/设计效果）对比，1-bit 版本表现不逊色，甚至更丰富、“更有想法”。GLM-5.2 本身以创意和长上下文见长，极端量化后仍保持较强表现，验证了开源模型在极端优化后正快速缩小与闭源前沿模型在实际可用性上的差距，尤其适合本地部署。

X.PIN@thexpin · 7天前61

http://x.com/i/article/2069762663366975488 # Tokenmaxxing is dying, and Chinese open-source models fill the gap Amazon, Meta, and Uber are capping the token spend as GLM-5.2 and DeepSeek give their models away for free. Over the past week, a new Chinese model called GLM-5.2 has set off another round of alarm in Silicon Valley. Released by the company z.AI under a permissive open-source license, it takes direct aim at the coding and agentic-workflow business that Anthropic has built its reputation on — and running on a one-million-token context window, it lands surprisingly close to Claude Opus 4.8 and OpenAI’s GPT-5.5. The open-source community is ecstatic. At the same moment, America’s “unlimited AI credits” mania is draining away. Amazon, Meta and others are killing their no-limits AI plans. After Uber’s engineers burned through a full year’s AI budget in four months, the company capped each employee at $1,500. Even Microsoft CEO Satya Nadella has warned that the industry can’t let a few AI giants swallow the whole economy. The link between open-source models and what people now call “Tokenmaxxing” is simple enough: programmers burn too many tokens, the bills get too big, and faced with a mountain of invoices, people reach for the open-source option. This is not the Tokenmaxxing takedown you’ve read on Substack, though. Because a few questions kept nagging at me. If open-source models can do the job, why is anyone still topping up their Claude account? And if everyone runs to open-source, how does anyone building a model make money? It was only after GLM-5.2 shipped that I arrived at a first answer. Both of these waves — the rush to open-source and the rush to burn tokens — come down to the same thing: how we decide to think about a token. ## Born Out of Scarcity Start with the open-source side, and start with GLM-5.2. Z.ai has released the core weights of GLM-5.2 under an unrestricted MIT license. Any company can download it free from Hugging Face, customize or fine-tune it, and run it locally or on a virtual machine. Standing the thing up is still a slog, but next to the now-delisted Fable 5, it’s a genuinely good option. The model was built on Huawei’s Ascend chips — no Nvidia hardware involved. But GLM-5.2 is not another DeepSeek. DeepSeek’s Liang Wenfeng came out of a quant fund, is worth billions, and has chosen near-total seclusion. (He recently put about $2.8 billion of fresh money into DeepSeek) Z.ai, by contrast, is an open-source model maker that’s already publicly listed in Hong Kong. It has no billionaire patron, and its road has been every bit as winding as DeepSeek’s. In 2020, BAAI’s Tang Jie argued the language model still deserved the effort. Of BAAI’s 480 A100 cards, 400 went to Tang’s team. Tang also tried Huawei’s 910A and 920 chips. On large-model training, the 920’s operator efficiency was just 18% of an A100’s; after Tang’s team helped rewrite the operators, they pushed it to roughly 40%, and trained a 13B code model, CodeGeeX. But Tang’s real goal was 100B-parameter model, even 2,000 910A cards weren’t enough. In the end, Tang turned to z.AI, the company he’d founded back in 2018, rented 1,000 cards. In July 2022, they finally had their hundred-billion model: GLM-130B. I tell his story because he embodies the type. Most of China’s open-source AI companies grew out of academic projects; they incorporated mainly because they needed to buy compute, and they open-sourced their architecture to keep their academic visibility. Starved of chips, they learned to adapt to whatever domestic silicon they could get. Z.ai wasn’t placed on the U.S. entity list until 2025, but it was already optimizing for Huawei chips in 2020. Localized compute and open architecture became, almost by default, the signature of Chinese AI. The open-source bet has its skeptics inside China, too. In 2024, Baidu founder Robin Li argued that closed models were more powerful and cheaper to run than open ones. His point being that closed models came with more compute and bigger teams, and that ERNIE was nearly a match for ChatGPT. (A little ironic, isn’t it?) ERNIE was not, in fact, in ChatGPT’s league, and China never produced a closed model strong enough to make Li’s case. Turning open-source into profit is a hard problem. In a 2025 interview, a z.AI expert described the company’s three possible lanes — inference, agentic, and coding — and said z.AI chose coding. MiniMax, by contrast, chose multimodal AI and AI companionship. At the time it wasn’t an obvious call: z.AI’s business leaned on enterprise and government contracts, coding showed no clear path to profit, and multimodal could win consumers directly. Z.ai was not the favorite. Then the AI-coding boom arrived. Z.ai’s latest results show a net loss of about ￥3.18B ($444M) against R&D spending of roughly ￥3.2B ($444M). Still in the red — but strip out the open-ended spend on compute, and z.ai’s revenue can cover day-to-day operations. If it can get cheaper chips, or use its chips more efficiently, or land a wave of enterprise buyers, the losses could narrow. That would be good news. In a sense, z.AI may owe Anthropic a thank-you note: both for the AI-doom evangelism and for the AI-coding fervor. Anthropic’s strong models cultivated customers, and its incessant messaging then drove some of them away. One of the places those customers landed was z.AI. A first conclusion, then: going open-source is a passive choice: a Chinese model maker admitting, out loud, that it’s behind on both compute and model quality. But if closed-model progress stalls, users won’t keep paying premium prices for closed-model tokens; they’ll choose open-source on their own. The Chinese saying fits: just hold your plate steady, and the roast duck falls from the sky. Remember to Like & Subscribe! ## Water, Electricity, and a Bad Analogy Now the other wave : Tokenmaxxing. GLM-5.2, DeepSeek and Kimi are mostly catching customers who fled the bills. But if OpenAI and Anthropic were good enough, would open-source still persuade anyone? Then Alibaba gave me a frame. In a March internal memo, CEO Wu Yongming argued that in the AI era, the token would become a basic factor of production, the way traffic was in the internet era. Alibaba set up the Alibaba Token Hub (ATH) around that idea. Follow the logic. In the age of electrification, a country’s electricity output and its GDP growth tend to rise together — no nation ever went bankrupt building power plants. So I looked at U.S. electricity prices, consumption and GDP from the 1920s to the 1960s. As prices fell, total spending on electricity rose 6.2x, but nominal GDP rose 11.1x. Americans spent relatively less on power and got more output for it. The pattern doesn’t always hold cleanly, though. Through the fast-industrializing decades in Japan, China, and West Germany, electricity spending actually outran GDP. But in West Germany and Japan, even during those high-growth years, the share of GDP eaten by electricity fell sharply to almost 2.0%. That suggests is a kind of lag: a rising industrial economy takes roughly fifteen years to work through the adjustment and reach the point where cheap power finally translates into abundant output. If Wu is right and tokens really are AI’s water and electricity, they ought to deliver something similar. But run the numbers and the story breaks. Over the past four years, the cost of a given unit of AI dropped more than 90 percent, while total token spending rose 70x. My god. If this is water and electricity, the bill is climbing far too fast. A seventyfold jump in token spending over four years has not produced anything like a matching surge in what society actually makes. Yes, the data centers went up, and the chips are back-ordered for months. But none of it has meaningfully improved the quality or efficiency of production outside the AI industry itself. What breaks the “AI as utility” analogy is the reasoning model. Across coding and agentic tasks, a model now generates thousands of internal reasoning tokens before it answers, pushing single-task consumption 10 to 100 times higher than older models. So how much does all that buy you? In an NBER paper, DeMiller, Musolff and Yang measured the gains from AI coding tools across four stages of work: - Writing a single file: +290% - Bulk work: +150% - A specific deliverable: +50% - A shipped, delivered product: +30% In other words, even in coding — the thing AI does best — the gains shrink fast as you zoom out from a single file to a finished product. Optimizing the whole pipeline is far harder than optimizing one slice of it. ## Three Months of Unlimited Tokens As latecomers, Chinese firms tried to copy the Tokenmaxxing wave too. Per public reports in March, Tencent gave core R&D teams an annual token package worth about $31,700 each, plus $1,000 a month for outside tools; ByteDance opened its internal AI tools for unlimited use and reimbursed half of employees’ personal AI experiments, capping technical staff at $1,000 a year; Baidu handed engineers unlimited ERNIE access plus up to $800 a year for outside tokens; 360 simply loaded every employee with 100 million tokens. The recalibration came fast. Three months later, Tencent’s Hunyuan team was capped at roughly $970 worth of outside models, and everyone moved onto quotas — though using Tencent’s own Hunyuan model stayed unlimited. ByteDance staff likewise faced no limit on its in-house TRAE tool. Internally, Tencent came out against usage rankings, refusing to treat token consumption as a single yardstick for output. The reason was simple: Chinese companies wanted real output, and they weren’t seeing it. One employee, speaking anonymously, described a team that built workflows across several different models — only to find the AI-generated pieces wouldn’t fit together, and to scrap the whole thing and start over. Twenty-odd people spent about $6,900 in tokens in a month and had nothing to show for it. At some firms, the free tokens got quietly repurposed — for analyzing stocks, say — and the company had no idea where they’d gone. Meta is tightening what employees can spend on Anthropic and other providers — a sharp reversal from the scene a few months earlier, when staff competed to burn tokens. Bloomberg has reported that Uber and Walmart each capped AI coding-tool use; the Financial Times reported that Amazon scrapped the internal leaderboard that ranked employees by AI usage. A June report from the consultancy Bain, titled Your AI Budget Is Growing. Your Returns Aren’t. Here’s Why., found that among companies able to quantify AI’s cost savings, 40 percent saw actual savings of 10 percent or less. Of the 37 percent who’d targeted savings of 11 to 20 percent, only 31 percent actually got there. The grassroots buying isn’t over, though. One ByteDance engineer pays for Claude Max — $100 a month reimbursed — to write what he considers the cleanest code. Better than DeepSeek, by his lights, and GLM he can’t get. But one employee’s purchase doesn’t make the whole company better off. Tokenmaxxing shifts an individual’s cost onto the employer. The irony is that the last firm into the water was the first one out. Tencent, a relative laggard in China’s AI race, quit Tokenmaxxing earlier than anyone. ByteDance is still touting its numbers: as of June, it says, daily token calls to its Doubao model topped 180 trillion, up more than tenfold in a year. Continue Reading

译中国公司 z.AI 以 MIT 许可证开源 GLM-5.2 模型，拥有百万 token 上下文窗口，基于华为昇腾芯片训练，性能接近 Claude Opus 4.8 和 GPT-5.5。与此同时，Amazon、Meta、Uber 等美国公司因工程师过度消耗 token 而开始限制 AI 预算（Uber 每员工上限 1500 美元），推动开源模型需求。GLM 团队源自学术项目，长期适配国产芯片；DeepSeek 投入 28 亿美元，共同成为“Tokenmaxxing”趋势的替代方案。

Alibaba Cloud@alibaba_cloud · 7天前28

⏳ 1 Day to Go! Flink Forward Asia 2026 — Main Forum Agenda Revealed! FFA2026 brings together world-class tech leaders to showcase the full picture of real-time data intelligence — from Alibaba Cloud's Agent-native to AI-native evolution, and from automotive to embodied AI industry scenarios. 📅 June 26–27 | InterContinental Shenzhen OCT, Shenzhen ⚡ Limited seats available — don't miss out! https://hd.aliyun.com/form/8369 📝 Note: All sessions will be conducted in Mandarin Chinese. #FlinkForwardAsia #FFA2026 #ApacheFlink #RealTimeData #DataAI #AIAgent #AgenticAI #StreamingData #OpenSource #AlibabaCloud #EmbodiedAI #DataEngineering

译距开幕1天，阿里云正式公布Flink Forward Asia 2026主论坛议程。大会聚焦实时数据智能，展现从Agent-native到AI-native的阿里云演进路径，并覆盖汽车、具身AI等产业场景。会议将于6月26-27日在深圳华侨城洲际酒店举行，所有演讲均以中文进行。席位有限，需提前报名。

AYi@AYi_AInotes · 7天前71

卧槽这个必须分享，一个开源工具，让你用免费 API 密钥池跑出企业级路由的效果，等于是零成本撸10亿+免费LLM Token，要把把付费网关干碎的节奏，对比一下：高容量令牌和企业路由，0，原理很简单，它是个路由框架，不是卖 API 的，你需要自己去各厂商申请免费密钥，然后填进配置，工具自动帮你做负载均衡和自动故障切换， 30 秒能跑起来：克隆仓库，配好密钥，把应用指向本地端点，完事，免费额度用满、用稳，不用自己写回退逻辑，项目几周前刚发布，现在入坑还能直接给作者提改进意见， GitHub 链接放评论区 👇 有用的记得给仓库加星。

译一款开源路由框架（非API售卖），让用户自行申请各厂商免费API密钥，通过配置实现自动负载均衡与故障切换，从而零成本使用10亿+免费LLM Token。操作极简：克隆仓库、填入密钥、将应用指向本地端点，30秒即可运行，无需手写回退逻辑。项目几周前刚发布，作者开放改进建议，GitHub链接见评论。

Berryxia.AI@berryxia · 7天前74

PaddleOCR的PP-OCRv6终于上Hugging Face了。这次不只是精度又提升，还一次性加了transformers和ONNX Runtime两个后端。意味着你现在可以用更统一的API，在不同推理框架之间无缝切换，而不用改太多代码。 PaddleOCR一直以来都是工业界用得最多的开源OCR方案之一，这次上HF + 多后端支持，等于把门槛又拉低了一大截。尤其是想在transformers生态里直接用高性能OCR的人，这次可以直接上手了。从之前的Unlimited-OCR到这次PP-OCRv6，国内团队在长文档和实用OCR方向上确实在持续迭代，而且越来越注重工程可用性。地址见评论区👇

译PaddleOCR 的 PP-OCRv6（对应 PaddleOCR 3.7）正式上线 HuggingFace，精度进一步提升，并新增 transformers 和 ONNX Runtime 两个推理后端。用户可通过统一 API 在不同后端之间无缝切换，无需大幅修改代码。PP-OCRv6 是工业界广泛使用的开源 OCR 方案，此次上架 HF 并支持多后端，降低了工程接入门槛，尤其利好希望在 transformers 生态中直接使用高性能 OCR 的开发者。

AK@_akhaliq · 7天前28

Have high conviction open models will win

译坚定相信开源模型会赢。

ginobefun@hongming731 · 6月25日46

BestBlogs 早报 · 06-25 # OpenAI / Jalapeño / Claude Tag / Open Code Review / Broadcom [1] ★ 精讲｜OpenAI 与 Broadcom 发布针对 LLM 优化的推理芯片 OpenAI 与 Broadcom 联合发布首款定制 LLM 推理芯片 Jalapeño，从设计到流片仅用九个月，号称高性能芯片史上最快的 ASIC 研发周期，且过程本身由 OpenAI 自家模型加速完成。这标志着 OpenAI 从模型、产品全面下探到芯片层，构建「模型反哺芯片设计、芯片支撑更便宜推理」的全栈飞轮，意在让先进 AI 的访问成本持续走低。来源：OpenAI News https://www.bestblogs.dev/article/41ff73d7 [2] ★ 精讲｜Anthropic 关于构建高效人机协作团队的经验 | Claude Anthropic 罕见公开内部实践：随着 Claude Tag 让智能体直接进驻团队协作空间，工作正从「一人一智能体」的单机模式，变成人类与多个智能体共享同一工作台的「多人游戏」。文章总结四条经验——信息默认公开、人和智能体各有清晰角色、由人类设定北极星目标、按可验证程度逐步放权——为团队级智能体协作给出一套可复制的治理框架。来源：Claude Blog https://www.bestblogs.dev/article/4929a2db [3] ★ 精讲｜阿里开源 Open Code Review：一周揽下 5k star，更专业的代码评审 CLI 阿里把内部验证两年、服务数万开发者的 AI 代码评审助手 Open Code Review 开源，一周揽下 5k star。它用「确定性工程 + Agent」混合架构解决通用 Agent 评审常见的覆盖不全、位置漂移、效果不稳定三大痛点：工程逻辑负责文件筛选与定位，Agent 只负责动态推理。实测准确率 25%-38%，远超 Claude Code 的 7%-16%，但召回率略逊，揭示「AI 写代码」与「AI 审代码」是两种截然不同的能力。来源：阿里技术 https://www.bestblogs.dev/article/3732f5a7 [4] 说好的艺术家呢？—— AI 时代，内容工业的三次死亡与创作者的重生 [播客] 演讲深度剖析 AI 如何从素材、流程、版权三个层面「杀死」传统内容工业，并指出创作者唯有构建全新愿景，以人类的直觉、品味与信任，才能在技术碾压下实现「重生」。来源：屠龙之术 https://www.bestblogs.dev/podcast/e1238ff [5] Flutter 底层渲染解析：BuildContext 与 Element Tree 详解本文深入剖析 Flutter 的渲染内部机制，详解三棵树（Widget、Element、RenderObject）、BuildContext 的本质以及 setState 的逐步工作原理，帮助开发者理解和修复常见的上下文相关错误。来源：freeCodeCamp https://www.bestblogs.dev/article/c7c34649 [6] 在 Gemini 3.5 Flash 中推出计算机操作功能 Google 宣布，计算机操作现已成为 Gemini 3.5 Flash 的内置能力，使开发者能够构建与浏览器、移动和桌面环境交互的智能体。来源：Google DeepMind News https://www.bestblogs.dev/article/16a75c47 [7] Qwen-AgentWorld 开源：让 Agent 学会「先预测，再行动」通义实验室开源 Qwen-AgentWorld，首个原生语言世界模型，从继续预训练阶段即开始环境建模，在 AgentWorldBench 上超越 GPT-5.4 等前沿模型，并展示可控模拟与跨任务泛化两种应用范式。来源：通义实验室 https://www.bestblogs.dev/article/8810d85f [8] Cisco SD-WAN 管理器零日漏洞遭利用获取 Root 权限全过程本分析详细描述了某威胁行为者利用 Cisco Catalyst SD-WAN Manager 中的零日权限提升漏洞 CVE-2026-20245，在通过恶意对等连接实现初始入侵后获取 root 权限，随后进行了广泛的抗取证清理。来源：Google Cloud Blog https://www.bestblogs.dev/article/bcfc7fba [9] 如何为 AI 智能体构建记忆本文来自 LangChain，介绍了一种为 AI 智能体构建记忆的结构化方法，涵盖概念框架、三步循环（捕获、分析、更新），以及使用 LangSmith 的可观测性、引擎和上下文中心的具体实现。来源：LangChain Blog https://www.bestblogs.dev/article/35c6d909 [10] 40 天不睡、5 人死磕：DeepMind 主管爆料 Gemini 大战 DeepSeek 内幕本文编译自 DeepMind Gemini 预训练主管 Vlad Feinberg 的播客访谈，曝光 Gemini 2.0 Flash 由 5 人团队 40 天不眠不休训练的幕后故事，并深入讨论了预训练研究、量化、推理协同设计以及程序员在 AI 时代的转型路径。来源：CSDN https://www.bestblogs.dev/article/87f785ef --- http://BestBlogs.dev · 发现真正适合你的高质量内容 BestBlogs 是 AI 驱动的私人阅读助手，帮助你发现真正适合你的高质量内容，欢迎体验。在线阅读：https://www.bestblogs.dev/explore/brief/2026-06-25

译OpenAI 与 Broadcom 发布首款定制 LLM 推理芯片 Jalapeño，设计到流片仅九个月，过程由自家模型加速。Anthropic 公开内部实践：Claude Tag 让多智能体进驻协作空间，梳理信息公开、角色清晰、北极星目标、逐步放权四条经验。阿里开源代码评审工具 Open Code Review，采用“确定性工程+Agent”混合架构，准确率 25%-38%，远超 Claude Code 的 7%-16%，召回率略逊。

François Chollet@fchollet · 6月25日64

This is the strongest ARC-AGI-2 performance to date by an open-source model.

译这是迄今为止开源模型在ARC-AGI-2上取得的最强表现。

Nathan Lambert@natolambert · 6月25日51

Add more wins for GLM. The model has some brittle characteristics, and is getting crushed by closed models here, but we should expect open models to be more jagged, and you use multiple of them depending on the task. Congrats again to @Zai_org and am excited for the next one

译为GLM再添胜绩。该模型有一些脆弱的特性，在这方面被闭源模型压制，但我们应该预期开源模型更加参差不齐，你可以根据任务使用多个模型。再次祝贺@Zai_org，并期待下一个。

Ethan Mollick@emollick · 6月25日57

Gemini 3 Pro was the first model to achieve at least 23% on ARC-AGI-2, which it did in November, 2025 (it actually scored 31%). So the 8-12 month gap between closed and open weights models still seems to hold. But they are also more jagged, better at some tasks, worse at others.

译Gemini 3 Pro 是首个在 ARC-AGI-2 上达到至少 23% 的模型，它在 2025 年 11 月就做到了（实际得分 31%）。所以闭源与开源模型之间 8-12 个月的差距似乎仍然存在。但它们也更参差不齐，有些任务表现更好，有些则更差。

Nathan Lambert@natolambert · 6月25日68

A much needed data release! Excited to tinker with the data.

译如何训练出在终端和编码方面能力强大的小型智能体模型？现宣布推出 OpenThoughts-Agent 和 OpenThinkerAgent-32B，这是基于 Qwen-3 的最强开放数据智能体模型，在 7 项智能体基准测试中平均得分 44.8%。Nathan Lambert 表示这是急需的数据发布，很兴奋。

Nathan Lambert@natolambert · 6月25日14

"Knowledge wants to be free" is an unofficial Interconnects mission statement, courtesy of @xeophon

译"知识渴望自由"是 Interconnects 的非官方使命声明，感谢 @xeophon。

Berryxia.AI@berryxia · 6月25日78

Qwen直接训了一个能模拟7种Agent环境的语言世界模型，叫Qwen-AgentWorld。它不是先训Agent再加环境，而是从头就把“环境建模”当成核心训练目标。模型要学会预测终端会输出什么、网页会怎么变、代码执行后状态如何变化，而不是只学会怎么去操作。他们做了两个方向的探索：一个是把世界模型做成高质量的环境模拟器，用来跑可控的Sim RL，结果发现用模拟环境训练的Agent，在某些任务上甚至能超过真实环境训练的Agent。另一个更有意思：单纯让模型做环境预测（不做任何Agent训练），这个预测能力居然能直接迁移到真实的多轮Agent任务上，在多个benchmark上都有明显提升，包括一些完全没见过的领域。 Qwen这次开源了35B的MoE版本和对应的benchmark。核心思路很清晰：想让Agent变强，先让它真正“懂”环境，不只是只教它怎么行动。

译Qwen-AgentWorld是一个原生语言世界模型，端到端以环境建模为训练目标，而非事后适配。它在单一模型中模拟MCP、Search、Terminal、SWE、Web、OS、Android等7种Agent环境，并在AgentWorldBench上超越Claude Opus 4.8和GPT-5.4。两个探索方向：1）将世界模型用作可控Sim RL的环境模拟器，模拟环境训练的Agent在部分任务上超过真实环境训练；2）仅做环境预测（不进行Agent训练）的预测能力零微调迁移到多轮Agent任务，在多个benchmark上均有提升。已开源35B MoE版本及对应benchmark。

Rohan Paul@rohanpaul_ai · 6月25日57

Sentient Foundation just launched a $42M open-source AGI funding program to back researchers, developers, and startups building advanced AI outside closed corporate stacks. The program has 2 tracks: grants with no equity and no claim on the work, and investments for companies turning open-source AI into commercial products while keeping openness central. Projects can qualify without open-sourcing everything, as long as at least one essential part is open, useful, and important to adoption. The review standard is technical quality, ecosystem value, openness, and long-term potential, which makes the program more like infrastructure funding than charity. imo, the strongest part is the split between public-good grants and startup capital, because open AI needs both unpaid research infrastructure and companies that can survive market pressure.

译Sentient Foundation 近日推出 4200 万美元开源 AGI 资助计划，支持研究人员、开发者和初创公司。计划设两个轨道：无需股权或成果索取的资助，以及面向将开源 AI 商业化的公司的投资。申请无需全部开源，只要求至少一个关键部分开放。评审标准包括技术质量、生态价值、开放性和长期潜力。该计划与阿里云、Franklin Templeton、普林斯顿大学和印度科学研究所合作，旨在保持 AGI 开放、去中心化并符合人类利益。

OpenClaw🦞@openclaw · 6月24日46

🦞 OpenClaw 2026.6.10 just dropped. Just a small release to keep things brewing: ⚡ Automatic fast mode for short talks 🧠 Much more reliable model routing 🔒 Safer session state + trusted policies 🛠️ Better provider onboarding Helping deliver rock-solid lobsters. 🦞 https://github.com/openclaw/openclaw/releases/tag/v2026.6.10

译🦞 OpenClaw 2026.6.10 刚刚发布。只是一个小型发布，保持进展： ⚡ 短对话自动快速模式 🧠 更可靠的模型路由 🔒 更安全的会话状态 + 受信任的策略 🛠️ 更好的提供商接入帮助交付坚如磐石的龙虾。🦞

OpenBMB@OpenBMB · 6月24日63

Big thanks to @JackdeS11 for bringing VoxCPM-0.5B fully on‑device to iPhone! 🎉❤️ The entire stack (MiniCPM4 + LocDiT flow‑matching + AudioVAE) runs on Neural Engine and GPU, with no network required. Great work! 👍👍

译面壁智能（OpenBMB）的扩散式 TTS 模型 VoxCPM-0.5B 已通过 Apple Core AI 完全部署至 iPhone 端侧，无需联网。该模型整合了 MiniCPM4 语言模型、LocDiT flow-matching 和 AudioVAE，每一层均运行于 Neural Engine 和 GPU 上。模型权重和部署代码已开源至 HuggingFace 与 GitHub。

elvis@omarsar0 · 6月24日45

This is just wild! I really like the Google Workspace CLI project. Sad to see this kind of treatment to folks that want to push interesting ideas out. There is so much to explore that it’s puzzling to this.

译谷歌前员工@JPoehnelt称，两月前他因创建Google Workspace CLI（非官方）被解雇。该工具发布后迅速登顶Hacker News #1，获得数千GitHub Stars和数万用户。他回忆，从总监询问如何学习该工具到被法律团队质问为何在GitHub仓库使用谷歌Logo和品牌色，经历令人困惑。他认为解雇原因是Workspace和部分领导担心被颠覆，尤其是对Agent冲击的恐惧。讽刺的是，他被解雇两天前，谷歌在Cloud Next宣布官方Workspace CLI即将到来。他在谷歌近7年，感谢团队支持。

swyx 🔜 @aiDotEngineer@swyx · 6月24日41

btw Zai IPO'ed in Jan at HK$120 a share. when I first met @louszbd nobody really knew anyone using GLM's. now they have beat deepseek with the world's undisputed top open model and in some respects (see @ml_angelopoulos) say top model period, and are returning to SF @aidotengineer on top of the world and open for business! excited for @Thom_Wolf and @ZixuanLi_ to chat onstage!

译智谱AI（Zai）1月以每股120港元在港IPO。其GLM-5.2模型击败DeepSeek，成为全球公认的最佳开源模型，并在部分基准上整体表现领先。团队首次现身硅谷，参加AI Engineer World's Fair，将分享最新工作进展。

Peter Steinberger 🦞@steipete · 6月24日56

Google fired the guy that made the google workspace cli, because he made the google workspace cli. Lucky me, Google can't fire me. https://gogcli.sh

译Peter Steinberger 嘲讽 Google 解雇了创建 Google Workspace CLI 的员工 @JPoehnelt，并发布自己的替代工具 gogcli.sh。据被解雇者称，两个月前他因开发该 CLI 被解雇；工具曾在 Hacker News 登顶 #1，数日内获数千 GitHub 星和大量用户。他分析解雇原因是 Workspace 内部对 AI 智能体颠覆现状的恐惧，讽刺的是被解雇前两天他刚得知 Google Cloud Next 将发布官方 Workspace CLI。