Tokenmaxxing is dying， and Chinese open-source models fill the gap

Amazon， Meta， and Uber are capping the token spend as GLM-5.2 and DeepSeek give their models away for free.

Over the past week， a new Chinese model called GLM-5.2 has set off another round of alarm in Silicon Valley. Released by the company z.AI under a permissive open-source license， it takes direct aim at the coding and agentic-workflow business that Anthropic has built its reputation on - and running on a one-million-token context window， it lands surprisingly close to Claude Opus 4.8 and OpenAI's GPT-5.5. The open-source community is ecstatic.

At the same moment， America's "unlimited AI credits" mania is draining away. Amazon， Meta and others are killing their no-limits AI plans. After Uber's engineers burned through a full year's AI budget in four months， the company capped each employee at $1，500. Even Microsoft CEO Satya Nadella has warned that the industry can't let a few AI giants swallow the whole economy.

The link between open-source models and what people now call "Tokenmaxxing" is simple enough： programmers burn too many tokens， the bills get too big， and faced with a mountain of invoices， people reach for the open-source option.

This is not the Tokenmaxxing takedown you've read on Substack， though. Because a few questions kept nagging at me. If open-source models can do the job， why is anyone still topping up their Claude account？ And if everyone runs to open-source， how does anyone building a model make money？

It was only after GLM-5.2 shipped that I arrived at a first answer. Both of these waves - the rush to open-source and the rush to burn tokens - come down to the same thing： how we decide to think about a token.

Born Out of Scarcity

Start with the open-source side， and start with GLM-5.2.

Z.ai has released the core weights of GLM-5.2 under an unrestricted MIT license. Any company can download it free from Hugging Face， customize or fine-tune it， and run it locally or on a virtual machine. Standing the thing up is still a slog， but next to the now-delisted Fable 5， it's a genuinely good option. The model was built on Huawei's Ascend chips - no Nvidia hardware involved.

X.PIN@thexpin · X

61导出 Markdown

2026-06-25 16:58·7天前

在 X 看原推· x.com

AI 摘要

中国公司 z.AI 以 MIT 许可证开源 GLM-5.2 模型，拥有百万 token 上下文窗口，基于华为昇腾芯片训练，性能接近 Claude Opus 4.8 和 GPT-5.5。与此同时，Amazon、Meta、Uber 等美国公司因工程师过度消耗 token 而开始限制 AI 预算（Uber 每员工上限 1500 美元），推动开源模型需求。GLM 团队源自学术项目，长期适配国产芯片；DeepSeek 投入 28 亿美元，共同成为“Tokenmaxxing”趋势的替代方案。

http://x.com/i/article/2069762663366975488

Tokenmaxxing is dying， and Chinese open-source models fill the gap

Amazon， Meta， and Uber are capping the token spend as GLM-5.2 and DeepSeek give their models away for free.

Tokenmaxxing is dying， and Chinese open-source models fill the gap

Born Out of Scarcity

Tokenmaxxing is dying， and Chinese open-source models fill the gap

Water， Electricity， and a Bad Analogy

Three Months of Unlimited Tokens

Born Out of Scarcity

Water， Electricity， and a Bad Analogy

Three Months of Unlimited Tokens