中国公司 z.AI 以 MIT 许可证开源 GLM-5.2 模型,拥有百万 token 上下文窗口,基于华为昇腾芯片训练,性能接近 Claude Opus 4.8 和 GPT-5.5。与此同时,Amazon、Meta、Uber 等美国公司因工程师过度消耗 token 而开始限制 AI 预算(Uber 每员工上限 1500 美元),推动开源模型需求。GLM 团队源自学术项目,长期适配国产芯片;DeepSeek 投入 28 亿美元,共同成为“Tokenmaxxing”趋势的替代方案。
http://x.com/i/article/2069762663366975488
Tokenmaxxing is dying, and Chinese open-source models fill the gap
Amazon, Meta, and Uber are capping the token spend as GLM-5.2 and DeepSeek give their models away for free.
Over the past week, a new Chinese model called GLM-5.2 has set off another round of alarm in Silicon Valley. Released by the company z.AI under a permissive open-source license, it takes direct aim at the coding and agentic-workflow business that Anthropic has built its reputation on - and running on a one-million-token context window, it lands surprisingly close to Claude Opus 4.8 and OpenAI's GPT-5.5. The open-source community is ecstatic.
At the same moment, America's "unlimited AI credits" mania is draining away. Amazon, Meta and others are killing their no-limits AI plans. After Uber's engineers burned through a full year's AI budget in four months, the company capped each employee at $1,500. Even Microsoft CEO Satya Nadella has warned that the industry can't let a few AI giants swallow the whole economy.
The link between open-source models and what people now call "Tokenmaxxing" is simple enough: programmers burn too many tokens, the bills get too big, and faced with a mountain of invoices, people reach for the open-source option.
This is not the Tokenmaxxing takedown you've read on Substack, though. Because a few questions kept nagging at me. If open-source models can do the job, why is anyone still topping up their Claude account? And if everyone runs to open-source, how does anyone building a model make money?
It was only after GLM-5.2 shipped that I arrived at a first answer. Both of these waves - the rush to open-source and the rush to burn tokens - come down to the same thing: how we decide to think about a token.
Born Out of Scarcity
Start with the open-source side, and start with GLM-5.2.
Z.ai has released the core weights of GLM-5.2 under an unrestricted MIT license. Any company can download it free from Hugging Face, customize or fine-tune it, and run it locally or on a virtual machine. Standing the thing up is still a slog, but next to the now-delisted Fable 5, it's a genuinely good option. The model was built on Huawei's Ascend chips - no Nvidia hardware involved.
中国公司 z.AI 以 MIT 许可证开源 GLM-5.2 模型,拥有百万 token 上下文窗口,基于华为昇腾芯片训练,性能接近 Claude Opus 4.8 和 GPT-5.5。与此同时,Amazon、Meta、Uber 等美国公司因工程师过度消耗 token 而开始限制 AI 预算(Uber 每员工上限 1500 美元),推动开源模型需求。GLM 团队源自学术项目,长期适配国产芯片;DeepSeek 投入 28 亿美元,共同成为“Tokenmaxxing”趋势的替代方案。
http://x.com/i/article/2069762663366975488
Tokenmaxxing is dying, and Chinese open-source models fill the gap
Amazon, Meta, and Uber are capping the token spend as GLM-5.2 and DeepSeek give their models away for free.
Over the past week, a new Chinese model called GLM-5.2 has set off another round of alarm in Silicon Valley. Released by the company z.AI under a permissive open-source license, it takes direct aim at the coding and agentic-workflow business that Anthropic has built its reputation on - and running on a one-million-token context window, it lands surprisingly close to Claude Opus 4.8 and OpenAI's GPT-5.5. The open-source community is ecstatic.
But GLM-5.2 is not another DeepSeek. DeepSeek's Liang Wenfeng came out of a quant fund, is worth billions, and has chosen near-total seclusion. (He recently put about $2.8 billion of fresh money into DeepSeek)
Z.ai, by contrast, is an open-source model maker that's already publicly listed in Hong Kong. It has no billionaire patron, and its road has been every bit as winding as DeepSeek's.
In 2020, BAAI's Tang Jie argued the language model still deserved the effort. Of BAAI's 480 A100 cards, 400 went to Tang's team.
Tang also tried Huawei's 910A and 920 chips. On large-model training, the 920's operator efficiency was just 18% of an A100's; after Tang's team helped rewrite the operators, they pushed it to roughly 40%, and trained a 13B code model, CodeGeeX.
But Tang's real goal was 100B-parameter model, even 2,000 910A cards weren't enough. In the end, Tang turned to z.AI, the company he'd founded back in 2018, rented 1,000 cards. In July 2022, they finally had their hundred-billion model: GLM-130B.
I tell his story because he embodies the type. Most of China's open-source AI companies grew out of academic projects; they incorporated mainly because they needed to buy compute, and they open-sourced their architecture to keep their academic visibility.
Starved of chips, they learned to adapt to whatever domestic silicon they could get. Z.ai wasn't placed on the U.S. entity list until 2025, but it was already optimizing for Huawei chips in 2020. Localized compute and open architecture became, almost by default, the signature of Chinese AI.
The open-source bet has its skeptics inside China, too. In 2024, Baidu founder Robin Li argued that closed models were more powerful and cheaper to run than open ones. His point being that closed models came with more compute and bigger teams, and that ERNIE was nearly a match for ChatGPT. (A little ironic, isn't it?) ERNIE was not, in fact, in ChatGPT's league, and China never produced a closed model strong enough to make Li's case.
Turning open-source into profit is a hard problem. In a 2025 interview, a z.AI expert described the company's three possible lanes - inference, agentic, and coding - and said z.AI chose coding. MiniMax, by contrast, chose multimodal AI and AI companionship. At the time it wasn't an obvious call: z.AI's business leaned on enterprise and government contracts, coding showed no clear path to profit, and multimodal could win consumers directly. Z.ai was not the favorite.
Then the AI-coding boom arrived. Z.ai's latest results show a net loss of about ¥3.18B ($444M) against R&D spending of roughly ¥3.2B ($444M). Still in the red - but strip out the open-ended spend on compute, and z.ai's revenue can cover day-to-day operations. If it can get cheaper chips, or use its chips more efficiently, or land a wave of enterprise buyers, the losses could narrow. That would be good news.
In a sense, z.AI may owe Anthropic a thank-you note: both for the AI-doom evangelism and for the AI-coding fervor. Anthropic's strong models cultivated customers, and its incessant messaging then drove some of them away. One of the places those customers landed was z.AI.
A first conclusion, then: going open-source is a passive choice: a Chinese model maker admitting, out loud, that it's behind on both compute and model quality. But if closed-model progress stalls, users won't keep paying premium prices for closed-model tokens; they'll choose open-source on their own. The Chinese saying fits: just hold your plate steady, and the roast duck falls from the sky.
Remember to Like & Subscribe!
Water, Electricity, and a Bad Analogy
Now the other wave : Tokenmaxxing.
GLM-5.2, DeepSeek and Kimi are mostly catching customers who fled the bills. But if OpenAI and Anthropic were good enough, would open-source still persuade anyone?
Then Alibaba gave me a frame. In a March internal memo, CEO Wu Yongming argued that in the AI era, the token would become a basic factor of production, the way traffic was in the internet era. Alibaba set up the Alibaba Token Hub (ATH) around that idea.
Follow the logic. In the age of electrification, a country's electricity output and its GDP growth tend to rise together - no nation ever went bankrupt building power plants. So I looked at U.S. electricity prices, consumption and GDP from the 1920s to the 1960s.
As prices fell, total spending on electricity rose 6.2x, but nominal GDP rose 11.1x. Americans spent relatively less on power and got more output for it.
The pattern doesn't always hold cleanly, though. Through the fast-industrializing decades in Japan, China, and West Germany, electricity spending actually outran GDP. But in West Germany and Japan, even during those high-growth years, the share of GDP eaten by electricity fell sharply to almost 2.0%.
That suggests is a kind of lag: a rising industrial economy takes roughly fifteen years to work through the adjustment and reach the point where cheap power finally translates into abundant output.
If Wu is right and tokens really are AI's water and electricity, they ought to deliver something similar. But run the numbers and the story breaks. Over the past four years, the cost of a given unit of AI dropped more than 90 percent, while total token spending rose 70x. My god.
If this is water and electricity, the bill is climbing far too fast. A seventyfold jump in token spending over four years has not produced anything like a matching surge in what society actually makes. Yes, the data centers went up, and the chips are back-ordered for months. But none of it has meaningfully improved the quality or efficiency of production outside the AI industry itself.
What breaks the "AI as utility" analogy is the reasoning model. Across coding and agentic tasks, a model now generates thousands of internal reasoning tokens before it answers, pushing single-task consumption 10 to 100 times higher than older models.
So how much does all that buy you? In an NBER paper, DeMiller, Musolff and Yang measured the gains from AI coding tools across four stages of work:
Writing a single file: +290%
Bulk work: +150%
A specific deliverable: +50%
A shipped, delivered product: +30%
In other words, even in coding - the thing AI does best - the gains shrink fast as you zoom out from a single file to a finished product. Optimizing the whole pipeline is far harder than optimizing one slice of it.
Three Months of Unlimited Tokens
As latecomers, Chinese firms tried to copy the Tokenmaxxing wave too. Per public reports in March, Tencent gave core R&D teams an annual token package worth about $31,700 each, plus $1,000 a month for outside tools; ByteDance opened its internal AI tools for unlimited use and reimbursed half of employees' personal AI experiments, capping technical staff at $1,000 a year; Baidu handed engineers unlimited ERNIE access plus up to $800 a year for outside tokens; 360 simply loaded every employee with 100 million tokens.
The recalibration came fast. Three months later, Tencent's Hunyuan team was capped at roughly $970 worth of outside models, and everyone moved onto quotas - though using Tencent's own Hunyuan model stayed unlimited. ByteDance staff likewise faced no limit on its in-house TRAE tool. Internally, Tencent came out against usage rankings, refusing to treat token consumption as a single yardstick for output.
The reason was simple: Chinese companies wanted real output, and they weren't seeing it. One employee, speaking anonymously, described a team that built workflows across several different models - only to find the AI-generated pieces wouldn't fit together, and to scrap the whole thing and start over. Twenty-odd people spent about $6,900 in tokens in a month and had nothing to show for it. At some firms, the free tokens got quietly repurposed - for analyzing stocks, say - and the company had no idea where they'd gone.
Meta is tightening what employees can spend on Anthropic and other providers - a sharp reversal from the scene a few months earlier, when staff competed to burn tokens. Bloomberg has reported that Uber and Walmart each capped AI coding-tool use; the Financial Times reported that Amazon scrapped the internal leaderboard that ranked employees by AI usage.
A June report from the consultancy Bain, titled Your AI Budget Is Growing. Your Returns Aren't. Here's Why., found that among companies able to quantify AI's cost savings, 40 percent saw actual savings of 10 percent or less. Of the 37 percent who'd targeted savings of 11 to 20 percent, only 31 percent actually got there.
The grassroots buying isn't over, though. One ByteDance engineer pays for Claude Max - $100 a month reimbursed - to write what he considers the cleanest code. Better than DeepSeek, by his lights, and GLM he can't get. But one employee's purchase doesn't make the whole company better off. Tokenmaxxing shifts an individual's cost onto the employer.
The irony is that the last firm into the water was the first one out. Tencent, a relative laggard in China's AI race, quit Tokenmaxxing earlier than anyone. ByteDance is still touting its numbers: as of June, it says, daily token calls to its Doubao model topped 180 trillion, up more than tenfold in a year.
At the same moment, America's "unlimited AI credits" mania is draining away. Amazon, Meta and others are killing their no-limits AI plans. After Uber's engineers burned through a full year's AI budget in four months, the company capped each employee at $1,500. Even Microsoft CEO Satya Nadella has warned that the industry can't let a few AI giants swallow the whole economy.
The link between open-source models and what people now call "Tokenmaxxing" is simple enough: programmers burn too many tokens, the bills get too big, and faced with a mountain of invoices, people reach for the open-source option.
This is not the Tokenmaxxing takedown you've read on Substack, though. Because a few questions kept nagging at me. If open-source models can do the job, why is anyone still topping up their Claude account? And if everyone runs to open-source, how does anyone building a model make money?
It was only after GLM-5.2 shipped that I arrived at a first answer. Both of these waves - the rush to open-source and the rush to burn tokens - come down to the same thing: how we decide to think about a token.
Born Out of Scarcity
Start with the open-source side, and start with GLM-5.2.
Z.ai has released the core weights of GLM-5.2 under an unrestricted MIT license. Any company can download it free from Hugging Face, customize or fine-tune it, and run it locally or on a virtual machine. Standing the thing up is still a slog, but next to the now-delisted Fable 5, it's a genuinely good option. The model was built on Huawei's Ascend chips - no Nvidia hardware involved.
But GLM-5.2 is not another DeepSeek. DeepSeek's Liang Wenfeng came out of a quant fund, is worth billions, and has chosen near-total seclusion. (He recently put about $2.8 billion of fresh money into DeepSeek)
Z.ai, by contrast, is an open-source model maker that's already publicly listed in Hong Kong. It has no billionaire patron, and its road has been every bit as winding as DeepSeek's.
In 2020, BAAI's Tang Jie argued the language model still deserved the effort. Of BAAI's 480 A100 cards, 400 went to Tang's team.
Tang also tried Huawei's 910A and 920 chips. On large-model training, the 920's operator efficiency was just 18% of an A100's; after Tang's team helped rewrite the operators, they pushed it to roughly 40%, and trained a 13B code model, CodeGeeX.
But Tang's real goal was 100B-parameter model, even 2,000 910A cards weren't enough. In the end, Tang turned to z.AI, the company he'd founded back in 2018, rented 1,000 cards. In July 2022, they finally had their hundred-billion model: GLM-130B.
I tell his story because he embodies the type. Most of China's open-source AI companies grew out of academic projects; they incorporated mainly because they needed to buy compute, and they open-sourced their architecture to keep their academic visibility.
Starved of chips, they learned to adapt to whatever domestic silicon they could get. Z.ai wasn't placed on the U.S. entity list until 2025, but it was already optimizing for Huawei chips in 2020. Localized compute and open architecture became, almost by default, the signature of Chinese AI.
The open-source bet has its skeptics inside China, too. In 2024, Baidu founder Robin Li argued that closed models were more powerful and cheaper to run than open ones. His point being that closed models came with more compute and bigger teams, and that ERNIE was nearly a match for ChatGPT. (A little ironic, isn't it?) ERNIE was not, in fact, in ChatGPT's league, and China never produced a closed model strong enough to make Li's case.
Turning open-source into profit is a hard problem. In a 2025 interview, a z.AI expert described the company's three possible lanes - inference, agentic, and coding - and said z.AI chose coding. MiniMax, by contrast, chose multimodal AI and AI companionship. At the time it wasn't an obvious call: z.AI's business leaned on enterprise and government contracts, coding showed no clear path to profit, and multimodal could win consumers directly. Z.ai was not the favorite.
Then the AI-coding boom arrived. Z.ai's latest results show a net loss of about ¥3.18B ($444M) against R&D spending of roughly ¥3.2B ($444M). Still in the red - but strip out the open-ended spend on compute, and z.ai's revenue can cover day-to-day operations. If it can get cheaper chips, or use its chips more efficiently, or land a wave of enterprise buyers, the losses could narrow. That would be good news.
In a sense, z.AI may owe Anthropic a thank-you note: both for the AI-doom evangelism and for the AI-coding fervor. Anthropic's strong models cultivated customers, and its incessant messaging then drove some of them away. One of the places those customers landed was z.AI.
A first conclusion, then: going open-source is a passive choice: a Chinese model maker admitting, out loud, that it's behind on both compute and model quality. But if closed-model progress stalls, users won't keep paying premium prices for closed-model tokens; they'll choose open-source on their own. The Chinese saying fits: just hold your plate steady, and the roast duck falls from the sky.
Remember to Like & Subscribe!
Water, Electricity, and a Bad Analogy
Now the other wave : Tokenmaxxing.
GLM-5.2, DeepSeek and Kimi are mostly catching customers who fled the bills. But if OpenAI and Anthropic were good enough, would open-source still persuade anyone?
Then Alibaba gave me a frame. In a March internal memo, CEO Wu Yongming argued that in the AI era, the token would become a basic factor of production, the way traffic was in the internet era. Alibaba set up the Alibaba Token Hub (ATH) around that idea.
Follow the logic. In the age of electrification, a country's electricity output and its GDP growth tend to rise together - no nation ever went bankrupt building power plants. So I looked at U.S. electricity prices, consumption and GDP from the 1920s to the 1960s.
As prices fell, total spending on electricity rose 6.2x, but nominal GDP rose 11.1x. Americans spent relatively less on power and got more output for it.
The pattern doesn't always hold cleanly, though. Through the fast-industrializing decades in Japan, China, and West Germany, electricity spending actually outran GDP. But in West Germany and Japan, even during those high-growth years, the share of GDP eaten by electricity fell sharply to almost 2.0%.
That suggests is a kind of lag: a rising industrial economy takes roughly fifteen years to work through the adjustment and reach the point where cheap power finally translates into abundant output.
If Wu is right and tokens really are AI's water and electricity, they ought to deliver something similar. But run the numbers and the story breaks. Over the past four years, the cost of a given unit of AI dropped more than 90 percent, while total token spending rose 70x. My god.
If this is water and electricity, the bill is climbing far too fast. A seventyfold jump in token spending over four years has not produced anything like a matching surge in what society actually makes. Yes, the data centers went up, and the chips are back-ordered for months. But none of it has meaningfully improved the quality or efficiency of production outside the AI industry itself.
What breaks the "AI as utility" analogy is the reasoning model. Across coding and agentic tasks, a model now generates thousands of internal reasoning tokens before it answers, pushing single-task consumption 10 to 100 times higher than older models.
So how much does all that buy you? In an NBER paper, DeMiller, Musolff and Yang measured the gains from AI coding tools across four stages of work:
Writing a single file: +290%
Bulk work: +150%
A specific deliverable: +50%
A shipped, delivered product: +30%
In other words, even in coding - the thing AI does best - the gains shrink fast as you zoom out from a single file to a finished product. Optimizing the whole pipeline is far harder than optimizing one slice of it.
Three Months of Unlimited Tokens
As latecomers, Chinese firms tried to copy the Tokenmaxxing wave too. Per public reports in March, Tencent gave core R&D teams an annual token package worth about $31,700 each, plus $1,000 a month for outside tools; ByteDance opened its internal AI tools for unlimited use and reimbursed half of employees' personal AI experiments, capping technical staff at $1,000 a year; Baidu handed engineers unlimited ERNIE access plus up to $800 a year for outside tokens; 360 simply loaded every employee with 100 million tokens.
The recalibration came fast. Three months later, Tencent's Hunyuan team was capped at roughly $970 worth of outside models, and everyone moved onto quotas - though using Tencent's own Hunyuan model stayed unlimited. ByteDance staff likewise faced no limit on its in-house TRAE tool. Internally, Tencent came out against usage rankings, refusing to treat token consumption as a single yardstick for output.
The reason was simple: Chinese companies wanted real output, and they weren't seeing it. One employee, speaking anonymously, described a team that built workflows across several different models - only to find the AI-generated pieces wouldn't fit together, and to scrap the whole thing and start over. Twenty-odd people spent about $6,900 in tokens in a month and had nothing to show for it. At some firms, the free tokens got quietly repurposed - for analyzing stocks, say - and the company had no idea where they'd gone.
Meta is tightening what employees can spend on Anthropic and other providers - a sharp reversal from the scene a few months earlier, when staff competed to burn tokens. Bloomberg has reported that Uber and Walmart each capped AI coding-tool use; the Financial Times reported that Amazon scrapped the internal leaderboard that ranked employees by AI usage.
A June report from the consultancy Bain, titled Your AI Budget Is Growing. Your Returns Aren't. Here's Why., found that among companies able to quantify AI's cost savings, 40 percent saw actual savings of 10 percent or less. Of the 37 percent who'd targeted savings of 11 to 20 percent, only 31 percent actually got there.
The grassroots buying isn't over, though. One ByteDance engineer pays for Claude Max - $100 a month reimbursed - to write what he considers the cleanest code. Better than DeepSeek, by his lights, and GLM he can't get. But one employee's purchase doesn't make the whole company better off. Tokenmaxxing shifts an individual's cost onto the employer.
The irony is that the last firm into the water was the first one out. Tencent, a relative laggard in China's AI race, quit Tokenmaxxing earlier than anyone. ByteDance is still touting its numbers: as of June, it says, daily token calls to its Doubao model topped 180 trillion, up more than tenfold in a year.