# Frontier Radar #3：智能体AI如何将token转化为业务指标

- 来源：The Decoder：AI News（RSS）
- 作者：Maximilian Schreiner
- 发布时间：2026-06-08 21:54
- AIHOT 分数：62
- AIHOT 链接：https://aihot.virxact.com/items/cmq5af63v0616slt2m3cmrduf
- 原文链接：https://the-decoder.com/frontier-radar-3-how-agentic-ai-is-turning-tokens-into-a-business-metric

## AI 摘要

生成式AI商业模式正从月订阅制转向按token消耗计费。智能体工作流消耗数倍于传统对话的token，可自主运行数小时，使固定费率不可持续。token价格因响应速度、专业化程度及结果经济价值而异。本期报告梳理订阅制向消费制迁移、看似低廉的token单价背后的实际成本，以及为何仅凭token消耗量衡量AI价值创造是片面的。

## 正文

Frontier Radar #3: How agentic AI is turning tokens into a business metric

Monthly subscription, open chat, ask a question: that's how generative AI worked until now. Agentic workflows blow up this model. They burn through far more tokens, run autonomously for hours, and make flat rates untenable for providers. At the same time, token prices are splitting along axes of speed, specialization, and economic value. But while costs get more precise, the benefits often stay vague. The result: token usage becomes a stand-in metric for value creation, even though it only measures activity, not outcomes.

Six times a year, THE DECODER's editorial team takes an in-depth look at a fundamental AI topic in its "Frontier Radar," as a newsletter and exclusively here on the site for THE DECODER subscribers. Issue #3 covers the emerging token economy of generative AI. Issue #1 looked at the current state of agentic AI. Issue #2 examined the measurable impact of AI on productivity.

For a long time, generative AI felt like classic software. Sign up for a monthly plan, open a chat, ask a question, and get an answer. Power users could always see through APIs what individual requests actually cost. That's why many of them went with flat rates, which were much cheaper under heavy use. For most users, though, the costs stayed invisible.

Flat rates worked broadly because human usage has natural limits. People type slowly, read answers, take breaks, go to meetings, and clock out. An agent doesn't know those limits. It reads files, calls tools, writes code, checks intermediate results, fixes errors, and tries again. If the user wants, it keeps going until the task is done.

There's also the pressure on the provider side: The big AI companies have poured hundreds of billions of dollars into data centers, chips, and model training. Those investments have to pay off, at a scale that flat rates simply can't support.

This issue of the Frontier Radar maps out the emerging token economy along these lines. How is billing shifting from subscription to usage? How is the token itself becoming a segmented product? And why is token usage still a poor measure of AI value?

Why providers are walking away from flat rates

The most visible change is the overhaul of pricing models in response to growing usage. Starting June 1, 2026, GitHub Copilot is gradually moving to a usage-based model with "GitHub AI Credits." The credits are tied to actual token usage and the API prices of each model. They kick in wherever Copilot does more than just suggest code, mainly in chat, CLI, and agent features. Standard completions stay free of these rules in paid plans.

GitHub's reasoning nails the problem: a short chat question used to be treated about the same as an autonomous coding session running for hours. That can't last.

Anthropic is also drawing a sharper line between normal use and agentic workflows. Claude Code, Claude Cowork, and Managed Agents turn Claude into a digital worker. Anthropic blamed bottlenecks at Claude Code on peak loads and contexts of up to one million tokens. The older plans fit heavy chat use but not always-on agent workflows.

How sharply usage differs between fields shows up in Anthropic's own analysis of its public API: nearly half of all agentic tool calls go to software development, the area that first benefited from agentic models and scaffolding like Claude Code.

Customer service, sales, finance, and e-commerce each sit at just a few percent. Simple chat requests still dominate there. That spread will likely widen as agentic workflows mature in office, research, finance, and legal tools. With it, the token bill moves into areas where it isn't yet felt today.

Why the token price alone is misleading

This development shifts the cost question: As long as AI was mainly used as a chat tool, the price per token could feel like a technical footnote. In agentic workflows, it becomes a business metric.

The most obvious mistake in the new token economy is a flat price comparison. GPT-5.5 costs $30 per million output tokens, DeepSeek V4 Pro 87 cents. That says little about actual costs in use. Beyond price per token, what matters is consumption per task. Like with a car, the price of gas alone tells you nothing about what a drive from Berlin to Munich costs. You also have to know the distance and the mileage.

A cheap model can get expensive if it needs more tries, fails more often, or requires more cleanup. A pricier model pays off when it gets to the goal with fewer loops and needs less human oversight.

Benchmarks and other analyses make this clear. GPT-5.5, for instance, was supposed to offset part of its higher list price with shorter answers. An analysis of real-world usage by OpenRouter still showed cost increases of 49 to 92 percent over its predecessor, depending on input length.

Of course, both can rise: the token price and the number of tokens consumed, as with Google's Gemini 3.5 Flash. Here, the token price jumped threefold over the predecessor Gemini 3 Flash. In Artificial Analysis's evaluation, the model also needed more steps in the Intelligence Index run. The result: in that test, it ended up more expensive than Google's current flagship, Gemini 3.1 Pro.

Pushing the other way is the price pressure from providers like DeepSeek. Behind the rock-bottom prices is a bet of its own: if you pay only a fraction per token, you can run the same job four or five times and still come out cheaper. As long as the final result holds up, that's attractive. Where it doesn't, rework quickly eats the price advantage.

How the token market is splitting by performance class

The more the market splits, the less sense it makes to talk about "the" token price. The price per million tokens still matters but only says something within a clear performance class. A fast token in a coding agent, a cheap token in a mass-market app, and a specialized token in security analysis can be billed in similar technical fashion, but they're different economic products.

Different model tiers and subscription levels have existed for a while. What's new is that the differentiation now spans more axes: latency, processing mode, context size, agent runtime, specialization, and increasingly the economic value of the output. Providers aren't just selling compute time in token form anymore. They're selling different inference services. The scarcer, faster, or more valuable that service is, the further the price can drift from raw compute costs.

Nvidia CEO Jensen Huang spelled this out in two recent interviews. On Dwarkesh Patel's show, he explains why Nvidia recently licensed the inference architecture of startup Groq and folded it into its own CUDA ecosystem. The reason is economic: the value of a token has risen so much that different prices for different token types now make sense.

Back in the old days, just a couple of years ago, Tokens were either free or barely expensive. But now you can have different customers, and those customers want different answers. Because the customers make so much money - for example, our software engineers - if I can give them much more responsive Tokens so that they're even more productive than they are today, I would pay for it. Jensen Huang, Nvidia

Back in the old days, just a couple of years ago, Tokens were either free or barely expensive. But now you can have different customers, and those customers want different answers. Because the customers make so much money - for example, our software engineers - if I can give them much more responsive Tokens so that they're even more productive than they are today, I would pay for it.

Jensen Huang, Nvidia

Huang is describing the technical side of this segmentation. Premium inference with lower latency pays off because tokens at the top of the market can command much higher prices. Nvidia talks about expanding the Pareto front: multiple optimal points of price and speed, depending on customer segment.

Where the value comes from the possible outcome, there is more segmentation possible. According to The Information, Palo Alto Networks tested Anthropic's security model Mythos to scan its own source code for vulnerabilities. The model reportedly found more than two dozen critical vulnerabilities in about three weeks, roughly five times as many as existing methods.

At the same time, the test quickly racked up token costs in the millions. Those costs can still be rational if the security holes found would cost many times more if exploited. The token in a run like that is economically a different product than the token in a chat reply, even if both are billed by token usage.

Another form of this segmentation shows up where tokens open access to proprietary data and specialized models. British biotech company Basecamp Research wants to scale its biological AI dataset from 10 billion to one trillion genes and other data points with its "Trillion Gene Atlas" project, to train models for drug development. The dataset is proprietary.

If such models deliver solid intermediate products like drug candidates or biologically viable hypotheses, a token run can't be compared to a chat or coding reply anymore. What matters then isn't what the token run costs technically, but what exclusive access it opens up: to proprietary data, specialized models, and possible intermediate products with high economic value.

In conversation with Lex Fridman, Huang puts it this way: computers used to be warehouses for data, today they're factories for tokens. And like every factory, this one produces several products at the same time.

The Tokens are starting to segment, like iPhones. You have free Tokens, you have premium Tokens, and you have several Tokens in the middle. […] The idea that somebody's willing to pay $1,000 per million Tokens is just around the corner. It's not if, it's only when. Jensen Huang

The Tokens are starting to segment, like iPhones. You have free Tokens, you have premium Tokens, and you have several Tokens in the middle. […] The idea that somebody's willing to pay $1,000 per million Tokens is just around the corner. It's not if, it's only when.

Jensen Huang

In Huang's reading, a market with clearly tiered segments is taking shape: tokens are increasingly tied to different value propositions.

The productivity gap and the temptation of tokenmaxxing

Agentic AI is billed by usage, and token prices are splitting by performance class. The cost side of AI use becomes more precise, higher, and more visible. That sharpens the questions: Does AI save time? Does it make people more productive? Does the spend pay off?

But the math is lopsided. Costs can be measured ever more exactly, while the benefits often stay vague: better decisions, faster research, less routine work, or earlier error detection.

We already described this gap between local productivity gains and the difficulty of measuring impact in Frontier Radar #2: Why AI productivity gets lost between benchmarks and the balance sheet.

Uber shows how hard the attribution gets even inside a single company. According to Fortune, the company burned through its planned 2026 AI coding tools budget in just four months. Uber COO Andrew Macdonald questioned whether rising use of Claude Code clearly translates into more useful consumer features. Token costs are known down to the cent. Whether they turn into products that users actually need, and that show up positively on the bottom line, is an open question.

One level up, in national accounting, the problem gets more fundamental. SemiAnalysis calls it "Dark Output": AI may be doing economically valuable work that barely shows up in traditional statistics. It becomes especially visible when tasks once paid for as consulting hours, legal services, or external contracts move into internal AI workflows. The token or cloud costs stay measurable, but the value of the work done no longer appears as its own transaction in GDP.

SemiAnalysis's argument: unlike screws or cars, the service sector has no countable unit of quantity. Statistical agencies derive the "volume" of services from revenue and list prices. If invoices from a law firm or agency disappear because the same work is done internally with AI, the statistics read that as an output decline, not a productivity gain.

Out of this double measurement gap comes a pragmatic stopgap in management. Because clean impact measurement is missing, token usage itself becomes the steering metric. More tokens, more agent runs, and higher tool adoption get read as signals of more value creation, even when nobody can cleanly prove the link. A term has emerged for this reflex: tokenmaxxing.

Tokenmaxxing is the assumption that more AI use automatically brings more benefit. The appeal of this thinking is its simplicity: if AI generally makes you productive, then more AI is generally better. And the only reliable measure of "more AI" is token usage. But that measures activity, not outcome. An agent that spends two hours solving a task wrong burns more tokens than one that solves it correctly in five minutes. In tokenmaxxing logic, the first would look more productive.

Agentic AI makes the problem worse in two ways. First, consumption rises massively. Second, the immediate human quality check falls away. In chat, the user sees the answer right away and judges it in the same second. An agent runs autonomously for minutes or hours and delivers a result at the end that has to be checked, fixed, or thrown out. Until then, token usage is the only signal about the run.

That's what makes tokenmaxxing so seductive in agentic systems: Once usage becomes the goal, the incentive is to burn tokens. Big tech companies like Meta and Amazon have already learned this the hard way.

Why agentic AI needs clear task framing

If token usage alone isn't a reliable steering metric, control has to start earlier: with the task itself, long before the output is generated. This is the real break with past practice. In chat, a bad prompt fails cheaply. The user sees the useless answer, rewrites, and is done. An agent, by contrast, is supposed to take on longer, more complex tasks. A failed attempt is much more expensive here. If a run breaks off after two hours with no result, the tokens are still gone.

Agentic AI therefore needs more than good prompts and context engineering. It needs clear task framing: What should be solved? Which data and tools are allowed? When does a human review? When does the agent abort? What can the attempt cost?

Every company knows this logic from working with freelancers or agencies. An editor doesn't tell a freelance writer "just write, no matter how long it takes." They give a topic, length, purpose, deadline, and fee.

An example: "Review this pull request with the standard model. If you spot security-relevant changes, escalate only the relevant files and hunks to the more expensive review model. Before each call, abort if the input context exceeds 200,000 tokens. Track cumulative input and output tokens, and stop if the review exceeds the token budget."

Setting limits like that is hard because the consumption of a task is tough to estimate in advance. In practice, the values have to be built up empirically per use case. Initial runs show typical token amounts, budgets get derived from them, and anomalies trigger alerts. Quality, cost, and accountability have to be planned together.

The example above also contains the practical answer to token segmentation. Using a cheap standard model for routine work and only escalating to a pricey specialist model when needed turns the abstract idea of different token classes into a concrete steering rule.

Early Mythos testers, according to The Information, already report exactly this kind of routing approach. The expensive model handles planning, evaluation, or critical analysis, while cheaper models do parts of the execution. What looks like product differentiation on the provider side becomes a routing architecture on the user side.

The token economy isn't an IT topic

That's why the token economy isn't a pure IT topic either. IT measures what happens technically. It builds dashboards, sets limits, and compares providers. But it usually can't judge whether a financial report or a report is good enough on the merits. That takes domain expertise.

Token economics will therefore likely become a skill that grows into many roles. Developers steer coding agents and weigh costs against test depth. Lawyers decide which contract reviews run automatically and where human review tips the balance.

Marketing teams budget agent runs for campaign analysis and judge whether the generated results justify another iteration. Financial analysts set the complexity threshold at which a report escalates from the cheaper standard model to a more powerful one.

Alongside that, a second steering layer is forming that reaches beyond individual roles. Procurement and finance negotiate credits, quotas, and provider terms in a market that's rebuilding its pricing logic. FinOps structures from the cloud business can be partly carried over but aren't enough on their own. Because like IT, FinOps can't judge whether an expensive run delivered the right result.

What token usage actually tells you in operations

Once task framing and routing architecture are in place, one question remains: how do you tell during operations whether a workflow is actually working?

The token economy only becomes steerable when usage and outcome are read together. Token usage then isn't a goal but a diagnostic signal. It shows where something's off but doesn't say what. Four symptom patterns can be distinguished in practice.

High usage, usable result. The most unremarkable case, and exactly for that reason, easy to miss. The task gets done, but more expensively than necessary. The causes usually lie in routing: a frontier model for a task a smaller one could have handled, a stuffed context dragged along at every step, or missing caching.

High usage, bad result. This is the biggest risk of the agentic era. Money is burned without anything usable at the end. The cause is rarely in one single spot, as unclear task framing, the wrong model class, and missing abort rules usually overlap. Was the task even solvable by an agent? Was the chosen model up to it? Did the agent know what "done" meant?

Low usage, high rework. Tokens are cheap because the model answers fast and thinks little. But every output has to be reworked by humans at length. The costs just shift from the token bill to the payroll. A more expensive model can end up cheaper in such cases. This pattern is especially deceptive because the token bill looks like a success.

Usage without attributable value. Token costs show up on the balance sheet, but nobody can say which process contributed what. Work that used to be done differently, externally, or not at all moves into internal token costs and vanishes there from value attribution. It's the same mechanism as Dark Output, just at the process level instead of the macro level. The only fix is clearly tying costs and benefits to processes and owners.

Where the token economy could go

Where the token economy stands in the coming years depends on more than models and prices. It also depends on how fast companies learn to steer AI work: framing tasks, assigning models deliberately, and judging results. This is where the two drivers meet: agentic usage and token segmentation run into the steering question. Three scenarios follow.

Baseline

The big providers roll out the hybrid model of base subscription plus usage-based credits across the board. First in software development, then in other functions like research, sales, and legal. Companies gradually build FinOps structures for AI, set up budgets per workflow, and experiment with model routing. Premium segments emerge in tightly bounded fields like cybersecurity, life sciences, and select research applications, without flipping the broad market. The debate over the real productivity contribution stays fuzzy because productivity gains still only partly show up on balance sheets. Token economics gets anchored as a management skill in domain roles, without becoming its own discipline.

Acceleration

If agent models and tool integration improve faster than expected, autonomous workflows spread quickly beyond software development: into cybersecurity, life sciences, finance, and consulting. The drivers are higher success rates per run, mature routing architectures, and pressure on the hyperscalers to refinance their capex. Token market segmentation speeds up. Jensen Huang's prediction of a market with tokens going up to $1,000 per million gets tested empirically. Companies that master task framing, routing, and diagnosis pull measurably ahead of less disciplined rivals. Differentiated prices per model class eventually turn into outcome-based pricing. "Pay per pull request," "pay per vulnerability," and later maybe even "pay per validated drug candidate."

Slowdown

If cases like Uber's pile up, where AI budgets explode without clear benefit, CFOs set harder limits and delay rollouts. The brakes are unreliable agents, high rework costs, regulatory requirements, and the persistent difficulty of proving productivity gains on the balance sheet. Providers come under pressure to guarantee result quality or cut prices. Low-cost providers like DeepSeek win market share without the agentic vision breaking through broadly. Token segmentation stays confined to narrowly scoped pilot workflows. Premium tokens exist, but find no mass market.

Our take

The baseline scenario is the most likely. The shift to usage-based models is already decided or underway at the big providers. A broad return to pure flat rates seems unlikely under current cost structures. At the same time, cases like Uber and the cost jumps at GPT-5.5 or Gemini 3.5 Flash show that companies still have to build the steering competence they need. That argues against fast acceleration.

A real slowdown is also unlikely. The investment pressure on providers and the early evidence of benefits in software development are too strong for that. More likely is a transition in which AI use becomes more expensive, more visible, and more actively managed.

In the agent era, the token becomes a business metric, comparable to the fuel consumption of a trucking company. To run economically, you have to know how many liters each trip burns, which trip needs which fuel, and which trip is even worth taking. The companies that will master this economy are the ones that can answer one question: which work are we buying with which tokens, and how do we know it was worth it?

AI News Without the Hype – Curated by Humans
