# 智能性价比

- 来源：Tomer Tunguz 博客（VC 分析）
- 发布时间：2026-06-03 08:00
- AIHOT 分数：66
- AIHOT 标记：精选
- AIHOT 链接：https://aihot.virxact.com/items/cmpy42n0t01jnslaxhzgxjmzh
- 原文链接：https://www.tomtunguz.com/tokens-per-result

## 精选理由

微软在模型发布卡上悄悄加了“平均token消耗”这个指标，这不是小改动，而是宣告AI从堆算力转向算账时代。Uber和Salesforce的预算教训已经很清楚了。

## AI 摘要

微软在模型发布卡中首次加入平均token使用量指标。其模型在SWE-Bench Verified上达71.6分，仅消耗约Claude Haiku 4.5三分之一的token。Artificial Analysis的Intelligence Index显示GPT 5.5与Claude Opus 4.8得分相近（约60分），但Opus 4.8运行成本高出40%（$4,685 vs $3,357）。Uber因四个月内AI预算超支而限制员工使用；Salesforce花费$3亿购买Anthropic tokens并冻结工程招聘。模型公司如今需同时在性能和成本两个维度竞争。

## 正文

Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.1

Average token usage.

In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies2, tokenmaxxing3, & all-out performance for many use cases is over.

Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case.4 Uber capped employee AI spending after blowing through its budget in four months.5 Salesforce is spending $300M on Anthropic tokens & has frozen engineering hires.6

This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar?

Artificial Analysis already benchmarks this.7 GPT 5.5 & Claude Opus 4.8 land within a point of each other on the Intelligence Index, around 60. Running the index costs $3,357 on GPT 5.5 & $4,685 on Opus 4.8. Same answer, 40% more expensive.

Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs.

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.

Introducing MAI-Code-1-Flash — Microsoft announces a new coding model with average token usage on the release card. ↩︎

The Unsustainable Subsidy — The era of AI subsidies is ending. ↩︎

Tokenmaxxing — Models that game benchmarks with extra tokens are losing their edge. ↩︎

Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI — Microsoft cancelled Claude Code licenses across its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) after engineering usage outran budgets. ↩︎

Uber caps employee AI spending after blowing through budget in 4 months — Uber caps employee AI spending after blowing through budget in four months. ↩︎

Salesforce Spends $300M on AI, Freezes Engineering Hires — Salesforce Spends $300M on AI, Freezes Engineering Hires. ↩︎

AI Model & API Providers Analysis — Independent analysis of AI model costs. ↩︎
