Claude Sonnet 5延续Anthropic隐藏涨价的模式
阅读原文· the-decoder.comClaude Sonnet 5在Artificial Analysis评测中以53分(峰值)排名第五,与GPT-5.5 (high)并列。虽输入/输出token价格维持$3/$15不变,但每任务平均成本达$2.29,高于Opus 4.8的$1.97。原因:max设置下输出token消耗比Sonnet 4.6多40%,在agent任务中运行约3倍循环,成本接近翻倍。Anthropic延续了通过新tokenizer增加约30% token消耗的隐藏涨价策略。Sonnet 5在CritPt仅得17%,低于多个更强模型;但Terminal-Bench v2.1、HLE、SciCode分别提升9、10、7分。目前有促销价$2/$10至9月1日,但隐藏成本上涨使其与低价竞品竞争时处于劣势。
Claude Sonnet 5 continues Anthropic's pattern of hiding price increases behind unchanged token rates
In an independent test, Claude Sonnet 5 placed fifth and beat the pricier Opus 4.8 on some agent-based tasks. But its massive jump in token consumption makes the model more expensive per task than Anthropic's previous top model.
Artificial Analysis evaluated Claude Sonnet 5 before its release and added it to its Intelligence Index. Sonnet 5 scored 53 points at peak performance, tying with GPT-5.5 (high) for fifth place. Four models rank higher: GPT-5.5 (xhigh) at 55, Opus 4.7 at 54, Opus 4.8 at 56, and Claude Fable 5, once again generally available as of today, at 60 points.

That's a six-point jump over Sonnet 4.6 (47 points), but Sonnet 5 chews through far more tokens to get there.
Same token prices, double the real cost
On paper, Sonnet 5 keeps the same token prices as its predecessor: $3 per million input tokens and $15 per million output tokens, while Opus 4.8 sits at $5 and $25. Yet according to Artificial Analysis, an average task in the Intelligence Index costs $2.29 with Sonnet 5, versus about $1.97 with Opus 4.8.
At the maximum performance setting ("max"), Sonnet 5 burns through about 40 percent more output tokens per task than Sonnet 4.6. In agent-based knowledge work benchmarks like AA-Briefcase and GDPval-AA, it runs about three times as many agent loops as its predecessor. Sonnet 4.6 cost about $1.20 per task. That's nearly doubled, even though Sonnet 5 beats Opus 4.8 on some of these tasks.

Anthropic is running a promotional rate of $2 or $10 per million tokens through September 1, but Artificial Analysis based its results on regular prices.
Complex reasoning still exposes Sonnet 5's limits
Sonnet 5 still falls short of larger models on reasoning- and knowledge-heavy benchmarks. On CritPt, a frontier physics reasoning test from Argonne National Labs and the University of Illinois, it scored 17 percent. That's 14 points above its predecessor but below GLM-5.2, Claude Opus, Fable, and GPT-5.5 in their higher configurations.
Elsewhere, Sonnet 5 shows solid gains over Sonnet 4.6: a 9-point jump on Terminal-Bench v2.1, 10 points on Humanity's Last Exam, and 7 points on SciCode. Scores on the remaining evaluations stayed roughly flat.
Anthropic keeps raising prices without saying so
Anthropic has done this before. When Opus 4.7 launched, token prices stayed flat on paper, but a new tokenizer chopped the same text into "approximately 30%" more tokens, inflating the real bill. Developer Abhishek Ray measured a 1.325x to 1.47x increase, and a community analysis of over 483 submissions found a 37.4 percent jump in tokens per request. With Sonnet 5, the tokenizer issue is compounded by the model's more agentic behavior, which eats through far more tokens per task.
Anthropic's models keep getting pricier with each generation, sometimes dramatically so, yet the official price lists don't reflect it. That kind of hidden cost creep is a hard sell when Chinese competitors like Deepseek V4 Pro and GLM-5.2 offer competitive performance at a fraction of the cost in the mid-range segment where Sonnet sits.
AI providers need more transparent pricing, like cost per standardized task or real-world knowledge work job, rather than raw token prices that lose meaning.