Artificial Analysis@ArtificialAnlys

2026-06-17 12:50·15天前

AI 摘要

Artificial Analysis 昨日发布 Intelligence Index v4.1 更新，主要变化有三项：升级的评测 Terminal-Bench 2.1、τ³-Bench Banking 和 GDPval-AA v2；提供每项任务的成本、时间与模型 token 消耗数据，并展示这些指标与智能水平的权衡；新增缓存输入 token 报告，显示特定模型使用的缓存 token 量及其对成本的影响。

Following up on our Intelligence Index v4.1 release yesterday， in the video below， Daniel from our team shares a short overview of what's changed：

Three upgraded evaluations： Terminal-Bench 2.1， τ3-Bench Banking and GDPval-AA v2

Cost， time， and tokens per task： Understand the cost， time， and tokens of tasks across our Index and for individual evals， and how these trade off against Intelligence

Cached input token reporting： We now report the amount of cached tokens a particular model uses and how this influences cost

评测/基准部署/工程

在 X 查看原推导出 Markdown

Artificial Analysis@ArtificialAnlys · X

52导出 Markdown

2026-06-17 12:50·15天前

在 X 看原推· x.com

AI 摘要

Following up on our Intelligence Index v4.1 release yesterday， in the video below， Daniel from our team shares a short overview of what's changed：

Three upgraded evaluations： Terminal-Bench 2.1， τ3-Bench Banking and GDPval-AA v2

Cost， time， and tokens per task： Understand the cost， time， and tokens of tasks across our Index and for individual evals， and how these trade off against Intelligence

Cached input token reporting： We now report the amount of cached tokens a particular model uses and how this influences cost