AI 摘要
Artificial Analysis 昨日发布 Intelligence Index v4.1 更新,主要变化有三项:升级的评测 Terminal-Bench 2.1、τ³-Bench Banking 和 GDPval-AA v2;提供每项任务的成本、时间与模型 token 消耗数据,并展示这些指标与智能水平的权衡;新增缓存输入 token 报告,显示特定模型使用的缓存 token 量及其对成本的影响。
Following up on our Intelligence Index v4.1 release yesterday, in the video below, Daniel from our team shares a short overview of what's changed:
- Three upgraded evaluations: Terminal-Bench 2.1, τ3-Bench Banking and GDPval-AA v2
- Cost, time, and tokens per task: Understand the cost, time, and tokens of tasks across our Index and for individual evals, and how these trade off against Intelligence
- Cached input token reporting: We now report the amount of cached tokens a particular model uses and how this influences cost