Artificial Analysis@ArtificialAnlys

2026-05-23 04:00·41天前

AI 摘要

基准测试显示，Cursor Composer 2.5在编程任务中展现出卓越的成本与效率优势。其单任务成本仅为Claude Opus 4.7的1/3至1/18，更是GPT-5.5的1/5至1/32。这一低成本源于其极高的token使用效率：完成全部测试仅消耗1.6M token，远低于同类模型的5.7M。效率方面，其平均任务完成时间约9分钟，比行业平均水平快约1.3倍，其快速版本更可将时间缩短至约7分钟。

Cursor Composer 2.5's is 3-18x cheaper than Opus 4.7 in Claude Code （medium reasoning）， and 5-32x cheaper than GPT-5.5 in Codex （medium） based on API pricing

This low Cost per Task isn't just driven by relatively low token pricing， it's also driven by low relatively low token usage compared to other leading models. @cursor_ai Composer 2.5 only used 1.6M token to complete our Coding Agent Index benchmarks， while other models used up to 5.7M.

This lower token usage also contributes to a low Time per Task. Across the Coding Agent Index configurations shown， average Time per Task was ~12 minutes. Composer 2.5 completed tasks in ~9 minutes on average， making it ~1.3x faster than average， while Composer 2.5 Fast completed tasks in ~7 minutes， making it ~1.8x faster than the average across agents.

Link to full benchmark results below

智能体编码评测/基准

在 X 查看原推导出 Markdown

Artificial Analysis@ArtificialAnlys · X

58导出 Markdown

2026-05-23 04:00·41天前

在 X 看原推· x.com

AI 摘要

Cursor Composer 2.5's is 3-18x cheaper than Opus 4.7 in Claude Code （medium reasoning）， and 5-32x cheaper than GPT-5.5 in Codex （medium） based on API pricing