NVIDIA 今日发布 Nemotron 3 Ultra,重点优化低延迟智能体性能。在 Terminal-Bench v2.1 上,该模型与竞品在 4 个递增轮次限制下对比测试。Nemotron 3 Ultra 凭借高推理速度(基于 token 用量与 blackboxai 预部署测得的端点输出速度,以及工具执行实际耗时),在每个轮次限制下完成任务的速度均快于竞品,同时保持了有竞争力的基准分数,处于该评测性能-时间帕累托前沿的领先位置。
Nemotron 3 Ultra was launched today, including a focus on low latency agentic performance. We tested it against peers under restricted turn-usage limits on Terminal-Bench v2.1 - @NVIDIA Nemotron 3 Ultra completes tasks at a much faster pace than peers due to its high inference speed while scoring competitively on the benchmark.
In this analysis each model is given a 'turn limit' within which it can complete tasks, inside a customized version of the Terminus 2 harness which advises it of this limit. We apply 4 increasing turn limits and trace each result's tradeoff of task latency and performance. Time per task, on the X axis, is calculated as decode time based on token usage and measured endpoint output speeds (for Nemotron 3 Ultra, speeds were measured on a pre-release deployment on @blackboxai), plus the actual time spent executing tools to complete the benchmark.