# NVIDIA 发布首个智能体 AI 基准 AgentPerf：GB300 NVL72 每兆瓦处理智能体数是 H200 的 20 倍

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-13 07:26
- AIHOT 分数：45
- AIHOT 链接：https://aihot.virxact.com/items/cmqbkcfry02a6slamy9sfcxm9
- 原文链接：https://x.com/rohanpaul_ai/status/2065576558312710584

## AI 摘要

NVIDIA 首次在 AgentPerf（由 Artificial Analysis 开发）中评测智能体 AI。该基准测试的不是传统 token 生成速度，而是每兆瓦可同时运行且保持响应性的编码智能体数量。工作负载模拟真实编码智能体路径（长链模型调用、代码编辑、命令运行、工具延迟、增长上下文），涵盖 12+ 编程语言，请求长度 5K–131K tokens（平均 27K）。结果：GB300 NVL72 在最低服务层每兆瓦达 61.4K 并发智能体，H200 仅为 2.6K（20 倍提升）。性能提升源于 72 GPU 通过 NVLink 组成的机架级系统，配合软件优化（MoE 专家分布、通信与计算重叠、大批量保持）。

## 正文

NVIDIA just posted the first agentic AI benchmark results where GB300 NVL72 runs up to 20x more coding agents per megawatt than H200.

Older inference benchmarks mostly ask how fast a system can produce tokens after one prompt.

AgentPerf from Artificial Analysis， asks a harder question： how many agents can run at the same time while still feeling responsive.

It tests a harder workload than normal LLM serving because an agent is not one request and one answer， but a long chain of model calls， code edits， command runs， tool delays， and growing context.

The benchmark replays real coding-agent paths from public repos across 12+ programming languages， with request lengths from 5K to 131K tokens and an average near 27K tokens.

NVIDIA says GB300 NVL72 reaches 61.4K concurrent agents per megawatt at the lowest service tier， while H200 reaches 2.6K.

The gain comes from 72 GPUs acting like one rack-scale machine through NVLink， plus software that spreads MoE expert work， overlaps communication with compute， and keeps batches large.

@NVIDIAAIDev