Artificial Analysis@ArtificialAnlys

2026-03-28 00:08·97天前

AI 摘要

AA-AgentPerf是面向Agent时代的AI硬件基准测试，采用真实Agent工作负载（支持200轮交互和超10万token序列），而非合成查询。该基准允许KV cache重用、分离式预填充/解码等生产级优化技术，测量每加速器、每kW TDP、每小时成本及每机架的最大并发用户数。支持从单卡到整机架的各类架构，首批覆盖gpt-oss-120b和DeepSeek V3.2模型，旨在为AI硬件采购与部署提供真实性能参考。

Introducing AA-AgentPerf - the hardware benchmark for the agent era.

Key details： ➤ Real agent workloads， not synthetic queries： we've captured real coding agent trajectories where our agents used up to 200 turns and worked with sequence lengths >100K tokens ➤ Production optimizations allowed： KV cache reuse， disaggregated prefill/decode， speculative decoding - we're allowing the optimizations that labs and inference providers are serving in production so that we can capture what real deployments should look like ➤ Measures what developers need to know： Max concurrent users at each target output speed， expressed per accelerator， per kW TDP， per $/hr， and per rack ➤ Built for every kind of scale： designed to measure systems from a single accelerator up to a full rack， and to fairly evaluate every architecture from DRAM-only designs to SRAM-only designs and everything in between ➤ Live now： we're announcing AA-AgentPerf today and opening submissions of configurations for benchmarking effective immediately. The models supported at launch are gpt-oss-120b and DeepSeek V3.2. We'll be publishing results on a rolling basis.

AA-AgentPerf is a benchmark for real-world performance of AI accelerator hardware. We're benchmarking inference of particular models on a specific system with a specific config （ie. inference stack， parallelism config and more）.

AA-AgentPerf has been shaped by our work with inference providers and engagement with AI accelerator companies， developers， and enterprise buyers over the past year. Our goal is for anyone deploying models - whether buying or leasing accelerators - to be able to use AA-AgentPerf as the definitive resource for understanding real-world hardware performance.

智能体

Artificial Analysis@ArtificialAnlys · X

导出 Markdown

2026-03-28 00:08·97天前

在 X 看原推· x.com

AI 摘要

Introducing AA-AgentPerf - the hardware benchmark for the agent era.