# SPEED-Bench：面向 Speculative Decoding 的统一多样化基准测试

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-02-10 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnygovrl0038sl13da6zplcv
- 原文链接：https://arxiv.org/abs/2604.09557

## AI 摘要

研究团队发布 SPEED-Bench，旨在建立 Speculative Decoding（SD）算法的统一评估标准。该基准测试包含注重语义多样性的 Qualitative 数据分割和支持多并发场景的 Throughput 数据分割，并与 vLLM、TensorRT-LLM 等生产引擎集成。通过 SPEED-Bench 可发现合成输入会高估真实世界吞吐量，识别出与批次大小相关的最优草稿长度，揭示低多样性数据的评估偏差，并分析先进草稿模型中词汇剪枝的潜在问题。

## 正文

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existing benchmarks suffer from limited task diversity, inadequate support for throughput-oriented evaluation, and a reliance on high-level implementations that fail to reflect production environments. To address this, we introduce SPEED-Bench, a comprehensive suite designed to standardize SD evaluation across diverse semantic domains and realistic serving regimes. SPEED-Bench offers a carefully curated Qualitative data split, selected by prioritizing semantic diversity across the data samples. Additionally, it includes a Throughput data split, allowing speedup evaluation across a range of concurrencies, from latency-sensitive low-batch settings to throughput-oriented high-load scenarios. By integrating with production engines like vLLM and TensorRT-LLM, SPEED-Bench allows practitioners to analyze system behaviors often masked by other benchmarks. We highlight this by quantifying how synthetic inputs overestimate real-world throughput, identifying batch-size dependent optimal draft lengths and biases in low-diversity data, and analyzing the caveats of vocabulary pruning in state-of-the-art drafters. We release SPEED-Bench to establish a unified evaluation standard for practical comparisons of SD algorithms.
