# SpeechEditBench：双语多属性指令引导语音编辑基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-03 08:00
- AIHOT 分数：57
- AIHOT 链接：https://aihot.virxact.com/items/cmq07bl7r03jisltr0p008uhu
- 原文链接：https://arxiv.org/abs/2606.01804

## AI 摘要

SpeechEditBench 是一个双语多属性基准测试，系统评估指令引导的语音编辑能力。基准涵盖七种原子编辑任务及组合编辑任务，并提出基于锚点的评估协议，分别衡量目标属性编辑成功、非目标属性保持成功及联合成功。评测主流语音大语言模型和专用语音编辑系统发现：（1）无单一模型在所有维度表现优异；（2）闭源语音LLM整体优于开源模型；（3）组合编辑极具挑战。该基准为定位语音LLM瓶颈提供诊断框架，数据代码已公开。

## 正文

Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code are avaialble at https://github.com/daxintan-cuhk/SpeechEditBench .
