# OpenSTBench：超越语义评估的语音翻译统一评估框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：50
- AIHOT 链接：https://aihot.virxact.com/items/cmpz0edm30525sli3mrvmhfwj
- 原文链接：https://arxiv.org/abs/2605.30792

## AI 摘要

OpenSTBench 是一个统一的多维评估框架，将语音翻译系统（S2TT 和 S2ST，涵盖离线与流式两种模式）输出转化为共享评估格式，联合评测翻译质量、语音质量、说话人保留、情感与副语言保真度、时间一致性以及延迟。实验表明，翻译质量强的系统在语音质量和时间质量上仍存在显著差异。代码与数据集已开源至 GitHub。

## 正文

Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (S2ST), offline translation, and streaming generation, producing outputs that differ in modality, speech realization, and timing behavior. Existing evaluation practices assess important aspects such as translation quality, speech quality, and temporal quality, but these aspects are often evaluated under separate protocols, making it difficult to compare heterogeneous systems comprehensively. To address this gap, we present OpenSTBench, a unified multidimensional evaluation framework that organizes heterogeneous speech translation outputs into a shared evaluation format. OpenSTBench supports both S2TT and S2ST systems in offline and streaming settings, and jointly evaluates translation quality, speech quality, speaker preservation, emotion and paralinguistic fidelity, temporal consistency, and latency. Through experiments on representative speech translation systems, we show that systems with strong translation quality can still differ substantially in speech quality, as well as in temporal quality. OpenSTBench provides a reproducible protocol for analyzing these cross-dimensional differences and supporting application-oriented comparison of speech translation systems. The code and datasets are available at https://github.com/sjtuayj/OpenSTBench.