迈向真正多语言 ASR:代码切换语音识别向未见语言对的泛化研究
阅读原文· arxiv.org代码切换ASR(CS-ASR)因多语言代码切换语音资源稀缺而极具挑战。现有方法依赖合成数据生成或特定语言对微调,但扩展性受限于语言对数量随支持语言数组合增长。本文通过模型合并与领域泛化方法,探究从有限已见语言对学到的CS能力能否泛化至未见语言对。实验表明,合并的双语CS-ASR模型仅能适度泛化到未见语言对,提示双语CS能力在跨语言对间的迁移有限。
Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets. Nevertheless, these approaches face an inherent scalability limitation, as support for CS must be developed separately for language pairs whose number grows combinatorially with the number of supported languages. In this work, we investigate whether CS capabilities learned from a limited set of seen language pairs can generalize to unseen language pairs through model merging and domain generalization methods. Our experiments show that merged bilingual CS-ASR models modestly generalize to unseen language pairs, suggesting limited transfer of bilingual CS capabilities across language pairs.