单一模型适配多种延迟：用于多样化实时应用的通用语音增强方案

2026-06-24 08:00·9天前

AI 摘要

针对不同实时语音应用需单独训练增强模型的痛点，本文提出一种通用实时语音增强模型，可同时控制算法延迟与计算延迟。算法延迟通过可配置的前瞻帧灵活调整，并引入并行卷积层应对不同填充配置带来的学习低效；计算延迟由早期退出机制控制，支持在不同网络深度推理。两阶段训练策略（共享到多解码器过渡）缩小了通用模型与专用模型的性能差距。该框架使单个模型可在多种延迟预算下部署，无需重新训练。

原文 · 未翻译

Different real-time speech applications impose distinct latency budgets, often requiring separately trained enhancement models for each scenario. In this paper, we propose a one-for-all, real-time universal speech enhancement model that provides explicit control over both algorithmic and computational latency. Algorithmic latency is flexibly adjusted via configurable look-ahead frames. To avoid learning inefficiency caused by varying padding configurations, we introduce parallel convolutional layers corresponding to different look-ahead settings. Computational latency is controlled through an early-exit mechanism, enabling inference at different network depths. To narrow the performance gap between specialized and flexible models, we propose a two-stage training strategy with a shared-to-multiple decoder transition. Overall, the proposed framework enables a single model to be deployed across diverse latency budgets without retraining separate models.

HuggingFace Daily Papers（社区热门论文）

47导出 Markdown

单一模型适配多种延迟：用于多样化实时应用的通用语音增强方案

2026-06-24 08:00·9天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译