# 单一模型适配多种延迟：用于多样化实时应用的通用语音增强方案

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-24 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmr0svezl04s4slolvm050hra
- 原文链接：https://arxiv.org/abs/2606.25621

## AI 摘要

针对不同实时语音应用需单独训练增强模型的痛点，本文提出一种通用实时语音增强模型，可同时控制算法延迟与计算延迟。算法延迟通过可配置的前瞻帧灵活调整，并引入并行卷积层应对不同填充配置带来的学习低效；计算延迟由早期退出机制控制，支持在不同网络深度推理。两阶段训练策略（共享到多解码器过渡）缩小了通用模型与专用模型的性能差距。该框架使单个模型可在多种延迟预算下部署，无需重新训练。

## 正文

Different real-time speech applications impose distinct latency budgets, often requiring separately trained enhancement models for each scenario. In this paper, we propose a one-for-all, real-time universal speech enhancement model that provides explicit control over both algorithmic and computational latency. Algorithmic latency is flexibly adjusted via configurable look-ahead frames. To avoid learning inefficiency caused by varying padding configurations, we introduce parallel convolutional layers corresponding to different look-ahead settings. Computational latency is controlled through an early-exit mechanism, enabling inference at different network depths. To narrow the performance gap between specialized and flexible models, we propose a two-stage training strategy with a shared-to-multiple decoder transition. Overall, the proposed framework enables a single model to be deployed across diverse latency budgets without retraining separate models.