# ConvFill：对话式填充实现语音智能体响应与能力兼得

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-23 08:00
- AIHOT 分数：57
- AIHOT 链接：https://aihot.virxact.com/items/cmqz0fno1010dslxsezsnv6nt
- 原文链接：https://arxiv.org/abs/2511.07397

## AI 摘要

语音智能体面临响应速度与复杂能力的权衡。ConvFill 提出“对话式填充”（conversational infill）方法：用小参数 talker 模型实时生成上下文响应，隐藏外部 reasoner 模型的推理延迟，并在推理中动态整合 reasoner 的流式知识。基于 290,571 条合成数据、6 个领域、7 个 135M–1.7B 参数小语言模型验证任务可学习性。系统保持毫秒级首次响应时间，准确性差距缩小至前沿 reasoner 模型的 6.3% 以内。在 Apple M2 SoC 上的 18 人用户研究中，ConvFill 整体与前沿模型持平，检索密集型任务更受青睐，响应性显著更高。代码、模型和数据集已开源。

## 正文

Voice agents face a fundamental tension: the reasoning, retrieval, and tool use that make foundation models capable are iterative and slow, while conversational interaction demands responses on a millisecond timescale. Smaller, real-time models meet the latency bar but cannot match foundation models on complex tasks, leaving current voice agents to trade away either responsiveness or capability. We introduce conversational infill, where a small talker model both immediately generates contextually grounded responses to hide the latency of an external reasoner model and fluently integrates streamed reasoner knowledge into its responses during inference. We curate a 290,571-example synthetic dataset spanning six domains and demonstrate that this task is learnable across seven widely used small language models ranging from 135M to 1.7B parameters. Our system implementation, ConvFill, sustains millisecond-level time-to-first-response while closing the accuracy gap to within 6.3% of the corresponding frontier reasoner performance. In a live user study (n=18) with talker deployments running on an Apple M2 SoC, participants rank ConvFill on par with frontier models overall, prefer it for retrieval-heavy tasks, and rate it significantly more responsive. These results show that conversational infill unlocks a new point on the latency-capability Pareto frontier, offering a practical path toward voice agents that are both responsive and highly capable. Code, models, and datasets are available at https://github.com/vysri/conversational-infill.
