ChildVox：一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准

2026-05-28 08:00·36天前

AI 摘要

ChildVox 是一个用于评估AI模型对儿童多样化声学信号理解能力的新基准。它覆盖了从出生到学龄的完整发展轨迹，包含生理声音、非语言发声、规范音节和口语语言。该基准整合了17个儿童音频与语音数据集中的20多个子任务，实现了系统性跨语料库、跨领域比较。我们评估了自监督、面向ASR及大型音频-语言模型三类基础模型，任务涵盖生理声音分类、发声与规范音节建模、语音质量评估与识别。结果表明，ChildVox提供了一套高性能模型，能够识别广泛的儿童声学信号，支持下游应用，如表征儿童语言水平和追踪语音发展。

原文 · 未翻译

We present ChildVox, a novel benchmark for characterizing the diverse acoustic signals through which children communicate. Specifically, ChildVox follows the full developmental trajectory from birth through school age, covering physiological sounds, non-linguistic vocalizations, canonical syllables, and spoken language. ChildVox integrates more than 20 sub-tasks across 17 child-centered audio and speech datasets, enabling systematic cross-corpus and cross-domain comparison. We evaluate a representative range of audio and speech foundation models, including self-supervised, ASR-oriented, and large audio-language models, on tasks including physiological sound classification, vocalization and canonical syllables modeling, and speech quality assessment and recognition. Benchmark results show that ChildVox provides a suite of high-performance models in recognizing a wide range of acoustic signals from children, supporting downstream applications such as characterizing children's language levels and tracking speech production with age.

HuggingFace Daily Papers（社区热门论文）

69导出 Markdown

ChildVox：一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准

2026-05-28 08:00·36天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

ChildVox： 一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准

ChildVox： 一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准

ChildVox：一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准

ChildVox：一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准