# ChildVox： 一个用于理解和表征儿童期声音的语音、音频与大型音频-语言模型基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：69
- AIHOT 链接：https://aihot.virxact.com/items/cmpqjlfr105k2slnoopmafkel
- 原文链接：https://arxiv.org/abs/2605.29257

## AI 摘要

ChildVox 是一个用于评估AI模型对儿童多样化声学信号理解能力的新基准。它覆盖了从出生到学龄的完整发展轨迹，包含生理声音、非语言发声、规范音节和口语语言。该基准整合了17个儿童音频与语音数据集中的20多个子任务，实现了系统性跨语料库、跨领域比较。我们评估了自监督、面向ASR及大型音频-语言模型三类基础模型，任务涵盖生理声音分类、发声与规范音节建模、语音质量评估与识别。结果表明，ChildVox提供了一套高性能模型，能够识别广泛的儿童声学信号，支持下游应用，如表征儿童语言水平和追踪语音发展。

## 正文

We present ChildVox, a novel benchmark for characterizing the diverse acoustic signals through which children communicate. Specifically, ChildVox follows the full developmental trajectory from birth through school age, covering physiological sounds, non-linguistic vocalizations, canonical syllables, and spoken language. ChildVox integrates more than 20 sub-tasks across 17 child-centered audio and speech datasets, enabling systematic cross-corpus and cross-domain comparison. We evaluate a representative range of audio and speech foundation models, including self-supervised, ASR-oriented, and large audio-language models, on tasks including physiological sound classification, vocalization and canonical syllables modeling, and speech quality assessment and recognition. Benchmark results show that ChildVox provides a suite of high-performance models in recognizing a wide range of acoustic signals from children, supporting downstream applications such as characterizing children's language levels and tracking speech production with age.