# Lexical Consensus：人工智能体基于具身经验的词汇习得与共享意义

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-21 04:08
- AIHOT 分数：51
- AIHOT 链接：https://aihot.virxact.com/items/cmr2ahrgh06aysl8zzn2el9qn
- 原文链接：https://arxiv.org/abs/2606.22207

## AI 摘要

Lexical Consensus 是一个研究AI智能体通过具身经验习得、稳定并运用词汇意义的框架。使用冻结DINOv2视觉嵌入、Carroll式假词和可解释词汇学习器，实验发现感知连贯性梯度主导学习效果：原生类别最易习得，远析取概念接近随机。CIFAR-100解离实验证实，感知距离显著预测习得准确率（partial R²=0.245, p<1e-7），语义距离无显著解释力。双向评估显示，样例机制在标签到图像检索中优于质心原型，命名与检索是分离的能力。控制实验表明，冻结的感知几何同时支撑了词汇基础并限制了无需表征适应即可习得的范围。

## 正文

Artificial intelligence systems are commonly evaluated through task performance and behavioral imitation, but such evaluations leave open whether an artificial agent can acquire, stabilize, and use new lexical meanings from grounded experience. This paper introduces Lexical Consensus, an experimental framework for studying grounded word learning over a structured perceptual substrate. Using frozen DINOv2 visual embeddings, Carroll-style nonce words, and interpretable lexical learners plus linear baselines, we test whether agents can acquire artificial labels for visual concepts, generalize them bidirectionally, and stabilize them across controlled settings. The main result is a robust perceptual-coherence gradient: native categories are easiest to learn, coherent overextensions remain learnable, mid-range disjunctive concepts degrade, and far-disjunctive concepts approach chance. A pre-registered CIFAR-100 dissociation experiment confirms that this gradient is governed by perceptual distance rather than semantic relatedness: perceptual distance predicts acquisition accuracy (partial R^2 = 0.245, p < 1e-7), while semantic distance adds no significant explanatory power (partial R^2 = 0.002, p = 0.660). Bidirectional evaluation shows that naming and retrieval are distinct: exemplar-based mechanisms outperform centroid prototypes in label-to-image retrieval, exposing a memory-fidelity dimension separate from naming accuracy. Falsification controls, homogeneous candidate-pool evaluations, and null results on representational restructuring indicate that frozen perceptual geometry both enables lexical grounding and limits what can be acquired without representational adaptation.
