# 超越感知错误：大型视觉语言模型中的语义固着

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-13 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnzl60r2003aslwz4xvn2cn5
- 原文链接：https://arxiv.org/abs/2604.12119

## AI 摘要

大型视觉语言模型存在"语义固着"现象：即使提示指定替代规则，仍固守默认语义解释。研究提出VLM-Fix基准（四种抽象策略游戏），评估14个模型发现准确率显著偏向标准规则。实验表明，中性别名提示可缩小逆向规则差距，语义负载别名则扩大差距；单规则训练损害相反规则迁移，联合规则训练改善广泛迁移。后期层激活干预可部分恢复性能，表明该错误可在模型后期表征中编辑。

## 正文

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.
