Chartographer：用于评估视觉语言模型的反事实图表生成

2026-05-26 08:00·38天前

AI 摘要

现有图表问答基准测试存在局限，模型可能依赖捷径或背景知识而非视觉推理来回答问题。为严格评估视觉推理能力，研究提出了“反事实图表”方法，即在保持图表-问答任务不变的前提下，改变底层图表及其答案。为此，研究引入了Chartographer框架，该框架能将图表逆向工程为可执行代码，验证重建保真度，生成种子可控的变体，并从可执行的问答逻辑中推导新答案。通过将此框架应用于现有数据集，研究评估了专有及开源视觉语言模型的变化敏感性与泛化能力。结果表明，反事实图表揭示了单一图表测试所隐藏的失败：模型在正确回答原始图表后，往往无法在更新图表需要全新视觉推理路径时成功泛化。

原文 · 未翻译

Chart question-answering (QA) benchmarks aim to pose questions that require visual reasoning to correctly answer, but models can often reach solutions through shortcuts or prior familiarity with a chart based on their own background knowledge. To strictly evaluate visual reasoning, we propose counterfactual charts where the chart-question task remains fixed, but underlying chart and the corresponding answer are varied. We introduce Chartographer, a framework to reverse engineer charts into executable code, validate reconstruction fidelity, generate seed-controlled counterfactual variants, and derive new answers from executable QA logic. We apply this framework to existing chart QA datasets and evaluate proprietary and open-source vision-language models (VLMs), measuring variation sensitivity and generalizability. Counterfactual charts reveal failures hidden by single-chart performance: VLMs often fail to generalize after answering the original chart correctly. We find failures are most prevalent when updated charts require novel visual reasoning pathways.

HuggingFace Daily Papers（社区热门论文）

61导出 Markdown

Chartographer：用于评估视觉语言模型的反事实图表生成

2026-05-26 08:00·38天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译