# Chartographer：用于评估视觉语言模型的反事实图表生成

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-26 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpovigyt09i4slv4jqij5wtp
- 原文链接：https://arxiv.org/abs/2605.27311

## AI 摘要

现有图表问答基准测试存在局限，模型可能依赖捷径或背景知识而非视觉推理来回答问题。为严格评估视觉推理能力，研究提出了“反事实图表”方法，即在保持图表-问答任务不变的前提下，改变底层图表及其答案。为此，研究引入了Chartographer框架，该框架能将图表逆向工程为可执行代码，验证重建保真度，生成种子可控的变体，并从可执行的问答逻辑中推导新答案。通过将此框架应用于现有数据集，研究评估了专有及开源视觉语言模型的变化敏感性与泛化能力。结果表明，反事实图表揭示了单一图表测试所隐藏的失败：模型在正确回答原始图表后，往往无法在更新图表需要全新视觉推理路径时成功泛化。

## 正文

Chart question-answering (QA) benchmarks aim to pose questions that require visual reasoning to correctly answer, but models can often reach solutions through shortcuts or prior familiarity with a chart based on their own background knowledge. To strictly evaluate visual reasoning, we propose counterfactual charts where the chart-question task remains fixed, but underlying chart and the corresponding answer are varied. We introduce Chartographer, a framework to reverse engineer charts into executable code, validate reconstruction fidelity, generate seed-controlled counterfactual variants, and derive new answers from executable QA logic. We apply this framework to existing chart QA datasets and evaluate proprietary and open-source vision-language models (VLMs), measuring variation sensitivity and generalizability. Counterfactual charts reveal failures hidden by single-chart performance: VLMs often fail to generalize after answering the original chart correctly. We find failures are most prevalent when updated charts require novel visual reasoning pathways.
