# 当大语言模型粗心读取表格：衡量与减少数据引用错误

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-30 08:00
- AIHOT 分数：48
- AIHOT 链接：https://aihot.virxact.com/items/cmr307xet0cvfsl8zparg7idb
- 原文链接：https://arxiv.org/abs/2606.32029

## AI 摘要

大语言模型在表格任务中仍会出现数据引用错误（DRE），即错误引用或遗漏表格数值。研究首次系统评估了不同模型（1.7B至20B参数）的DRE发生率，发现所有测试模型均存在该问题。将数据引用作为critic进行过滤和拒绝采样后，答案准确率提升最高达12.0%。团队训练了一个轻量级4B参数critic模型，在分布内和分布外DRE检测上取得平均F1分数78.2%，并能有效辅助更大模型进行推理。

## 正文

While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer accuracy, DREs directly compromise the correctness and reliability of intermediate reasoning steps. Yet prior studies have only offered limited, small-scale analyses. In this work, we present the first systematic evaluation of tabular data referencing errors across different models and tasks. Our results show that DREs occur across all tested models (1.7B to 20B parameters). Furthermore, we demonstrate that incorporating data referencing as a critic significantly improves answer accuracy up to 12.0%, through critic-based filtering and rejection sampling. Finally, we trained a lightweight 4B-parameter critic model that achieves an average F1 score of 78.2% in detecting both in-distribution and out-of-distribution DREs, and effectively assists inference for larger models.