利用感知扰动和奖励建模缓解多模态大语言模型评估中的感知判断偏差
阅读原文· arxiv.org多模态大语言模型作为评估者时,若视觉证据与文本线索冲突,模型倾向于奖励看似合理但感知错误的答案,即感知判断偏差。本文构建感知扰动评估数据集,通过最小编辑的反事实响应隔离感知错误并提供可验证监督;提出结合GRPO结构化奖励与批量排序目标的统一训练框架,无需显式成对标签即可实现全局排序一致性。实验表明该方法显著提升评估的感知忠实度、排序一致性与人类对齐度。
Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical weakness: when visual evidence conflicts with textual cues, MLLM judges tend to reward plausible narratives over perceptually correct answers. We identify and systematically analyze this phenomenon, which we term Perceptual Judgment Bias. Through controlled visual perturbations, existing multimodal judges frequently anchor on the response text instead of their own visual perception, leading to inconsistent and non-verifiable evaluations. To address this issue, we introduce the Perceptually Perturbed Judgment Dataset, which constructs minimally edited counterfactual responses that isolate perceptual errors and enable verifiable supervision. Building on this dataset, we develop a unified training framework that combines a structured GRPO-based reward with a batch-ranking objective, achieving coherent global ordering without explicit pairwise labels. Experiments across diverse MLLM-as-a-Judge benchmarks show that our approach substantially improves perceptual fidelity, ranking coherence, and alignment with human evaluation. Our results establish a scalable and generalizable pathway for training multimodal judges that are perceptually grounded, interpretable, and robust to visual-reasoning conflicts.