# OmniVerifier-M1：具有显式结构化重校准能力的多模态元验证器

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-27 08:00
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmpozt6xb0aluslv4zu1618os
- 原文链接：https://arxiv.org/abs/2605.28805

## AI 摘要

OmniVerifier-M1是一个通过符号化元验证和解耦强化学习训练的多模态验证器。研究发现，使用符号化验证输出（如边界框）作为元验证依据，优于文本解释，便于基于规则的强化学习；同时将二元判断与元验证的强化学习目标解耦，显著优于联合优化。基于此，OmniVerifier-M1实现了稳健的视觉验证与细粒度错误定位，并支持M1-TTS，一个通过该验证器实现动态区域级自校正的生成系统。

## 正文

Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which leverages verifier-generated rationales rather than decision-only signals, and explore how to effectively incorporate meta-verification feedback into multimodal verifier training. We identify two key findings. First, symbolic verifier outputs (e.g., bounding boxes) outperform textual explanations as meta-verification rationales, enabling efficient rule-based reinforcement learning rewards while avoiding reliance on model-based rewards from auxiliary judge models. Second, decoupling reinforcement learning objectives for binary judgment and meta-verification substantially outperforms joint reward optimization, due to intrinsic differences in output structure and learning dynamics. Based on these insights, we train OmniVerifier-M1, a generalist visual verifier leveraging symbolic meta-verification and decoupled reinforcement learning. OmniVerifier-M1 provides robust verification and fine-grained error localization, and further enables M1-TTS, a verifier-driven agentic generation system achieving dynamic region-level self-correction. This approach paves the way for more reliable, interpretable, and fine-grained multimodal verification, supporting safer and more controllable foundation model deployment.
