# 基于工具监督强化学习的视觉推理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-21 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmob5e9hb061msl1y8yy18sk2
- 原文链接：https://arxiv.org/abs/2604.19945

## AI 摘要

研究团队提出ToolsRL框架，通过工具监督强化学习提升多模态大语言模型的视觉推理能力。该框架采用课程学习策略，第一阶段利用工具特定奖励训练基础工具操作（包括放大、旋转、翻转、绘制点线等），第二阶段结合准确性奖励进行端到端优化。这种分离式训练避免了异构任务间的优化冲突，使模型先掌握工具调用能力再应用于复杂视觉推理。实验表明，该方法能高效习得可解释的视觉工具使用技能，显著提升复杂视觉推理任务表现。

## 正文

In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Models. To achieve that, we propose a novel Tool-supervised Reinforcement Learning (ToolsRL) framework, with direct tool supervision for more effective tool-use learning. We focus on a series of simple, native, and interpretable visual tools, including zoom-in, rotate, flip, and draw point/line, whose tool supervision is easy to collect. A reinforcement learning curriculum is developed, where the first stage is solely optimized by a set of well motivated tool-specific rewards, and the second stage is trained with the accuracy targeted rewards while allowing calling tools. In this way, tool calling capability is mastered before using tools to complete visual reasoning tasks, avoiding the potential optimization conflict among those heterogeneous tasks. Our experiments have shown that the tool-supervised curriculum training is efficient and ToolsRL can achieve strong tool-use capabilities for complex visual reasoning tasks.